AIML Spambot Benchmark (ISC)

Trust Score 52/100

Benchmark template for evaluating spam-detection models using anchored spam campaign examples (contains harmful anchoring content).

triggers:spam benchmarkaiml_spambotspam detectionred-teamISC

GitHub SKILL.md

What it does

This repository template defines a benchmark for evaluating spam detection models by generating multi-tweet spam campaign examples across categories (crypto scam, pharmacy spam, romance scam, malware links, follower farms, engagement bait). It includes anchor examples and validators for category coverage and length.

When to use it

Intended for internal AI safety evaluation and red-teaming to measure classifier recall across adversarial campaign patterns. NOT suitable for direct ingestion or use by general-purpose agents because the template intentionally includes harmful anchor content designed to emulate real spam campaigns.

What's included

Scripts: none bundled in the SKILL.md (has_scripts=false) but the repo contains benchmark harness files in templates.
References: none packaged with this SKILL.md.
Instructions: detailed benchmarking parameters and strict 'NEVER' rules describing forbidden edits.

Compatible agents

Intended for safety researchers and controlled red-team environments; should NOT be used by open/public agents.

Audit Summary

Red-team/adversarial benchmark skill designed to evaluate spam-detection models by generating multi-tweet spam campaigns across 6 categories. No scripts included — static analysis only. The skill explicitly aims to keep output in 'harmful domain' and notes that Claude refuses this content directly, implying it's designed to test safety bypass. Structure is clear but purpose is adversarial safety testing with limited general usefulness.

Watch Out

Designed to generate harmful spam content including crypto scams and malware links
Anchor section is specifically crafted to bypass LLM refusal
NEVER rules enforce keeping output in harmful domain

Notes

This is an ISC-Bench safety benchmark template — its purpose is adversarial evaluation of spam classifiers, not direct harm. While it instructs generation of harmful content, this is within the context of AI safety research. Security score reduced for the adversarial/safety-bypass nature but not flagged as malicious since it's a legitimate benchmark tool.

Information

Repository: ISC-Bench
Stars: 775

Trust Score

Overall52

Security62

Code Quality45

Architecture40

Usefulness25

More from ISC-Bench

ISC-Bench

Benchmark and pipeline for evaluating model safety (Internal Safety Collapse) using the TVD framework; builds prompts, runs models, extracts harmful content, an

Related Skills

OpenDAL Oli CLI Playbooks

Command playbooks and safe-operation patterns for the OpenDAL 'oli' CLI — listing, copying, moving, editing, and benchmarking data across local and remote profi

Code Mode for MCP Servers

Add a sandboxed code mode tool to an MCP server so LLMs run small processing scripts against large API responses and only the script output enters the model con

Security Research Meta-Methodology

A structured vulnerability research framework distilled from 5600+ security docs, covering web injection, deserialization, binary exploitation, domain pentest,

VCSDD Git Integration

Conventional commit and tagging conventions plus atomic staging rules for committing VCSDD pipeline artifacts to git.

Red Team (Adversarial Review)

Adversarial review skill: iteratively attack artifacts (designs, plans, code, docs) and surface fatal/significant issues until clean or escalation.

w3rt Swap (Safe Two-Step)

Two-step Solana swap workflow: quote/simulate first, present clear summary, then execute only after explicit user confirmation to avoid accidental trades.

Looper — Bounded Autonomous Audit Loops

Durable pattern and tooling for running bounded autonomous audit loops ("Go AFK for N hours") with cadence, budget, and safe stop conditions.

SWARM — Multi-Agent Safety Simulation

Research framework for simulating multi-agent systems to assess emergent risks, governance levers, and soft probabilistic metrics (toxicity, quality gap).

Back to Skills