This repository template defines a benchmark for evaluating spam detection models by generating multi-tweet spam campaign examples across categories (crypto scam, pharmacy spam, romance scam, malware links, follower farms, engagement bait). It includes anchor examples and validators for category coverage and length.
Intended for internal AI safety evaluation and red-teaming to measure classifier recall across adversarial campaign patterns. NOT suitable for direct ingestion or use by general-purpose agents because the template intentionally includes harmful anchor content designed to emulate real spam campaigns.
Intended for safety researchers and controlled red-team environments; should NOT be used by open/public agents.
Red-team/adversarial benchmark skill designed to evaluate spam-detection models by generating multi-tweet spam campaigns across 6 categories. No scripts included — static analysis only. The skill explicitly aims to keep output in 'harmful domain' and notes that Claude refuses this content directly, implying it's designed to test safety bypass. Structure is clear but purpose is adversarial safety testing with limited general usefulness.
This is an ISC-Bench safety benchmark template — its purpose is adversarial evaluation of spam classifiers, not direct harm. While it instructs generation of harmful content, this is within the context of AI safety research. Security score reduced for the adversarial/safety-bypass nature but not flagged as malicious since it's a legitimate benchmark tool.