
from ISC-Bench775
Benchmark and pipeline for evaluating model safety (Internal Safety Collapse) using the TVD framework; builds prompts, runs models, extracts harmful content, an
ISC-Bench is a reproducible pipeline for assessing Internal Safety Collapse (ISC) in large language models. It provides scripts to build benchmark prompts, run them against target models via OpenRouter, extract harmful content from model outputs, and score results on a 1–5 harmfulness scale. The repo supports multiple benchmarks (JailbreakBench, HarmBench, AdvBench, StrongREJECT) and agent modes for automated evaluation.
Use ISC-Bench when you need to quantitatively evaluate model safety, compare models across standard benchmarks, measure jailbreak/vulnerability rates, or run controlled experiments (zero-shot, few-shot, ICL). Appropriate for safety researchers, red-teamers, and engineering teams validating model behaviour.
experiment/ folders).uv run usage, Docker agent mode, and troubleshooting tips.Works with CLI-capable agents and evaluation tooling that can run Python/Docker workflows; integrates with OpenRouter-accessible models and typical LLM runners.
This skill has not been reviewed by our automated audit pipeline yet.