
from claude-superskills23
Create, improve, and evaluate Agent Skills with a guided workflow: capture intent, draft SKILL.md, run evals and benchmarks, and optimize triggering description
A complete authoring and evaluation workflow for Agent Skills. Guides an author through interviewing the user, drafting SKILL.md content, creating test cases, running with-skill and baseline evaluations, grading results, and producing a reviewer report and benchmark. Also includes tools for iterating on descriptions to improve triggering accuracy.
Use this skill when you need to create a new Agent Skill from a user conversation, improve an existing SKILL.md, run repeatable evals and benchmarks, or optimize a skill's frontmatter and triggers for better activation. Useful when you want structured test cases, reproducible grading, and an HTML reviewer for human feedback.
Best suited for agents supporting subagents and script execution (Claude Code, CLI-capable agents, environments that can run Python).
Comprehensive skill-creator workflow for building, evaluating, and iterating on agent skills. Includes eval framework with A/B testing, description optimization loop, and HTML report generation. Scripts are well-structured but most fail outside the repo context due to module path assumptions (from scripts.X import Y) and missing anthropic dependency. Only utils.py runs standalone cleanly.
anthropicWell-designed skill with clear progressive disclosure. SKILL.md is thorough with good instructions for both creating and improving skills. The eval/benchmark infrastructure is sophisticated. Main issue is scripts not designed to run independently — they assume repo-root execution context.