
from my_arxiv_daily183
Create, iterate on, and evaluate Agent Skills: draft SKILL.md, run test prompts, collect quantitative and qualitative feedback, and optimize skill descriptions
A comprehensive authoring and evaluation workflow for building Agent Skills. Provides step-by-step guidance to capture intent, draft SKILL.md, devise and run test cases (with both with-skill and baseline runs when subagents are available), grade results, aggregate benchmarks, and iterate until the skill meets quality expectations. Also supports description optimization to improve trigger accuracy.
Use this skill when you need to: create a new Agent Skill from scratch; improve an existing SKILL.md; design and run test cases and evaluations; benchmark skill performance; or optimise the frontmatter description so the platform triggers the skill reliably. Ideal for authors, reviewers, and maintainers who want repeatable iterations.
Best used with platforms that provide subagents and a runner (e.g., Cowork/Claude Code or systems that support spawning parallel runs). Works in degraded mode (single-agent) by running tests serially.
This skill has not been reviewed by our automated audit pipeline yet.