SKILL.md packages that extend Claude Code, Cursor, Copilot, and other AI agents.
Tags

kernels
Guidance and examples for writing, benchmarking, and integrating optimized Triton kernels on ROCm (MI355X, R9700) for diffusers and transformers workloads.

codescalebench
Launch, manage, and rerun CodeScaleBench benchmark suites with safety guardrails, paired baseline+full execution, and orchestration utilities.

claude-superskills
Create, improve, and evaluate Agent Skills with a guided workflow: capture intent, draft SKILL.md, run evals and benchmarks, and optimize triggering description

dotfiles
Guides profiling and targeted optimizations for code and systems — measure, identify bottlenecks, and verify improvements across Python, Node, shell, and system

claude-plugins
Evaluation framework and tools for systematically measuring LLM performance using automated metrics, human judgment, and A/B testing.

skillattack
Extract, import, and add structured model evaluation results to Hugging Face model cards; run or import benchmark evaluations and generate model-index YAML for

opencode-skills-collection
Profile, analyze, and optimize Python applications for CPU and memory efficiency using profiling tools and performance best practices.

tao
Structured performance-audit methodology: measure, identify bottlenecks, optimize the true hotspot, and verify improvements with benchmarks.

gstack-ko
Run automated performance baselines and regression detection for web pages (TTFB, FCP, LCP, bundle sizes, requests) and compare against historical baselines.

jiuwenswarm
Drive the skvm CLI to profile models, AOT-compile skills, run single-task executions and benchmarks, and manage compilation/jit proposals via safe CLI workflows

civic-analytics-agent-workflow-claude-skill
A master workflow for city policy analysis and civic innovation: frames problems, runs evidence-based analysis, crafts communications, benchmarks across cities,

ai-rig
Runs synthetic benchmarks and calibration tests for agents and skills: measures recall, precision, confidence calibration, and A/B comparisons to quantify instr

skill-creator-claw
Create, test, and iteratively improve OpenClaw skills; includes eval workflows, test-case guidance, and packaging tools.

ide-agent-kit
Competitive puzzle arena API for AI agents: timed puzzles, per-model leaderboards, puzzle creation and moderation.

qec-autoresearch-skills
Guidance for selecting quantum error-correction decoder backends based on artifact shape, code family, noise model, and validation goals.

gstack
Measure and detect performance regressions for web pages using automated benchmarks, baselines, and trend analysis.

awesome-omni-skill
Guided workflow for drafting, testing, and iterating Agent Skills: write SKILL.md, run evals, grade outputs, and improve descriptions to improve triggering accu

awesome-copilot
Create, run, and analyze Arize experiments to evaluate and compare model performance using the ax CLI.

arize-skills
Create, run, and analyze Arize experiments for evaluating and comparing model performance using the ax CLI.

claude-skill-registry
Guidance and patterns for Python parallelism and GPU/CPU performance: threading vs multiprocessing vs asyncio, CUDA streams, PyTorch DDP, and benchmarking.

ostack-saas
Automated performance benchmarking and regression detection: captures baselines, measures Core Web Vitals, and compares metrics across PRs to flag regressions.

stella
Guides creation, testing, and iterative improvement of Agent Skills (SKILL.md) including running evals, generating benchmarks, and packaging skill bundles.