Agent Skills

arizeexperimentsevaluation

Arize Experiment

awesome-copilot

Create, run, and analyze Arize experiments to evaluate and compare model performance using the ax CLI.

34,827

observabilityopentelemetrycollector

OpenTelemetry Skill — Observability Engineering Assistant

opentelemetry-skill

Expert OpenTelemetry observability skill for designing collectors, pipelines, sampling, cardinality management, security, and production-ready deployment patter

Skill Optimizer

opencode-skills-collection

Diagnose and optimize Agent Skills (SKILL.md) using session transcripts and static analysis to improve triggers, workflows, and token efficiency.

qualityauditanalysis

growthexperimentsa/b-testing

Swarma

swarma

Framework for running high-velocity growth experiments with agent teams that generate hypotheses, run tests, observe signals, and build a validated playbook.

174

evaluationomnidocbenchdocument-parsing

OmniDocBench Evaluation Helper

opendatalab

Run, validate, and parse OmniDocBench document parsing evaluations with Docker/conda workflows and result parsing.

dbtsemantic-layeranalytics

Answering Natural Language Questions with dbt

skillshub

Use dbt's semantic layer or compiled SQL to answer business data questions: list metrics, modify compiled SQL when needed, and fall back to manifest analysis if

code-qualitymetricscomplexity

Code Quality Metrics Guide

dotfiles-claude

Practical guide to quantitative code-quality metrics (cyclomatic, cognitive, Halstead, maintainability index) with thresholds, formulas and measurement commands

6 triggers

Body Composition Analyzer

vitaclaw

Analyze and track body composition metrics (body fat, muscle mass, visceral fat, BMI) and provide evidence-based recommendations and trend analysis.

healthbody-compositionmetrics

mlflowmlopsexperiment-tracking

Track ML Experiments (MLflow)

agent-almanac

Set up MLflow experiment tracking: server, autologging, artifact storage, run comparison and lifecycle management for reproducible ML workflows.

6 triggers

KB Stats (Knowledge Base Statistics)

ai-first-sdlc-practices

Generates a markdown dashboard of knowledge-base health: inventory, layer/domain distribution, recent activity, and staleness metrics.

knowledge-basekbdashboard

metricsmetricflowdata-modeling

MetricFlow: Interactive Metric Definition

datus-agent

Interactively define MetricFlow metrics from natural-language business descriptions; proposes, validates, and dry-runs metric YAML for semantic modeling.

1,161

Metrics Report (codebase test & coverage analyzer)

ai-playground

Scans a repository, discovers and runs tests, computes coverage, evaluates test quality, and generates a metrics report website and JSON output.

testingcoverageci

LLM Evaluation

claude-plugins

Evaluation framework and tools for systematically measuring LLM performance using automated metrics, human judgment, and A/B testing.

evaluationllmmetrics

SaaS Growth & Metrics Playbook

skills

Practical, cross‑language playbook to design and scale SaaS growth (PLG vs SLG), pricing, retention, and key metrics (ARR, NRR, CAC, LTV) with benchmarks and ca

saasgrowthplg

4,473

observabilitytelemetryopentelemetry

Maverick Observability Best Practices

maverick

Practical observability standards: metrics, tracing, health checks, SLIs/SLOs and dashboards for production services.

rustopentelemetryobservability

Maple Rust Observability Style

maple

Implements OpenTelemetry observability in Rust using Maple's HTTP exporter and tracing-opentelemetry bridge.

1,527

4 triggers

Datadog Observability

claude-mpm-skills

Guides setup and best practices for Datadog APM, logs, metrics, synthetics and RUM to implement full-stack monitoring, tracing, and cost optimization in product

observabilitymonitoringapm

OpenJudge — LLM Evaluation Pipeline

openjudge

Tools and patterns to build automated evaluation pipelines for LLMs: graders, runners, aggregators, and analysis utilities for comparing model outputs and scori

evaluationgraderllm

639

triagegraph-analysisbacklog

Beads BV — Graph-Aware Triage

boshu2

Graph-based backlog triage using bv/br: rank priorities, find bottlenecks, and produce machine-readable recommendations for agents (robot mode).

6 triggers

Vigil — Incident Response

claude-code-plugins-plus-skills

Incident response guidance for diagnosing production issues: gather symptoms, read logs, check metrics, trace requests, identify root cause, and recommend fixes

incident-responseobservabilitydevops

2,514

metricskpiproduct-management

Outcome KPI Framework

nwave

A practical framework for writing measurable outcome-focused KPIs for features and user stories, mapping outcomes to leading indicators and measurement plans.

564

agentoptimizationprompt-engineering

Agent Performance Optimization

agent-skills-hub

A structured workflow for improving agent performance via metrics, prompt engineering, testing (A/B), and safe staged rollouts.

Powertools for AWS Lambda

powertools-mcp

Guidance and utilities for building AWS Lambda functions with best practices: structured logging, X-Ray tracing, EMF metrics, idempotency and batch processing a

awslambdaserverless