
from awesome-copilot32,830
Create, run, and analyze Arize experiments to evaluate and compare model performance using the ax CLI.
Provides step-by-step guidance and CLI workflows for creating, exporting, running, and comparing Arize experiments. It covers dataset export, running inference to produce runs, exporting results, and comparing evaluation metrics to benchmark and A/B test models. Includes clear instructions for using the ax CLI to list/get/export experiments and templates for piping experiment exports into inference scripts.
Use this skill when you need to evaluate model performance with Arize: creating experiments, exporting runs, running bulk inference over dataset examples, comparing two experiments, or extracting metrics for analysis. Trigger when the user mentions experiments, benchmarks, A/B testing models, model evaluation, exporting runs, or using the ax CLI.
Works with agents that can run shell commands and invoke provider SDKs (OpenAI, Anthropic, Google Gemini, custom OpenAI-compatible proxies).
This skill has not been reviewed by our automated audit pipeline yet.
Copilot Instructions Blueprint Generator
Generates a technology-agnostic blueprint for creating copilot-instructions.md files that align Copilot output with a project's exact architecture, versions, an
quality-playbook
Run a complete quality engineering audit on any codebase. Derives behavioral requirements from the code, generates spec-traced functional tests, runs a three...