
from arize-skills19
Create, run, and analyze Arize experiments for evaluating and comparing model performance using the ax CLI.
Provides step-by-step CLI guidance to create, export, run, and compare Arize experiments. It explains the experiment/run/dataset concepts, how to export datasets and collect runs, and how to run statistical comparisons and exports for further analysis. Concrete workflows and command examples (ax CLI) are included for common tasks such as exporting runs, creating experiments, and piping outputs into analysis tools.
Use this skill when you need to evaluate model performance, run A/B model comparisons, export experiment runs for analysis, or automate experiment creation from dataset exports. Trigger when the user asks about creating experiments, exporting runs, comparing models, benchmarking, or measuring accuracy.
Best suited for agents with shell/CLI capabilities and access to the ax CLI and networked model provider SDKs (e.g., Claude Code, Codex, Copilot/CLI-enabled agents).
CLI-wrapper skill for Arize experiment management via the ax CLI. No bundled scripts — all interaction is through documented shell commands. SKILL.md is thorough with clear concepts, detailed flag tables, practical workflow examples, and a troubleshooting section. Explicitly prohibits credential exfiltration and output fabrication, which is a strong security posture. Niche but well-executed for its target audience of LLM evaluation practitioners using Arize.
Clean skill with no security concerns. Strong anti-fabrication and anti-credential-exfiltration stance. Well-structured documentation. Usefulness limited by niche audience (Arize platform users doing LLM evaluation), but high quality within that scope.