What it does

Provides step-by-step guidance and CLI workflows for creating, exporting, running, and comparing Arize experiments. It covers dataset export, running inference to produce runs, exporting results, and comparing evaluation metrics to benchmark and A/B test models. Includes clear instructions for using the ax CLI to list/get/export experiments and templates for piping experiment exports into inference scripts.

When to use it

Use this skill when you need to evaluate model performance with Arize: creating experiments, exporting runs, running bulk inference over dataset examples, comparing two experiments, or extracting metrics for analysis. Trigger when the user mentions experiments, benchmarks, A/B testing models, model evaluation, exporting runs, or using the ax CLI.

What's included

Scripts: guidance and templates for infer.py and piping ax exports into tools (has_references=true).
References: linked docs in the repo for setup, profiles, and export tips.
Instructions: how to export datasets, generate runs by calling the real model API, verify runs, and create experiments via ax. Emphasises never fabricating outputs and how to switch between REST vs Arrow Flight exports.