Arize Experiment

Trust Score 92/100

from arize-skills19

Create, run, and analyze Arize experiments for evaluating and comparing model performance using the ax CLI.

triggers:create experimentexport runscompare modelsax experimentsrun experimentbenchmark models

GitHub SKILL.md

What it does

Provides step-by-step CLI guidance to create, export, run, and compare Arize experiments. It explains the experiment/run/dataset concepts, how to export datasets and collect runs, and how to run statistical comparisons and exports for further analysis. Concrete workflows and command examples (ax CLI) are included for common tasks such as exporting runs, creating experiments, and piping outputs into analysis tools.

When to use it

Use this skill when you need to evaluate model performance, run A/B model comparisons, export experiment runs for analysis, or automate experiment creation from dataset exports. Trigger when the user asks about creating experiments, exporting runs, comparing models, benchmarking, or measuring accuracy.

What's included

Scripts: none detected in this skill directory (has_scripts=false)
References: included (see references/ for setup and profiles guidance)
Instructions: detailed CLI examples for listing/getting/exporting experiments, creating experiments from run files, and piping exports into analysis commands. Guidance on provider SDKs and running real model API calls is provided.

Compatible agents

Best suited for agents with shell/CLI capabilities and access to the ax CLI and networked model provider SDKs (e.g., Claude Code, Codex, Copilot/CLI-enabled agents).

Audit Summary

CLI-wrapper skill for Arize experiment management via the ax CLI. No bundled scripts — all interaction is through documented shell commands. SKILL.md is thorough with clear concepts, detailed flag tables, practical workflow examples, and a troubleshooting section. Explicitly prohibits credential exfiltration and output fabrication, which is a strong security posture. Niche but well-executed for its target audience of LLM evaluation practitioners using Arize.

Watch Out

Requires the ax CLI installed and an Arize account with API key configured
No scripts to validate — purely reference documentation for CLI commands

Notes

Clean skill with no security concerns. Strong anti-fabrication and anti-credential-exfiltration stance. Well-structured documentation. Usefulness limited by niche audience (Arize platform users doing LLM evaluation), but high quality within that scope.

Information

Repository: arize-skills
Stars: 19

Trust Score

Overall92

Security100

Code Quality85

Architecture80

Usefulness58

Related Skills

Readwise Reader Document Management

Manage Readwise Reader documents: list, save, search, move, tag, highlight, export and bulk-edit via official and custom CLIs.

Scikit-learn (Classical ML reference)

Comprehensive scikit-learn guidance for classification, regression, clustering, preprocessing, model evaluation, and production-ready ML pipelines.

ezBookkeeping API Tools

Command-line API tools for ezBookkeeping: record and query transactions, retrieve accounts/categories/tags, and fetch exchange rates for self-hosted personal fi

Tmux Bridge

Give an agent controlled access to a local tmux terminal bridge so it can run shell commands, capture output, and manage sessions on the user's machine.

SourceSage CLI

Generate concise, LLM-friendly repository summaries (Markdown) using the SourceSage CLI — supports lite mode, language switching, and targeted repo analysis.

Nit (Nitter CLI)

Terminal client to browse tweets, view profiles, and search posts via Nitter instances without opening a browser.

hn — Hacker News CLI

Terminal-based Hacker News client with a CLI and optional interactive TUI for browsing top/new/best/ask/show/jobs stories, viewing threads with nested comments,

Lokalise Migration — Deep Dive

Step-by-step migration guide and tooling for moving translations from Crowdin/Phrase/POEditor into Lokalise, including transforms, uploads and validation.

Back to Skills