Back to Skills

AEC Bench Experiment Configurator

from aec-bench23

Interactively build and validate experiment.yaml configurations for AI benchmarks, including task and agent selection.

triggers:configure experimentset up an experimentplan a benchmarkcreate experiment.yaml

GitHub SKILL.md

What it does

This skill provides a structured, interactive workflow for creating experiment.yaml configuration files for the AEC Bench tool. It guides the user through the process of selecting tasks, choosing AI agents and models, and defining execution settings to ensure a valid benchmark run.

When to use it

Use this skill when a user wants to start a new benchmark experiment, modify an existing configuration, or plan a trial run with a dry run preview.

What's included

References: Includes manifest schemas and agent-provider matrices to ensure compatibility.
Instructions: A multi-step process covering context detection (checking aec-bench.toml), task selection via datasets or disk scanning, agent configuration (including model selection from a compatibility matrix), and execution settings.

Compatible agents

Designed for agents capable of executing shell commands and reading/writing YAML files, such as Claude Code or similar autonomous IDE agents.

Not yet audited

This skill has not been reviewed by our automated audit pipeline yet.

Information

Repository: aec-bench
Stars: 23

Related Skills

LLM Evaluation

Evaluation framework and tools for systematically measuring LLM performance using automated metrics, human judgment, and A/B testing.

ROCm Triton Kernels (RMSNorm, RoPE 3D, GEGLU, AdaLN)

Guidance and examples for writing, benchmarking, and integrating optimized Triton kernels on ROCm (MI355X, R9700) for diffusers and transformers workloads.

Skill Creator

Create, improve, and evaluate Agent Skills with a guided workflow: capture intent, draft SKILL.md, run evals and benchmarks, and optimize triggering description

Performance Optimizer

Guides profiling and targeted optimizations for code and systems — measure, identify bottlenecks, and verify improvements across Python, Node, shell, and system

Hugging Face Evaluation Manager

Extract, import, and add structured model evaluation results to Hugging Face model cards; run or import benchmark evaluations and generate model-index YAML for

LLM Council

Run parallel queries across multiple LLMs with a live dashboard to compare outputs, synthesize consensus, and perform anonymous model voting.

TAO Performance Audit

Structured performance-audit methodology: measure, identify bottlenecks, optimize the true hotspot, and verify improvements with benchmarks.

Run Benchmarks

Launch, manage, and rerun CodeScaleBench benchmark suites with safety guardrails, paired baseline+full execution, and orchestration utilities.

MCP App Store

AEC Bench Experiment Configurator

What it does

When to use it

What's included

Compatible agents

Tags

Not yet audited

Information

Related Skills

AEC Bench Experiment Configurator

What it does

When to use it

What's included

Compatible agents

Tags

Not yet audited

Information

Related Skills