AgentEvals

Name: AgentEvals
Availability: InStock
Author: agentevals-dev

Supports UI

by agentevals-dev

Framework-agnostic AI agent evaluation using OpenTelemetry traces to score performance and inference quality without re-execution.

0 stars

Works in:claude

Exposes:Tools

View on GitHub Docs

What it does

AgentEvals connects to AI agent execution traces via OpenTelemetry (OTel) to provide deterministic scoring of agent behavior. It allows developers to benchmark agents before production by analyzing tool trajectories and response quality from existing traces, eliminating the need for expensive and slow re-runs.

Tools

list_metrics: Displays all available built-in and community evaluation metrics.
evaluate_traces: Processes local OTLP or Jaeger trace files to generate scores.
list_sessions: Lists active streaming sessions for real-time evaluation.
summarize_session: Provides a structured summary of an agent session's tool calls.
evaluate_sessions: Scores live sessions against a defined golden reference set.

Installation

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "agentevals": {
      "command": "agentevals",
      "args": ["mcp"]
    }
  }
}

Supported hosts

Claude Desktop
Claude Code

Quick install

pip install agentevals-cli

Information

Pricing: free
Published: 6/18/2026
stars: 0

Related Apps

FinanceToolkit

MCP Server

Professional-grade financial analysis toolkit for equities, options, and risk management.

DiffSitter MCP

MCP Server

AI-powered structural code navigation using tree-sitter ASTs for semantic understanding across 14+ languages.

OpenAI Apps SDK Examples

MCP App

Official example gallery of interactive MCP widgets for ChatGPT — 3D viewers, maps, carousels, shopping carts, and more.

Human MCP

MCP Server

Give AI agents human-like senses: visual analysis, image/video generation, speech synthesis, browser automation, and advanced reasoning — 29 MCP tools in one se

Containarium

MCP Server

Self-hostable agent runtime with SSH-native isolation, eBPF egress policy, and MCP-native CLI.

Shopify MCP Server

MCP Server

Direct interaction with Shopify store data via GraphQL API for managing products, customers, and orders.

Git MCP Server

MCP Server

Full-featured Git MCP server exposing 28 tools for AI agents to clone, commit, branch, diff, merge, rebase, and more via STDIO or Streamable HTTP.

CodexPotter

MCP Server

Autonomous reconciliation loop that drives Codex to align your codebase with instructed states.

Back to Apps

AgentEvals

Supports UI

by agentevals-dev

Framework-agnostic AI agent evaluation using OpenTelemetry traces to score performance and inference quality without re-execution.

0 stars

Works in:claude

Exposes:Tools

View on GitHub Docs

What it does

Tools

list_metrics: Displays all available built-in and community evaluation metrics.
evaluate_traces: Processes local OTLP or Jaeger trace files to generate scores.
list_sessions: Lists active streaming sessions for real-time evaluation.
summarize_session: Provides a structured summary of an agent session's tool calls.
evaluate_sessions: Scores live sessions against a defined golden reference set.

Installation

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "agentevals": {
      "command": "agentevals",
      "args": ["mcp"]
    }
  }
}

Supported hosts

Claude Desktop
Claude Code

Quick install

pip install agentevals-cli

Information

Pricing: free
Published: 6/18/2026
stars: 0

Related Apps

FinanceToolkit

MCP Server

Professional-grade financial analysis toolkit for equities, options, and risk management.

DiffSitter MCP

MCP Server

AI-powered structural code navigation using tree-sitter ASTs for semantic understanding across 14+ languages.

OpenAI Apps SDK Examples

MCP App

Official example gallery of interactive MCP widgets for ChatGPT — 3D viewers, maps, carousels, shopping carts, and more.

Human MCP

MCP Server

Give AI agents human-like senses: visual analysis, image/video generation, speech synthesis, browser automation, and advanced reasoning — 29 MCP tools in one se

Containarium

MCP Server

Self-hostable agent runtime with SSH-native isolation, eBPF egress policy, and MCP-native CLI.

Shopify MCP Server

MCP Server

Direct interaction with Shopify store data via GraphQL API for managing products, customers, and orders.

Git MCP Server

MCP Server

Full-featured Git MCP server exposing 28 tools for AI agents to clone, commit, branch, diff, merge, rebase, and more via STDIO or Streamable HTTP.

CodexPotter

MCP Server

Autonomous reconciliation loop that drives Codex to align your codebase with instructed states.

AgentEvals

What it does

Tools

Installation

Supported hosts

Quick install

Information

Categories

Related Apps

AgentEvals

What it does

Tools

Installation

Supported hosts

Quick install

Information

Categories

Related Apps