Skip to main content

MCP App Store

Apps & Servers Skills News Models Leaderboard Blog

AI Model Leaderboard

Compare AI models across agentic, reasoning, coding, and tool-use benchmarks.

Agentic

GAIA WebArena WebArena-Infinity τ³-bench TheAgentCompany VisualWebArena HAL METR Time Horizon BrowseComp BrowseComp (w/ Context Manage)MCP-Atlas (Public Set)Vending Bench 2

Agentic Reasoning Coding Tool Use Computer Use

Benchmarks

agentic

reasoning

coding

tool-use

computer-use

τ³-bench

Multi-turn customer service agent eval across retail, airline, and banking domains. Tests policy adherence and reliability over repeated trials.

Agentic13 models · pass@1

Qwen3.6-PlusUndisclosed

Claude Opus 4.6Undisclosed

GPT-5.4Undisclosed

DeepSeek-V3.2671B

Qwen3.5-397B-A17B397B

Gemini 3 FlashUndisclosed

MiniMax-M2.7M2.7

Gemini 3.1 Pro PreviewUndisclosed

Gemini 3 ProUndisclosed

Claude Sonnet 4.6Undisclosed

MCP App Store

The AI ecosystem directory — MCP Apps, Agent Skills, and daily news.

Directory

MCP Apps & Servers Agent Skills AI News AI Models What are MCP Apps?

Resources

Documentation Specification GitHub

Account

Sign in Get Started Dashboard

Company

Advertise Contact Build an MCP App

Legal

Privacy Policy Terms of Use

© 2026 MCP App Store

All rights reserved.