Skip to main content

MCP App Store

Apps & Servers Skills News Models Leaderboard Blog

AI Model Leaderboard

Compare AI models across agentic, reasoning, coding, and tool-use benchmarks.

Agentic

GAIA WebArena WebArena-Infinity τ³-bench TheAgentCompany VisualWebArena HAL METR Time Horizon BrowseComp BrowseComp (w/ Context Manage)MCP-Atlas (Public Set)Vending Bench 2

Agentic Reasoning Coding Tool Use Computer Use

Benchmarks

agentic

reasoning

coding

tool-use

computer-use

WebArena

Self-hosted web environment simulating real sites (shopping, Reddit, GitLab). Agents navigate, fill forms, and complete goals across 812 tasks.

Agentic8 models · % success

DeepSeek-V3.2671B

74.30% success 2

Gemini 3 ProUndisclosed

69.60% success 3

Claude Sonnet 4.6Undisclosed

GPT-4oUndisclosed

54.60% success 5

GPT-4.1Undisclosed

20.20% success 6

Llama-3.1-70B-Instruct70B

7.02% success 7

Llama-3.1-8B-Instruct8B

3.32% success 8

Mixtral-8x7B-Instruct-v0.147B

MCP App Store

The AI ecosystem directory — MCP Apps, Agent Skills, and daily news.

Directory

MCP Apps & Servers Agent Skills AI News AI Models What are MCP Apps?

Resources

Documentation Specification GitHub

Account

Sign in Get Started Dashboard

Company

Advertise Contact Build an MCP App

Legal

Privacy Policy Terms of Use

© 2026 MCP App Store

All rights reserved.