Skip to main content

MCP App Store

Apps & Servers Skills News Models Leaderboard Blog

AI Model Leaderboard

Compare AI models across agentic, reasoning, coding, and tool-use benchmarks.

Reasoning

Humanity's Last Exam LiveBench Open LLM Leaderboard v2 HLE (w/ Tools)AIME 2026 HMMT Nov. 2025 HMMT Feb. 2026 IMOAnswerBench GPQA-Diamond

Agentic Reasoning Coding Tool Use Computer Use

Benchmarks

agentic

reasoning

coding

tool-use

computer-use

GPQA-Diamond

Graduate-Level Google-Proof Q&A Diamond set

Reasoning13 models · percent

Gemini 3.1 Pro PreviewUndisclosed

GPT-5.4Undisclosed

Claude Opus 4.6Undisclosed

Qwen3.6-PlusUndisclosed

MiniMax-M2.7M2.7

gemma-4-31b30.7B

84.30percent 10

DeepSeek-V3.2671B

82.40percent 11

gemma-4-26b-a4b25.2B total / 3.8B active

82.30percent 12

gemma-4-e4b4.5B effective / 8B total

58.60percent 13

gemma-4-e2b2.3B effective / 5.1B total

MCP App Store

The AI ecosystem directory — MCP Apps, Agent Skills, and daily news.

Directory

MCP Apps & Servers Agent Skills AI News AI Models What are MCP Apps?

Resources

Documentation Specification GitHub

Account

Sign in Get Started Dashboard

Company

Advertise Contact Build an MCP App

Legal

Privacy Policy Terms of Use

© 2026 MCP App Store

All rights reserved.