Skip to main content

MCP App Store

Apps & Servers Skills News Models Leaderboard Blog

AI Model Leaderboard

Compare AI models across agentic, reasoning, coding, and tool-use benchmarks.

Tool Use

BFCL v4 Tool-Decathlon

Agentic Reasoning Coding Tool Use Computer Use

Benchmarks

agentic

reasoning

coding

tool-use

computer-use

BFCL v4

Berkeley Function-Calling Leaderboard v4. Tests tool-use accuracy across serial, parallel, multi-turn, and agentic interactions with web search.

Tool Use44 models · % accuracy

Claude Opus 4.6Undisclosed

77.47% accuracy 2

Claude Sonnet 4.6Undisclosed

73.24% accuracy 3

72.80% accuracy 4

Gemini 3 ProUndisclosed

72.51% accuracy 5

72.38% accuracy 6

Grok 4.1 FastUndisclosed

69.57% accuracy 7

Claude Haiku 4.5Undisclosed

68.70% accuracy 8

63.05% accuracy 9

Grok 4Undisclosed

62.97% accuracy 10

Kimi-K2-Instruct1T

59.06% accuracy 11

DeepSeek-V3.2-Exp671B

56.73% accuracy 12

DeepSeek-V3.2671B

56.73% accuracy 13

Gemini 2.5 FlashUndisclosed

56.24% accuracy 14

GPT-5.4Undisclosed

55.87% accuracy 15

GPT-5.4 MiniUndisclosed

55.46% accuracy 16

GPT-4.1Undisclosed

53.96% accuracy 17

o4-miniUndisclosed

53.24% accuracy 18

Qwen3-235B-A22B235B

52.15% accuracy 19

GPT-5.4 NanoUndisclosed

51.45% accuracy 20

Nanbeige4.1-3B3B

51.40% accuracy 21

GPT-4.1 MiniUndisclosed

50.45% accuracy 22

48.71% accuracy 23

48.71% accuracy 24

Qwen2.5-14B-Instruct14B

42.57% accuracy 25

42.57% accuracy

MCP App Store

The AI ecosystem directory — MCP Apps, Agent Skills, and daily news.

Directory

MCP Apps & Servers Agent Skills AI News AI Models What are MCP Apps?

Resources

Documentation Specification GitHub

Account

Sign in Get Started Dashboard

Company

Advertise Contact Build an MCP App

Legal

Privacy Policy Terms of Use

© 2026 MCP App Store

All rights reserved.