Classement des modèles IA

Comparez les modèles IA sur les benchmarks agentiques, de raisonnement, de codage et d'utilisation d'outils.

Agentic Reasoning Coding Tool Use Computer Use

Benchmarks

agentic

reasoning

coding

tool-use

computer-use

Open LLM Leaderboard v2

HuggingFace's open-source model leaderboard. Evaluates on IFEval, BBH, MATH Lvl 5, GPQA Diamond, MUSR, and MMLU-Pro. The standard reference for open-source model rankings.

Reasoning7 modèles · avg score

Qwen2.5-72B-Instruct72B

47.98avg score 2

Mistral-Large-Instruct-2411123B

46.52avg score 3

Llama-3.3-70B-Instruct70B

44.85avg score 4

Qwen2-72B-Instruct72B

43.59avg score 5

Llama-3.1-70B-Instruct70B

43.41avg score 6

Qwen2.5-14B-Instruct14B

41.31avg score 7

Qwen2.5-Coder-32B-Instruct32B

39.89avg score