Comparez les modèles IA sur les benchmarks agentiques, de raisonnement, de codage et d'utilisation d'outils.
agentic
reasoning
coding
tool-use
computer-use
Contamination-free benchmark updated monthly with fresh questions across reasoning, math, coding, data analysis, and instruction following. Avoids leakage by using new problems from recent sources.