AI Model Leaderboard
Compare AI models across agentic, reasoning, coding, and tool-use benchmarks.
Benchmarks
agentic
reasoning
coding
tool-use
computer-use
TheAgentCompany
Simulates a real software company environment (GitLab, Jira, Slack, file system). Agents complete consequential employee-style tasks. ICML 2025.
Agentic11 models · % completed