OpenAI
OpenAI's o3 reasoning model — extended chain-of-thought for the hardest math, science, and coding problems.
Agent2030-v2.3 (o3 + GPT-4.1 + Gemini 2.5 Pro)
o3-2025-04-16 Prompt — rank #8
o3 — rank #48, max_steps=100, OpenAI Jul 2025
Gemini 2.5 Pro proxy (o3-era reasoning)
O3 High — rank #2, easy=98.8 med=88.5 hard=64.7
o3_inspect — p50=119.7 min autonomous work, avg_score=0.636
OpenAI's o3-mini — efficient reasoning model with strong STEM performance at significantly lower cost than o3.
OpenAI's o4-mini — next-generation compact reasoning model with improved tool use and agentic capabilities.
MetaAgentv0.5.11 (o3)
O3-Mini High — rank #6, easy=98.8 med=87.1 hard=47.2
Agent2030-v2.2 (o4-mini + GPT-4.1 + Gemini 2.5 Pro)
o4-mini-2025-04-16 FC — rank #21
O4-Mini High — rank #1, easy=98.4 med=91.9 hard=69.3
o1_inspect — p50=38.8 min autonomous work, avg_score=0.511
USACO Episodic+Semantic + o4-mini High — rank #2, $44.04