OpenAI's o3 reasoning model — extended chain-of-thought for the hardest math, science, and coding problems.
Agent2030-v2.3 (o3 + GPT-4.1 + Gemini 2.5 Pro)
o3-2025-04-16 Prompt — rank #8
o3 — rank #48, max_steps=100, OpenAI Jul 2025
Gemini 2.5 Pro proxy (o3-era reasoning)
O3 High — rank #2, easy=98.8 med=88.5 hard=64.7
o3_inspect — p50=119.7 min autonomous work, avg_score=0.636