OpenAI's GPT-4.1 with 1M token context — powerful instruction-following optimised for long-document and agentic tasks.
Agent_v0.1.4 (gpt-4.1 solo agent)
GPT-4 + Auto Eval & Refine (Apr 2024)
GPT-4.1-2025-04-14 FC — rank #20
GPT-4.1 proxy (GPT-5.1 family)
GPT-4-Turbo proxy — rank #16, easy=83.0 med=29.3 hard=4.2
gpt_4_1106_inspect — p50=4.0 min autonomous work, avg_score=0.289