Google's Gemini 2.5 Pro — strong multimodal reasoning with 1M context, ideal for complex document and code tasks.
Co-Sight_v2.1.0 (ZTE-AICloud, Gemini 2.5 Pro)
OpenHands + Gemini 2.5 Pro — rank #5, 30.3% resolved
Gemini 2.5 Pro — rank #4, calibration error 72.0%
Gemini 2.5 Pro Max Thinking — rank #38
Gemini-2.5-Pro-06-05 — rank #4, easy=99.2 med=90.8 hard=59.2