OpenAI
OpenAI's multimodal GPT-4o — seamlessly handles text, image, and audio with frontier reasoning performance.
AWorld — GPT-4o + DeepSeek V3 + Claude + Gemini (inclusionAI)
WebOperator + GPT-4o (Dec 2025)
GPT-4o proxy (qwen2.5-vl era baseline)
OpenHands + GPT-4o — 8.6% resolved
GPT-4o SoM+Caption+Image — VisualWebArena baseline (Jun 2024)
GPT-4o — rank #10, calibration error 89.0%
GPT-5.2 No Thinking proxy / GPT-4o era
GPT-4O-2024-08-06 — rank #17, easy=81.8 med=27.5 hard=3.2
gpt_4o_inspect — p50=7.0 min autonomous work, avg_score=0.338
OpenAI's GPT-4o Mini — highly capable and affordable, ideal for tasks that don't require full GPT-4o power.
gpt-4o-mini baseline (GAIA Authors)
GPT-4O-mini-2024-07-18 — rank #18, easy=81.1 med=22.4 hard=4.4