Anthropic
Anthropic's fastest and most compact Claude model — near-instant responses for lightweight tasks and high-throughput use cases.
Fixed HAIKU MODEL baseline
Claude 4.5 Haiku high reasoning — mini-SWE-agent v2
Claude-Haiku-4-5-20251001 FC — rank #6
Claude Haiku 4.5 Thinking — rank #29
Claude-3-Haiku proxy — rank #19, easy=62.4 med=9.4 hard=2.1
Anthropic's most intelligent Claude model — best for complex agentic workflows, coding, and long-horizon reasoning tasks.
Anthropic's balanced Claude Sonnet — optimal mix of intelligence, speed, and cost for production AI applications.
CustomGPT Agent v2 (Claude Opus 4.6 + Sonnet 4.6)
Claude 4.5 Opus high reasoning — mini-SWE-agent v2
Claude-Opus-4-5-20251101 FC — rank #1
Claude Opus 4.5 — rank #1, high reasoning, Sierra eval
Claude 4.6 Opus Thinking High Effort — rank #3
Claude-Opus-4 Thinking — rank #12, easy=98.8 med=78.3 hard=31.4
claude_opus_4_6_inspect — p50=718.8 min autonomous work, avg_score=0.789
Claude Code + Claude Opus 4.5 — rank #1, 95.5% manual, $87.16
USACO Episodic+Semantic + Claude Opus 4.1 High — rank #3, $267.72
h2oGPTe Agent v1.6.44 (claude-sonnet-4.5 + gpt-5, h2o.ai)
Claude 4.5 Sonnet high reasoning — mini-SWE-agent v2
Claude Code + GBOX MCP (GBOX AI, Oct 2025)
Claude-Sonnet-4-5-20250929 FC — rank #2
Claude Sonnet 4.5 — rank #7, reasoning enabled, Sierra eval
claude-sonnet-4-6 — rank #1, max_steps=100, Anthropic Mar 2026
OpenHands-Versa + Claude Sonnet 4 — rank #3, 33.1% resolved
Claude 4.5 Sonnet — rank #6, calibration error 65.0%
Claude 4.6 Sonnet Thinking Medium Effort — rank #5
Claude-Sonnet-4 Thinking — rank #13, easy=98.4 med=72.9 hard=31.4
claude_3_7_sonnet_inspect — p50=60.4 min autonomous work, avg_score=0.558
Claude Code + Claude Sonnet 4.5 (Sep 2025) — rank #2, $68.33