xAI's Grok 4 flagship — frontier reasoning and multimodal model with a 256K context window and tool-use capabilities.
exo-unoptimized (grok-4-fast + gemini-2.5-flash)
Grok-4-0709 Prompt — rank #9
Grok 4 — rank #3, calibration error 56.4%
Grok 4 — rank #24