DeepSeek AI
DeepSeek's flagship 671B reasoning model with chain-of-thought, matching o1 performance on math and coding benchmarks.
SU AI Zero (Anthropic+Google+OpenAI, Suzhou AI Lab)
DeepSeek-R1 — rank #8, calibration error 73.0%
DeepSeek's May 2025 R1 update — stronger reasoning across math, coding, and science over the original R1 release.
DeepSeek's pure-RL reasoning model trained without SFT — demonstrates emergent chain-of-thought through reinforcement learning alone.
DeepSeek-R1-0528, easy=99.0 med=89.3 hard=60.4