Back to Deepseek R1
DeepSeek's pure-RL reasoning model trained without SFT — demonstrates emergent chain-of-thought through reinforcement learning alone.
164K tokensFree / Open weightsMoEMIT
No benchmark scores available yet for this model.
DeepSeek's pure-RL reasoning model trained without SFT — demonstrates emergent chain-of-thought through reinforcement learning alone.
No benchmark scores available yet for this model.