DeepSeek AI

Deepseek R1 Distill

Open source6 variants

DeepSeek-R1-Distill-Qwen-1.5B

reasoning1.5B

DeepSeek's R1 reasoning distilled into a tiny 1.5B model — chain-of-thought reasoning at ultra-low compute for edge deployment.

164K tokensFree / Open weightsTransformerMIT

DeepSeek-R1-Distill-Qwen-32B

reasoning32B

HuggingFace

DeepSeek's R1 reasoning capabilities distilled into a 32B Qwen model — strong math and coding at a fraction of full R1 cost.

164K tokensFree / Open weightsTransformerMIT

GAIA

DeepSeek-R1-Distill-Qwen-32B

57.5%

DeepSeek-R1-Distill-Llama-8B

27.6%

DeepSeek-R1-Distill-Qwen-7B

25.6%

MetaAgent_v0.2.1 (o3) w/ 32B distill

DeepSeek-R1-Distill-Llama-70B

reasoning70B

HuggingFace

DeepSeek's R1 reasoning distilled into Llama 70B — frontier-level chain-of-thought at 70B scale on a Llama architecture.

164K tokensFree / Open weightsTransformerMIT

DeepSeek-R1-Distill-Qwen-7B

reasoning7B

HuggingFace

DeepSeek's R1 reasoning distilled into a 7B Qwen base — solid chain-of-thought at a highly deployable size.

164K tokensFree / Open weightsTransformerMIT

GAIA

DeepSeek-R1-Distill-Qwen-7B

25.6%

DeepSeek-R1-Distill-Qwen-32B

57.5%

DeepSeek-R1-Distill-Llama-8B

27.6%

Qwen3-32B-RL system using distill

DeepSeek-R1-0528-Qwen3-8B

reasoning8B

HuggingFace

DeepSeek's R1 reasoning distilled into an 8B Qwen3 base — strong chain-of-thought at a compact, deployable size.

164K tokensFree / Open weightsTransformerMIT

DeepSeek-R1-Distill-Llama-8B

reasoning8B

HuggingFace

DeepSeek's R1 reasoning distilled into Llama 3 8B — strong chain-of-thought on a Llama base for broad ecosystem compatibility.

164K tokensFree / Open weightsTransformerMIT

GAIA

DeepSeek-R1-Distill-Llama-8B

27.6%

DeepSeek-R1-Distill-Qwen-32B

57.5%

DeepSeek-R1-Distill-Qwen-7B

25.6%

TapeAgent v0.1 (GPT-4o) referencing distill

Back to Models

DeepSeek AI

Deepseek R1 Distill

Open source6 variants

DeepSeek-R1-Distill-Qwen-1.5B

reasoning1.5B

HuggingFace

DeepSeek's R1 reasoning distilled into a tiny 1.5B model — chain-of-thought reasoning at ultra-low compute for edge deployment.

164K tokensFree / Open weightsTransformerMIT

DeepSeek-R1-Distill-Qwen-32B

reasoning32B

HuggingFace

DeepSeek's R1 reasoning capabilities distilled into a 32B Qwen model — strong math and coding at a fraction of full R1 cost.

164K tokensFree / Open weightsTransformerMIT

GAIA

DeepSeek-R1-Distill-Qwen-32B

57.5%

DeepSeek-R1-Distill-Llama-8B

27.6%

DeepSeek-R1-Distill-Qwen-7B

25.6%

MetaAgent_v0.2.1 (o3) w/ 32B distill

DeepSeek-R1-Distill-Llama-70B

reasoning70B

HuggingFace

DeepSeek's R1 reasoning distilled into Llama 70B — frontier-level chain-of-thought at 70B scale on a Llama architecture.

164K tokensFree / Open weightsTransformerMIT

DeepSeek-R1-Distill-Qwen-7B

reasoning7B

HuggingFace

DeepSeek's R1 reasoning distilled into a 7B Qwen base — solid chain-of-thought at a highly deployable size.

164K tokensFree / Open weightsTransformerMIT

GAIA

DeepSeek-R1-Distill-Qwen-7B

25.6%

DeepSeek-R1-Distill-Qwen-32B

57.5%

DeepSeek-R1-Distill-Llama-8B

27.6%

Qwen3-32B-RL system using distill

DeepSeek-R1-0528-Qwen3-8B

reasoning8B

HuggingFace

DeepSeek's R1 reasoning distilled into an 8B Qwen3 base — strong chain-of-thought at a compact, deployable size.

164K tokensFree / Open weightsTransformerMIT

DeepSeek-R1-Distill-Llama-8B

reasoning8B

HuggingFace

DeepSeek's R1 reasoning distilled into Llama 3 8B — strong chain-of-thought on a Llama base for broad ecosystem compatibility.

164K tokensFree / Open weightsTransformerMIT

GAIA

DeepSeek-R1-Distill-Llama-8B

27.6%

DeepSeek-R1-Distill-Qwen-32B

57.5%

DeepSeek-R1-Distill-Qwen-7B

25.6%

TapeAgent v0.1 (GPT-4o) referencing distill