Back to Models

DeepSeek-R1-Distill-Qwen-32B

Standard Transformer 32.5B Parameters

Model Specifications

Layers 64
Hidden Dimension 5,120
Attention Heads 40
KV Heads 8
Max Context 131K tokens
Vocabulary Size 152,064

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.82 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context
FP16 16.0 bpw FP32 68.25 GB 73.08 GB (+4.0 KV) 77.08 GB (+8.0 KV) 85.08 GB (+16.0 KV) 101.08 GB (+32.0 KV) 133.07 GB (+64.0 KV)
FP16 16.0 bpw FP16 68.25 GB 71.08 GB (+2.0 KV) 73.08 GB (+4.0 KV) 77.08 GB (+8.0 KV) 85.08 GB (+16.0 KV) 101.08 GB (+32.0 KV)
FP16 16.0 bpw Q8_0 68.25 GB 70.17 GB (+1.1 KV) 71.28 GB (+2.2 KV) 73.48 GB (+4.4 KV) 77.88 GB (+8.8 KV) 86.67 GB (+17.6 KV)
FP16 16.0 bpw FP8 (Exp) 68.25 GB 70.08 GB (+1.0 KV) 71.08 GB (+2.0 KV) 73.08 GB (+4.0 KV) 77.08 GB (+8.0 KV) 85.08 GB (+16.0 KV)
FP16 16.0 bpw Q4_0 (Exp) 68.25 GB 69.67 GB (+0.6 KV) 70.28 GB (+1.2 KV) 71.48 GB (+2.4 KV) 73.88 GB (+4.8 KV) 78.67 GB (+9.6 KV)
Q8_0 8.0 bpw FP32 34.12 GB 38.95 GB (+4.0 KV) 42.95 GB (+8.0 KV) 50.95 GB (+16.0 KV) 66.95 GB (+32.0 KV) 98.95 GB (+64.0 KV)
Q8_0 8.0 bpw FP16 34.12 GB 36.95 GB (+2.0 KV) 38.95 GB (+4.0 KV) 42.95 GB (+8.0 KV) 50.95 GB (+16.0 KV) 66.95 GB (+32.0 KV)
Q8_0 8.0 bpw Q8_0 34.12 GB 36.05 GB (+1.1 KV) 37.15 GB (+2.2 KV) 39.35 GB (+4.4 KV) 43.75 GB (+8.8 KV) 52.55 GB (+17.6 KV)
Q8_0 8.0 bpw FP8 (Exp) 34.12 GB 35.95 GB (+1.0 KV) 36.95 GB (+2.0 KV) 38.95 GB (+4.0 KV) 42.95 GB (+8.0 KV) 50.95 GB (+16.0 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 34.12 GB 35.55 GB (+0.6 KV) 36.15 GB (+1.2 KV) 37.35 GB (+2.4 KV) 39.75 GB (+4.8 KV) 44.55 GB (+9.6 KV)
Q4_K_M 4.65 bpw FP32 19.84 GB 24.66 GB (+4.0 KV) 28.66 GB (+8.0 KV) 36.66 GB (+16.0 KV) 52.66 GB (+32.0 KV) 84.66 GB (+64.0 KV)
Q4_K_M 4.65 bpw FP16 19.84 GB 22.66 GB (+2.0 KV) 24.66 GB (+4.0 KV) 28.66 GB (+8.0 KV) 36.66 GB (+16.0 KV) 52.66 GB (+32.0 KV)
Q4_K_M 4.65 bpw Q8_0 19.84 GB 21.76 GB (+1.1 KV) 22.86 GB (+2.2 KV) 25.06 GB (+4.4 KV) 29.46 GB (+8.8 KV) 38.26 GB (+17.6 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 19.84 GB 21.66 GB (+1.0 KV) 22.66 GB (+2.0 KV) 24.66 GB (+4.0 KV) 28.66 GB (+8.0 KV) 36.66 GB (+16.0 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 19.84 GB 21.26 GB (+0.6 KV) 21.86 GB (+1.2 KV) 23.06 GB (+2.4 KV) 25.46 GB (+4.8 KV) 30.26 GB (+9.6 KV)
Q4_K_S 4.58 bpw FP32 19.54 GB 24.36 GB (+4.0 KV) 28.36 GB (+8.0 KV) 36.36 GB (+16.0 KV) 52.36 GB (+32.0 KV) 84.36 GB (+64.0 KV)
Q4_K_S 4.58 bpw FP16 19.54 GB 22.36 GB (+2.0 KV) 24.36 GB (+4.0 KV) 28.36 GB (+8.0 KV) 36.36 GB (+16.0 KV) 52.36 GB (+32.0 KV)
Q4_K_S 4.58 bpw Q8_0 19.54 GB 21.46 GB (+1.1 KV) 22.56 GB (+2.2 KV) 24.76 GB (+4.4 KV) 29.16 GB (+8.8 KV) 37.96 GB (+17.6 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 19.54 GB 21.36 GB (+1.0 KV) 22.36 GB (+2.0 KV) 24.36 GB (+4.0 KV) 28.36 GB (+8.0 KV) 36.36 GB (+16.0 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 19.54 GB 20.96 GB (+0.6 KV) 21.56 GB (+1.2 KV) 22.76 GB (+2.4 KV) 25.16 GB (+4.8 KV) 29.96 GB (+9.6 KV)
Q3_K_M 3.91 bpw FP32 16.68 GB 21.5 GB (+4.0 KV) 25.5 GB (+8.0 KV) 33.5 GB (+16.0 KV) 49.5 GB (+32.0 KV) 81.5 GB (+64.0 KV)
Q3_K_M 3.91 bpw FP16 16.68 GB 19.5 GB (+2.0 KV) 21.5 GB (+4.0 KV) 25.5 GB (+8.0 KV) 33.5 GB (+16.0 KV) 49.5 GB (+32.0 KV)
Q3_K_M 3.91 bpw Q8_0 16.68 GB 18.6 GB (+1.1 KV) 19.7 GB (+2.2 KV) 21.9 GB (+4.4 KV) 26.3 GB (+8.8 KV) 35.1 GB (+17.6 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 16.68 GB 18.5 GB (+1.0 KV) 19.5 GB (+2.0 KV) 21.5 GB (+4.0 KV) 25.5 GB (+8.0 KV) 33.5 GB (+16.0 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 16.68 GB 18.1 GB (+0.6 KV) 18.7 GB (+1.2 KV) 19.9 GB (+2.4 KV) 22.3 GB (+4.8 KV) 27.1 GB (+9.6 KV)
Q2_K 2.63 bpw FP32 11.22 GB 16.04 GB (+4.0 KV) 20.04 GB (+8.0 KV) 28.04 GB (+16.0 KV) 44.04 GB (+32.0 KV) 76.04 GB (+64.0 KV)
Q2_K 2.63 bpw FP16 11.22 GB 14.04 GB (+2.0 KV) 16.04 GB (+4.0 KV) 20.04 GB (+8.0 KV) 28.04 GB (+16.0 KV) 44.04 GB (+32.0 KV)
Q2_K 2.63 bpw Q8_0 11.22 GB 13.14 GB (+1.1 KV) 14.24 GB (+2.2 KV) 16.44 GB (+4.4 KV) 20.84 GB (+8.8 KV) 29.64 GB (+17.6 KV)
Q2_K 2.63 bpw FP8 (Exp) 11.22 GB 13.04 GB (+1.0 KV) 14.04 GB (+2.0 KV) 16.04 GB (+4.0 KV) 20.04 GB (+8.0 KV) 28.04 GB (+16.0 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 11.22 GB 12.64 GB (+0.6 KV) 13.24 GB (+1.2 KV) 14.44 GB (+2.4 KV) 16.84 GB (+4.8 KV) 21.64 GB (+9.6 KV)

Total VRAM = Model Weights + KV Cache + 0.82 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run DeepSeek-R1-Distill-Qwen-32B

Use our calculator to see if this model fits your specific hardware configuration.