Back to Models

DeepSeek-R1-Distill-Qwen-1.5B

Standard Transformer 1.5B Parameters

Model Specifications

Layers 28
Hidden Dimension 1,536
Attention Heads 12
KV Heads 2
Max Context 131K tokens
Vocabulary Size 151,936

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.52 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context
FP16 16.0 bpw FP32 3.15 GB 4.1 GB (+0.44 KV) 4.54 GB (+0.88 KV) 5.42 GB (+1.75 KV) 7.17 GB (+3.5 KV) 10.67 GB (+7.0 KV)
FP16 16.0 bpw FP16 3.15 GB 3.88 GB (+0.22 KV) 4.1 GB (+0.44 KV) 4.54 GB (+0.88 KV) 5.42 GB (+1.75 KV) 7.17 GB (+3.5 KV)
FP16 16.0 bpw Q8_0 3.15 GB 3.79 GB (+0.12 KV) 3.91 GB (+0.24 KV) 4.15 GB (+0.48 KV) 4.63 GB (+0.96 KV) 5.59 GB (+1.93 KV)
FP16 16.0 bpw FP8 (Exp) 3.15 GB 3.77 GB (+0.11 KV) 3.88 GB (+0.22 KV) 4.1 GB (+0.44 KV) 4.54 GB (+0.88 KV) 5.42 GB (+1.75 KV)
FP16 16.0 bpw Q4_0 (Exp) 3.15 GB 3.73 GB (+0.07 KV) 3.8 GB (+0.13 KV) 3.93 GB (+0.26 KV) 4.19 GB (+0.53 KV) 4.71 GB (+1.05 KV)
Q8_0 8.0 bpw FP32 1.58 GB 2.53 GB (+0.44 KV) 2.97 GB (+0.88 KV) 3.84 GB (+1.75 KV) 5.59 GB (+3.5 KV) 9.09 GB (+7.0 KV)
Q8_0 8.0 bpw FP16 1.58 GB 2.31 GB (+0.22 KV) 2.53 GB (+0.44 KV) 2.97 GB (+0.88 KV) 3.84 GB (+1.75 KV) 5.59 GB (+3.5 KV)
Q8_0 8.0 bpw Q8_0 1.58 GB 2.21 GB (+0.12 KV) 2.33 GB (+0.24 KV) 2.57 GB (+0.48 KV) 3.05 GB (+0.96 KV) 4.02 GB (+1.93 KV)
Q8_0 8.0 bpw FP8 (Exp) 1.58 GB 2.2 GB (+0.11 KV) 2.31 GB (+0.22 KV) 2.53 GB (+0.44 KV) 2.97 GB (+0.88 KV) 3.84 GB (+1.75 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 1.58 GB 2.16 GB (+0.07 KV) 2.22 GB (+0.13 KV) 2.35 GB (+0.26 KV) 2.62 GB (+0.53 KV) 3.14 GB (+1.05 KV)
Q4_K_M 4.65 bpw FP32 0.92 GB 1.87 GB (+0.44 KV) 2.31 GB (+0.88 KV) 3.18 GB (+1.75 KV) 4.93 GB (+3.5 KV) 8.43 GB (+7.0 KV)
Q4_K_M 4.65 bpw FP16 0.92 GB 1.65 GB (+0.22 KV) 1.87 GB (+0.44 KV) 2.31 GB (+0.88 KV) 3.18 GB (+1.75 KV) 4.93 GB (+3.5 KV)
Q4_K_M 4.65 bpw Q8_0 0.92 GB 1.55 GB (+0.12 KV) 1.67 GB (+0.24 KV) 1.91 GB (+0.48 KV) 2.39 GB (+0.96 KV) 3.36 GB (+1.93 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 0.92 GB 1.54 GB (+0.11 KV) 1.65 GB (+0.22 KV) 1.87 GB (+0.44 KV) 2.31 GB (+0.88 KV) 3.18 GB (+1.75 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 0.92 GB 1.5 GB (+0.07 KV) 1.56 GB (+0.13 KV) 1.69 GB (+0.26 KV) 1.96 GB (+0.53 KV) 2.48 GB (+1.05 KV)
Q4_K_S 4.58 bpw FP32 0.9 GB 1.85 GB (+0.44 KV) 2.29 GB (+0.88 KV) 3.17 GB (+1.75 KV) 4.92 GB (+3.5 KV) 8.42 GB (+7.0 KV)
Q4_K_S 4.58 bpw FP16 0.9 GB 1.64 GB (+0.22 KV) 1.85 GB (+0.44 KV) 2.29 GB (+0.88 KV) 3.17 GB (+1.75 KV) 4.92 GB (+3.5 KV)
Q4_K_S 4.58 bpw Q8_0 0.9 GB 1.54 GB (+0.12 KV) 1.66 GB (+0.24 KV) 1.9 GB (+0.48 KV) 2.38 GB (+0.96 KV) 3.34 GB (+1.93 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 0.9 GB 1.53 GB (+0.11 KV) 1.64 GB (+0.22 KV) 1.85 GB (+0.44 KV) 2.29 GB (+0.88 KV) 3.17 GB (+1.75 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 0.9 GB 1.48 GB (+0.07 KV) 1.55 GB (+0.13 KV) 1.68 GB (+0.26 KV) 1.94 GB (+0.53 KV) 2.47 GB (+1.05 KV)
Q3_K_M 3.91 bpw FP32 0.77 GB 1.72 GB (+0.44 KV) 2.16 GB (+0.88 KV) 3.03 GB (+1.75 KV) 4.78 GB (+3.5 KV) 8.28 GB (+7.0 KV)
Q3_K_M 3.91 bpw FP16 0.77 GB 1.5 GB (+0.22 KV) 1.72 GB (+0.44 KV) 2.16 GB (+0.88 KV) 3.03 GB (+1.75 KV) 4.78 GB (+3.5 KV)
Q3_K_M 3.91 bpw Q8_0 0.77 GB 1.41 GB (+0.12 KV) 1.53 GB (+0.24 KV) 1.77 GB (+0.48 KV) 2.25 GB (+0.96 KV) 3.21 GB (+1.93 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 0.77 GB 1.39 GB (+0.11 KV) 1.5 GB (+0.22 KV) 1.72 GB (+0.44 KV) 2.16 GB (+0.88 KV) 3.03 GB (+1.75 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 0.77 GB 1.35 GB (+0.07 KV) 1.42 GB (+0.13 KV) 1.55 GB (+0.26 KV) 1.81 GB (+0.53 KV) 2.33 GB (+1.05 KV)
Q2_K 2.63 bpw FP32 0.52 GB 1.47 GB (+0.44 KV) 1.91 GB (+0.88 KV) 2.78 GB (+1.75 KV) 4.53 GB (+3.5 KV) 8.03 GB (+7.0 KV)
Q2_K 2.63 bpw FP16 0.52 GB 1.25 GB (+0.22 KV) 1.47 GB (+0.44 KV) 1.91 GB (+0.88 KV) 2.78 GB (+1.75 KV) 4.53 GB (+3.5 KV)
Q2_K 2.63 bpw Q8_0 0.52 GB 1.15 GB (+0.12 KV) 1.27 GB (+0.24 KV) 1.51 GB (+0.48 KV) 2.0 GB (+0.96 KV) 2.96 GB (+1.93 KV)
Q2_K 2.63 bpw FP8 (Exp) 0.52 GB 1.14 GB (+0.11 KV) 1.25 GB (+0.22 KV) 1.47 GB (+0.44 KV) 1.91 GB (+0.88 KV) 2.78 GB (+1.75 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 0.52 GB 1.1 GB (+0.07 KV) 1.16 GB (+0.13 KV) 1.3 GB (+0.26 KV) 1.56 GB (+0.53 KV) 2.08 GB (+1.05 KV)

Total VRAM = Model Weights + KV Cache + 0.52 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run DeepSeek-R1-Distill-Qwen-1.5B

Use our calculator to see if this model fits your specific hardware configuration.