Back to Models

DeepSeek-R1-Distill-Llama-8B

Standard Transformer 8.0B Parameters

Model Specifications

Layers 32
Hidden Dimension 4,096
Attention Heads 32
KV Heads 8
Max Context 131K tokens
Vocabulary Size 128,256

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.58 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context
FP16 16.0 bpw FP32 16.8 GB 19.38 GB (+2.0 KV) 21.38 GB (+4.0 KV) 25.38 GB (+8.0 KV) 33.38 GB (+16.0 KV) 49.38 GB (+32.0 KV)
FP16 16.0 bpw FP16 16.8 GB 18.38 GB (+1.0 KV) 19.38 GB (+2.0 KV) 21.38 GB (+4.0 KV) 25.38 GB (+8.0 KV) 33.38 GB (+16.0 KV)
FP16 16.0 bpw Q8_0 16.8 GB 17.93 GB (+0.55 KV) 18.48 GB (+1.1 KV) 19.58 GB (+2.2 KV) 21.78 GB (+4.4 KV) 26.18 GB (+8.8 KV)
FP16 16.0 bpw FP8 (Exp) 16.8 GB 17.88 GB (+0.5 KV) 18.38 GB (+1.0 KV) 19.38 GB (+2.0 KV) 21.38 GB (+4.0 KV) 25.38 GB (+8.0 KV)
FP16 16.0 bpw Q4_0 (Exp) 16.8 GB 17.68 GB (+0.3 KV) 17.98 GB (+0.6 KV) 18.58 GB (+1.2 KV) 19.78 GB (+2.4 KV) 22.18 GB (+4.8 KV)
Q8_0 8.0 bpw FP32 8.4 GB 10.98 GB (+2.0 KV) 12.98 GB (+4.0 KV) 16.98 GB (+8.0 KV) 24.98 GB (+16.0 KV) 40.98 GB (+32.0 KV)
Q8_0 8.0 bpw FP16 8.4 GB 9.98 GB (+1.0 KV) 10.98 GB (+2.0 KV) 12.98 GB (+4.0 KV) 16.98 GB (+8.0 KV) 24.98 GB (+16.0 KV)
Q8_0 8.0 bpw Q8_0 8.4 GB 9.53 GB (+0.55 KV) 10.08 GB (+1.1 KV) 11.18 GB (+2.2 KV) 13.38 GB (+4.4 KV) 17.78 GB (+8.8 KV)
Q8_0 8.0 bpw FP8 (Exp) 8.4 GB 9.48 GB (+0.5 KV) 9.98 GB (+1.0 KV) 10.98 GB (+2.0 KV) 12.98 GB (+4.0 KV) 16.98 GB (+8.0 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 8.4 GB 9.28 GB (+0.3 KV) 9.58 GB (+0.6 KV) 10.18 GB (+1.2 KV) 11.38 GB (+2.4 KV) 13.78 GB (+4.8 KV)
Q4_K_M 4.65 bpw FP32 4.88 GB 7.46 GB (+2.0 KV) 9.46 GB (+4.0 KV) 13.46 GB (+8.0 KV) 21.46 GB (+16.0 KV) 37.46 GB (+32.0 KV)
Q4_K_M 4.65 bpw FP16 4.88 GB 6.46 GB (+1.0 KV) 7.46 GB (+2.0 KV) 9.46 GB (+4.0 KV) 13.46 GB (+8.0 KV) 21.46 GB (+16.0 KV)
Q4_K_M 4.65 bpw Q8_0 4.88 GB 6.01 GB (+0.55 KV) 6.56 GB (+1.1 KV) 7.66 GB (+2.2 KV) 9.86 GB (+4.4 KV) 14.26 GB (+8.8 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 4.88 GB 5.96 GB (+0.5 KV) 6.46 GB (+1.0 KV) 7.46 GB (+2.0 KV) 9.46 GB (+4.0 KV) 13.46 GB (+8.0 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 4.88 GB 5.76 GB (+0.3 KV) 6.06 GB (+0.6 KV) 6.66 GB (+1.2 KV) 7.86 GB (+2.4 KV) 10.26 GB (+4.8 KV)
Q4_K_S 4.58 bpw FP32 4.81 GB 7.39 GB (+2.0 KV) 9.39 GB (+4.0 KV) 13.39 GB (+8.0 KV) 21.39 GB (+16.0 KV) 37.39 GB (+32.0 KV)
Q4_K_S 4.58 bpw FP16 4.81 GB 6.39 GB (+1.0 KV) 7.39 GB (+2.0 KV) 9.39 GB (+4.0 KV) 13.39 GB (+8.0 KV) 21.39 GB (+16.0 KV)
Q4_K_S 4.58 bpw Q8_0 4.81 GB 5.94 GB (+0.55 KV) 6.49 GB (+1.1 KV) 7.59 GB (+2.2 KV) 9.79 GB (+4.4 KV) 14.19 GB (+8.8 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 4.81 GB 5.89 GB (+0.5 KV) 6.39 GB (+1.0 KV) 7.39 GB (+2.0 KV) 9.39 GB (+4.0 KV) 13.39 GB (+8.0 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 4.81 GB 5.69 GB (+0.3 KV) 5.99 GB (+0.6 KV) 6.59 GB (+1.2 KV) 7.79 GB (+2.4 KV) 10.19 GB (+4.8 KV)
Q3_K_M 3.91 bpw FP32 4.11 GB 6.69 GB (+2.0 KV) 8.69 GB (+4.0 KV) 12.69 GB (+8.0 KV) 20.69 GB (+16.0 KV) 36.69 GB (+32.0 KV)
Q3_K_M 3.91 bpw FP16 4.11 GB 5.69 GB (+1.0 KV) 6.69 GB (+2.0 KV) 8.69 GB (+4.0 KV) 12.69 GB (+8.0 KV) 20.69 GB (+16.0 KV)
Q3_K_M 3.91 bpw Q8_0 4.11 GB 5.24 GB (+0.55 KV) 5.79 GB (+1.1 KV) 6.89 GB (+2.2 KV) 9.09 GB (+4.4 KV) 13.49 GB (+8.8 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 4.11 GB 5.19 GB (+0.5 KV) 5.69 GB (+1.0 KV) 6.69 GB (+2.0 KV) 8.69 GB (+4.0 KV) 12.69 GB (+8.0 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 4.11 GB 4.99 GB (+0.3 KV) 5.29 GB (+0.6 KV) 5.89 GB (+1.2 KV) 7.09 GB (+2.4 KV) 9.49 GB (+4.8 KV)
Q2_K 2.63 bpw FP32 2.76 GB 5.34 GB (+2.0 KV) 7.34 GB (+4.0 KV) 11.34 GB (+8.0 KV) 19.34 GB (+16.0 KV) 35.34 GB (+32.0 KV)
Q2_K 2.63 bpw FP16 2.76 GB 4.34 GB (+1.0 KV) 5.34 GB (+2.0 KV) 7.34 GB (+4.0 KV) 11.34 GB (+8.0 KV) 19.34 GB (+16.0 KV)
Q2_K 2.63 bpw Q8_0 2.76 GB 3.89 GB (+0.55 KV) 4.44 GB (+1.1 KV) 5.54 GB (+2.2 KV) 7.74 GB (+4.4 KV) 12.14 GB (+8.8 KV)
Q2_K 2.63 bpw FP8 (Exp) 2.76 GB 3.84 GB (+0.5 KV) 4.34 GB (+1.0 KV) 5.34 GB (+2.0 KV) 7.34 GB (+4.0 KV) 11.34 GB (+8.0 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 2.76 GB 3.64 GB (+0.3 KV) 3.94 GB (+0.6 KV) 4.54 GB (+1.2 KV) 5.74 GB (+2.4 KV) 8.14 GB (+4.8 KV)

Total VRAM = Model Weights + KV Cache + 0.58 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run DeepSeek-R1-Distill-Llama-8B

Use our calculator to see if this model fits your specific hardware configuration.