Back to Models

DeepSeek-R1-Distill-Llama-70B

Standard Transformer 70.6B Parameters

Model Specifications

Layers 80
Hidden Dimension 8,192
Attention Heads 64
KV Heads 8
Max Context 131K tokens
Vocabulary Size 128,256

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 1.21 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context
FP16 16.0 bpw FP32 148.26 GB 154.47 GB (+5.0 KV) 159.47 GB (+10.0 KV) 169.47 GB (+20.0 KV) 189.47 GB (+40.0 KV) 229.47 GB (+80.0 KV)
FP16 16.0 bpw FP16 148.26 GB 151.97 GB (+2.5 KV) 154.47 GB (+5.0 KV) 159.47 GB (+10.0 KV) 169.47 GB (+20.0 KV) 189.47 GB (+40.0 KV)
FP16 16.0 bpw Q8_0 148.26 GB 150.84 GB (+1.38 KV) 152.22 GB (+2.75 KV) 154.97 GB (+5.5 KV) 160.47 GB (+11.0 KV) 171.47 GB (+22.0 KV)
FP16 16.0 bpw FP8 (Exp) 148.26 GB 150.72 GB (+1.25 KV) 151.97 GB (+2.5 KV) 154.47 GB (+5.0 KV) 159.47 GB (+10.0 KV) 169.47 GB (+20.0 KV)
FP16 16.0 bpw Q4_0 (Exp) 148.26 GB 150.22 GB (+0.75 KV) 150.97 GB (+1.5 KV) 152.47 GB (+3.0 KV) 155.47 GB (+6.0 KV) 161.47 GB (+12.0 KV)
Q8_0 8.0 bpw FP32 74.13 GB 80.34 GB (+5.0 KV) 85.34 GB (+10.0 KV) 95.34 GB (+20.0 KV) 115.34 GB (+40.0 KV) 155.34 GB (+80.0 KV)
Q8_0 8.0 bpw FP16 74.13 GB 77.84 GB (+2.5 KV) 80.34 GB (+5.0 KV) 85.34 GB (+10.0 KV) 95.34 GB (+20.0 KV) 115.34 GB (+40.0 KV)
Q8_0 8.0 bpw Q8_0 74.13 GB 76.71 GB (+1.38 KV) 78.09 GB (+2.75 KV) 80.84 GB (+5.5 KV) 86.34 GB (+11.0 KV) 97.34 GB (+22.0 KV)
Q8_0 8.0 bpw FP8 (Exp) 74.13 GB 76.59 GB (+1.25 KV) 77.84 GB (+2.5 KV) 80.34 GB (+5.0 KV) 85.34 GB (+10.0 KV) 95.34 GB (+20.0 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 74.13 GB 76.09 GB (+0.75 KV) 76.84 GB (+1.5 KV) 78.34 GB (+3.0 KV) 81.34 GB (+6.0 KV) 87.34 GB (+12.0 KV)
Q4_K_M 4.65 bpw FP32 43.09 GB 49.29 GB (+5.0 KV) 54.29 GB (+10.0 KV) 64.29 GB (+20.0 KV) 84.29 GB (+40.0 KV) 124.29 GB (+80.0 KV)
Q4_K_M 4.65 bpw FP16 43.09 GB 46.79 GB (+2.5 KV) 49.29 GB (+5.0 KV) 54.29 GB (+10.0 KV) 64.29 GB (+20.0 KV) 84.29 GB (+40.0 KV)
Q4_K_M 4.65 bpw Q8_0 43.09 GB 45.67 GB (+1.38 KV) 47.04 GB (+2.75 KV) 49.79 GB (+5.5 KV) 55.29 GB (+11.0 KV) 66.29 GB (+22.0 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 43.09 GB 45.54 GB (+1.25 KV) 46.79 GB (+2.5 KV) 49.29 GB (+5.0 KV) 54.29 GB (+10.0 KV) 64.29 GB (+20.0 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 43.09 GB 45.04 GB (+0.75 KV) 45.79 GB (+1.5 KV) 47.29 GB (+3.0 KV) 50.29 GB (+6.0 KV) 56.29 GB (+12.0 KV)
Q4_K_S 4.58 bpw FP32 42.44 GB 48.65 GB (+5.0 KV) 53.65 GB (+10.0 KV) 63.65 GB (+20.0 KV) 83.65 GB (+40.0 KV) 123.65 GB (+80.0 KV)
Q4_K_S 4.58 bpw FP16 42.44 GB 46.15 GB (+2.5 KV) 48.65 GB (+5.0 KV) 53.65 GB (+10.0 KV) 63.65 GB (+20.0 KV) 83.65 GB (+40.0 KV)
Q4_K_S 4.58 bpw Q8_0 42.44 GB 45.02 GB (+1.38 KV) 46.4 GB (+2.75 KV) 49.15 GB (+5.5 KV) 54.65 GB (+11.0 KV) 65.65 GB (+22.0 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 42.44 GB 44.9 GB (+1.25 KV) 46.15 GB (+2.5 KV) 48.65 GB (+5.0 KV) 53.65 GB (+10.0 KV) 63.65 GB (+20.0 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 42.44 GB 44.4 GB (+0.75 KV) 45.15 GB (+1.5 KV) 46.65 GB (+3.0 KV) 49.65 GB (+6.0 KV) 55.65 GB (+12.0 KV)
Q3_K_M 3.91 bpw FP32 36.23 GB 42.44 GB (+5.0 KV) 47.44 GB (+10.0 KV) 57.44 GB (+20.0 KV) 77.44 GB (+40.0 KV) 117.44 GB (+80.0 KV)
Q3_K_M 3.91 bpw FP16 36.23 GB 39.94 GB (+2.5 KV) 42.44 GB (+5.0 KV) 47.44 GB (+10.0 KV) 57.44 GB (+20.0 KV) 77.44 GB (+40.0 KV)
Q3_K_M 3.91 bpw Q8_0 36.23 GB 38.81 GB (+1.38 KV) 40.19 GB (+2.75 KV) 42.94 GB (+5.5 KV) 48.44 GB (+11.0 KV) 59.44 GB (+22.0 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 36.23 GB 38.69 GB (+1.25 KV) 39.94 GB (+2.5 KV) 42.44 GB (+5.0 KV) 47.44 GB (+10.0 KV) 57.44 GB (+20.0 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 36.23 GB 38.19 GB (+0.75 KV) 38.94 GB (+1.5 KV) 40.44 GB (+3.0 KV) 43.44 GB (+6.0 KV) 49.44 GB (+12.0 KV)
Q2_K 2.63 bpw FP32 24.37 GB 30.58 GB (+5.0 KV) 35.58 GB (+10.0 KV) 45.58 GB (+20.0 KV) 65.58 GB (+40.0 KV) 105.58 GB (+80.0 KV)
Q2_K 2.63 bpw FP16 24.37 GB 28.08 GB (+2.5 KV) 30.58 GB (+5.0 KV) 35.58 GB (+10.0 KV) 45.58 GB (+20.0 KV) 65.58 GB (+40.0 KV)
Q2_K 2.63 bpw Q8_0 24.37 GB 26.95 GB (+1.38 KV) 28.33 GB (+2.75 KV) 31.08 GB (+5.5 KV) 36.58 GB (+11.0 KV) 47.58 GB (+22.0 KV)
Q2_K 2.63 bpw FP8 (Exp) 24.37 GB 26.83 GB (+1.25 KV) 28.08 GB (+2.5 KV) 30.58 GB (+5.0 KV) 35.58 GB (+10.0 KV) 45.58 GB (+20.0 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 24.37 GB 26.33 GB (+0.75 KV) 27.08 GB (+1.5 KV) 28.58 GB (+3.0 KV) 31.58 GB (+6.0 KV) 37.58 GB (+12.0 KV)

Total VRAM = Model Weights + KV Cache + 1.21 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run DeepSeek-R1-Distill-Llama-70B

Use our calculator to see if this model fits your specific hardware configuration.