Back to Models

Qwen2.5-72B-Instruct

Standard Transformer 72.7B Parameters

Model Specifications

Layers 80
Hidden Dimension 8,192
Attention Heads 64
KV Heads 8
Max Context 131K tokens
Vocabulary Size 152,064

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 1.23 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context
FP16 16.0 bpw FP32 152.67 GB 158.9 GB (+5.0 KV) 163.9 GB (+10.0 KV) 173.9 GB (+20.0 KV) 193.9 GB (+40.0 KV) 233.9 GB (+80.0 KV)
FP16 16.0 bpw FP16 152.67 GB 156.4 GB (+2.5 KV) 158.9 GB (+5.0 KV) 163.9 GB (+10.0 KV) 173.9 GB (+20.0 KV) 193.9 GB (+40.0 KV)
FP16 16.0 bpw Q8_0 152.67 GB 155.27 GB (+1.38 KV) 156.65 GB (+2.75 KV) 159.4 GB (+5.5 KV) 164.9 GB (+11.0 KV) 175.9 GB (+22.0 KV)
FP16 16.0 bpw FP8 (Exp) 152.67 GB 155.15 GB (+1.25 KV) 156.4 GB (+2.5 KV) 158.9 GB (+5.0 KV) 163.9 GB (+10.0 KV) 173.9 GB (+20.0 KV)
FP16 16.0 bpw Q4_0 (Exp) 152.67 GB 154.65 GB (+0.75 KV) 155.4 GB (+1.5 KV) 156.9 GB (+3.0 KV) 159.9 GB (+6.0 KV) 165.9 GB (+12.0 KV)
Q8_0 8.0 bpw FP32 76.34 GB 82.56 GB (+5.0 KV) 87.56 GB (+10.0 KV) 97.56 GB (+20.0 KV) 117.56 GB (+40.0 KV) 157.56 GB (+80.0 KV)
Q8_0 8.0 bpw FP16 76.34 GB 80.06 GB (+2.5 KV) 82.56 GB (+5.0 KV) 87.56 GB (+10.0 KV) 97.56 GB (+20.0 KV) 117.56 GB (+40.0 KV)
Q8_0 8.0 bpw Q8_0 76.34 GB 78.94 GB (+1.38 KV) 80.31 GB (+2.75 KV) 83.06 GB (+5.5 KV) 88.56 GB (+11.0 KV) 99.56 GB (+22.0 KV)
Q8_0 8.0 bpw FP8 (Exp) 76.34 GB 78.81 GB (+1.25 KV) 80.06 GB (+2.5 KV) 82.56 GB (+5.0 KV) 87.56 GB (+10.0 KV) 97.56 GB (+20.0 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 76.34 GB 78.31 GB (+0.75 KV) 79.06 GB (+1.5 KV) 80.56 GB (+3.0 KV) 83.56 GB (+6.0 KV) 89.56 GB (+12.0 KV)
Q4_K_M 4.65 bpw FP32 44.37 GB 50.6 GB (+5.0 KV) 55.6 GB (+10.0 KV) 65.6 GB (+20.0 KV) 85.6 GB (+40.0 KV) 125.6 GB (+80.0 KV)
Q4_K_M 4.65 bpw FP16 44.37 GB 48.1 GB (+2.5 KV) 50.6 GB (+5.0 KV) 55.6 GB (+10.0 KV) 65.6 GB (+20.0 KV) 85.6 GB (+40.0 KV)
Q4_K_M 4.65 bpw Q8_0 44.37 GB 46.97 GB (+1.38 KV) 48.35 GB (+2.75 KV) 51.1 GB (+5.5 KV) 56.6 GB (+11.0 KV) 67.6 GB (+22.0 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 44.37 GB 46.85 GB (+1.25 KV) 48.1 GB (+2.5 KV) 50.6 GB (+5.0 KV) 55.6 GB (+10.0 KV) 65.6 GB (+20.0 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 44.37 GB 46.35 GB (+0.75 KV) 47.1 GB (+1.5 KV) 48.6 GB (+3.0 KV) 51.6 GB (+6.0 KV) 57.6 GB (+12.0 KV)
Q4_K_S 4.58 bpw FP32 43.7 GB 49.93 GB (+5.0 KV) 54.93 GB (+10.0 KV) 64.93 GB (+20.0 KV) 84.93 GB (+40.0 KV) 124.93 GB (+80.0 KV)
Q4_K_S 4.58 bpw FP16 43.7 GB 47.43 GB (+2.5 KV) 49.93 GB (+5.0 KV) 54.93 GB (+10.0 KV) 64.93 GB (+20.0 KV) 84.93 GB (+40.0 KV)
Q4_K_S 4.58 bpw Q8_0 43.7 GB 46.3 GB (+1.38 KV) 47.68 GB (+2.75 KV) 50.43 GB (+5.5 KV) 55.93 GB (+11.0 KV) 66.93 GB (+22.0 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 43.7 GB 46.18 GB (+1.25 KV) 47.43 GB (+2.5 KV) 49.93 GB (+5.0 KV) 54.93 GB (+10.0 KV) 64.93 GB (+20.0 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 43.7 GB 45.68 GB (+0.75 KV) 46.43 GB (+1.5 KV) 47.93 GB (+3.0 KV) 50.93 GB (+6.0 KV) 56.93 GB (+12.0 KV)
Q3_K_M 3.91 bpw FP32 37.31 GB 43.54 GB (+5.0 KV) 48.54 GB (+10.0 KV) 58.54 GB (+20.0 KV) 78.54 GB (+40.0 KV) 118.54 GB (+80.0 KV)
Q3_K_M 3.91 bpw FP16 37.31 GB 41.04 GB (+2.5 KV) 43.54 GB (+5.0 KV) 48.54 GB (+10.0 KV) 58.54 GB (+20.0 KV) 78.54 GB (+40.0 KV)
Q3_K_M 3.91 bpw Q8_0 37.31 GB 39.91 GB (+1.38 KV) 41.29 GB (+2.75 KV) 44.04 GB (+5.5 KV) 49.54 GB (+11.0 KV) 60.54 GB (+22.0 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 37.31 GB 39.79 GB (+1.25 KV) 41.04 GB (+2.5 KV) 43.54 GB (+5.0 KV) 48.54 GB (+10.0 KV) 58.54 GB (+20.0 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 37.31 GB 39.29 GB (+0.75 KV) 40.04 GB (+1.5 KV) 41.54 GB (+3.0 KV) 44.54 GB (+6.0 KV) 50.54 GB (+12.0 KV)
Q2_K 2.63 bpw FP32 25.1 GB 31.32 GB (+5.0 KV) 36.32 GB (+10.0 KV) 46.32 GB (+20.0 KV) 66.32 GB (+40.0 KV) 106.32 GB (+80.0 KV)
Q2_K 2.63 bpw FP16 25.1 GB 28.82 GB (+2.5 KV) 31.32 GB (+5.0 KV) 36.32 GB (+10.0 KV) 46.32 GB (+20.0 KV) 66.32 GB (+40.0 KV)
Q2_K 2.63 bpw Q8_0 25.1 GB 27.7 GB (+1.38 KV) 29.07 GB (+2.75 KV) 31.82 GB (+5.5 KV) 37.32 GB (+11.0 KV) 48.32 GB (+22.0 KV)
Q2_K 2.63 bpw FP8 (Exp) 25.1 GB 27.57 GB (+1.25 KV) 28.82 GB (+2.5 KV) 31.32 GB (+5.0 KV) 36.32 GB (+10.0 KV) 46.32 GB (+20.0 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 25.1 GB 27.07 GB (+0.75 KV) 27.82 GB (+1.5 KV) 29.32 GB (+3.0 KV) 32.32 GB (+6.0 KV) 38.32 GB (+12.0 KV)

Total VRAM = Model Weights + KV Cache + 1.23 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Qwen2.5-72B-Instruct

Use our calculator to see if this model fits your specific hardware configuration.