Back to Models

Qwen2.5-VL-72B-Instruct

Standard Transformer 73.5B Parameters

Model Specifications

Layers 80
Hidden Dimension 8,192
Attention Heads 64
KV Heads 8
Max Context 128K tokens
Vocabulary Size 152,064

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 1.23 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 128K Context
FP16 16.0 bpw FP32 154.35 GB 160.59 GB (+5.0 KV) 165.59 GB (+10.0 KV) 175.59 GB (+20.0 KV) 195.59 GB (+40.0 KV) 233.71 GB (+78.12 KV)
FP16 16.0 bpw FP16 154.35 GB 158.09 GB (+2.5 KV) 160.59 GB (+5.0 KV) 165.59 GB (+10.0 KV) 175.59 GB (+20.0 KV) 194.65 GB (+39.06 KV)
FP16 16.0 bpw Q8_0 154.35 GB 156.96 GB (+1.38 KV) 158.34 GB (+2.75 KV) 161.09 GB (+5.5 KV) 166.59 GB (+11.0 KV) 177.07 GB (+21.48 KV)
FP16 16.0 bpw FP8 (Exp) 154.35 GB 156.84 GB (+1.25 KV) 158.09 GB (+2.5 KV) 160.59 GB (+5.0 KV) 165.59 GB (+10.0 KV) 175.12 GB (+19.53 KV)
FP16 16.0 bpw Q4_0 (Exp) 154.35 GB 156.34 GB (+0.75 KV) 157.09 GB (+1.5 KV) 158.59 GB (+3.0 KV) 161.59 GB (+6.0 KV) 167.3 GB (+11.72 KV)
Q8_0 8.0 bpw FP32 77.17 GB 83.41 GB (+5.0 KV) 88.41 GB (+10.0 KV) 98.41 GB (+20.0 KV) 118.41 GB (+40.0 KV) 156.54 GB (+78.12 KV)
Q8_0 8.0 bpw FP16 77.17 GB 80.91 GB (+2.5 KV) 83.41 GB (+5.0 KV) 88.41 GB (+10.0 KV) 98.41 GB (+20.0 KV) 117.47 GB (+39.06 KV)
Q8_0 8.0 bpw Q8_0 77.17 GB 79.78 GB (+1.38 KV) 81.16 GB (+2.75 KV) 83.91 GB (+5.5 KV) 89.41 GB (+11.0 KV) 99.89 GB (+21.48 KV)
Q8_0 8.0 bpw FP8 (Exp) 77.17 GB 79.66 GB (+1.25 KV) 80.91 GB (+2.5 KV) 83.41 GB (+5.0 KV) 88.41 GB (+10.0 KV) 97.94 GB (+19.53 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 77.17 GB 79.16 GB (+0.75 KV) 79.91 GB (+1.5 KV) 81.41 GB (+3.0 KV) 84.41 GB (+6.0 KV) 90.13 GB (+11.72 KV)
Q4_K_M 4.65 bpw FP32 44.86 GB 51.09 GB (+5.0 KV) 56.09 GB (+10.0 KV) 66.09 GB (+20.0 KV) 86.09 GB (+40.0 KV) 124.22 GB (+78.12 KV)
Q4_K_M 4.65 bpw FP16 44.86 GB 48.59 GB (+2.5 KV) 51.09 GB (+5.0 KV) 56.09 GB (+10.0 KV) 66.09 GB (+20.0 KV) 85.16 GB (+39.06 KV)
Q4_K_M 4.65 bpw Q8_0 44.86 GB 47.47 GB (+1.38 KV) 48.84 GB (+2.75 KV) 51.59 GB (+5.5 KV) 57.09 GB (+11.0 KV) 67.58 GB (+21.48 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 44.86 GB 47.34 GB (+1.25 KV) 48.59 GB (+2.5 KV) 51.09 GB (+5.0 KV) 56.09 GB (+10.0 KV) 65.62 GB (+19.53 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 44.86 GB 46.84 GB (+0.75 KV) 47.59 GB (+1.5 KV) 49.09 GB (+3.0 KV) 52.09 GB (+6.0 KV) 57.81 GB (+11.72 KV)
Q4_K_S 4.58 bpw FP32 44.18 GB 50.42 GB (+5.0 KV) 55.42 GB (+10.0 KV) 65.42 GB (+20.0 KV) 85.42 GB (+40.0 KV) 123.54 GB (+78.12 KV)
Q4_K_S 4.58 bpw FP16 44.18 GB 47.92 GB (+2.5 KV) 50.42 GB (+5.0 KV) 55.42 GB (+10.0 KV) 65.42 GB (+20.0 KV) 84.48 GB (+39.06 KV)
Q4_K_S 4.58 bpw Q8_0 44.18 GB 46.79 GB (+1.38 KV) 48.17 GB (+2.75 KV) 50.92 GB (+5.5 KV) 56.42 GB (+11.0 KV) 66.9 GB (+21.48 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 44.18 GB 46.67 GB (+1.25 KV) 47.92 GB (+2.5 KV) 50.42 GB (+5.0 KV) 55.42 GB (+10.0 KV) 64.95 GB (+19.53 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 44.18 GB 46.17 GB (+0.75 KV) 46.92 GB (+1.5 KV) 48.42 GB (+3.0 KV) 51.42 GB (+6.0 KV) 57.14 GB (+11.72 KV)
Q3_K_M 3.91 bpw FP32 37.72 GB 43.95 GB (+5.0 KV) 48.95 GB (+10.0 KV) 58.95 GB (+20.0 KV) 78.95 GB (+40.0 KV) 117.08 GB (+78.12 KV)
Q3_K_M 3.91 bpw FP16 37.72 GB 41.45 GB (+2.5 KV) 43.95 GB (+5.0 KV) 48.95 GB (+10.0 KV) 58.95 GB (+20.0 KV) 78.02 GB (+39.06 KV)
Q3_K_M 3.91 bpw Q8_0 37.72 GB 40.33 GB (+1.38 KV) 41.7 GB (+2.75 KV) 44.45 GB (+5.5 KV) 49.95 GB (+11.0 KV) 60.44 GB (+21.48 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 37.72 GB 40.2 GB (+1.25 KV) 41.45 GB (+2.5 KV) 43.95 GB (+5.0 KV) 48.95 GB (+10.0 KV) 58.49 GB (+19.53 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 37.72 GB 39.7 GB (+0.75 KV) 40.45 GB (+1.5 KV) 41.95 GB (+3.0 KV) 44.95 GB (+6.0 KV) 50.67 GB (+11.72 KV)
Q2_K 2.63 bpw FP32 25.37 GB 31.61 GB (+5.0 KV) 36.61 GB (+10.0 KV) 46.61 GB (+20.0 KV) 66.61 GB (+40.0 KV) 104.73 GB (+78.12 KV)
Q2_K 2.63 bpw FP16 25.37 GB 29.11 GB (+2.5 KV) 31.61 GB (+5.0 KV) 36.61 GB (+10.0 KV) 46.61 GB (+20.0 KV) 65.67 GB (+39.06 KV)
Q2_K 2.63 bpw Q8_0 25.37 GB 27.98 GB (+1.38 KV) 29.36 GB (+2.75 KV) 32.11 GB (+5.5 KV) 37.61 GB (+11.0 KV) 48.09 GB (+21.48 KV)
Q2_K 2.63 bpw FP8 (Exp) 25.37 GB 27.86 GB (+1.25 KV) 29.11 GB (+2.5 KV) 31.61 GB (+5.0 KV) 36.61 GB (+10.0 KV) 46.14 GB (+19.53 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 25.37 GB 27.36 GB (+0.75 KV) 28.11 GB (+1.5 KV) 29.61 GB (+3.0 KV) 32.61 GB (+6.0 KV) 38.33 GB (+11.72 KV)

Total VRAM = Model Weights + KV Cache + 1.23 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Qwen2.5-VL-72B-Instruct

Use our calculator to see if this model fits your specific hardware configuration.