Back to Models

Qwen2.5-Math-72B-Instruct

Standard Transformer 72.7B Parameters

Model Specifications

Layers 80
Hidden Dimension 8,192
Attention Heads 64
KV Heads 8
Max Context 4K tokens
Vocabulary Size 152,064

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 1.23 GB (CUDA context + activations).

Quantization Cache Format Model Weights 4K Context
FP16 16.0 bpw FP32 152.67 GB 156.4 GB (+2.5 KV)
FP16 16.0 bpw FP16 152.67 GB 155.15 GB (+1.25 KV)
FP16 16.0 bpw Q8_0 152.67 GB 154.58 GB (+0.69 KV)
FP16 16.0 bpw FP8 (Exp) 152.67 GB 154.52 GB (+0.62 KV)
FP16 16.0 bpw Q4_0 (Exp) 152.67 GB 154.27 GB (+0.38 KV)
Q8_0 8.0 bpw FP32 76.34 GB 80.06 GB (+2.5 KV)
Q8_0 8.0 bpw FP16 76.34 GB 78.81 GB (+1.25 KV)
Q8_0 8.0 bpw Q8_0 76.34 GB 78.25 GB (+0.69 KV)
Q8_0 8.0 bpw FP8 (Exp) 76.34 GB 78.19 GB (+0.62 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 76.34 GB 77.94 GB (+0.38 KV)
Q4_K_M 4.65 bpw FP32 44.37 GB 48.1 GB (+2.5 KV)
Q4_K_M 4.65 bpw FP16 44.37 GB 46.85 GB (+1.25 KV)
Q4_K_M 4.65 bpw Q8_0 44.37 GB 46.28 GB (+0.69 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 44.37 GB 46.22 GB (+0.62 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 44.37 GB 45.97 GB (+0.38 KV)
Q4_K_S 4.58 bpw FP32 43.7 GB 47.43 GB (+2.5 KV)
Q4_K_S 4.58 bpw FP16 43.7 GB 46.18 GB (+1.25 KV)
Q4_K_S 4.58 bpw Q8_0 43.7 GB 45.62 GB (+0.69 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 43.7 GB 45.55 GB (+0.62 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 43.7 GB 45.3 GB (+0.38 KV)
Q3_K_M 3.91 bpw FP32 37.31 GB 41.04 GB (+2.5 KV)
Q3_K_M 3.91 bpw FP16 37.31 GB 39.79 GB (+1.25 KV)
Q3_K_M 3.91 bpw Q8_0 37.31 GB 39.22 GB (+0.69 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 37.31 GB 39.16 GB (+0.62 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 37.31 GB 38.91 GB (+0.38 KV)
Q2_K 2.63 bpw FP32 25.1 GB 28.82 GB (+2.5 KV)
Q2_K 2.63 bpw FP16 25.1 GB 27.57 GB (+1.25 KV)
Q2_K 2.63 bpw Q8_0 25.1 GB 27.01 GB (+0.69 KV)
Q2_K 2.63 bpw FP8 (Exp) 25.1 GB 26.95 GB (+0.62 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 25.1 GB 26.7 GB (+0.38 KV)

Total VRAM = Model Weights + KV Cache + 1.23 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Qwen2.5-Math-72B-Instruct

Use our calculator to see if this model fits your specific hardware configuration.