Back to Models

Qwen2.5-VL-32B-Instruct

Standard Transformer 33.5B Parameters

Model Specifications

Layers 64
Hidden Dimension 5,120
Attention Heads 40
KV Heads 8
Max Context 128K tokens
Vocabulary Size 152,064

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.83 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 128K Context
FP16 16.0 bpw FP32 70.35 GB 75.19 GB (+4.0 KV) 79.19 GB (+8.0 KV) 87.19 GB (+16.0 KV) 103.19 GB (+32.0 KV) 133.69 GB (+62.5 KV)
FP16 16.0 bpw FP16 70.35 GB 73.19 GB (+2.0 KV) 75.19 GB (+4.0 KV) 79.19 GB (+8.0 KV) 87.19 GB (+16.0 KV) 102.44 GB (+31.25 KV)
FP16 16.0 bpw Q8_0 70.35 GB 72.28 GB (+1.1 KV) 73.39 GB (+2.2 KV) 75.59 GB (+4.4 KV) 79.98 GB (+8.8 KV) 88.37 GB (+17.19 KV)
FP16 16.0 bpw FP8 (Exp) 70.35 GB 72.19 GB (+1.0 KV) 73.19 GB (+2.0 KV) 75.19 GB (+4.0 KV) 79.19 GB (+8.0 KV) 86.81 GB (+15.62 KV)
FP16 16.0 bpw Q4_0 (Exp) 70.35 GB 71.78 GB (+0.6 KV) 72.39 GB (+1.2 KV) 73.59 GB (+2.4 KV) 75.98 GB (+4.8 KV) 80.56 GB (+9.38 KV)
Q8_0 8.0 bpw FP32 35.18 GB 40.01 GB (+4.0 KV) 44.01 GB (+8.0 KV) 52.01 GB (+16.0 KV) 68.01 GB (+32.0 KV) 98.51 GB (+62.5 KV)
Q8_0 8.0 bpw FP16 35.18 GB 38.01 GB (+2.0 KV) 40.01 GB (+4.0 KV) 44.01 GB (+8.0 KV) 52.01 GB (+16.0 KV) 67.26 GB (+31.25 KV)
Q8_0 8.0 bpw Q8_0 35.18 GB 37.11 GB (+1.1 KV) 38.21 GB (+2.2 KV) 40.41 GB (+4.4 KV) 44.81 GB (+8.8 KV) 53.2 GB (+17.19 KV)
Q8_0 8.0 bpw FP8 (Exp) 35.18 GB 37.01 GB (+1.0 KV) 38.01 GB (+2.0 KV) 40.01 GB (+4.0 KV) 44.01 GB (+8.0 KV) 51.64 GB (+15.62 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 35.18 GB 36.61 GB (+0.6 KV) 37.21 GB (+1.2 KV) 38.41 GB (+2.4 KV) 40.81 GB (+4.8 KV) 45.39 GB (+9.38 KV)
Q4_K_M 4.65 bpw FP32 20.45 GB 25.28 GB (+4.0 KV) 29.28 GB (+8.0 KV) 37.28 GB (+16.0 KV) 53.28 GB (+32.0 KV) 83.78 GB (+62.5 KV)
Q4_K_M 4.65 bpw FP16 20.45 GB 23.28 GB (+2.0 KV) 25.28 GB (+4.0 KV) 29.28 GB (+8.0 KV) 37.28 GB (+16.0 KV) 52.53 GB (+31.25 KV)
Q4_K_M 4.65 bpw Q8_0 20.45 GB 22.38 GB (+1.1 KV) 23.48 GB (+2.2 KV) 25.68 GB (+4.4 KV) 30.08 GB (+8.8 KV) 38.47 GB (+17.19 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 20.45 GB 22.28 GB (+1.0 KV) 23.28 GB (+2.0 KV) 25.28 GB (+4.0 KV) 29.28 GB (+8.0 KV) 36.91 GB (+15.62 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 20.45 GB 21.88 GB (+0.6 KV) 22.48 GB (+1.2 KV) 23.68 GB (+2.4 KV) 26.08 GB (+4.8 KV) 30.66 GB (+9.38 KV)
Q4_K_S 4.58 bpw FP32 20.14 GB 24.97 GB (+4.0 KV) 28.97 GB (+8.0 KV) 36.97 GB (+16.0 KV) 52.97 GB (+32.0 KV) 83.47 GB (+62.5 KV)
Q4_K_S 4.58 bpw FP16 20.14 GB 22.97 GB (+2.0 KV) 24.97 GB (+4.0 KV) 28.97 GB (+8.0 KV) 36.97 GB (+16.0 KV) 52.22 GB (+31.25 KV)
Q4_K_S 4.58 bpw Q8_0 20.14 GB 22.07 GB (+1.1 KV) 23.17 GB (+2.2 KV) 25.37 GB (+4.4 KV) 29.77 GB (+8.8 KV) 38.16 GB (+17.19 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 20.14 GB 21.97 GB (+1.0 KV) 22.97 GB (+2.0 KV) 24.97 GB (+4.0 KV) 28.97 GB (+8.0 KV) 36.6 GB (+15.62 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 20.14 GB 21.57 GB (+0.6 KV) 22.17 GB (+1.2 KV) 23.37 GB (+2.4 KV) 25.77 GB (+4.8 KV) 30.35 GB (+9.38 KV)
Q3_K_M 3.91 bpw FP32 17.19 GB 22.03 GB (+4.0 KV) 26.03 GB (+8.0 KV) 34.03 GB (+16.0 KV) 50.03 GB (+32.0 KV) 80.53 GB (+62.5 KV)
Q3_K_M 3.91 bpw FP16 17.19 GB 20.03 GB (+2.0 KV) 22.03 GB (+4.0 KV) 26.03 GB (+8.0 KV) 34.03 GB (+16.0 KV) 49.28 GB (+31.25 KV)
Q3_K_M 3.91 bpw Q8_0 17.19 GB 19.13 GB (+1.1 KV) 20.23 GB (+2.2 KV) 22.43 GB (+4.4 KV) 26.83 GB (+8.8 KV) 35.21 GB (+17.19 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 17.19 GB 19.03 GB (+1.0 KV) 20.03 GB (+2.0 KV) 22.03 GB (+4.0 KV) 26.03 GB (+8.0 KV) 33.65 GB (+15.62 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 17.19 GB 18.63 GB (+0.6 KV) 19.23 GB (+1.2 KV) 20.43 GB (+2.4 KV) 22.83 GB (+4.8 KV) 27.4 GB (+9.38 KV)
Q2_K 2.63 bpw FP32 11.56 GB 16.4 GB (+4.0 KV) 20.4 GB (+8.0 KV) 28.4 GB (+16.0 KV) 44.4 GB (+32.0 KV) 74.9 GB (+62.5 KV)
Q2_K 2.63 bpw FP16 11.56 GB 14.4 GB (+2.0 KV) 16.4 GB (+4.0 KV) 20.4 GB (+8.0 KV) 28.4 GB (+16.0 KV) 43.65 GB (+31.25 KV)
Q2_K 2.63 bpw Q8_0 11.56 GB 13.5 GB (+1.1 KV) 14.6 GB (+2.2 KV) 16.8 GB (+4.4 KV) 21.2 GB (+8.8 KV) 29.59 GB (+17.19 KV)
Q2_K 2.63 bpw FP8 (Exp) 11.56 GB 13.4 GB (+1.0 KV) 14.4 GB (+2.0 KV) 16.4 GB (+4.0 KV) 20.4 GB (+8.0 KV) 28.02 GB (+15.62 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 11.56 GB 13.0 GB (+0.6 KV) 13.6 GB (+1.2 KV) 14.8 GB (+2.4 KV) 17.2 GB (+4.8 KV) 21.77 GB (+9.38 KV)

Total VRAM = Model Weights + KV Cache + 0.83 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Qwen2.5-VL-32B-Instruct

Use our calculator to see if this model fits your specific hardware configuration.