Back to Models

Qwen3-VL-4B-Thinking

Standard Transformer 4.6B Parameters

Model Specifications

Layers 36
Hidden Dimension 2,560
Attention Heads 32
KV Heads 8
Max Context 262K tokens
Vocabulary Size 151,936

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.55 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context 262K Context
FP16 16.0 bpw FP32 9.66 GB 12.46 GB (+2.25 KV) 14.71 GB (+4.5 KV) 19.21 GB (+9.0 KV) 28.21 GB (+18.0 KV) 46.21 GB (+36.0 KV) 82.21 GB (+72.0 KV)
FP16 16.0 bpw FP16 9.66 GB 11.33 GB (+1.12 KV) 12.46 GB (+2.25 KV) 14.71 GB (+4.5 KV) 19.21 GB (+9.0 KV) 28.21 GB (+18.0 KV) 46.21 GB (+36.0 KV)
FP16 16.0 bpw Q8_0 9.66 GB 10.82 GB (+0.62 KV) 11.44 GB (+1.24 KV) 12.68 GB (+2.48 KV) 15.16 GB (+4.95 KV) 20.11 GB (+9.9 KV) 30.01 GB (+19.8 KV)
FP16 16.0 bpw FP8 (Exp) 9.66 GB 10.77 GB (+0.56 KV) 11.33 GB (+1.12 KV) 12.46 GB (+2.25 KV) 14.71 GB (+4.5 KV) 19.21 GB (+9.0 KV) 28.21 GB (+18.0 KV)
FP16 16.0 bpw Q4_0 (Exp) 9.66 GB 10.54 GB (+0.34 KV) 10.88 GB (+0.67 KV) 11.56 GB (+1.35 KV) 12.91 GB (+2.7 KV) 15.61 GB (+5.4 KV) 21.01 GB (+10.8 KV)
Q8_0 8.0 bpw FP32 4.83 GB 7.63 GB (+2.25 KV) 9.88 GB (+4.5 KV) 14.38 GB (+9.0 KV) 23.38 GB (+18.0 KV) 41.38 GB (+36.0 KV) 77.38 GB (+72.0 KV)
Q8_0 8.0 bpw FP16 4.83 GB 6.5 GB (+1.12 KV) 7.63 GB (+2.25 KV) 9.88 GB (+4.5 KV) 14.38 GB (+9.0 KV) 23.38 GB (+18.0 KV) 41.38 GB (+36.0 KV)
Q8_0 8.0 bpw Q8_0 4.83 GB 5.99 GB (+0.62 KV) 6.61 GB (+1.24 KV) 7.85 GB (+2.48 KV) 10.33 GB (+4.95 KV) 15.28 GB (+9.9 KV) 25.18 GB (+19.8 KV)
Q8_0 8.0 bpw FP8 (Exp) 4.83 GB 5.94 GB (+0.56 KV) 6.5 GB (+1.12 KV) 7.63 GB (+2.25 KV) 9.88 GB (+4.5 KV) 14.38 GB (+9.0 KV) 23.38 GB (+18.0 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 4.83 GB 5.71 GB (+0.34 KV) 6.05 GB (+0.67 KV) 6.73 GB (+1.35 KV) 8.08 GB (+2.7 KV) 10.78 GB (+5.4 KV) 16.18 GB (+10.8 KV)
Q4_K_M 4.65 bpw FP32 2.81 GB 5.6 GB (+2.25 KV) 7.85 GB (+4.5 KV) 12.35 GB (+9.0 KV) 21.35 GB (+18.0 KV) 39.35 GB (+36.0 KV) 75.35 GB (+72.0 KV)
Q4_K_M 4.65 bpw FP16 2.81 GB 4.48 GB (+1.12 KV) 5.6 GB (+2.25 KV) 7.85 GB (+4.5 KV) 12.35 GB (+9.0 KV) 21.35 GB (+18.0 KV) 39.35 GB (+36.0 KV)
Q4_K_M 4.65 bpw Q8_0 2.81 GB 3.97 GB (+0.62 KV) 4.59 GB (+1.24 KV) 5.83 GB (+2.48 KV) 8.3 GB (+4.95 KV) 13.25 GB (+9.9 KV) 23.15 GB (+19.8 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 2.81 GB 3.92 GB (+0.56 KV) 4.48 GB (+1.12 KV) 5.6 GB (+2.25 KV) 7.85 GB (+4.5 KV) 12.35 GB (+9.0 KV) 21.35 GB (+18.0 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 2.81 GB 3.69 GB (+0.34 KV) 4.03 GB (+0.67 KV) 4.7 GB (+1.35 KV) 6.05 GB (+2.7 KV) 8.75 GB (+5.4 KV) 14.15 GB (+10.8 KV)
Q4_K_S 4.58 bpw FP32 2.77 GB 5.56 GB (+2.25 KV) 7.81 GB (+4.5 KV) 12.31 GB (+9.0 KV) 21.31 GB (+18.0 KV) 39.31 GB (+36.0 KV) 75.31 GB (+72.0 KV)
Q4_K_S 4.58 bpw FP16 2.77 GB 4.44 GB (+1.12 KV) 5.56 GB (+2.25 KV) 7.81 GB (+4.5 KV) 12.31 GB (+9.0 KV) 21.31 GB (+18.0 KV) 39.31 GB (+36.0 KV)
Q4_K_S 4.58 bpw Q8_0 2.77 GB 3.93 GB (+0.62 KV) 4.55 GB (+1.24 KV) 5.79 GB (+2.48 KV) 8.26 GB (+4.95 KV) 13.21 GB (+9.9 KV) 23.11 GB (+19.8 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 2.77 GB 3.87 GB (+0.56 KV) 4.44 GB (+1.12 KV) 5.56 GB (+2.25 KV) 7.81 GB (+4.5 KV) 12.31 GB (+9.0 KV) 21.31 GB (+18.0 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 2.77 GB 3.65 GB (+0.34 KV) 3.99 GB (+0.67 KV) 4.66 GB (+1.35 KV) 6.01 GB (+2.7 KV) 8.71 GB (+5.4 KV) 14.11 GB (+10.8 KV)
Q3_K_M 3.91 bpw FP32 2.36 GB 5.16 GB (+2.25 KV) 7.41 GB (+4.5 KV) 11.91 GB (+9.0 KV) 20.91 GB (+18.0 KV) 38.91 GB (+36.0 KV) 74.91 GB (+72.0 KV)
Q3_K_M 3.91 bpw FP16 2.36 GB 4.03 GB (+1.12 KV) 5.16 GB (+2.25 KV) 7.41 GB (+4.5 KV) 11.91 GB (+9.0 KV) 20.91 GB (+18.0 KV) 38.91 GB (+36.0 KV)
Q3_K_M 3.91 bpw Q8_0 2.36 GB 3.53 GB (+0.62 KV) 4.14 GB (+1.24 KV) 5.38 GB (+2.48 KV) 7.86 GB (+4.95 KV) 12.81 GB (+9.9 KV) 22.71 GB (+19.8 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 2.36 GB 3.47 GB (+0.56 KV) 4.03 GB (+1.12 KV) 5.16 GB (+2.25 KV) 7.41 GB (+4.5 KV) 11.91 GB (+9.0 KV) 20.91 GB (+18.0 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 2.36 GB 3.24 GB (+0.34 KV) 3.58 GB (+0.67 KV) 4.26 GB (+1.35 KV) 5.61 GB (+2.7 KV) 8.31 GB (+5.4 KV) 13.71 GB (+10.8 KV)
Q2_K 2.63 bpw FP32 1.59 GB 4.38 GB (+2.25 KV) 6.63 GB (+4.5 KV) 11.13 GB (+9.0 KV) 20.13 GB (+18.0 KV) 38.13 GB (+36.0 KV) 74.13 GB (+72.0 KV)
Q2_K 2.63 bpw FP16 1.59 GB 3.26 GB (+1.12 KV) 4.38 GB (+2.25 KV) 6.63 GB (+4.5 KV) 11.13 GB (+9.0 KV) 20.13 GB (+18.0 KV) 38.13 GB (+36.0 KV)
Q2_K 2.63 bpw Q8_0 1.59 GB 2.75 GB (+0.62 KV) 3.37 GB (+1.24 KV) 4.61 GB (+2.48 KV) 7.08 GB (+4.95 KV) 12.03 GB (+9.9 KV) 21.93 GB (+19.8 KV)
Q2_K 2.63 bpw FP8 (Exp) 1.59 GB 2.7 GB (+0.56 KV) 3.26 GB (+1.12 KV) 4.38 GB (+2.25 KV) 6.63 GB (+4.5 KV) 11.13 GB (+9.0 KV) 20.13 GB (+18.0 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 1.59 GB 2.47 GB (+0.34 KV) 2.81 GB (+0.67 KV) 3.48 GB (+1.35 KV) 4.83 GB (+2.7 KV) 7.53 GB (+5.4 KV) 12.93 GB (+10.8 KV)

Total VRAM = Model Weights + KV Cache + 0.55 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Qwen3-VL-4B-Thinking

Use our calculator to see if this model fits your specific hardware configuration.