Back to Models

Qwen3-VL-8B-Instruct

Standard Transformer 8.2B Parameters

Model Specifications

Layers 36
Hidden Dimension 4,096
Attention Heads 32
KV Heads 8
Max Context 262K tokens
Vocabulary Size 151,936

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.58 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context 262K Context
FP16 16.0 bpw FP32 17.22 GB 20.05 GB (+2.25 KV) 22.3 GB (+4.5 KV) 26.8 GB (+9.0 KV) 35.8 GB (+18.0 KV) 53.8 GB (+36.0 KV) 89.8 GB (+72.0 KV)
FP16 16.0 bpw FP16 17.22 GB 18.93 GB (+1.12 KV) 20.05 GB (+2.25 KV) 22.3 GB (+4.5 KV) 26.8 GB (+9.0 KV) 35.8 GB (+18.0 KV) 53.8 GB (+36.0 KV)
FP16 16.0 bpw Q8_0 17.22 GB 18.42 GB (+0.62 KV) 19.04 GB (+1.24 KV) 20.28 GB (+2.48 KV) 22.75 GB (+4.95 KV) 27.7 GB (+9.9 KV) 37.6 GB (+19.8 KV)
FP16 16.0 bpw FP8 (Exp) 17.22 GB 18.36 GB (+0.56 KV) 18.93 GB (+1.12 KV) 20.05 GB (+2.25 KV) 22.3 GB (+4.5 KV) 26.8 GB (+9.0 KV) 35.8 GB (+18.0 KV)
FP16 16.0 bpw Q4_0 (Exp) 17.22 GB 18.14 GB (+0.34 KV) 18.48 GB (+0.67 KV) 19.15 GB (+1.35 KV) 20.5 GB (+2.7 KV) 23.2 GB (+5.4 KV) 28.6 GB (+10.8 KV)
Q8_0 8.0 bpw FP32 8.61 GB 11.44 GB (+2.25 KV) 13.69 GB (+4.5 KV) 18.19 GB (+9.0 KV) 27.19 GB (+18.0 KV) 45.19 GB (+36.0 KV) 81.19 GB (+72.0 KV)
Q8_0 8.0 bpw FP16 8.61 GB 10.32 GB (+1.12 KV) 11.44 GB (+2.25 KV) 13.69 GB (+4.5 KV) 18.19 GB (+9.0 KV) 27.19 GB (+18.0 KV) 45.19 GB (+36.0 KV)
Q8_0 8.0 bpw Q8_0 8.61 GB 9.81 GB (+0.62 KV) 10.43 GB (+1.24 KV) 11.67 GB (+2.48 KV) 14.14 GB (+4.95 KV) 19.09 GB (+9.9 KV) 28.99 GB (+19.8 KV)
Q8_0 8.0 bpw FP8 (Exp) 8.61 GB 9.75 GB (+0.56 KV) 10.32 GB (+1.12 KV) 11.44 GB (+2.25 KV) 13.69 GB (+4.5 KV) 18.19 GB (+9.0 KV) 27.19 GB (+18.0 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 8.61 GB 9.53 GB (+0.34 KV) 9.87 GB (+0.67 KV) 10.54 GB (+1.35 KV) 11.89 GB (+2.7 KV) 14.59 GB (+5.4 KV) 19.99 GB (+10.8 KV)
Q4_K_M 4.65 bpw FP32 5.0 GB 7.84 GB (+2.25 KV) 10.09 GB (+4.5 KV) 14.59 GB (+9.0 KV) 23.59 GB (+18.0 KV) 41.59 GB (+36.0 KV) 77.59 GB (+72.0 KV)
Q4_K_M 4.65 bpw FP16 5.0 GB 6.71 GB (+1.12 KV) 7.84 GB (+2.25 KV) 10.09 GB (+4.5 KV) 14.59 GB (+9.0 KV) 23.59 GB (+18.0 KV) 41.59 GB (+36.0 KV)
Q4_K_M 4.65 bpw Q8_0 5.0 GB 6.21 GB (+0.62 KV) 6.82 GB (+1.24 KV) 8.06 GB (+2.48 KV) 10.54 GB (+4.95 KV) 15.49 GB (+9.9 KV) 25.39 GB (+19.8 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 5.0 GB 6.15 GB (+0.56 KV) 6.71 GB (+1.12 KV) 7.84 GB (+2.25 KV) 10.09 GB (+4.5 KV) 14.59 GB (+9.0 KV) 23.59 GB (+18.0 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 5.0 GB 5.92 GB (+0.34 KV) 6.26 GB (+0.67 KV) 6.94 GB (+1.35 KV) 8.29 GB (+2.7 KV) 10.99 GB (+5.4 KV) 16.39 GB (+10.8 KV)
Q4_K_S 4.58 bpw FP32 4.93 GB 7.76 GB (+2.25 KV) 10.01 GB (+4.5 KV) 14.51 GB (+9.0 KV) 23.51 GB (+18.0 KV) 41.51 GB (+36.0 KV) 77.51 GB (+72.0 KV)
Q4_K_S 4.58 bpw FP16 4.93 GB 6.64 GB (+1.12 KV) 7.76 GB (+2.25 KV) 10.01 GB (+4.5 KV) 14.51 GB (+9.0 KV) 23.51 GB (+18.0 KV) 41.51 GB (+36.0 KV)
Q4_K_S 4.58 bpw Q8_0 4.93 GB 6.13 GB (+0.62 KV) 6.75 GB (+1.24 KV) 7.99 GB (+2.48 KV) 10.46 GB (+4.95 KV) 15.41 GB (+9.9 KV) 25.31 GB (+19.8 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 4.93 GB 6.07 GB (+0.56 KV) 6.64 GB (+1.12 KV) 7.76 GB (+2.25 KV) 10.01 GB (+4.5 KV) 14.51 GB (+9.0 KV) 23.51 GB (+18.0 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 4.93 GB 5.85 GB (+0.34 KV) 6.19 GB (+0.67 KV) 6.86 GB (+1.35 KV) 8.21 GB (+2.7 KV) 10.91 GB (+5.4 KV) 16.31 GB (+10.8 KV)
Q3_K_M 3.91 bpw FP32 4.21 GB 7.04 GB (+2.25 KV) 9.29 GB (+4.5 KV) 13.79 GB (+9.0 KV) 22.79 GB (+18.0 KV) 40.79 GB (+36.0 KV) 76.79 GB (+72.0 KV)
Q3_K_M 3.91 bpw FP16 4.21 GB 5.92 GB (+1.12 KV) 7.04 GB (+2.25 KV) 9.29 GB (+4.5 KV) 13.79 GB (+9.0 KV) 22.79 GB (+18.0 KV) 40.79 GB (+36.0 KV)
Q3_K_M 3.91 bpw Q8_0 4.21 GB 5.41 GB (+0.62 KV) 6.03 GB (+1.24 KV) 7.27 GB (+2.48 KV) 9.74 GB (+4.95 KV) 14.69 GB (+9.9 KV) 24.59 GB (+19.8 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 4.21 GB 5.35 GB (+0.56 KV) 5.92 GB (+1.12 KV) 7.04 GB (+2.25 KV) 9.29 GB (+4.5 KV) 13.79 GB (+9.0 KV) 22.79 GB (+18.0 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 4.21 GB 5.13 GB (+0.34 KV) 5.47 GB (+0.67 KV) 6.14 GB (+1.35 KV) 7.49 GB (+2.7 KV) 10.19 GB (+5.4 KV) 15.59 GB (+10.8 KV)
Q2_K 2.63 bpw FP32 2.83 GB 5.66 GB (+2.25 KV) 7.91 GB (+4.5 KV) 12.41 GB (+9.0 KV) 21.41 GB (+18.0 KV) 39.41 GB (+36.0 KV) 75.41 GB (+72.0 KV)
Q2_K 2.63 bpw FP16 2.83 GB 4.54 GB (+1.12 KV) 5.66 GB (+2.25 KV) 7.91 GB (+4.5 KV) 12.41 GB (+9.0 KV) 21.41 GB (+18.0 KV) 39.41 GB (+36.0 KV)
Q2_K 2.63 bpw Q8_0 2.83 GB 4.03 GB (+0.62 KV) 4.65 GB (+1.24 KV) 5.89 GB (+2.48 KV) 8.36 GB (+4.95 KV) 13.31 GB (+9.9 KV) 23.21 GB (+19.8 KV)
Q2_K 2.63 bpw FP8 (Exp) 2.83 GB 3.98 GB (+0.56 KV) 4.54 GB (+1.12 KV) 5.66 GB (+2.25 KV) 7.91 GB (+4.5 KV) 12.41 GB (+9.0 KV) 21.41 GB (+18.0 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 2.83 GB 3.75 GB (+0.34 KV) 4.09 GB (+0.67 KV) 4.76 GB (+1.35 KV) 6.11 GB (+2.7 KV) 8.81 GB (+5.4 KV) 14.21 GB (+10.8 KV)

Total VRAM = Model Weights + KV Cache + 0.58 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Qwen3-VL-8B-Instruct

Use our calculator to see if this model fits your specific hardware configuration.