Back to Models

Qwen3-VL-30B-A3B-Instruct

Mixture of Experts 30.5B Parameters

Active Parameters: 3.3B

Model Specifications

Layers 48
Hidden Dimension 2,048
Attention Heads 32
KV Heads 4
Max Context 262K tokens
Vocabulary Size 151,936

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.8 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context 262K Context
FP16 16.0 bpw FP32 64.05 GB 66.36 GB (+1.5 KV) 67.86 GB (+3.0 KV) 70.86 GB (+6.0 KV) 76.86 GB (+12.0 KV) 88.86 GB (+24.0 KV) 112.86 GB (+48.0 KV)
FP16 16.0 bpw FP16 64.05 GB 65.61 GB (+0.75 KV) 66.36 GB (+1.5 KV) 67.86 GB (+3.0 KV) 70.86 GB (+6.0 KV) 76.86 GB (+12.0 KV) 88.86 GB (+24.0 KV)
FP16 16.0 bpw Q8_0 64.05 GB 65.27 GB (+0.41 KV) 65.68 GB (+0.83 KV) 66.51 GB (+1.65 KV) 68.16 GB (+3.3 KV) 71.45 GB (+6.6 KV) 78.06 GB (+13.2 KV)
FP16 16.0 bpw FP8 (Exp) 64.05 GB 65.23 GB (+0.38 KV) 65.61 GB (+0.75 KV) 66.36 GB (+1.5 KV) 67.86 GB (+3.0 KV) 70.86 GB (+6.0 KV) 76.86 GB (+12.0 KV)
FP16 16.0 bpw Q4_0 (Exp) 64.05 GB 65.08 GB (+0.22 KV) 65.31 GB (+0.45 KV) 65.76 GB (+0.9 KV) 66.66 GB (+1.8 KV) 68.45 GB (+3.6 KV) 72.06 GB (+7.2 KV)
Q8_0 8.0 bpw FP32 32.02 GB 34.33 GB (+1.5 KV) 35.83 GB (+3.0 KV) 38.83 GB (+6.0 KV) 44.83 GB (+12.0 KV) 56.83 GB (+24.0 KV) 80.83 GB (+48.0 KV)
Q8_0 8.0 bpw FP16 32.02 GB 33.58 GB (+0.75 KV) 34.33 GB (+1.5 KV) 35.83 GB (+3.0 KV) 38.83 GB (+6.0 KV) 44.83 GB (+12.0 KV) 56.83 GB (+24.0 KV)
Q8_0 8.0 bpw Q8_0 32.02 GB 33.24 GB (+0.41 KV) 33.66 GB (+0.83 KV) 34.48 GB (+1.65 KV) 36.13 GB (+3.3 KV) 39.43 GB (+6.6 KV) 46.03 GB (+13.2 KV)
Q8_0 8.0 bpw FP8 (Exp) 32.02 GB 33.2 GB (+0.38 KV) 33.58 GB (+0.75 KV) 34.33 GB (+1.5 KV) 35.83 GB (+3.0 KV) 38.83 GB (+6.0 KV) 44.83 GB (+12.0 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 32.02 GB 33.05 GB (+0.22 KV) 33.28 GB (+0.45 KV) 33.73 GB (+0.9 KV) 34.63 GB (+1.8 KV) 36.43 GB (+3.6 KV) 40.03 GB (+7.2 KV)
Q4_K_M 4.65 bpw FP32 18.61 GB 20.92 GB (+1.5 KV) 22.42 GB (+3.0 KV) 25.42 GB (+6.0 KV) 31.42 GB (+12.0 KV) 43.42 GB (+24.0 KV) 67.42 GB (+48.0 KV)
Q4_K_M 4.65 bpw FP16 18.61 GB 20.17 GB (+0.75 KV) 20.92 GB (+1.5 KV) 22.42 GB (+3.0 KV) 25.42 GB (+6.0 KV) 31.42 GB (+12.0 KV) 43.42 GB (+24.0 KV)
Q4_K_M 4.65 bpw Q8_0 18.61 GB 19.83 GB (+0.41 KV) 20.24 GB (+0.83 KV) 21.07 GB (+1.65 KV) 22.72 GB (+3.3 KV) 26.02 GB (+6.6 KV) 32.62 GB (+13.2 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 18.61 GB 19.79 GB (+0.38 KV) 20.17 GB (+0.75 KV) 20.92 GB (+1.5 KV) 22.42 GB (+3.0 KV) 25.42 GB (+6.0 KV) 31.42 GB (+12.0 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 18.61 GB 19.64 GB (+0.22 KV) 19.87 GB (+0.45 KV) 20.32 GB (+0.9 KV) 21.22 GB (+1.8 KV) 23.02 GB (+3.6 KV) 26.62 GB (+7.2 KV)
Q4_K_S 4.58 bpw FP32 18.33 GB 20.64 GB (+1.5 KV) 22.14 GB (+3.0 KV) 25.14 GB (+6.0 KV) 31.14 GB (+12.0 KV) 43.14 GB (+24.0 KV) 67.14 GB (+48.0 KV)
Q4_K_S 4.58 bpw FP16 18.33 GB 19.89 GB (+0.75 KV) 20.64 GB (+1.5 KV) 22.14 GB (+3.0 KV) 25.14 GB (+6.0 KV) 31.14 GB (+12.0 KV) 43.14 GB (+24.0 KV)
Q4_K_S 4.58 bpw Q8_0 18.33 GB 19.55 GB (+0.41 KV) 19.96 GB (+0.83 KV) 20.79 GB (+1.65 KV) 22.44 GB (+3.3 KV) 25.74 GB (+6.6 KV) 32.34 GB (+13.2 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 18.33 GB 19.51 GB (+0.38 KV) 19.89 GB (+0.75 KV) 20.64 GB (+1.5 KV) 22.14 GB (+3.0 KV) 25.14 GB (+6.0 KV) 31.14 GB (+12.0 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 18.33 GB 19.36 GB (+0.22 KV) 19.59 GB (+0.45 KV) 20.04 GB (+0.9 KV) 20.94 GB (+1.8 KV) 22.74 GB (+3.6 KV) 26.34 GB (+7.2 KV)
Q3_K_M 3.91 bpw FP32 15.65 GB 17.96 GB (+1.5 KV) 19.46 GB (+3.0 KV) 22.46 GB (+6.0 KV) 28.46 GB (+12.0 KV) 40.46 GB (+24.0 KV) 64.46 GB (+48.0 KV)
Q3_K_M 3.91 bpw FP16 15.65 GB 17.21 GB (+0.75 KV) 17.96 GB (+1.5 KV) 19.46 GB (+3.0 KV) 22.46 GB (+6.0 KV) 28.46 GB (+12.0 KV) 40.46 GB (+24.0 KV)
Q3_K_M 3.91 bpw Q8_0 15.65 GB 16.87 GB (+0.41 KV) 17.28 GB (+0.83 KV) 18.11 GB (+1.65 KV) 19.76 GB (+3.3 KV) 23.06 GB (+6.6 KV) 29.66 GB (+13.2 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 15.65 GB 16.83 GB (+0.38 KV) 17.21 GB (+0.75 KV) 17.96 GB (+1.5 KV) 19.46 GB (+3.0 KV) 22.46 GB (+6.0 KV) 28.46 GB (+12.0 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 15.65 GB 16.68 GB (+0.22 KV) 16.91 GB (+0.45 KV) 17.36 GB (+0.9 KV) 18.26 GB (+1.8 KV) 20.06 GB (+3.6 KV) 23.66 GB (+7.2 KV)
Q2_K 2.63 bpw FP32 10.53 GB 12.83 GB (+1.5 KV) 14.33 GB (+3.0 KV) 17.33 GB (+6.0 KV) 23.33 GB (+12.0 KV) 35.33 GB (+24.0 KV) 59.33 GB (+48.0 KV)
Q2_K 2.63 bpw FP16 10.53 GB 12.08 GB (+0.75 KV) 12.83 GB (+1.5 KV) 14.33 GB (+3.0 KV) 17.33 GB (+6.0 KV) 23.33 GB (+12.0 KV) 35.33 GB (+24.0 KV)
Q2_K 2.63 bpw Q8_0 10.53 GB 11.75 GB (+0.41 KV) 12.16 GB (+0.83 KV) 12.98 GB (+1.65 KV) 14.63 GB (+3.3 KV) 17.93 GB (+6.6 KV) 24.53 GB (+13.2 KV)
Q2_K 2.63 bpw FP8 (Exp) 10.53 GB 11.71 GB (+0.38 KV) 12.08 GB (+0.75 KV) 12.83 GB (+1.5 KV) 14.33 GB (+3.0 KV) 17.33 GB (+6.0 KV) 23.33 GB (+12.0 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 10.53 GB 11.56 GB (+0.22 KV) 11.78 GB (+0.45 KV) 12.23 GB (+0.9 KV) 13.13 GB (+1.8 KV) 14.93 GB (+3.6 KV) 18.53 GB (+7.2 KV)

Total VRAM = Model Weights + KV Cache + 0.8 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Qwen3-VL-30B-A3B-Instruct

Use our calculator to see if this model fits your specific hardware configuration.