Back to Models

Qwen3-Omni-30B-A3B-Instruct

Mixture of Experts 34.0B Parameters

Active Parameters: 4.0B

Model Specifications

Layers 48
Hidden Dimension 2,048
Attention Heads 32
KV Heads 4
Max Context 65K tokens
Vocabulary Size 152,064

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.84 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context
FP16 16.0 bpw FP32 71.4 GB 73.74 GB (+1.5 KV) 75.24 GB (+3.0 KV) 78.24 GB (+6.0 KV) 84.24 GB (+12.0 KV)
FP16 16.0 bpw FP16 71.4 GB 72.99 GB (+0.75 KV) 73.74 GB (+1.5 KV) 75.24 GB (+3.0 KV) 78.24 GB (+6.0 KV)
FP16 16.0 bpw Q8_0 71.4 GB 72.65 GB (+0.41 KV) 73.07 GB (+0.83 KV) 73.89 GB (+1.65 KV) 75.54 GB (+3.3 KV)
FP16 16.0 bpw FP8 (Exp) 71.4 GB 72.62 GB (+0.38 KV) 72.99 GB (+0.75 KV) 73.74 GB (+1.5 KV) 75.24 GB (+3.0 KV)
FP16 16.0 bpw Q4_0 (Exp) 71.4 GB 72.47 GB (+0.22 KV) 72.69 GB (+0.45 KV) 73.14 GB (+0.9 KV) 74.04 GB (+1.8 KV)
Q8_0 8.0 bpw FP32 35.7 GB 38.04 GB (+1.5 KV) 39.54 GB (+3.0 KV) 42.54 GB (+6.0 KV) 48.54 GB (+12.0 KV)
Q8_0 8.0 bpw FP16 35.7 GB 37.29 GB (+0.75 KV) 38.04 GB (+1.5 KV) 39.54 GB (+3.0 KV) 42.54 GB (+6.0 KV)
Q8_0 8.0 bpw Q8_0 35.7 GB 36.95 GB (+0.41 KV) 37.37 GB (+0.83 KV) 38.19 GB (+1.65 KV) 39.84 GB (+3.3 KV)
Q8_0 8.0 bpw FP8 (Exp) 35.7 GB 36.92 GB (+0.38 KV) 37.29 GB (+0.75 KV) 38.04 GB (+1.5 KV) 39.54 GB (+3.0 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 35.7 GB 36.77 GB (+0.22 KV) 36.99 GB (+0.45 KV) 37.44 GB (+0.9 KV) 38.34 GB (+1.8 KV)
Q4_K_M 4.65 bpw FP32 20.75 GB 23.09 GB (+1.5 KV) 24.59 GB (+3.0 KV) 27.59 GB (+6.0 KV) 33.59 GB (+12.0 KV)
Q4_K_M 4.65 bpw FP16 20.75 GB 22.34 GB (+0.75 KV) 23.09 GB (+1.5 KV) 24.59 GB (+3.0 KV) 27.59 GB (+6.0 KV)
Q4_K_M 4.65 bpw Q8_0 20.75 GB 22.0 GB (+0.41 KV) 22.42 GB (+0.83 KV) 23.24 GB (+1.65 KV) 24.89 GB (+3.3 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 20.75 GB 21.97 GB (+0.38 KV) 22.34 GB (+0.75 KV) 23.09 GB (+1.5 KV) 24.59 GB (+3.0 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 20.75 GB 21.82 GB (+0.22 KV) 22.04 GB (+0.45 KV) 22.49 GB (+0.9 KV) 23.39 GB (+1.8 KV)
Q4_K_S 4.58 bpw FP32 20.44 GB 22.78 GB (+1.5 KV) 24.28 GB (+3.0 KV) 27.28 GB (+6.0 KV) 33.28 GB (+12.0 KV)
Q4_K_S 4.58 bpw FP16 20.44 GB 22.03 GB (+0.75 KV) 22.78 GB (+1.5 KV) 24.28 GB (+3.0 KV) 27.28 GB (+6.0 KV)
Q4_K_S 4.58 bpw Q8_0 20.44 GB 21.69 GB (+0.41 KV) 22.1 GB (+0.83 KV) 22.93 GB (+1.65 KV) 24.58 GB (+3.3 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 20.44 GB 21.65 GB (+0.38 KV) 22.03 GB (+0.75 KV) 22.78 GB (+1.5 KV) 24.28 GB (+3.0 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 20.44 GB 21.5 GB (+0.22 KV) 21.73 GB (+0.45 KV) 22.18 GB (+0.9 KV) 23.08 GB (+1.8 KV)
Q3_K_M 3.91 bpw FP32 17.45 GB 19.79 GB (+1.5 KV) 21.29 GB (+3.0 KV) 24.29 GB (+6.0 KV) 30.29 GB (+12.0 KV)
Q3_K_M 3.91 bpw FP16 17.45 GB 19.04 GB (+0.75 KV) 19.79 GB (+1.5 KV) 21.29 GB (+3.0 KV) 24.29 GB (+6.0 KV)
Q3_K_M 3.91 bpw Q8_0 17.45 GB 18.7 GB (+0.41 KV) 19.11 GB (+0.83 KV) 19.94 GB (+1.65 KV) 21.59 GB (+3.3 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 17.45 GB 18.66 GB (+0.38 KV) 19.04 GB (+0.75 KV) 19.79 GB (+1.5 KV) 21.29 GB (+3.0 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 17.45 GB 18.51 GB (+0.22 KV) 18.74 GB (+0.45 KV) 19.19 GB (+0.9 KV) 20.09 GB (+1.8 KV)
Q2_K 2.63 bpw FP32 11.74 GB 14.08 GB (+1.5 KV) 15.58 GB (+3.0 KV) 18.58 GB (+6.0 KV) 24.58 GB (+12.0 KV)
Q2_K 2.63 bpw FP16 11.74 GB 13.33 GB (+0.75 KV) 14.08 GB (+1.5 KV) 15.58 GB (+3.0 KV) 18.58 GB (+6.0 KV)
Q2_K 2.63 bpw Q8_0 11.74 GB 12.99 GB (+0.41 KV) 13.4 GB (+0.83 KV) 14.23 GB (+1.65 KV) 15.88 GB (+3.3 KV)
Q2_K 2.63 bpw FP8 (Exp) 11.74 GB 12.95 GB (+0.38 KV) 13.33 GB (+0.75 KV) 14.08 GB (+1.5 KV) 15.58 GB (+3.0 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 11.74 GB 12.8 GB (+0.22 KV) 13.03 GB (+0.45 KV) 13.48 GB (+0.9 KV) 14.38 GB (+1.8 KV)

Total VRAM = Model Weights + KV Cache + 0.84 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Qwen3-Omni-30B-A3B-Instruct

Use our calculator to see if this model fits your specific hardware configuration.