Back to Models

Qwen3-Omni-30B-A3B-Captioner

Mixture of Experts 32.0B Parameters

Active Parameters: 3.3B

Model Specifications

Layers 48
Hidden Dimension 2,048
Attention Heads 32
KV Heads 4
Max Context 65K tokens
Vocabulary Size 152,064

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.82 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context
FP16 16.0 bpw FP32 67.2 GB 69.52 GB (+1.5 KV) 71.02 GB (+3.0 KV) 74.02 GB (+6.0 KV) 80.02 GB (+12.0 KV)
FP16 16.0 bpw FP16 67.2 GB 68.77 GB (+0.75 KV) 69.52 GB (+1.5 KV) 71.02 GB (+3.0 KV) 74.02 GB (+6.0 KV)
FP16 16.0 bpw Q8_0 67.2 GB 68.43 GB (+0.41 KV) 68.84 GB (+0.83 KV) 69.67 GB (+1.65 KV) 71.32 GB (+3.3 KV)
FP16 16.0 bpw FP8 (Exp) 67.2 GB 68.39 GB (+0.38 KV) 68.77 GB (+0.75 KV) 69.52 GB (+1.5 KV) 71.02 GB (+3.0 KV)
FP16 16.0 bpw Q4_0 (Exp) 67.2 GB 68.24 GB (+0.22 KV) 68.47 GB (+0.45 KV) 68.92 GB (+0.9 KV) 69.82 GB (+1.8 KV)
Q8_0 8.0 bpw FP32 33.6 GB 35.92 GB (+1.5 KV) 37.42 GB (+3.0 KV) 40.42 GB (+6.0 KV) 46.42 GB (+12.0 KV)
Q8_0 8.0 bpw FP16 33.6 GB 35.17 GB (+0.75 KV) 35.92 GB (+1.5 KV) 37.42 GB (+3.0 KV) 40.42 GB (+6.0 KV)
Q8_0 8.0 bpw Q8_0 33.6 GB 34.83 GB (+0.41 KV) 35.25 GB (+0.83 KV) 36.07 GB (+1.65 KV) 37.72 GB (+3.3 KV)
Q8_0 8.0 bpw FP8 (Exp) 33.6 GB 34.8 GB (+0.38 KV) 35.17 GB (+0.75 KV) 35.92 GB (+1.5 KV) 37.42 GB (+3.0 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 33.6 GB 34.65 GB (+0.22 KV) 34.87 GB (+0.45 KV) 35.32 GB (+0.9 KV) 36.22 GB (+1.8 KV)
Q4_K_M 4.65 bpw FP32 19.53 GB 21.85 GB (+1.5 KV) 23.35 GB (+3.0 KV) 26.35 GB (+6.0 KV) 32.35 GB (+12.0 KV)
Q4_K_M 4.65 bpw FP16 19.53 GB 21.1 GB (+0.75 KV) 21.85 GB (+1.5 KV) 23.35 GB (+3.0 KV) 26.35 GB (+6.0 KV)
Q4_K_M 4.65 bpw Q8_0 19.53 GB 20.76 GB (+0.41 KV) 21.18 GB (+0.83 KV) 22.0 GB (+1.65 KV) 23.65 GB (+3.3 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 19.53 GB 20.73 GB (+0.38 KV) 21.1 GB (+0.75 KV) 21.85 GB (+1.5 KV) 23.35 GB (+3.0 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 19.53 GB 20.58 GB (+0.22 KV) 20.8 GB (+0.45 KV) 21.25 GB (+0.9 KV) 22.15 GB (+1.8 KV)
Q4_K_S 4.58 bpw FP32 19.24 GB 21.56 GB (+1.5 KV) 23.06 GB (+3.0 KV) 26.06 GB (+6.0 KV) 32.06 GB (+12.0 KV)
Q4_K_S 4.58 bpw FP16 19.24 GB 20.81 GB (+0.75 KV) 21.56 GB (+1.5 KV) 23.06 GB (+3.0 KV) 26.06 GB (+6.0 KV)
Q4_K_S 4.58 bpw Q8_0 19.24 GB 20.47 GB (+0.41 KV) 20.88 GB (+0.83 KV) 21.71 GB (+1.65 KV) 23.36 GB (+3.3 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 19.24 GB 20.43 GB (+0.38 KV) 20.81 GB (+0.75 KV) 21.56 GB (+1.5 KV) 23.06 GB (+3.0 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 19.24 GB 20.28 GB (+0.22 KV) 20.51 GB (+0.45 KV) 20.96 GB (+0.9 KV) 21.86 GB (+1.8 KV)
Q3_K_M 3.91 bpw FP32 16.42 GB 18.74 GB (+1.5 KV) 20.24 GB (+3.0 KV) 23.24 GB (+6.0 KV) 29.24 GB (+12.0 KV)
Q3_K_M 3.91 bpw FP16 16.42 GB 17.99 GB (+0.75 KV) 18.74 GB (+1.5 KV) 20.24 GB (+3.0 KV) 23.24 GB (+6.0 KV)
Q3_K_M 3.91 bpw Q8_0 16.42 GB 17.65 GB (+0.41 KV) 18.07 GB (+0.83 KV) 18.89 GB (+1.65 KV) 20.54 GB (+3.3 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 16.42 GB 17.62 GB (+0.38 KV) 17.99 GB (+0.75 KV) 18.74 GB (+1.5 KV) 20.24 GB (+3.0 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 16.42 GB 17.47 GB (+0.22 KV) 17.69 GB (+0.45 KV) 18.14 GB (+0.9 KV) 19.04 GB (+1.8 KV)
Q2_K 2.63 bpw FP32 11.05 GB 13.37 GB (+1.5 KV) 14.87 GB (+3.0 KV) 17.87 GB (+6.0 KV) 23.87 GB (+12.0 KV)
Q2_K 2.63 bpw FP16 11.05 GB 12.62 GB (+0.75 KV) 13.37 GB (+1.5 KV) 14.87 GB (+3.0 KV) 17.87 GB (+6.0 KV)
Q2_K 2.63 bpw Q8_0 11.05 GB 12.28 GB (+0.41 KV) 12.69 GB (+0.83 KV) 13.52 GB (+1.65 KV) 15.17 GB (+3.3 KV)
Q2_K 2.63 bpw FP8 (Exp) 11.05 GB 12.24 GB (+0.38 KV) 12.62 GB (+0.75 KV) 13.37 GB (+1.5 KV) 14.87 GB (+3.0 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 11.05 GB 12.09 GB (+0.22 KV) 12.32 GB (+0.45 KV) 12.77 GB (+0.9 KV) 13.67 GB (+1.8 KV)

Total VRAM = Model Weights + KV Cache + 0.82 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Qwen3-Omni-30B-A3B-Captioner

Use our calculator to see if this model fits your specific hardware configuration.