Back to Models

Qwen2.5-Omni-3B

Standard Transformer 4.8B Parameters

Model Specifications

Layers 36
Hidden Dimension 2,048
Attention Heads 16
KV Heads 2
Max Context 32K tokens
Vocabulary Size 151,936

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.55 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context
FP16 16.0 bpw FP32 10.08 GB 11.19 GB (+0.56 KV) 11.75 GB (+1.12 KV) 12.88 GB (+2.25 KV)
FP16 16.0 bpw FP16 10.08 GB 10.91 GB (+0.28 KV) 11.19 GB (+0.56 KV) 11.75 GB (+1.12 KV)
FP16 16.0 bpw Q8_0 10.08 GB 10.78 GB (+0.15 KV) 10.94 GB (+0.31 KV) 11.25 GB (+0.62 KV)
FP16 16.0 bpw FP8 (Exp) 10.08 GB 10.77 GB (+0.14 KV) 10.91 GB (+0.28 KV) 11.19 GB (+0.56 KV)
FP16 16.0 bpw Q4_0 (Exp) 10.08 GB 10.71 GB (+0.08 KV) 10.8 GB (+0.17 KV) 10.97 GB (+0.34 KV)
Q8_0 8.0 bpw FP32 5.04 GB 6.15 GB (+0.56 KV) 6.71 GB (+1.12 KV) 7.84 GB (+2.25 KV)
Q8_0 8.0 bpw FP16 5.04 GB 5.87 GB (+0.28 KV) 6.15 GB (+0.56 KV) 6.71 GB (+1.12 KV)
Q8_0 8.0 bpw Q8_0 5.04 GB 5.74 GB (+0.15 KV) 5.9 GB (+0.31 KV) 6.21 GB (+0.62 KV)
Q8_0 8.0 bpw FP8 (Exp) 5.04 GB 5.73 GB (+0.14 KV) 5.87 GB (+0.28 KV) 6.15 GB (+0.56 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 5.04 GB 5.67 GB (+0.08 KV) 5.76 GB (+0.17 KV) 5.93 GB (+0.34 KV)
Q4_K_M 4.65 bpw FP32 2.93 GB 4.04 GB (+0.56 KV) 4.6 GB (+1.12 KV) 5.73 GB (+2.25 KV)
Q4_K_M 4.65 bpw FP16 2.93 GB 3.76 GB (+0.28 KV) 4.04 GB (+0.56 KV) 4.6 GB (+1.12 KV)
Q4_K_M 4.65 bpw Q8_0 2.93 GB 3.63 GB (+0.15 KV) 3.79 GB (+0.31 KV) 4.1 GB (+0.62 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 2.93 GB 3.62 GB (+0.14 KV) 3.76 GB (+0.28 KV) 4.04 GB (+0.56 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 2.93 GB 3.56 GB (+0.08 KV) 3.65 GB (+0.17 KV) 3.81 GB (+0.34 KV)
Q4_K_S 4.58 bpw FP32 2.89 GB 4.0 GB (+0.56 KV) 4.56 GB (+1.12 KV) 5.68 GB (+2.25 KV)
Q4_K_S 4.58 bpw FP16 2.89 GB 3.71 GB (+0.28 KV) 4.0 GB (+0.56 KV) 4.56 GB (+1.12 KV)
Q4_K_S 4.58 bpw Q8_0 2.89 GB 3.59 GB (+0.15 KV) 3.74 GB (+0.31 KV) 4.05 GB (+0.62 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 2.89 GB 3.57 GB (+0.14 KV) 3.71 GB (+0.28 KV) 4.0 GB (+0.56 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 2.89 GB 3.52 GB (+0.08 KV) 3.6 GB (+0.17 KV) 3.77 GB (+0.34 KV)
Q3_K_M 3.91 bpw FP32 2.46 GB 3.57 GB (+0.56 KV) 4.14 GB (+1.12 KV) 5.26 GB (+2.25 KV)
Q3_K_M 3.91 bpw FP16 2.46 GB 3.29 GB (+0.28 KV) 3.57 GB (+0.56 KV) 4.14 GB (+1.12 KV)
Q3_K_M 3.91 bpw Q8_0 2.46 GB 3.17 GB (+0.15 KV) 3.32 GB (+0.31 KV) 3.63 GB (+0.62 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 2.46 GB 3.15 GB (+0.14 KV) 3.29 GB (+0.28 KV) 3.57 GB (+0.56 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 2.46 GB 3.1 GB (+0.08 KV) 3.18 GB (+0.17 KV) 3.35 GB (+0.34 KV)
Q2_K 2.63 bpw FP32 1.66 GB 2.77 GB (+0.56 KV) 3.33 GB (+1.12 KV) 4.45 GB (+2.25 KV)
Q2_K 2.63 bpw FP16 1.66 GB 2.49 GB (+0.28 KV) 2.77 GB (+0.56 KV) 3.33 GB (+1.12 KV)
Q2_K 2.63 bpw Q8_0 1.66 GB 2.36 GB (+0.15 KV) 2.51 GB (+0.31 KV) 2.82 GB (+0.62 KV)
Q2_K 2.63 bpw FP8 (Exp) 1.66 GB 2.35 GB (+0.14 KV) 2.49 GB (+0.28 KV) 2.77 GB (+0.56 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 1.66 GB 2.29 GB (+0.08 KV) 2.37 GB (+0.17 KV) 2.54 GB (+0.34 KV)

Total VRAM = Model Weights + KV Cache + 0.55 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Qwen2.5-Omni-3B

Use our calculator to see if this model fits your specific hardware configuration.