Back to Models

Qwen2.5-Omni-7B

Standard Transformer 8.5B Parameters

Model Specifications

Layers 28
Hidden Dimension 3,584
Attention Heads 28
KV Heads 4
Max Context 32K tokens
Vocabulary Size 152,064

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.58 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context
FP16 16.0 bpw FP32 17.85 GB 19.31 GB (+0.88 KV) 20.19 GB (+1.75 KV) 21.94 GB (+3.5 KV)
FP16 16.0 bpw FP16 17.85 GB 18.87 GB (+0.44 KV) 19.31 GB (+0.88 KV) 20.19 GB (+1.75 KV)
FP16 16.0 bpw Q8_0 17.85 GB 18.68 GB (+0.24 KV) 18.92 GB (+0.48 KV) 19.4 GB (+0.96 KV)
FP16 16.0 bpw FP8 (Exp) 17.85 GB 18.65 GB (+0.22 KV) 18.87 GB (+0.44 KV) 19.31 GB (+0.88 KV)
FP16 16.0 bpw Q4_0 (Exp) 17.85 GB 18.57 GB (+0.13 KV) 18.7 GB (+0.26 KV) 18.96 GB (+0.53 KV)
Q8_0 8.0 bpw FP32 8.93 GB 10.39 GB (+0.88 KV) 11.26 GB (+1.75 KV) 13.01 GB (+3.5 KV)
Q8_0 8.0 bpw FP16 8.93 GB 9.95 GB (+0.44 KV) 10.39 GB (+0.88 KV) 11.26 GB (+1.75 KV)
Q8_0 8.0 bpw Q8_0 8.93 GB 9.75 GB (+0.24 KV) 9.99 GB (+0.48 KV) 10.47 GB (+0.96 KV)
Q8_0 8.0 bpw FP8 (Exp) 8.93 GB 9.73 GB (+0.22 KV) 9.95 GB (+0.44 KV) 10.39 GB (+0.88 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 8.93 GB 9.64 GB (+0.13 KV) 9.77 GB (+0.26 KV) 10.04 GB (+0.53 KV)
Q4_K_M 4.65 bpw FP32 5.19 GB 6.65 GB (+0.88 KV) 7.52 GB (+1.75 KV) 9.27 GB (+3.5 KV)
Q4_K_M 4.65 bpw FP16 5.19 GB 6.21 GB (+0.44 KV) 6.65 GB (+0.88 KV) 7.52 GB (+1.75 KV)
Q4_K_M 4.65 bpw Q8_0 5.19 GB 6.01 GB (+0.24 KV) 6.25 GB (+0.48 KV) 6.74 GB (+0.96 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 5.19 GB 5.99 GB (+0.22 KV) 6.21 GB (+0.44 KV) 6.65 GB (+0.88 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 5.19 GB 5.9 GB (+0.13 KV) 6.04 GB (+0.26 KV) 6.3 GB (+0.53 KV)
Q4_K_S 4.58 bpw FP32 5.11 GB 6.57 GB (+0.88 KV) 7.44 GB (+1.75 KV) 9.19 GB (+3.5 KV)
Q4_K_S 4.58 bpw FP16 5.11 GB 6.13 GB (+0.44 KV) 6.57 GB (+0.88 KV) 7.44 GB (+1.75 KV)
Q4_K_S 4.58 bpw Q8_0 5.11 GB 5.94 GB (+0.24 KV) 6.18 GB (+0.48 KV) 6.66 GB (+0.96 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 5.11 GB 5.91 GB (+0.22 KV) 6.13 GB (+0.44 KV) 6.57 GB (+0.88 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 5.11 GB 5.83 GB (+0.13 KV) 5.96 GB (+0.26 KV) 6.22 GB (+0.53 KV)
Q3_K_M 3.91 bpw FP32 4.36 GB 5.82 GB (+0.88 KV) 6.7 GB (+1.75 KV) 8.45 GB (+3.5 KV)
Q3_K_M 3.91 bpw FP16 4.36 GB 5.38 GB (+0.44 KV) 5.82 GB (+0.88 KV) 6.7 GB (+1.75 KV)
Q3_K_M 3.91 bpw Q8_0 4.36 GB 5.19 GB (+0.24 KV) 5.43 GB (+0.48 KV) 5.91 GB (+0.96 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 4.36 GB 5.17 GB (+0.22 KV) 5.38 GB (+0.44 KV) 5.82 GB (+0.88 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 4.36 GB 5.08 GB (+0.13 KV) 5.21 GB (+0.26 KV) 5.47 GB (+0.53 KV)
Q2_K 2.63 bpw FP32 2.93 GB 4.39 GB (+0.88 KV) 5.27 GB (+1.75 KV) 7.02 GB (+3.5 KV)
Q2_K 2.63 bpw FP16 2.93 GB 3.96 GB (+0.44 KV) 4.39 GB (+0.88 KV) 5.27 GB (+1.75 KV)
Q2_K 2.63 bpw Q8_0 2.93 GB 3.76 GB (+0.24 KV) 4.0 GB (+0.48 KV) 4.48 GB (+0.96 KV)
Q2_K 2.63 bpw FP8 (Exp) 2.93 GB 3.74 GB (+0.22 KV) 3.96 GB (+0.44 KV) 4.39 GB (+0.88 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 2.93 GB 3.65 GB (+0.13 KV) 3.78 GB (+0.26 KV) 4.04 GB (+0.53 KV)

Total VRAM = Model Weights + KV Cache + 0.58 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Qwen2.5-Omni-7B

Use our calculator to see if this model fits your specific hardware configuration.