Back to Models

Qwen3-32B

Standard Transformer 32.5B Parameters

Model Specifications

Layers 64
Hidden Dimension 5,120
Attention Heads 64
KV Heads 8
Max Context 40K tokens
Vocabulary Size 151,936

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.82 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 40K Context
FP16 16.0 bpw FP32 68.25 GB 73.08 GB (+4.0 KV) 77.08 GB (+8.0 KV) 85.08 GB (+16.0 KV) 89.08 GB (+20.0 KV)
FP16 16.0 bpw FP16 68.25 GB 71.08 GB (+2.0 KV) 73.08 GB (+4.0 KV) 77.08 GB (+8.0 KV) 79.08 GB (+10.0 KV)
FP16 16.0 bpw Q8_0 68.25 GB 70.17 GB (+1.1 KV) 71.28 GB (+2.2 KV) 73.48 GB (+4.4 KV) 74.58 GB (+5.5 KV)
FP16 16.0 bpw FP8 (Exp) 68.25 GB 70.08 GB (+1.0 KV) 71.08 GB (+2.0 KV) 73.08 GB (+4.0 KV) 74.08 GB (+5.0 KV)
FP16 16.0 bpw Q4_0 (Exp) 68.25 GB 69.67 GB (+0.6 KV) 70.28 GB (+1.2 KV) 71.48 GB (+2.4 KV) 72.08 GB (+3.0 KV)
Q8_0 8.0 bpw FP32 34.12 GB 38.95 GB (+4.0 KV) 42.95 GB (+8.0 KV) 50.95 GB (+16.0 KV) 54.95 GB (+20.0 KV)
Q8_0 8.0 bpw FP16 34.12 GB 36.95 GB (+2.0 KV) 38.95 GB (+4.0 KV) 42.95 GB (+8.0 KV) 44.95 GB (+10.0 KV)
Q8_0 8.0 bpw Q8_0 34.12 GB 36.05 GB (+1.1 KV) 37.15 GB (+2.2 KV) 39.35 GB (+4.4 KV) 40.45 GB (+5.5 KV)
Q8_0 8.0 bpw FP8 (Exp) 34.12 GB 35.95 GB (+1.0 KV) 36.95 GB (+2.0 KV) 38.95 GB (+4.0 KV) 39.95 GB (+5.0 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 34.12 GB 35.55 GB (+0.6 KV) 36.15 GB (+1.2 KV) 37.35 GB (+2.4 KV) 37.95 GB (+3.0 KV)
Q4_K_M 4.65 bpw FP32 19.84 GB 24.66 GB (+4.0 KV) 28.66 GB (+8.0 KV) 36.66 GB (+16.0 KV) 40.66 GB (+20.0 KV)
Q4_K_M 4.65 bpw FP16 19.84 GB 22.66 GB (+2.0 KV) 24.66 GB (+4.0 KV) 28.66 GB (+8.0 KV) 30.66 GB (+10.0 KV)
Q4_K_M 4.65 bpw Q8_0 19.84 GB 21.76 GB (+1.1 KV) 22.86 GB (+2.2 KV) 25.06 GB (+4.4 KV) 26.16 GB (+5.5 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 19.84 GB 21.66 GB (+1.0 KV) 22.66 GB (+2.0 KV) 24.66 GB (+4.0 KV) 25.66 GB (+5.0 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 19.84 GB 21.26 GB (+0.6 KV) 21.86 GB (+1.2 KV) 23.06 GB (+2.4 KV) 23.66 GB (+3.0 KV)
Q4_K_S 4.58 bpw FP32 19.54 GB 24.36 GB (+4.0 KV) 28.36 GB (+8.0 KV) 36.36 GB (+16.0 KV) 40.36 GB (+20.0 KV)
Q4_K_S 4.58 bpw FP16 19.54 GB 22.36 GB (+2.0 KV) 24.36 GB (+4.0 KV) 28.36 GB (+8.0 KV) 30.36 GB (+10.0 KV)
Q4_K_S 4.58 bpw Q8_0 19.54 GB 21.46 GB (+1.1 KV) 22.56 GB (+2.2 KV) 24.76 GB (+4.4 KV) 25.86 GB (+5.5 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 19.54 GB 21.36 GB (+1.0 KV) 22.36 GB (+2.0 KV) 24.36 GB (+4.0 KV) 25.36 GB (+5.0 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 19.54 GB 20.96 GB (+0.6 KV) 21.56 GB (+1.2 KV) 22.76 GB (+2.4 KV) 23.36 GB (+3.0 KV)
Q3_K_M 3.91 bpw FP32 16.68 GB 21.5 GB (+4.0 KV) 25.5 GB (+8.0 KV) 33.5 GB (+16.0 KV) 37.5 GB (+20.0 KV)
Q3_K_M 3.91 bpw FP16 16.68 GB 19.5 GB (+2.0 KV) 21.5 GB (+4.0 KV) 25.5 GB (+8.0 KV) 27.5 GB (+10.0 KV)
Q3_K_M 3.91 bpw Q8_0 16.68 GB 18.6 GB (+1.1 KV) 19.7 GB (+2.2 KV) 21.9 GB (+4.4 KV) 23.0 GB (+5.5 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 16.68 GB 18.5 GB (+1.0 KV) 19.5 GB (+2.0 KV) 21.5 GB (+4.0 KV) 22.5 GB (+5.0 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 16.68 GB 18.1 GB (+0.6 KV) 18.7 GB (+1.2 KV) 19.9 GB (+2.4 KV) 20.5 GB (+3.0 KV)
Q2_K 2.63 bpw FP32 11.22 GB 16.04 GB (+4.0 KV) 20.04 GB (+8.0 KV) 28.04 GB (+16.0 KV) 32.04 GB (+20.0 KV)
Q2_K 2.63 bpw FP16 11.22 GB 14.04 GB (+2.0 KV) 16.04 GB (+4.0 KV) 20.04 GB (+8.0 KV) 22.04 GB (+10.0 KV)
Q2_K 2.63 bpw Q8_0 11.22 GB 13.14 GB (+1.1 KV) 14.24 GB (+2.2 KV) 16.44 GB (+4.4 KV) 17.54 GB (+5.5 KV)
Q2_K 2.63 bpw FP8 (Exp) 11.22 GB 13.04 GB (+1.0 KV) 14.04 GB (+2.0 KV) 16.04 GB (+4.0 KV) 17.04 GB (+5.0 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 11.22 GB 12.64 GB (+0.6 KV) 13.24 GB (+1.2 KV) 14.44 GB (+2.4 KV) 15.04 GB (+3.0 KV)

Total VRAM = Model Weights + KV Cache + 0.82 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Qwen3-32B

Use our calculator to see if this model fits your specific hardware configuration.