Back to Models

Qwen3-4B

Standard Transformer 4.2B Parameters

Model Specifications

Layers 36
Hidden Dimension 2,560
Attention Heads 32
KV Heads 8
Max Context 40K tokens
Vocabulary Size 151,936

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.54 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 40K Context
FP16 16.0 bpw FP32 8.82 GB 11.61 GB (+2.25 KV) 13.86 GB (+4.5 KV) 18.36 GB (+9.0 KV) 20.61 GB (+11.25 KV)
FP16 16.0 bpw FP16 8.82 GB 10.49 GB (+1.12 KV) 11.61 GB (+2.25 KV) 13.86 GB (+4.5 KV) 14.99 GB (+5.62 KV)
FP16 16.0 bpw Q8_0 8.82 GB 9.98 GB (+0.62 KV) 10.6 GB (+1.24 KV) 11.84 GB (+2.48 KV) 12.46 GB (+3.09 KV)
FP16 16.0 bpw FP8 (Exp) 8.82 GB 9.92 GB (+0.56 KV) 10.49 GB (+1.12 KV) 11.61 GB (+2.25 KV) 12.17 GB (+2.81 KV)
FP16 16.0 bpw Q4_0 (Exp) 8.82 GB 9.7 GB (+0.34 KV) 10.04 GB (+0.67 KV) 10.71 GB (+1.35 KV) 11.05 GB (+1.69 KV)
Q8_0 8.0 bpw FP32 4.41 GB 7.2 GB (+2.25 KV) 9.45 GB (+4.5 KV) 13.95 GB (+9.0 KV) 16.2 GB (+11.25 KV)
Q8_0 8.0 bpw FP16 4.41 GB 6.08 GB (+1.12 KV) 7.2 GB (+2.25 KV) 9.45 GB (+4.5 KV) 10.58 GB (+5.62 KV)
Q8_0 8.0 bpw Q8_0 4.41 GB 5.57 GB (+0.62 KV) 6.19 GB (+1.24 KV) 7.43 GB (+2.48 KV) 8.05 GB (+3.09 KV)
Q8_0 8.0 bpw FP8 (Exp) 4.41 GB 5.51 GB (+0.56 KV) 6.08 GB (+1.12 KV) 7.2 GB (+2.25 KV) 7.76 GB (+2.81 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 4.41 GB 5.29 GB (+0.34 KV) 5.63 GB (+0.67 KV) 6.3 GB (+1.35 KV) 6.64 GB (+1.69 KV)
Q4_K_M 4.65 bpw FP32 2.56 GB 5.36 GB (+2.25 KV) 7.61 GB (+4.5 KV) 12.11 GB (+9.0 KV) 14.36 GB (+11.25 KV)
Q4_K_M 4.65 bpw FP16 2.56 GB 4.23 GB (+1.12 KV) 5.36 GB (+2.25 KV) 7.61 GB (+4.5 KV) 8.73 GB (+5.62 KV)
Q4_K_M 4.65 bpw Q8_0 2.56 GB 3.72 GB (+0.62 KV) 4.34 GB (+1.24 KV) 5.58 GB (+2.48 KV) 6.2 GB (+3.09 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 2.56 GB 3.67 GB (+0.56 KV) 4.23 GB (+1.12 KV) 5.36 GB (+2.25 KV) 5.92 GB (+2.81 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 2.56 GB 3.44 GB (+0.34 KV) 3.78 GB (+0.67 KV) 4.46 GB (+1.35 KV) 4.79 GB (+1.69 KV)
Q4_K_S 4.58 bpw FP32 2.52 GB 5.32 GB (+2.25 KV) 7.57 GB (+4.5 KV) 12.07 GB (+9.0 KV) 14.32 GB (+11.25 KV)
Q4_K_S 4.58 bpw FP16 2.52 GB 4.19 GB (+1.12 KV) 5.32 GB (+2.25 KV) 7.57 GB (+4.5 KV) 8.69 GB (+5.62 KV)
Q4_K_S 4.58 bpw Q8_0 2.52 GB 3.69 GB (+0.62 KV) 4.3 GB (+1.24 KV) 5.54 GB (+2.48 KV) 6.16 GB (+3.09 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 2.52 GB 3.63 GB (+0.56 KV) 4.19 GB (+1.12 KV) 5.32 GB (+2.25 KV) 5.88 GB (+2.81 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 2.52 GB 3.4 GB (+0.34 KV) 3.74 GB (+0.67 KV) 4.42 GB (+1.35 KV) 4.75 GB (+1.69 KV)
Q3_K_M 3.91 bpw FP32 2.16 GB 4.95 GB (+2.25 KV) 7.2 GB (+4.5 KV) 11.7 GB (+9.0 KV) 13.95 GB (+11.25 KV)
Q3_K_M 3.91 bpw FP16 2.16 GB 3.82 GB (+1.12 KV) 4.95 GB (+2.25 KV) 7.2 GB (+4.5 KV) 8.32 GB (+5.62 KV)
Q3_K_M 3.91 bpw Q8_0 2.16 GB 3.32 GB (+0.62 KV) 3.93 GB (+1.24 KV) 5.17 GB (+2.48 KV) 5.79 GB (+3.09 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 2.16 GB 3.26 GB (+0.56 KV) 3.82 GB (+1.12 KV) 4.95 GB (+2.25 KV) 5.51 GB (+2.81 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 2.16 GB 3.03 GB (+0.34 KV) 3.37 GB (+0.67 KV) 4.05 GB (+1.35 KV) 4.38 GB (+1.69 KV)
Q2_K 2.63 bpw FP32 1.45 GB 4.24 GB (+2.25 KV) 6.49 GB (+4.5 KV) 10.99 GB (+9.0 KV) 13.24 GB (+11.25 KV)
Q2_K 2.63 bpw FP16 1.45 GB 3.12 GB (+1.12 KV) 4.24 GB (+2.25 KV) 6.49 GB (+4.5 KV) 7.62 GB (+5.62 KV)
Q2_K 2.63 bpw Q8_0 1.45 GB 2.61 GB (+0.62 KV) 3.23 GB (+1.24 KV) 4.47 GB (+2.48 KV) 5.09 GB (+3.09 KV)
Q2_K 2.63 bpw FP8 (Exp) 1.45 GB 2.55 GB (+0.56 KV) 3.12 GB (+1.12 KV) 4.24 GB (+2.25 KV) 4.8 GB (+2.81 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 1.45 GB 2.33 GB (+0.34 KV) 2.67 GB (+0.67 KV) 3.34 GB (+1.35 KV) 3.68 GB (+1.69 KV)

Total VRAM = Model Weights + KV Cache + 0.54 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Qwen3-4B

Use our calculator to see if this model fits your specific hardware configuration.