Back to Models

Qwen3-0.6B

Standard Transformer 0.6B Parameters

Model Specifications

Layers 28
Hidden Dimension 1,024
Attention Heads 16
KV Heads 8
Max Context 40K tokens
Vocabulary Size 151,936

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.51 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 40K Context
FP16 16.0 bpw FP32 1.26 GB 3.52 GB (+1.75 KV) 5.27 GB (+3.5 KV) 8.77 GB (+7.0 KV) 10.52 GB (+8.75 KV)
FP16 16.0 bpw FP16 1.26 GB 2.64 GB (+0.88 KV) 3.52 GB (+1.75 KV) 5.27 GB (+3.5 KV) 6.14 GB (+4.38 KV)
FP16 16.0 bpw Q8_0 1.26 GB 2.25 GB (+0.48 KV) 2.73 GB (+0.96 KV) 3.69 GB (+1.93 KV) 4.17 GB (+2.41 KV)
FP16 16.0 bpw FP8 (Exp) 1.26 GB 2.2 GB (+0.44 KV) 2.64 GB (+0.88 KV) 3.52 GB (+1.75 KV) 3.95 GB (+2.19 KV)
FP16 16.0 bpw Q4_0 (Exp) 1.26 GB 2.03 GB (+0.26 KV) 2.29 GB (+0.53 KV) 2.82 GB (+1.05 KV) 3.08 GB (+1.31 KV)
Q8_0 8.0 bpw FP32 0.63 GB 2.89 GB (+1.75 KV) 4.64 GB (+3.5 KV) 8.14 GB (+7.0 KV) 9.89 GB (+8.75 KV)
Q8_0 8.0 bpw FP16 0.63 GB 2.01 GB (+0.88 KV) 2.89 GB (+1.75 KV) 4.64 GB (+3.5 KV) 5.51 GB (+4.38 KV)
Q8_0 8.0 bpw Q8_0 0.63 GB 1.62 GB (+0.48 KV) 2.1 GB (+0.96 KV) 3.06 GB (+1.93 KV) 3.54 GB (+2.41 KV)
Q8_0 8.0 bpw FP8 (Exp) 0.63 GB 1.57 GB (+0.44 KV) 2.01 GB (+0.88 KV) 2.89 GB (+1.75 KV) 3.32 GB (+2.19 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 0.63 GB 1.4 GB (+0.26 KV) 1.66 GB (+0.53 KV) 2.19 GB (+1.05 KV) 2.45 GB (+1.31 KV)
Q4_K_M 4.65 bpw FP32 0.37 GB 2.62 GB (+1.75 KV) 4.37 GB (+3.5 KV) 7.87 GB (+7.0 KV) 9.62 GB (+8.75 KV)
Q4_K_M 4.65 bpw FP16 0.37 GB 1.75 GB (+0.88 KV) 2.62 GB (+1.75 KV) 4.37 GB (+3.5 KV) 5.25 GB (+4.38 KV)
Q4_K_M 4.65 bpw Q8_0 0.37 GB 1.35 GB (+0.48 KV) 1.83 GB (+0.96 KV) 2.8 GB (+1.93 KV) 3.28 GB (+2.41 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 0.37 GB 1.31 GB (+0.44 KV) 1.75 GB (+0.88 KV) 2.62 GB (+1.75 KV) 3.06 GB (+2.19 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 0.37 GB 1.13 GB (+0.26 KV) 1.4 GB (+0.53 KV) 1.92 GB (+1.05 KV) 2.18 GB (+1.31 KV)
Q4_K_S 4.58 bpw FP32 0.36 GB 2.62 GB (+1.75 KV) 4.37 GB (+3.5 KV) 7.87 GB (+7.0 KV) 9.62 GB (+8.75 KV)
Q4_K_S 4.58 bpw FP16 0.36 GB 1.74 GB (+0.88 KV) 2.62 GB (+1.75 KV) 4.37 GB (+3.5 KV) 5.24 GB (+4.38 KV)
Q4_K_S 4.58 bpw Q8_0 0.36 GB 1.35 GB (+0.48 KV) 1.83 GB (+0.96 KV) 2.79 GB (+1.93 KV) 3.27 GB (+2.41 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 0.36 GB 1.3 GB (+0.44 KV) 1.74 GB (+0.88 KV) 2.62 GB (+1.75 KV) 3.05 GB (+2.19 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 0.36 GB 1.13 GB (+0.26 KV) 1.39 GB (+0.53 KV) 1.92 GB (+1.05 KV) 2.18 GB (+1.31 KV)
Q3_K_M 3.91 bpw FP32 0.31 GB 2.56 GB (+1.75 KV) 4.31 GB (+3.5 KV) 7.81 GB (+7.0 KV) 9.56 GB (+8.75 KV)
Q3_K_M 3.91 bpw FP16 0.31 GB 1.69 GB (+0.88 KV) 2.56 GB (+1.75 KV) 4.31 GB (+3.5 KV) 5.19 GB (+4.38 KV)
Q3_K_M 3.91 bpw Q8_0 0.31 GB 1.3 GB (+0.48 KV) 1.78 GB (+0.96 KV) 2.74 GB (+1.93 KV) 3.22 GB (+2.41 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 0.31 GB 1.25 GB (+0.44 KV) 1.69 GB (+0.88 KV) 2.56 GB (+1.75 KV) 3.0 GB (+2.19 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 0.31 GB 1.08 GB (+0.26 KV) 1.34 GB (+0.53 KV) 1.86 GB (+1.05 KV) 2.13 GB (+1.31 KV)
Q2_K 2.63 bpw FP32 0.21 GB 2.46 GB (+1.75 KV) 4.21 GB (+3.5 KV) 7.71 GB (+7.0 KV) 9.46 GB (+8.75 KV)
Q2_K 2.63 bpw FP16 0.21 GB 1.59 GB (+0.88 KV) 2.46 GB (+1.75 KV) 4.21 GB (+3.5 KV) 5.09 GB (+4.38 KV)
Q2_K 2.63 bpw Q8_0 0.21 GB 1.19 GB (+0.48 KV) 1.68 GB (+0.96 KV) 2.64 GB (+1.93 KV) 3.12 GB (+2.41 KV)
Q2_K 2.63 bpw FP8 (Exp) 0.21 GB 1.15 GB (+0.44 KV) 1.59 GB (+0.88 KV) 2.46 GB (+1.75 KV) 2.9 GB (+2.19 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 0.21 GB 0.98 GB (+0.26 KV) 1.24 GB (+0.53 KV) 1.76 GB (+1.05 KV) 2.03 GB (+1.31 KV)

Total VRAM = Model Weights + KV Cache + 0.51 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Qwen3-0.6B

Use our calculator to see if this model fits your specific hardware configuration.