Back to Models

Qwen3Guard-Gen-4B

Standard Transformer 4.0B Parameters

Model Specifications

Layers 36
Hidden Dimension 2,560
Attention Heads 32
KV Heads 8
Max Context 32K tokens
Vocabulary Size 151,936

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.54 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context
FP16 16.0 bpw FP32 8.4 GB 11.19 GB (+2.25 KV) 13.44 GB (+4.5 KV) 17.94 GB (+9.0 KV)
FP16 16.0 bpw FP16 8.4 GB 10.07 GB (+1.12 KV) 11.19 GB (+2.25 KV) 13.44 GB (+4.5 KV)
FP16 16.0 bpw Q8_0 8.4 GB 9.56 GB (+0.62 KV) 10.18 GB (+1.24 KV) 11.41 GB (+2.48 KV)
FP16 16.0 bpw FP8 (Exp) 8.4 GB 9.5 GB (+0.56 KV) 10.07 GB (+1.12 KV) 11.19 GB (+2.25 KV)
FP16 16.0 bpw Q4_0 (Exp) 8.4 GB 9.28 GB (+0.34 KV) 9.62 GB (+0.67 KV) 10.29 GB (+1.35 KV)
Q8_0 8.0 bpw FP32 4.2 GB 6.99 GB (+2.25 KV) 9.24 GB (+4.5 KV) 13.74 GB (+9.0 KV)
Q8_0 8.0 bpw FP16 4.2 GB 5.87 GB (+1.12 KV) 6.99 GB (+2.25 KV) 9.24 GB (+4.5 KV)
Q8_0 8.0 bpw Q8_0 4.2 GB 5.36 GB (+0.62 KV) 5.98 GB (+1.24 KV) 7.22 GB (+2.48 KV)
Q8_0 8.0 bpw FP8 (Exp) 4.2 GB 5.3 GB (+0.56 KV) 5.87 GB (+1.12 KV) 6.99 GB (+2.25 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 4.2 GB 5.08 GB (+0.34 KV) 5.42 GB (+0.67 KV) 6.09 GB (+1.35 KV)
Q4_K_M 4.65 bpw FP32 2.44 GB 5.23 GB (+2.25 KV) 7.48 GB (+4.5 KV) 11.98 GB (+9.0 KV)
Q4_K_M 4.65 bpw FP16 2.44 GB 4.11 GB (+1.12 KV) 5.23 GB (+2.25 KV) 7.48 GB (+4.5 KV)
Q4_K_M 4.65 bpw Q8_0 2.44 GB 3.6 GB (+0.62 KV) 4.22 GB (+1.24 KV) 5.46 GB (+2.48 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 2.44 GB 3.54 GB (+0.56 KV) 4.11 GB (+1.12 KV) 5.23 GB (+2.25 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 2.44 GB 3.32 GB (+0.34 KV) 3.66 GB (+0.67 KV) 4.33 GB (+1.35 KV)
Q4_K_S 4.58 bpw FP32 2.4 GB 5.19 GB (+2.25 KV) 7.44 GB (+4.5 KV) 11.94 GB (+9.0 KV)
Q4_K_S 4.58 bpw FP16 2.4 GB 4.07 GB (+1.12 KV) 5.19 GB (+2.25 KV) 7.44 GB (+4.5 KV)
Q4_K_S 4.58 bpw Q8_0 2.4 GB 3.56 GB (+0.62 KV) 4.18 GB (+1.24 KV) 5.42 GB (+2.48 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 2.4 GB 3.51 GB (+0.56 KV) 4.07 GB (+1.12 KV) 5.19 GB (+2.25 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 2.4 GB 3.28 GB (+0.34 KV) 3.62 GB (+0.67 KV) 4.29 GB (+1.35 KV)
Q3_K_M 3.91 bpw FP32 2.05 GB 4.84 GB (+2.25 KV) 7.09 GB (+4.5 KV) 11.59 GB (+9.0 KV)
Q3_K_M 3.91 bpw FP16 2.05 GB 3.72 GB (+1.12 KV) 4.84 GB (+2.25 KV) 7.09 GB (+4.5 KV)
Q3_K_M 3.91 bpw Q8_0 2.05 GB 3.21 GB (+0.62 KV) 3.83 GB (+1.24 KV) 5.07 GB (+2.48 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 2.05 GB 3.16 GB (+0.56 KV) 3.72 GB (+1.12 KV) 4.84 GB (+2.25 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 2.05 GB 2.93 GB (+0.34 KV) 3.27 GB (+0.67 KV) 3.94 GB (+1.35 KV)
Q2_K 2.63 bpw FP32 1.38 GB 4.17 GB (+2.25 KV) 6.42 GB (+4.5 KV) 10.92 GB (+9.0 KV)
Q2_K 2.63 bpw FP16 1.38 GB 3.05 GB (+1.12 KV) 4.17 GB (+2.25 KV) 6.42 GB (+4.5 KV)
Q2_K 2.63 bpw Q8_0 1.38 GB 2.54 GB (+0.62 KV) 3.16 GB (+1.24 KV) 4.4 GB (+2.48 KV)
Q2_K 2.63 bpw FP8 (Exp) 1.38 GB 2.48 GB (+0.56 KV) 3.05 GB (+1.12 KV) 4.17 GB (+2.25 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 1.38 GB 2.26 GB (+0.34 KV) 2.6 GB (+0.67 KV) 3.27 GB (+1.35 KV)

Total VRAM = Model Weights + KV Cache + 0.54 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Qwen3Guard-Gen-4B

Use our calculator to see if this model fits your specific hardware configuration.