Back to Models

Qwen3Guard-Gen-8B

Standard Transformer 8.0B Parameters

Model Specifications

Layers 36
Hidden Dimension 4,096
Attention Heads 32
KV Heads 8
Max Context 32K tokens
Vocabulary Size 151,936

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.58 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context
FP16 16.0 bpw FP32 16.8 GB 19.63 GB (+2.25 KV) 21.88 GB (+4.5 KV) 26.38 GB (+9.0 KV)
FP16 16.0 bpw FP16 16.8 GB 18.5 GB (+1.12 KV) 19.63 GB (+2.25 KV) 21.88 GB (+4.5 KV)
FP16 16.0 bpw Q8_0 16.8 GB 18.0 GB (+0.62 KV) 18.62 GB (+1.24 KV) 19.86 GB (+2.48 KV)
FP16 16.0 bpw FP8 (Exp) 16.8 GB 17.94 GB (+0.56 KV) 18.5 GB (+1.12 KV) 19.63 GB (+2.25 KV)
FP16 16.0 bpw Q4_0 (Exp) 16.8 GB 17.72 GB (+0.34 KV) 18.05 GB (+0.67 KV) 18.73 GB (+1.35 KV)
Q8_0 8.0 bpw FP32 8.4 GB 11.23 GB (+2.25 KV) 13.48 GB (+4.5 KV) 17.98 GB (+9.0 KV)
Q8_0 8.0 bpw FP16 8.4 GB 10.11 GB (+1.12 KV) 11.23 GB (+2.25 KV) 13.48 GB (+4.5 KV)
Q8_0 8.0 bpw Q8_0 8.4 GB 9.6 GB (+0.62 KV) 10.22 GB (+1.24 KV) 11.46 GB (+2.48 KV)
Q8_0 8.0 bpw FP8 (Exp) 8.4 GB 9.54 GB (+0.56 KV) 10.11 GB (+1.12 KV) 11.23 GB (+2.25 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 8.4 GB 9.32 GB (+0.34 KV) 9.66 GB (+0.67 KV) 10.33 GB (+1.35 KV)
Q4_K_M 4.65 bpw FP32 4.88 GB 7.71 GB (+2.25 KV) 9.96 GB (+4.5 KV) 14.46 GB (+9.0 KV)
Q4_K_M 4.65 bpw FP16 4.88 GB 6.59 GB (+1.12 KV) 7.71 GB (+2.25 KV) 9.96 GB (+4.5 KV)
Q4_K_M 4.65 bpw Q8_0 4.88 GB 6.08 GB (+0.62 KV) 6.7 GB (+1.24 KV) 7.94 GB (+2.48 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 4.88 GB 6.03 GB (+0.56 KV) 6.59 GB (+1.12 KV) 7.71 GB (+2.25 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 4.88 GB 5.8 GB (+0.34 KV) 6.14 GB (+0.67 KV) 6.81 GB (+1.35 KV)
Q4_K_S 4.58 bpw FP32 4.81 GB 7.64 GB (+2.25 KV) 9.89 GB (+4.5 KV) 14.39 GB (+9.0 KV)
Q4_K_S 4.58 bpw FP16 4.81 GB 6.51 GB (+1.12 KV) 7.64 GB (+2.25 KV) 9.89 GB (+4.5 KV)
Q4_K_S 4.58 bpw Q8_0 4.81 GB 6.01 GB (+0.62 KV) 6.63 GB (+1.24 KV) 7.86 GB (+2.48 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 4.81 GB 5.95 GB (+0.56 KV) 6.51 GB (+1.12 KV) 7.64 GB (+2.25 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 4.81 GB 5.73 GB (+0.34 KV) 6.06 GB (+0.67 KV) 6.74 GB (+1.35 KV)
Q3_K_M 3.91 bpw FP32 4.11 GB 6.94 GB (+2.25 KV) 9.19 GB (+4.5 KV) 13.69 GB (+9.0 KV)
Q3_K_M 3.91 bpw FP16 4.11 GB 5.81 GB (+1.12 KV) 6.94 GB (+2.25 KV) 9.19 GB (+4.5 KV)
Q3_K_M 3.91 bpw Q8_0 4.11 GB 5.3 GB (+0.62 KV) 5.92 GB (+1.24 KV) 7.16 GB (+2.48 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 4.11 GB 5.25 GB (+0.56 KV) 5.81 GB (+1.12 KV) 6.94 GB (+2.25 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 4.11 GB 5.02 GB (+0.34 KV) 5.36 GB (+0.67 KV) 6.04 GB (+1.35 KV)
Q2_K 2.63 bpw FP32 2.76 GB 5.59 GB (+2.25 KV) 7.84 GB (+4.5 KV) 12.34 GB (+9.0 KV)
Q2_K 2.63 bpw FP16 2.76 GB 4.47 GB (+1.12 KV) 5.59 GB (+2.25 KV) 7.84 GB (+4.5 KV)
Q2_K 2.63 bpw Q8_0 2.76 GB 3.96 GB (+0.62 KV) 4.58 GB (+1.24 KV) 5.82 GB (+2.48 KV)
Q2_K 2.63 bpw FP8 (Exp) 2.76 GB 3.9 GB (+0.56 KV) 4.47 GB (+1.12 KV) 5.59 GB (+2.25 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 2.76 GB 3.68 GB (+0.34 KV) 4.02 GB (+0.67 KV) 4.69 GB (+1.35 KV)

Total VRAM = Model Weights + KV Cache + 0.58 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Qwen3Guard-Gen-8B

Use our calculator to see if this model fits your specific hardware configuration.