VRAM usage for all quantization and cache format combinations. Base overhead: 0.54 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context |
|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 8.4 GB | 11.19 GB (+2.25 KV) | 13.44 GB (+4.5 KV) | 17.94 GB (+9.0 KV) |
| FP16 16.0 bpw | FP16 | 8.4 GB | 10.07 GB (+1.12 KV) | 11.19 GB (+2.25 KV) | 13.44 GB (+4.5 KV) |
| FP16 16.0 bpw | Q8_0 | 8.4 GB | 9.56 GB (+0.62 KV) | 10.18 GB (+1.24 KV) | 11.41 GB (+2.48 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 8.4 GB | 9.5 GB (+0.56 KV) | 10.07 GB (+1.12 KV) | 11.19 GB (+2.25 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 8.4 GB | 9.28 GB (+0.34 KV) | 9.62 GB (+0.67 KV) | 10.29 GB (+1.35 KV) |
| Q8_0 8.0 bpw | FP32 | 4.2 GB | 6.99 GB (+2.25 KV) | 9.24 GB (+4.5 KV) | 13.74 GB (+9.0 KV) |
| Q8_0 8.0 bpw | FP16 | 4.2 GB | 5.87 GB (+1.12 KV) | 6.99 GB (+2.25 KV) | 9.24 GB (+4.5 KV) |
| Q8_0 8.0 bpw | Q8_0 | 4.2 GB | 5.36 GB (+0.62 KV) | 5.98 GB (+1.24 KV) | 7.22 GB (+2.48 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 4.2 GB | 5.3 GB (+0.56 KV) | 5.87 GB (+1.12 KV) | 6.99 GB (+2.25 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 4.2 GB | 5.08 GB (+0.34 KV) | 5.42 GB (+0.67 KV) | 6.09 GB (+1.35 KV) |
| Q4_K_M 4.65 bpw | FP32 | 2.44 GB | 5.23 GB (+2.25 KV) | 7.48 GB (+4.5 KV) | 11.98 GB (+9.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 2.44 GB | 4.11 GB (+1.12 KV) | 5.23 GB (+2.25 KV) | 7.48 GB (+4.5 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 2.44 GB | 3.6 GB (+0.62 KV) | 4.22 GB (+1.24 KV) | 5.46 GB (+2.48 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 2.44 GB | 3.54 GB (+0.56 KV) | 4.11 GB (+1.12 KV) | 5.23 GB (+2.25 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 2.44 GB | 3.32 GB (+0.34 KV) | 3.66 GB (+0.67 KV) | 4.33 GB (+1.35 KV) |
| Q4_K_S 4.58 bpw | FP32 | 2.4 GB | 5.19 GB (+2.25 KV) | 7.44 GB (+4.5 KV) | 11.94 GB (+9.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 2.4 GB | 4.07 GB (+1.12 KV) | 5.19 GB (+2.25 KV) | 7.44 GB (+4.5 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 2.4 GB | 3.56 GB (+0.62 KV) | 4.18 GB (+1.24 KV) | 5.42 GB (+2.48 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 2.4 GB | 3.51 GB (+0.56 KV) | 4.07 GB (+1.12 KV) | 5.19 GB (+2.25 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 2.4 GB | 3.28 GB (+0.34 KV) | 3.62 GB (+0.67 KV) | 4.29 GB (+1.35 KV) |
| Q3_K_M 3.91 bpw | FP32 | 2.05 GB | 4.84 GB (+2.25 KV) | 7.09 GB (+4.5 KV) | 11.59 GB (+9.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 2.05 GB | 3.72 GB (+1.12 KV) | 4.84 GB (+2.25 KV) | 7.09 GB (+4.5 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 2.05 GB | 3.21 GB (+0.62 KV) | 3.83 GB (+1.24 KV) | 5.07 GB (+2.48 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 2.05 GB | 3.16 GB (+0.56 KV) | 3.72 GB (+1.12 KV) | 4.84 GB (+2.25 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 2.05 GB | 2.93 GB (+0.34 KV) | 3.27 GB (+0.67 KV) | 3.94 GB (+1.35 KV) |
| Q2_K 2.63 bpw | FP32 | 1.38 GB | 4.17 GB (+2.25 KV) | 6.42 GB (+4.5 KV) | 10.92 GB (+9.0 KV) |
| Q2_K 2.63 bpw | FP16 | 1.38 GB | 3.05 GB (+1.12 KV) | 4.17 GB (+2.25 KV) | 6.42 GB (+4.5 KV) |
| Q2_K 2.63 bpw | Q8_0 | 1.38 GB | 2.54 GB (+0.62 KV) | 3.16 GB (+1.24 KV) | 4.4 GB (+2.48 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 1.38 GB | 2.48 GB (+0.56 KV) | 3.05 GB (+1.12 KV) | 4.17 GB (+2.25 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 1.38 GB | 2.26 GB (+0.34 KV) | 2.6 GB (+0.67 KV) | 3.27 GB (+1.35 KV) |
Total VRAM = Model Weights + KV Cache + 0.54 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.