VRAM usage for all quantization and cache format combinations. Base overhead: 0.58 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context |
|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 17.85 GB | 19.31 GB (+0.88 KV) | 20.19 GB (+1.75 KV) | 21.94 GB (+3.5 KV) |
| FP16 16.0 bpw | FP16 | 17.85 GB | 18.87 GB (+0.44 KV) | 19.31 GB (+0.88 KV) | 20.19 GB (+1.75 KV) |
| FP16 16.0 bpw | Q8_0 | 17.85 GB | 18.68 GB (+0.24 KV) | 18.92 GB (+0.48 KV) | 19.4 GB (+0.96 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 17.85 GB | 18.65 GB (+0.22 KV) | 18.87 GB (+0.44 KV) | 19.31 GB (+0.88 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 17.85 GB | 18.57 GB (+0.13 KV) | 18.7 GB (+0.26 KV) | 18.96 GB (+0.53 KV) |
| Q8_0 8.0 bpw | FP32 | 8.93 GB | 10.39 GB (+0.88 KV) | 11.26 GB (+1.75 KV) | 13.01 GB (+3.5 KV) |
| Q8_0 8.0 bpw | FP16 | 8.93 GB | 9.95 GB (+0.44 KV) | 10.39 GB (+0.88 KV) | 11.26 GB (+1.75 KV) |
| Q8_0 8.0 bpw | Q8_0 | 8.93 GB | 9.75 GB (+0.24 KV) | 9.99 GB (+0.48 KV) | 10.47 GB (+0.96 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 8.93 GB | 9.73 GB (+0.22 KV) | 9.95 GB (+0.44 KV) | 10.39 GB (+0.88 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 8.93 GB | 9.64 GB (+0.13 KV) | 9.77 GB (+0.26 KV) | 10.04 GB (+0.53 KV) |
| Q4_K_M 4.65 bpw | FP32 | 5.19 GB | 6.65 GB (+0.88 KV) | 7.52 GB (+1.75 KV) | 9.27 GB (+3.5 KV) |
| Q4_K_M 4.65 bpw | FP16 | 5.19 GB | 6.21 GB (+0.44 KV) | 6.65 GB (+0.88 KV) | 7.52 GB (+1.75 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 5.19 GB | 6.01 GB (+0.24 KV) | 6.25 GB (+0.48 KV) | 6.74 GB (+0.96 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 5.19 GB | 5.99 GB (+0.22 KV) | 6.21 GB (+0.44 KV) | 6.65 GB (+0.88 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 5.19 GB | 5.9 GB (+0.13 KV) | 6.04 GB (+0.26 KV) | 6.3 GB (+0.53 KV) |
| Q4_K_S 4.58 bpw | FP32 | 5.11 GB | 6.57 GB (+0.88 KV) | 7.44 GB (+1.75 KV) | 9.19 GB (+3.5 KV) |
| Q4_K_S 4.58 bpw | FP16 | 5.11 GB | 6.13 GB (+0.44 KV) | 6.57 GB (+0.88 KV) | 7.44 GB (+1.75 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 5.11 GB | 5.94 GB (+0.24 KV) | 6.18 GB (+0.48 KV) | 6.66 GB (+0.96 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 5.11 GB | 5.91 GB (+0.22 KV) | 6.13 GB (+0.44 KV) | 6.57 GB (+0.88 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 5.11 GB | 5.83 GB (+0.13 KV) | 5.96 GB (+0.26 KV) | 6.22 GB (+0.53 KV) |
| Q3_K_M 3.91 bpw | FP32 | 4.36 GB | 5.82 GB (+0.88 KV) | 6.7 GB (+1.75 KV) | 8.45 GB (+3.5 KV) |
| Q3_K_M 3.91 bpw | FP16 | 4.36 GB | 5.38 GB (+0.44 KV) | 5.82 GB (+0.88 KV) | 6.7 GB (+1.75 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 4.36 GB | 5.19 GB (+0.24 KV) | 5.43 GB (+0.48 KV) | 5.91 GB (+0.96 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 4.36 GB | 5.17 GB (+0.22 KV) | 5.38 GB (+0.44 KV) | 5.82 GB (+0.88 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 4.36 GB | 5.08 GB (+0.13 KV) | 5.21 GB (+0.26 KV) | 5.47 GB (+0.53 KV) |
| Q2_K 2.63 bpw | FP32 | 2.93 GB | 4.39 GB (+0.88 KV) | 5.27 GB (+1.75 KV) | 7.02 GB (+3.5 KV) |
| Q2_K 2.63 bpw | FP16 | 2.93 GB | 3.96 GB (+0.44 KV) | 4.39 GB (+0.88 KV) | 5.27 GB (+1.75 KV) |
| Q2_K 2.63 bpw | Q8_0 | 2.93 GB | 3.76 GB (+0.24 KV) | 4.0 GB (+0.48 KV) | 4.48 GB (+0.96 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 2.93 GB | 3.74 GB (+0.22 KV) | 3.96 GB (+0.44 KV) | 4.39 GB (+0.88 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 2.93 GB | 3.65 GB (+0.13 KV) | 3.78 GB (+0.26 KV) | 4.04 GB (+0.53 KV) |
Total VRAM = Model Weights + KV Cache + 0.58 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.