VRAM usage for all quantization and cache format combinations. Base overhead: 0.58 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 128K Context |
|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 17.43 GB | 18.89 GB (+0.88 KV) | 19.76 GB (+1.75 KV) | 21.51 GB (+3.5 KV) | 25.01 GB (+7.0 KV) | 31.68 GB (+13.67 KV) |
| FP16 16.0 bpw | FP16 | 17.43 GB | 18.45 GB (+0.44 KV) | 18.89 GB (+0.88 KV) | 19.76 GB (+1.75 KV) | 21.51 GB (+3.5 KV) | 24.85 GB (+6.84 KV) |
| FP16 16.0 bpw | Q8_0 | 17.43 GB | 18.25 GB (+0.24 KV) | 18.49 GB (+0.48 KV) | 18.98 GB (+0.96 KV) | 19.94 GB (+1.93 KV) | 21.77 GB (+3.76 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 17.43 GB | 18.23 GB (+0.22 KV) | 18.45 GB (+0.44 KV) | 18.89 GB (+0.88 KV) | 19.76 GB (+1.75 KV) | 21.43 GB (+3.42 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 17.43 GB | 18.14 GB (+0.13 KV) | 18.28 GB (+0.26 KV) | 18.54 GB (+0.53 KV) | 19.06 GB (+1.05 KV) | 20.06 GB (+2.05 KV) |
| Q8_0 8.0 bpw | FP32 | 8.72 GB | 10.17 GB (+0.88 KV) | 11.05 GB (+1.75 KV) | 12.8 GB (+3.5 KV) | 16.3 GB (+7.0 KV) | 22.97 GB (+13.67 KV) |
| Q8_0 8.0 bpw | FP16 | 8.72 GB | 9.74 GB (+0.44 KV) | 10.17 GB (+0.88 KV) | 11.05 GB (+1.75 KV) | 12.8 GB (+3.5 KV) | 16.13 GB (+6.84 KV) |
| Q8_0 8.0 bpw | Q8_0 | 8.72 GB | 9.54 GB (+0.24 KV) | 9.78 GB (+0.48 KV) | 10.26 GB (+0.96 KV) | 11.22 GB (+1.93 KV) | 13.06 GB (+3.76 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 8.72 GB | 9.52 GB (+0.22 KV) | 9.74 GB (+0.44 KV) | 10.17 GB (+0.88 KV) | 11.05 GB (+1.75 KV) | 12.72 GB (+3.42 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 8.72 GB | 9.43 GB (+0.13 KV) | 9.56 GB (+0.26 KV) | 9.82 GB (+0.53 KV) | 10.35 GB (+1.05 KV) | 11.35 GB (+2.05 KV) |
| Q4_K_M 4.65 bpw | FP32 | 5.07 GB | 6.52 GB (+0.88 KV) | 7.4 GB (+1.75 KV) | 9.15 GB (+3.5 KV) | 12.65 GB (+7.0 KV) | 19.32 GB (+13.67 KV) |
| Q4_K_M 4.65 bpw | FP16 | 5.07 GB | 6.09 GB (+0.44 KV) | 6.52 GB (+0.88 KV) | 7.4 GB (+1.75 KV) | 9.15 GB (+3.5 KV) | 12.48 GB (+6.84 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 5.07 GB | 5.89 GB (+0.24 KV) | 6.13 GB (+0.48 KV) | 6.61 GB (+0.96 KV) | 7.57 GB (+1.93 KV) | 9.41 GB (+3.76 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 5.07 GB | 5.87 GB (+0.22 KV) | 6.09 GB (+0.44 KV) | 6.52 GB (+0.88 KV) | 7.4 GB (+1.75 KV) | 9.07 GB (+3.42 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 5.07 GB | 5.78 GB (+0.13 KV) | 5.91 GB (+0.26 KV) | 6.17 GB (+0.53 KV) | 6.7 GB (+1.05 KV) | 7.7 GB (+2.05 KV) |
| Q4_K_S 4.58 bpw | FP32 | 4.99 GB | 6.45 GB (+0.88 KV) | 7.32 GB (+1.75 KV) | 9.07 GB (+3.5 KV) | 12.57 GB (+7.0 KV) | 19.24 GB (+13.67 KV) |
| Q4_K_S 4.58 bpw | FP16 | 4.99 GB | 6.01 GB (+0.44 KV) | 6.45 GB (+0.88 KV) | 7.32 GB (+1.75 KV) | 9.07 GB (+3.5 KV) | 12.41 GB (+6.84 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 4.99 GB | 5.81 GB (+0.24 KV) | 6.05 GB (+0.48 KV) | 6.53 GB (+0.96 KV) | 7.5 GB (+1.93 KV) | 9.33 GB (+3.76 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 4.99 GB | 5.79 GB (+0.22 KV) | 6.01 GB (+0.44 KV) | 6.45 GB (+0.88 KV) | 7.32 GB (+1.75 KV) | 8.99 GB (+3.42 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 4.99 GB | 5.7 GB (+0.13 KV) | 5.83 GB (+0.26 KV) | 6.1 GB (+0.53 KV) | 6.62 GB (+1.05 KV) | 7.62 GB (+2.05 KV) |
| Q3_K_M 3.91 bpw | FP32 | 4.26 GB | 5.72 GB (+0.88 KV) | 6.59 GB (+1.75 KV) | 8.34 GB (+3.5 KV) | 11.84 GB (+7.0 KV) | 18.51 GB (+13.67 KV) |
| Q3_K_M 3.91 bpw | FP16 | 4.26 GB | 5.28 GB (+0.44 KV) | 5.72 GB (+0.88 KV) | 6.59 GB (+1.75 KV) | 8.34 GB (+3.5 KV) | 11.68 GB (+6.84 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 4.26 GB | 5.08 GB (+0.24 KV) | 5.32 GB (+0.48 KV) | 5.8 GB (+0.96 KV) | 6.77 GB (+1.93 KV) | 8.6 GB (+3.76 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 4.26 GB | 5.06 GB (+0.22 KV) | 5.28 GB (+0.44 KV) | 5.72 GB (+0.88 KV) | 6.59 GB (+1.75 KV) | 8.26 GB (+3.42 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 4.26 GB | 4.97 GB (+0.13 KV) | 5.1 GB (+0.26 KV) | 5.37 GB (+0.53 KV) | 5.89 GB (+1.05 KV) | 6.89 GB (+2.05 KV) |
| Q2_K 2.63 bpw | FP32 | 2.87 GB | 4.32 GB (+0.88 KV) | 5.2 GB (+1.75 KV) | 6.95 GB (+3.5 KV) | 10.45 GB (+7.0 KV) | 17.12 GB (+13.67 KV) |
| Q2_K 2.63 bpw | FP16 | 2.87 GB | 3.89 GB (+0.44 KV) | 4.32 GB (+0.88 KV) | 5.2 GB (+1.75 KV) | 6.95 GB (+3.5 KV) | 10.28 GB (+6.84 KV) |
| Q2_K 2.63 bpw | Q8_0 | 2.87 GB | 3.69 GB (+0.24 KV) | 3.93 GB (+0.48 KV) | 4.41 GB (+0.96 KV) | 5.37 GB (+1.93 KV) | 7.21 GB (+3.76 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 2.87 GB | 3.67 GB (+0.22 KV) | 3.89 GB (+0.44 KV) | 4.32 GB (+0.88 KV) | 5.2 GB (+1.75 KV) | 6.87 GB (+3.42 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 2.87 GB | 3.58 GB (+0.13 KV) | 3.71 GB (+0.26 KV) | 3.97 GB (+0.53 KV) | 4.5 GB (+1.05 KV) | 5.5 GB (+2.05 KV) |
Total VRAM = Model Weights + KV Cache + 0.58 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.