VRAM usage for all quantization and cache format combinations. Base overhead: 0.65 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context |
|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 30.87 GB | 34.52 GB (+3.0 KV) | 37.52 GB (+6.0 KV) | 43.52 GB (+12.0 KV) | 55.52 GB (+24.0 KV) | 79.52 GB (+48.0 KV) |
| FP16 16.0 bpw | FP16 | 30.87 GB | 33.02 GB (+1.5 KV) | 34.52 GB (+3.0 KV) | 37.52 GB (+6.0 KV) | 43.52 GB (+12.0 KV) | 55.52 GB (+24.0 KV) |
| FP16 16.0 bpw | Q8_0 | 30.87 GB | 32.34 GB (+0.83 KV) | 33.17 GB (+1.65 KV) | 34.82 GB (+3.3 KV) | 38.12 GB (+6.6 KV) | 44.72 GB (+13.2 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 30.87 GB | 32.27 GB (+0.75 KV) | 33.02 GB (+1.5 KV) | 34.52 GB (+3.0 KV) | 37.52 GB (+6.0 KV) | 43.52 GB (+12.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 30.87 GB | 31.97 GB (+0.45 KV) | 32.42 GB (+0.9 KV) | 33.32 GB (+1.8 KV) | 35.12 GB (+3.6 KV) | 38.72 GB (+7.2 KV) |
| Q8_0 8.0 bpw | FP32 | 15.44 GB | 19.08 GB (+3.0 KV) | 22.08 GB (+6.0 KV) | 28.08 GB (+12.0 KV) | 40.08 GB (+24.0 KV) | 64.08 GB (+48.0 KV) |
| Q8_0 8.0 bpw | FP16 | 15.44 GB | 17.58 GB (+1.5 KV) | 19.08 GB (+3.0 KV) | 22.08 GB (+6.0 KV) | 28.08 GB (+12.0 KV) | 40.08 GB (+24.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 15.44 GB | 16.91 GB (+0.83 KV) | 17.73 GB (+1.65 KV) | 19.38 GB (+3.3 KV) | 22.68 GB (+6.6 KV) | 29.28 GB (+13.2 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 15.44 GB | 16.83 GB (+0.75 KV) | 17.58 GB (+1.5 KV) | 19.08 GB (+3.0 KV) | 22.08 GB (+6.0 KV) | 28.08 GB (+12.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 15.44 GB | 16.53 GB (+0.45 KV) | 16.98 GB (+0.9 KV) | 17.88 GB (+1.8 KV) | 19.68 GB (+3.6 KV) | 23.28 GB (+7.2 KV) |
| Q4_K_M 4.65 bpw | FP32 | 8.97 GB | 12.62 GB (+3.0 KV) | 15.62 GB (+6.0 KV) | 21.62 GB (+12.0 KV) | 33.62 GB (+24.0 KV) | 57.62 GB (+48.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 8.97 GB | 11.12 GB (+1.5 KV) | 12.62 GB (+3.0 KV) | 15.62 GB (+6.0 KV) | 21.62 GB (+12.0 KV) | 33.62 GB (+24.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 8.97 GB | 10.44 GB (+0.83 KV) | 11.27 GB (+1.65 KV) | 12.92 GB (+3.3 KV) | 16.22 GB (+6.6 KV) | 22.82 GB (+13.2 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 8.97 GB | 10.37 GB (+0.75 KV) | 11.12 GB (+1.5 KV) | 12.62 GB (+3.0 KV) | 15.62 GB (+6.0 KV) | 21.62 GB (+12.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 8.97 GB | 10.07 GB (+0.45 KV) | 10.52 GB (+0.9 KV) | 11.42 GB (+1.8 KV) | 13.22 GB (+3.6 KV) | 16.82 GB (+7.2 KV) |
| Q4_K_S 4.58 bpw | FP32 | 8.84 GB | 12.48 GB (+3.0 KV) | 15.48 GB (+6.0 KV) | 21.48 GB (+12.0 KV) | 33.48 GB (+24.0 KV) | 57.48 GB (+48.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 8.84 GB | 10.98 GB (+1.5 KV) | 12.48 GB (+3.0 KV) | 15.48 GB (+6.0 KV) | 21.48 GB (+12.0 KV) | 33.48 GB (+24.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 8.84 GB | 10.31 GB (+0.83 KV) | 11.13 GB (+1.65 KV) | 12.78 GB (+3.3 KV) | 16.08 GB (+6.6 KV) | 22.68 GB (+13.2 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 8.84 GB | 10.23 GB (+0.75 KV) | 10.98 GB (+1.5 KV) | 12.48 GB (+3.0 KV) | 15.48 GB (+6.0 KV) | 21.48 GB (+12.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 8.84 GB | 9.93 GB (+0.45 KV) | 10.38 GB (+0.9 KV) | 11.28 GB (+1.8 KV) | 13.08 GB (+3.6 KV) | 16.68 GB (+7.2 KV) |
| Q3_K_M 3.91 bpw | FP32 | 7.54 GB | 11.19 GB (+3.0 KV) | 14.19 GB (+6.0 KV) | 20.19 GB (+12.0 KV) | 32.19 GB (+24.0 KV) | 56.19 GB (+48.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 7.54 GB | 9.69 GB (+1.5 KV) | 11.19 GB (+3.0 KV) | 14.19 GB (+6.0 KV) | 20.19 GB (+12.0 KV) | 32.19 GB (+24.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 7.54 GB | 9.02 GB (+0.83 KV) | 9.84 GB (+1.65 KV) | 11.49 GB (+3.3 KV) | 14.79 GB (+6.6 KV) | 21.39 GB (+13.2 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 7.54 GB | 8.94 GB (+0.75 KV) | 9.69 GB (+1.5 KV) | 11.19 GB (+3.0 KV) | 14.19 GB (+6.0 KV) | 20.19 GB (+12.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 7.54 GB | 8.64 GB (+0.45 KV) | 9.09 GB (+0.9 KV) | 9.99 GB (+1.8 KV) | 11.79 GB (+3.6 KV) | 15.39 GB (+7.2 KV) |
| Q2_K 2.63 bpw | FP32 | 5.07 GB | 8.72 GB (+3.0 KV) | 11.72 GB (+6.0 KV) | 17.72 GB (+12.0 KV) | 29.72 GB (+24.0 KV) | 53.72 GB (+48.0 KV) |
| Q2_K 2.63 bpw | FP16 | 5.07 GB | 7.22 GB (+1.5 KV) | 8.72 GB (+3.0 KV) | 11.72 GB (+6.0 KV) | 17.72 GB (+12.0 KV) | 29.72 GB (+24.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 5.07 GB | 6.55 GB (+0.83 KV) | 7.37 GB (+1.65 KV) | 9.02 GB (+3.3 KV) | 12.32 GB (+6.6 KV) | 18.92 GB (+13.2 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 5.07 GB | 6.47 GB (+0.75 KV) | 7.22 GB (+1.5 KV) | 8.72 GB (+3.0 KV) | 11.72 GB (+6.0 KV) | 17.72 GB (+12.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 5.07 GB | 6.17 GB (+0.45 KV) | 6.62 GB (+0.9 KV) | 7.52 GB (+1.8 KV) | 9.32 GB (+3.6 KV) | 12.92 GB (+7.2 KV) |
Total VRAM = Model Weights + KV Cache + 0.65 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.