VRAM usage for all quantization and cache format combinations. Base overhead: 0.58 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 1M Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 15.96 GB | 17.41 GB (+0.88 KV) | 18.29 GB (+1.75 KV) | 20.04 GB (+3.5 KV) | 23.54 GB (+7.0 KV) | 30.54 GB (+14.0 KV) | 123.35 GB (+106.81 KV) |
| FP16 16.0 bpw | FP16 | 15.96 GB | 16.97 GB (+0.44 KV) | 17.41 GB (+0.88 KV) | 18.29 GB (+1.75 KV) | 20.04 GB (+3.5 KV) | 23.54 GB (+7.0 KV) | 69.94 GB (+53.41 KV) |
| FP16 16.0 bpw | Q8_0 | 15.96 GB | 16.78 GB (+0.24 KV) | 17.02 GB (+0.48 KV) | 17.5 GB (+0.96 KV) | 18.46 GB (+1.93 KV) | 20.39 GB (+3.85 KV) | 45.91 GB (+29.37 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 15.96 GB | 16.75 GB (+0.22 KV) | 16.97 GB (+0.44 KV) | 17.41 GB (+0.88 KV) | 18.29 GB (+1.75 KV) | 20.04 GB (+3.5 KV) | 43.24 GB (+26.7 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 15.96 GB | 16.67 GB (+0.13 KV) | 16.8 GB (+0.26 KV) | 17.06 GB (+0.53 KV) | 17.59 GB (+1.05 KV) | 18.64 GB (+2.1 KV) | 32.56 GB (+16.02 KV) |
| Q8_0 8.0 bpw | FP32 | 7.98 GB | 9.43 GB (+0.88 KV) | 10.31 GB (+1.75 KV) | 12.06 GB (+3.5 KV) | 15.56 GB (+7.0 KV) | 22.56 GB (+14.0 KV) | 115.37 GB (+106.81 KV) |
| Q8_0 8.0 bpw | FP16 | 7.98 GB | 8.99 GB (+0.44 KV) | 9.43 GB (+0.88 KV) | 10.31 GB (+1.75 KV) | 12.06 GB (+3.5 KV) | 15.56 GB (+7.0 KV) | 61.96 GB (+53.41 KV) |
| Q8_0 8.0 bpw | Q8_0 | 7.98 GB | 8.8 GB (+0.24 KV) | 9.04 GB (+0.48 KV) | 9.52 GB (+0.96 KV) | 10.48 GB (+1.93 KV) | 12.41 GB (+3.85 KV) | 37.93 GB (+29.37 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 7.98 GB | 8.77 GB (+0.22 KV) | 8.99 GB (+0.44 KV) | 9.43 GB (+0.88 KV) | 10.31 GB (+1.75 KV) | 12.06 GB (+3.5 KV) | 35.26 GB (+26.7 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 7.98 GB | 8.69 GB (+0.13 KV) | 8.82 GB (+0.26 KV) | 9.08 GB (+0.53 KV) | 9.61 GB (+1.05 KV) | 10.66 GB (+2.1 KV) | 24.58 GB (+16.02 KV) |
| Q4_K_M 4.65 bpw | FP32 | 4.64 GB | 6.09 GB (+0.88 KV) | 6.96 GB (+1.75 KV) | 8.71 GB (+3.5 KV) | 12.21 GB (+7.0 KV) | 19.21 GB (+14.0 KV) | 112.03 GB (+106.81 KV) |
| Q4_K_M 4.65 bpw | FP16 | 4.64 GB | 5.65 GB (+0.44 KV) | 6.09 GB (+0.88 KV) | 6.96 GB (+1.75 KV) | 8.71 GB (+3.5 KV) | 12.21 GB (+7.0 KV) | 58.62 GB (+53.41 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 4.64 GB | 5.46 GB (+0.24 KV) | 5.7 GB (+0.48 KV) | 6.18 GB (+0.96 KV) | 7.14 GB (+1.93 KV) | 9.06 GB (+3.85 KV) | 34.59 GB (+29.37 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 4.64 GB | 5.43 GB (+0.22 KV) | 5.65 GB (+0.44 KV) | 6.09 GB (+0.88 KV) | 6.96 GB (+1.75 KV) | 8.71 GB (+3.5 KV) | 31.92 GB (+26.7 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 4.64 GB | 5.35 GB (+0.13 KV) | 5.48 GB (+0.26 KV) | 5.74 GB (+0.53 KV) | 6.26 GB (+1.05 KV) | 7.31 GB (+2.1 KV) | 21.24 GB (+16.02 KV) |
| Q4_K_S 4.58 bpw | FP32 | 4.57 GB | 6.02 GB (+0.88 KV) | 6.89 GB (+1.75 KV) | 8.64 GB (+3.5 KV) | 12.14 GB (+7.0 KV) | 19.14 GB (+14.0 KV) | 111.96 GB (+106.81 KV) |
| Q4_K_S 4.58 bpw | FP16 | 4.57 GB | 5.58 GB (+0.44 KV) | 6.02 GB (+0.88 KV) | 6.89 GB (+1.75 KV) | 8.64 GB (+3.5 KV) | 12.14 GB (+7.0 KV) | 58.55 GB (+53.41 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 4.57 GB | 5.39 GB (+0.24 KV) | 5.63 GB (+0.48 KV) | 6.11 GB (+0.96 KV) | 7.07 GB (+1.93 KV) | 8.99 GB (+3.85 KV) | 34.52 GB (+29.37 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 4.57 GB | 5.36 GB (+0.22 KV) | 5.58 GB (+0.44 KV) | 6.02 GB (+0.88 KV) | 6.89 GB (+1.75 KV) | 8.64 GB (+3.5 KV) | 31.85 GB (+26.7 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 4.57 GB | 5.28 GB (+0.13 KV) | 5.41 GB (+0.26 KV) | 5.67 GB (+0.53 KV) | 6.19 GB (+1.05 KV) | 7.24 GB (+2.1 KV) | 21.17 GB (+16.02 KV) |
| Q3_K_M 3.91 bpw | FP32 | 3.9 GB | 5.35 GB (+0.88 KV) | 6.23 GB (+1.75 KV) | 7.98 GB (+3.5 KV) | 11.48 GB (+7.0 KV) | 18.48 GB (+14.0 KV) | 111.29 GB (+106.81 KV) |
| Q3_K_M 3.91 bpw | FP16 | 3.9 GB | 4.91 GB (+0.44 KV) | 5.35 GB (+0.88 KV) | 6.23 GB (+1.75 KV) | 7.98 GB (+3.5 KV) | 11.48 GB (+7.0 KV) | 57.88 GB (+53.41 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 3.9 GB | 4.72 GB (+0.24 KV) | 4.96 GB (+0.48 KV) | 5.44 GB (+0.96 KV) | 6.4 GB (+1.93 KV) | 8.33 GB (+3.85 KV) | 33.85 GB (+29.37 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 3.9 GB | 4.69 GB (+0.22 KV) | 4.91 GB (+0.44 KV) | 5.35 GB (+0.88 KV) | 6.23 GB (+1.75 KV) | 7.98 GB (+3.5 KV) | 31.18 GB (+26.7 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 3.9 GB | 4.61 GB (+0.13 KV) | 4.74 GB (+0.26 KV) | 5.0 GB (+0.53 KV) | 5.53 GB (+1.05 KV) | 6.58 GB (+2.1 KV) | 20.5 GB (+16.02 KV) |
| Q2_K 2.63 bpw | FP32 | 2.62 GB | 4.07 GB (+0.88 KV) | 4.95 GB (+1.75 KV) | 6.7 GB (+3.5 KV) | 10.2 GB (+7.0 KV) | 17.2 GB (+14.0 KV) | 110.01 GB (+106.81 KV) |
| Q2_K 2.63 bpw | FP16 | 2.62 GB | 3.64 GB (+0.44 KV) | 4.07 GB (+0.88 KV) | 4.95 GB (+1.75 KV) | 6.7 GB (+3.5 KV) | 10.2 GB (+7.0 KV) | 56.61 GB (+53.41 KV) |
| Q2_K 2.63 bpw | Q8_0 | 2.62 GB | 3.44 GB (+0.24 KV) | 3.68 GB (+0.48 KV) | 4.16 GB (+0.96 KV) | 5.12 GB (+1.93 KV) | 7.05 GB (+3.85 KV) | 32.57 GB (+29.37 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 2.62 GB | 3.42 GB (+0.22 KV) | 3.64 GB (+0.44 KV) | 4.07 GB (+0.88 KV) | 4.95 GB (+1.75 KV) | 6.7 GB (+3.5 KV) | 29.9 GB (+26.7 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 2.62 GB | 3.33 GB (+0.13 KV) | 3.46 GB (+0.26 KV) | 3.72 GB (+0.53 KV) | 4.25 GB (+1.05 KV) | 5.3 GB (+2.1 KV) | 19.22 GB (+16.02 KV) |
Total VRAM = Model Weights + KV Cache + 0.58 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.