VRAM usage for all quantization and cache format combinations. Base overhead: 0.58 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 262K Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 17.64 GB | 20.35 GB (+2.12 KV) | 22.47 GB (+4.25 KV) | 26.72 GB (+8.5 KV) | 35.22 GB (+17.0 KV) | 52.22 GB (+34.0 KV) | 86.22 GB (+68.0 KV) |
| FP16 16.0 bpw | FP16 | 17.64 GB | 19.29 GB (+1.06 KV) | 20.35 GB (+2.12 KV) | 22.47 GB (+4.25 KV) | 26.72 GB (+8.5 KV) | 35.22 GB (+17.0 KV) | 52.22 GB (+34.0 KV) |
| FP16 16.0 bpw | Q8_0 | 17.64 GB | 18.81 GB (+0.58 KV) | 19.39 GB (+1.17 KV) | 20.56 GB (+2.34 KV) | 22.9 GB (+4.68 KV) | 27.57 GB (+9.35 KV) | 36.92 GB (+18.7 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 17.64 GB | 18.76 GB (+0.53 KV) | 19.29 GB (+1.06 KV) | 20.35 GB (+2.12 KV) | 22.47 GB (+4.25 KV) | 26.72 GB (+8.5 KV) | 35.22 GB (+17.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 17.64 GB | 18.54 GB (+0.32 KV) | 18.86 GB (+0.64 KV) | 19.5 GB (+1.27 KV) | 20.77 GB (+2.55 KV) | 23.32 GB (+5.1 KV) | 28.42 GB (+10.2 KV) |
| Q8_0 8.0 bpw | FP32 | 8.82 GB | 11.53 GB (+2.12 KV) | 13.65 GB (+4.25 KV) | 17.9 GB (+8.5 KV) | 26.4 GB (+17.0 KV) | 43.4 GB (+34.0 KV) | 77.4 GB (+68.0 KV) |
| Q8_0 8.0 bpw | FP16 | 8.82 GB | 10.47 GB (+1.06 KV) | 11.53 GB (+2.12 KV) | 13.65 GB (+4.25 KV) | 17.9 GB (+8.5 KV) | 26.4 GB (+17.0 KV) | 43.4 GB (+34.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 8.82 GB | 9.99 GB (+0.58 KV) | 10.57 GB (+1.17 KV) | 11.74 GB (+2.34 KV) | 14.08 GB (+4.68 KV) | 18.75 GB (+9.35 KV) | 28.1 GB (+18.7 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 8.82 GB | 9.94 GB (+0.53 KV) | 10.47 GB (+1.06 KV) | 11.53 GB (+2.12 KV) | 13.65 GB (+4.25 KV) | 17.9 GB (+8.5 KV) | 26.4 GB (+17.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 8.82 GB | 9.72 GB (+0.32 KV) | 10.04 GB (+0.64 KV) | 10.68 GB (+1.27 KV) | 11.95 GB (+2.55 KV) | 14.5 GB (+5.1 KV) | 19.6 GB (+10.2 KV) |
| Q4_K_M 4.65 bpw | FP32 | 5.13 GB | 7.84 GB (+2.12 KV) | 9.96 GB (+4.25 KV) | 14.21 GB (+8.5 KV) | 22.71 GB (+17.0 KV) | 39.71 GB (+34.0 KV) | 73.71 GB (+68.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 5.13 GB | 6.77 GB (+1.06 KV) | 7.84 GB (+2.12 KV) | 9.96 GB (+4.25 KV) | 14.21 GB (+8.5 KV) | 22.71 GB (+17.0 KV) | 39.71 GB (+34.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 5.13 GB | 6.29 GB (+0.58 KV) | 6.88 GB (+1.17 KV) | 8.05 GB (+2.34 KV) | 10.39 GB (+4.68 KV) | 15.06 GB (+9.35 KV) | 24.41 GB (+18.7 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 5.13 GB | 6.24 GB (+0.53 KV) | 6.77 GB (+1.06 KV) | 7.84 GB (+2.12 KV) | 9.96 GB (+4.25 KV) | 14.21 GB (+8.5 KV) | 22.71 GB (+17.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 5.13 GB | 6.03 GB (+0.32 KV) | 6.35 GB (+0.64 KV) | 6.99 GB (+1.27 KV) | 8.26 GB (+2.55 KV) | 10.81 GB (+5.1 KV) | 15.91 GB (+10.2 KV) |
| Q4_K_S 4.58 bpw | FP32 | 5.05 GB | 7.76 GB (+2.12 KV) | 9.88 GB (+4.25 KV) | 14.13 GB (+8.5 KV) | 22.63 GB (+17.0 KV) | 39.63 GB (+34.0 KV) | 73.63 GB (+68.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 5.05 GB | 6.7 GB (+1.06 KV) | 7.76 GB (+2.12 KV) | 9.88 GB (+4.25 KV) | 14.13 GB (+8.5 KV) | 22.63 GB (+17.0 KV) | 39.63 GB (+34.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 5.05 GB | 6.22 GB (+0.58 KV) | 6.8 GB (+1.17 KV) | 7.97 GB (+2.34 KV) | 10.31 GB (+4.68 KV) | 14.98 GB (+9.35 KV) | 24.33 GB (+18.7 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 5.05 GB | 6.16 GB (+0.53 KV) | 6.7 GB (+1.06 KV) | 7.76 GB (+2.12 KV) | 9.88 GB (+4.25 KV) | 14.13 GB (+8.5 KV) | 22.63 GB (+17.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 5.05 GB | 5.95 GB (+0.32 KV) | 6.27 GB (+0.64 KV) | 6.91 GB (+1.27 KV) | 8.18 GB (+2.55 KV) | 10.73 GB (+5.1 KV) | 15.83 GB (+10.2 KV) |
| Q3_K_M 3.91 bpw | FP32 | 4.31 GB | 7.02 GB (+2.12 KV) | 9.14 GB (+4.25 KV) | 13.39 GB (+8.5 KV) | 21.89 GB (+17.0 KV) | 38.89 GB (+34.0 KV) | 72.89 GB (+68.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 4.31 GB | 5.96 GB (+1.06 KV) | 7.02 GB (+2.12 KV) | 9.14 GB (+4.25 KV) | 13.39 GB (+8.5 KV) | 21.89 GB (+17.0 KV) | 38.89 GB (+34.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 4.31 GB | 5.48 GB (+0.58 KV) | 6.06 GB (+1.17 KV) | 7.23 GB (+2.34 KV) | 9.57 GB (+4.68 KV) | 14.24 GB (+9.35 KV) | 23.59 GB (+18.7 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 4.31 GB | 5.43 GB (+0.53 KV) | 5.96 GB (+1.06 KV) | 7.02 GB (+2.12 KV) | 9.14 GB (+4.25 KV) | 13.39 GB (+8.5 KV) | 21.89 GB (+17.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 4.31 GB | 5.21 GB (+0.32 KV) | 5.53 GB (+0.64 KV) | 6.17 GB (+1.27 KV) | 7.44 GB (+2.55 KV) | 9.99 GB (+5.1 KV) | 15.09 GB (+10.2 KV) |
| Q2_K 2.63 bpw | FP32 | 2.9 GB | 5.61 GB (+2.12 KV) | 7.73 GB (+4.25 KV) | 11.98 GB (+8.5 KV) | 20.48 GB (+17.0 KV) | 37.48 GB (+34.0 KV) | 71.48 GB (+68.0 KV) |
| Q2_K 2.63 bpw | FP16 | 2.9 GB | 4.55 GB (+1.06 KV) | 5.61 GB (+2.12 KV) | 7.73 GB (+4.25 KV) | 11.98 GB (+8.5 KV) | 20.48 GB (+17.0 KV) | 37.48 GB (+34.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 2.9 GB | 4.07 GB (+0.58 KV) | 4.65 GB (+1.17 KV) | 5.82 GB (+2.34 KV) | 8.16 GB (+4.68 KV) | 12.83 GB (+9.35 KV) | 22.18 GB (+18.7 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 2.9 GB | 4.01 GB (+0.53 KV) | 4.55 GB (+1.06 KV) | 5.61 GB (+2.12 KV) | 7.73 GB (+4.25 KV) | 11.98 GB (+8.5 KV) | 20.48 GB (+17.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 2.9 GB | 3.8 GB (+0.32 KV) | 4.12 GB (+0.64 KV) | 4.76 GB (+1.27 KV) | 6.03 GB (+2.55 KV) | 8.58 GB (+5.1 KV) | 13.68 GB (+10.2 KV) |
Total VRAM = Model Weights + KV Cache + 0.58 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.