VRAM usage for all quantization and cache format combinations. Base overhead: 0.58 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context |
|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 17.22 GB | 20.05 GB (+2.25 KV) | 22.3 GB (+4.5 KV) | 26.8 GB (+9.0 KV) | 35.8 GB (+18.0 KV) | 53.8 GB (+36.0 KV) |
| FP16 16.0 bpw | FP16 | 17.22 GB | 18.93 GB (+1.12 KV) | 20.05 GB (+2.25 KV) | 22.3 GB (+4.5 KV) | 26.8 GB (+9.0 KV) | 35.8 GB (+18.0 KV) |
| FP16 16.0 bpw | Q8_0 | 17.22 GB | 18.42 GB (+0.62 KV) | 19.04 GB (+1.24 KV) | 20.28 GB (+2.48 KV) | 22.75 GB (+4.95 KV) | 27.7 GB (+9.9 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 17.22 GB | 18.36 GB (+0.56 KV) | 18.93 GB (+1.12 KV) | 20.05 GB (+2.25 KV) | 22.3 GB (+4.5 KV) | 26.8 GB (+9.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 17.22 GB | 18.14 GB (+0.34 KV) | 18.48 GB (+0.67 KV) | 19.15 GB (+1.35 KV) | 20.5 GB (+2.7 KV) | 23.2 GB (+5.4 KV) |
| Q8_0 8.0 bpw | FP32 | 8.61 GB | 11.44 GB (+2.25 KV) | 13.69 GB (+4.5 KV) | 18.19 GB (+9.0 KV) | 27.19 GB (+18.0 KV) | 45.19 GB (+36.0 KV) |
| Q8_0 8.0 bpw | FP16 | 8.61 GB | 10.32 GB (+1.12 KV) | 11.44 GB (+2.25 KV) | 13.69 GB (+4.5 KV) | 18.19 GB (+9.0 KV) | 27.19 GB (+18.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 8.61 GB | 9.81 GB (+0.62 KV) | 10.43 GB (+1.24 KV) | 11.67 GB (+2.48 KV) | 14.14 GB (+4.95 KV) | 19.09 GB (+9.9 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 8.61 GB | 9.75 GB (+0.56 KV) | 10.32 GB (+1.12 KV) | 11.44 GB (+2.25 KV) | 13.69 GB (+4.5 KV) | 18.19 GB (+9.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 8.61 GB | 9.53 GB (+0.34 KV) | 9.87 GB (+0.67 KV) | 10.54 GB (+1.35 KV) | 11.89 GB (+2.7 KV) | 14.59 GB (+5.4 KV) |
| Q4_K_M 4.65 bpw | FP32 | 5.0 GB | 7.84 GB (+2.25 KV) | 10.09 GB (+4.5 KV) | 14.59 GB (+9.0 KV) | 23.59 GB (+18.0 KV) | 41.59 GB (+36.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 5.0 GB | 6.71 GB (+1.12 KV) | 7.84 GB (+2.25 KV) | 10.09 GB (+4.5 KV) | 14.59 GB (+9.0 KV) | 23.59 GB (+18.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 5.0 GB | 6.21 GB (+0.62 KV) | 6.82 GB (+1.24 KV) | 8.06 GB (+2.48 KV) | 10.54 GB (+4.95 KV) | 15.49 GB (+9.9 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 5.0 GB | 6.15 GB (+0.56 KV) | 6.71 GB (+1.12 KV) | 7.84 GB (+2.25 KV) | 10.09 GB (+4.5 KV) | 14.59 GB (+9.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 5.0 GB | 5.92 GB (+0.34 KV) | 6.26 GB (+0.67 KV) | 6.94 GB (+1.35 KV) | 8.29 GB (+2.7 KV) | 10.99 GB (+5.4 KV) |
| Q4_K_S 4.58 bpw | FP32 | 4.93 GB | 7.76 GB (+2.25 KV) | 10.01 GB (+4.5 KV) | 14.51 GB (+9.0 KV) | 23.51 GB (+18.0 KV) | 41.51 GB (+36.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 4.93 GB | 6.64 GB (+1.12 KV) | 7.76 GB (+2.25 KV) | 10.01 GB (+4.5 KV) | 14.51 GB (+9.0 KV) | 23.51 GB (+18.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 4.93 GB | 6.13 GB (+0.62 KV) | 6.75 GB (+1.24 KV) | 7.99 GB (+2.48 KV) | 10.46 GB (+4.95 KV) | 15.41 GB (+9.9 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 4.93 GB | 6.07 GB (+0.56 KV) | 6.64 GB (+1.12 KV) | 7.76 GB (+2.25 KV) | 10.01 GB (+4.5 KV) | 14.51 GB (+9.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 4.93 GB | 5.85 GB (+0.34 KV) | 6.19 GB (+0.67 KV) | 6.86 GB (+1.35 KV) | 8.21 GB (+2.7 KV) | 10.91 GB (+5.4 KV) |
| Q3_K_M 3.91 bpw | FP32 | 4.21 GB | 7.04 GB (+2.25 KV) | 9.29 GB (+4.5 KV) | 13.79 GB (+9.0 KV) | 22.79 GB (+18.0 KV) | 40.79 GB (+36.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 4.21 GB | 5.92 GB (+1.12 KV) | 7.04 GB (+2.25 KV) | 9.29 GB (+4.5 KV) | 13.79 GB (+9.0 KV) | 22.79 GB (+18.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 4.21 GB | 5.41 GB (+0.62 KV) | 6.03 GB (+1.24 KV) | 7.27 GB (+2.48 KV) | 9.74 GB (+4.95 KV) | 14.69 GB (+9.9 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 4.21 GB | 5.35 GB (+0.56 KV) | 5.92 GB (+1.12 KV) | 7.04 GB (+2.25 KV) | 9.29 GB (+4.5 KV) | 13.79 GB (+9.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 4.21 GB | 5.13 GB (+0.34 KV) | 5.47 GB (+0.67 KV) | 6.14 GB (+1.35 KV) | 7.49 GB (+2.7 KV) | 10.19 GB (+5.4 KV) |
| Q2_K 2.63 bpw | FP32 | 2.83 GB | 5.66 GB (+2.25 KV) | 7.91 GB (+4.5 KV) | 12.41 GB (+9.0 KV) | 21.41 GB (+18.0 KV) | 39.41 GB (+36.0 KV) |
| Q2_K 2.63 bpw | FP16 | 2.83 GB | 4.54 GB (+1.12 KV) | 5.66 GB (+2.25 KV) | 7.91 GB (+4.5 KV) | 12.41 GB (+9.0 KV) | 21.41 GB (+18.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 2.83 GB | 4.03 GB (+0.62 KV) | 4.65 GB (+1.24 KV) | 5.89 GB (+2.48 KV) | 8.36 GB (+4.95 KV) | 13.31 GB (+9.9 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 2.83 GB | 3.98 GB (+0.56 KV) | 4.54 GB (+1.12 KV) | 5.66 GB (+2.25 KV) | 7.91 GB (+4.5 KV) | 12.41 GB (+9.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 2.83 GB | 3.75 GB (+0.34 KV) | 4.09 GB (+0.67 KV) | 4.76 GB (+1.35 KV) | 6.11 GB (+2.7 KV) | 8.81 GB (+5.4 KV) |
Total VRAM = Model Weights + KV Cache + 0.58 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.