VRAM usage for all quantization and cache format combinations. Base overhead: 0.55 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 262K Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 9.66 GB | 12.46 GB (+2.25 KV) | 14.71 GB (+4.5 KV) | 19.21 GB (+9.0 KV) | 28.21 GB (+18.0 KV) | 46.21 GB (+36.0 KV) | 82.21 GB (+72.0 KV) |
| FP16 16.0 bpw | FP16 | 9.66 GB | 11.33 GB (+1.12 KV) | 12.46 GB (+2.25 KV) | 14.71 GB (+4.5 KV) | 19.21 GB (+9.0 KV) | 28.21 GB (+18.0 KV) | 46.21 GB (+36.0 KV) |
| FP16 16.0 bpw | Q8_0 | 9.66 GB | 10.82 GB (+0.62 KV) | 11.44 GB (+1.24 KV) | 12.68 GB (+2.48 KV) | 15.16 GB (+4.95 KV) | 20.11 GB (+9.9 KV) | 30.01 GB (+19.8 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 9.66 GB | 10.77 GB (+0.56 KV) | 11.33 GB (+1.12 KV) | 12.46 GB (+2.25 KV) | 14.71 GB (+4.5 KV) | 19.21 GB (+9.0 KV) | 28.21 GB (+18.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 9.66 GB | 10.54 GB (+0.34 KV) | 10.88 GB (+0.67 KV) | 11.56 GB (+1.35 KV) | 12.91 GB (+2.7 KV) | 15.61 GB (+5.4 KV) | 21.01 GB (+10.8 KV) |
| Q8_0 8.0 bpw | FP32 | 4.83 GB | 7.63 GB (+2.25 KV) | 9.88 GB (+4.5 KV) | 14.38 GB (+9.0 KV) | 23.38 GB (+18.0 KV) | 41.38 GB (+36.0 KV) | 77.38 GB (+72.0 KV) |
| Q8_0 8.0 bpw | FP16 | 4.83 GB | 6.5 GB (+1.12 KV) | 7.63 GB (+2.25 KV) | 9.88 GB (+4.5 KV) | 14.38 GB (+9.0 KV) | 23.38 GB (+18.0 KV) | 41.38 GB (+36.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 4.83 GB | 5.99 GB (+0.62 KV) | 6.61 GB (+1.24 KV) | 7.85 GB (+2.48 KV) | 10.33 GB (+4.95 KV) | 15.28 GB (+9.9 KV) | 25.18 GB (+19.8 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 4.83 GB | 5.94 GB (+0.56 KV) | 6.5 GB (+1.12 KV) | 7.63 GB (+2.25 KV) | 9.88 GB (+4.5 KV) | 14.38 GB (+9.0 KV) | 23.38 GB (+18.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 4.83 GB | 5.71 GB (+0.34 KV) | 6.05 GB (+0.67 KV) | 6.73 GB (+1.35 KV) | 8.08 GB (+2.7 KV) | 10.78 GB (+5.4 KV) | 16.18 GB (+10.8 KV) |
| Q4_K_M 4.65 bpw | FP32 | 2.81 GB | 5.6 GB (+2.25 KV) | 7.85 GB (+4.5 KV) | 12.35 GB (+9.0 KV) | 21.35 GB (+18.0 KV) | 39.35 GB (+36.0 KV) | 75.35 GB (+72.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 2.81 GB | 4.48 GB (+1.12 KV) | 5.6 GB (+2.25 KV) | 7.85 GB (+4.5 KV) | 12.35 GB (+9.0 KV) | 21.35 GB (+18.0 KV) | 39.35 GB (+36.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 2.81 GB | 3.97 GB (+0.62 KV) | 4.59 GB (+1.24 KV) | 5.83 GB (+2.48 KV) | 8.3 GB (+4.95 KV) | 13.25 GB (+9.9 KV) | 23.15 GB (+19.8 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 2.81 GB | 3.92 GB (+0.56 KV) | 4.48 GB (+1.12 KV) | 5.6 GB (+2.25 KV) | 7.85 GB (+4.5 KV) | 12.35 GB (+9.0 KV) | 21.35 GB (+18.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 2.81 GB | 3.69 GB (+0.34 KV) | 4.03 GB (+0.67 KV) | 4.7 GB (+1.35 KV) | 6.05 GB (+2.7 KV) | 8.75 GB (+5.4 KV) | 14.15 GB (+10.8 KV) |
| Q4_K_S 4.58 bpw | FP32 | 2.77 GB | 5.56 GB (+2.25 KV) | 7.81 GB (+4.5 KV) | 12.31 GB (+9.0 KV) | 21.31 GB (+18.0 KV) | 39.31 GB (+36.0 KV) | 75.31 GB (+72.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 2.77 GB | 4.44 GB (+1.12 KV) | 5.56 GB (+2.25 KV) | 7.81 GB (+4.5 KV) | 12.31 GB (+9.0 KV) | 21.31 GB (+18.0 KV) | 39.31 GB (+36.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 2.77 GB | 3.93 GB (+0.62 KV) | 4.55 GB (+1.24 KV) | 5.79 GB (+2.48 KV) | 8.26 GB (+4.95 KV) | 13.21 GB (+9.9 KV) | 23.11 GB (+19.8 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 2.77 GB | 3.87 GB (+0.56 KV) | 4.44 GB (+1.12 KV) | 5.56 GB (+2.25 KV) | 7.81 GB (+4.5 KV) | 12.31 GB (+9.0 KV) | 21.31 GB (+18.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 2.77 GB | 3.65 GB (+0.34 KV) | 3.99 GB (+0.67 KV) | 4.66 GB (+1.35 KV) | 6.01 GB (+2.7 KV) | 8.71 GB (+5.4 KV) | 14.11 GB (+10.8 KV) |
| Q3_K_M 3.91 bpw | FP32 | 2.36 GB | 5.16 GB (+2.25 KV) | 7.41 GB (+4.5 KV) | 11.91 GB (+9.0 KV) | 20.91 GB (+18.0 KV) | 38.91 GB (+36.0 KV) | 74.91 GB (+72.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 2.36 GB | 4.03 GB (+1.12 KV) | 5.16 GB (+2.25 KV) | 7.41 GB (+4.5 KV) | 11.91 GB (+9.0 KV) | 20.91 GB (+18.0 KV) | 38.91 GB (+36.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 2.36 GB | 3.53 GB (+0.62 KV) | 4.14 GB (+1.24 KV) | 5.38 GB (+2.48 KV) | 7.86 GB (+4.95 KV) | 12.81 GB (+9.9 KV) | 22.71 GB (+19.8 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 2.36 GB | 3.47 GB (+0.56 KV) | 4.03 GB (+1.12 KV) | 5.16 GB (+2.25 KV) | 7.41 GB (+4.5 KV) | 11.91 GB (+9.0 KV) | 20.91 GB (+18.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 2.36 GB | 3.24 GB (+0.34 KV) | 3.58 GB (+0.67 KV) | 4.26 GB (+1.35 KV) | 5.61 GB (+2.7 KV) | 8.31 GB (+5.4 KV) | 13.71 GB (+10.8 KV) |
| Q2_K 2.63 bpw | FP32 | 1.59 GB | 4.38 GB (+2.25 KV) | 6.63 GB (+4.5 KV) | 11.13 GB (+9.0 KV) | 20.13 GB (+18.0 KV) | 38.13 GB (+36.0 KV) | 74.13 GB (+72.0 KV) |
| Q2_K 2.63 bpw | FP16 | 1.59 GB | 3.26 GB (+1.12 KV) | 4.38 GB (+2.25 KV) | 6.63 GB (+4.5 KV) | 11.13 GB (+9.0 KV) | 20.13 GB (+18.0 KV) | 38.13 GB (+36.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 1.59 GB | 2.75 GB (+0.62 KV) | 3.37 GB (+1.24 KV) | 4.61 GB (+2.48 KV) | 7.08 GB (+4.95 KV) | 12.03 GB (+9.9 KV) | 21.93 GB (+19.8 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 1.59 GB | 2.7 GB (+0.56 KV) | 3.26 GB (+1.12 KV) | 4.38 GB (+2.25 KV) | 6.63 GB (+4.5 KV) | 11.13 GB (+9.0 KV) | 20.13 GB (+18.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 1.59 GB | 2.47 GB (+0.34 KV) | 2.81 GB (+0.67 KV) | 3.48 GB (+1.35 KV) | 4.83 GB (+2.7 KV) | 7.53 GB (+5.4 KV) | 12.93 GB (+10.8 KV) |
Total VRAM = Model Weights + KV Cache + 0.55 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.