VRAM usage for all quantization and cache format combinations. Base overhead: 0.54 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 262K Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 8.82 GB | 11.61 GB (+2.25 KV) | 13.86 GB (+4.5 KV) | 18.36 GB (+9.0 KV) | 27.36 GB (+18.0 KV) | 45.36 GB (+36.0 KV) | 81.36 GB (+72.0 KV) |
| FP16 16.0 bpw | FP16 | 8.82 GB | 10.49 GB (+1.12 KV) | 11.61 GB (+2.25 KV) | 13.86 GB (+4.5 KV) | 18.36 GB (+9.0 KV) | 27.36 GB (+18.0 KV) | 45.36 GB (+36.0 KV) |
| FP16 16.0 bpw | Q8_0 | 8.82 GB | 9.98 GB (+0.62 KV) | 10.6 GB (+1.24 KV) | 11.84 GB (+2.48 KV) | 14.31 GB (+4.95 KV) | 19.26 GB (+9.9 KV) | 29.16 GB (+19.8 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 8.82 GB | 9.92 GB (+0.56 KV) | 10.49 GB (+1.12 KV) | 11.61 GB (+2.25 KV) | 13.86 GB (+4.5 KV) | 18.36 GB (+9.0 KV) | 27.36 GB (+18.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 8.82 GB | 9.7 GB (+0.34 KV) | 10.04 GB (+0.67 KV) | 10.71 GB (+1.35 KV) | 12.06 GB (+2.7 KV) | 14.76 GB (+5.4 KV) | 20.16 GB (+10.8 KV) |
| Q8_0 8.0 bpw | FP32 | 4.41 GB | 7.2 GB (+2.25 KV) | 9.45 GB (+4.5 KV) | 13.95 GB (+9.0 KV) | 22.95 GB (+18.0 KV) | 40.95 GB (+36.0 KV) | 76.95 GB (+72.0 KV) |
| Q8_0 8.0 bpw | FP16 | 4.41 GB | 6.08 GB (+1.12 KV) | 7.2 GB (+2.25 KV) | 9.45 GB (+4.5 KV) | 13.95 GB (+9.0 KV) | 22.95 GB (+18.0 KV) | 40.95 GB (+36.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 4.41 GB | 5.57 GB (+0.62 KV) | 6.19 GB (+1.24 KV) | 7.43 GB (+2.48 KV) | 9.9 GB (+4.95 KV) | 14.85 GB (+9.9 KV) | 24.75 GB (+19.8 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 4.41 GB | 5.51 GB (+0.56 KV) | 6.08 GB (+1.12 KV) | 7.2 GB (+2.25 KV) | 9.45 GB (+4.5 KV) | 13.95 GB (+9.0 KV) | 22.95 GB (+18.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 4.41 GB | 5.29 GB (+0.34 KV) | 5.63 GB (+0.67 KV) | 6.3 GB (+1.35 KV) | 7.65 GB (+2.7 KV) | 10.35 GB (+5.4 KV) | 15.75 GB (+10.8 KV) |
| Q4_K_M 4.65 bpw | FP32 | 2.56 GB | 5.36 GB (+2.25 KV) | 7.61 GB (+4.5 KV) | 12.11 GB (+9.0 KV) | 21.11 GB (+18.0 KV) | 39.11 GB (+36.0 KV) | 75.11 GB (+72.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 2.56 GB | 4.23 GB (+1.12 KV) | 5.36 GB (+2.25 KV) | 7.61 GB (+4.5 KV) | 12.11 GB (+9.0 KV) | 21.11 GB (+18.0 KV) | 39.11 GB (+36.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 2.56 GB | 3.72 GB (+0.62 KV) | 4.34 GB (+1.24 KV) | 5.58 GB (+2.48 KV) | 8.06 GB (+4.95 KV) | 13.01 GB (+9.9 KV) | 22.91 GB (+19.8 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 2.56 GB | 3.67 GB (+0.56 KV) | 4.23 GB (+1.12 KV) | 5.36 GB (+2.25 KV) | 7.61 GB (+4.5 KV) | 12.11 GB (+9.0 KV) | 21.11 GB (+18.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 2.56 GB | 3.44 GB (+0.34 KV) | 3.78 GB (+0.67 KV) | 4.46 GB (+1.35 KV) | 5.81 GB (+2.7 KV) | 8.51 GB (+5.4 KV) | 13.91 GB (+10.8 KV) |
| Q4_K_S 4.58 bpw | FP32 | 2.52 GB | 5.32 GB (+2.25 KV) | 7.57 GB (+4.5 KV) | 12.07 GB (+9.0 KV) | 21.07 GB (+18.0 KV) | 39.07 GB (+36.0 KV) | 75.07 GB (+72.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 2.52 GB | 4.19 GB (+1.12 KV) | 5.32 GB (+2.25 KV) | 7.57 GB (+4.5 KV) | 12.07 GB (+9.0 KV) | 21.07 GB (+18.0 KV) | 39.07 GB (+36.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 2.52 GB | 3.69 GB (+0.62 KV) | 4.3 GB (+1.24 KV) | 5.54 GB (+2.48 KV) | 8.02 GB (+4.95 KV) | 12.97 GB (+9.9 KV) | 22.87 GB (+19.8 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 2.52 GB | 3.63 GB (+0.56 KV) | 4.19 GB (+1.12 KV) | 5.32 GB (+2.25 KV) | 7.57 GB (+4.5 KV) | 12.07 GB (+9.0 KV) | 21.07 GB (+18.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 2.52 GB | 3.4 GB (+0.34 KV) | 3.74 GB (+0.67 KV) | 4.42 GB (+1.35 KV) | 5.77 GB (+2.7 KV) | 8.47 GB (+5.4 KV) | 13.87 GB (+10.8 KV) |
| Q3_K_M 3.91 bpw | FP32 | 2.16 GB | 4.95 GB (+2.25 KV) | 7.2 GB (+4.5 KV) | 11.7 GB (+9.0 KV) | 20.7 GB (+18.0 KV) | 38.7 GB (+36.0 KV) | 74.7 GB (+72.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 2.16 GB | 3.82 GB (+1.12 KV) | 4.95 GB (+2.25 KV) | 7.2 GB (+4.5 KV) | 11.7 GB (+9.0 KV) | 20.7 GB (+18.0 KV) | 38.7 GB (+36.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 2.16 GB | 3.32 GB (+0.62 KV) | 3.93 GB (+1.24 KV) | 5.17 GB (+2.48 KV) | 7.65 GB (+4.95 KV) | 12.6 GB (+9.9 KV) | 22.5 GB (+19.8 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 2.16 GB | 3.26 GB (+0.56 KV) | 3.82 GB (+1.12 KV) | 4.95 GB (+2.25 KV) | 7.2 GB (+4.5 KV) | 11.7 GB (+9.0 KV) | 20.7 GB (+18.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 2.16 GB | 3.03 GB (+0.34 KV) | 3.37 GB (+0.67 KV) | 4.05 GB (+1.35 KV) | 5.4 GB (+2.7 KV) | 8.1 GB (+5.4 KV) | 13.5 GB (+10.8 KV) |
| Q2_K 2.63 bpw | FP32 | 1.45 GB | 4.24 GB (+2.25 KV) | 6.49 GB (+4.5 KV) | 10.99 GB (+9.0 KV) | 19.99 GB (+18.0 KV) | 37.99 GB (+36.0 KV) | 73.99 GB (+72.0 KV) |
| Q2_K 2.63 bpw | FP16 | 1.45 GB | 3.12 GB (+1.12 KV) | 4.24 GB (+2.25 KV) | 6.49 GB (+4.5 KV) | 10.99 GB (+9.0 KV) | 19.99 GB (+18.0 KV) | 37.99 GB (+36.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 1.45 GB | 2.61 GB (+0.62 KV) | 3.23 GB (+1.24 KV) | 4.47 GB (+2.48 KV) | 6.94 GB (+4.95 KV) | 11.89 GB (+9.9 KV) | 21.79 GB (+19.8 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 1.45 GB | 2.55 GB (+0.56 KV) | 3.12 GB (+1.12 KV) | 4.24 GB (+2.25 KV) | 6.49 GB (+4.5 KV) | 10.99 GB (+9.0 KV) | 19.99 GB (+18.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 1.45 GB | 2.33 GB (+0.34 KV) | 2.67 GB (+0.67 KV) | 3.34 GB (+1.35 KV) | 4.69 GB (+2.7 KV) | 7.39 GB (+5.4 KV) | 12.79 GB (+10.8 KV) |
Total VRAM = Model Weights + KV Cache + 0.54 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.