VRAM usage for all quantization and cache format combinations. Base overhead: 1.5 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 1M Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 957.6 GB | 959.73 GB (+0.62 KV) | 960.35 GB (+1.25 KV) | 961.6 GB (+2.5 KV) | 964.1 GB (+5.0 KV) | 969.1 GB (+10.0 KV) | 1039.1 GB (+80.0 KV) |
| FP16 16.0 bpw | FP16 | 957.6 GB | 959.41 GB (+0.31 KV) | 959.73 GB (+0.62 KV) | 960.35 GB (+1.25 KV) | 961.6 GB (+2.5 KV) | 964.1 GB (+5.0 KV) | 999.1 GB (+40.0 KV) |
| FP16 16.0 bpw | Q8_0 | 957.6 GB | 959.27 GB (+0.17 KV) | 959.44 GB (+0.34 KV) | 959.79 GB (+0.69 KV) | 960.48 GB (+1.38 KV) | 961.85 GB (+2.75 KV) | 981.1 GB (+22.0 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 957.6 GB | 959.26 GB (+0.16 KV) | 959.41 GB (+0.31 KV) | 959.73 GB (+0.62 KV) | 960.35 GB (+1.25 KV) | 961.6 GB (+2.5 KV) | 979.1 GB (+20.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 957.6 GB | 959.19 GB (+0.09 KV) | 959.29 GB (+0.19 KV) | 959.48 GB (+0.38 KV) | 959.85 GB (+0.75 KV) | 960.6 GB (+1.5 KV) | 971.1 GB (+12.0 KV) |
| Q8_0 8.0 bpw | FP32 | 478.8 GB | 480.93 GB (+0.62 KV) | 481.55 GB (+1.25 KV) | 482.8 GB (+2.5 KV) | 485.3 GB (+5.0 KV) | 490.3 GB (+10.0 KV) | 560.3 GB (+80.0 KV) |
| Q8_0 8.0 bpw | FP16 | 478.8 GB | 480.61 GB (+0.31 KV) | 480.93 GB (+0.62 KV) | 481.55 GB (+1.25 KV) | 482.8 GB (+2.5 KV) | 485.3 GB (+5.0 KV) | 520.3 GB (+40.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 478.8 GB | 480.47 GB (+0.17 KV) | 480.64 GB (+0.34 KV) | 480.99 GB (+0.69 KV) | 481.68 GB (+1.38 KV) | 483.05 GB (+2.75 KV) | 502.3 GB (+22.0 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 478.8 GB | 480.46 GB (+0.16 KV) | 480.61 GB (+0.31 KV) | 480.93 GB (+0.62 KV) | 481.55 GB (+1.25 KV) | 482.8 GB (+2.5 KV) | 500.3 GB (+20.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 478.8 GB | 480.39 GB (+0.09 KV) | 480.49 GB (+0.19 KV) | 480.68 GB (+0.38 KV) | 481.05 GB (+0.75 KV) | 481.8 GB (+1.5 KV) | 492.3 GB (+12.0 KV) |
| Q4_K_M 4.65 bpw | FP32 | 278.3 GB | 280.43 GB (+0.62 KV) | 281.05 GB (+1.25 KV) | 282.3 GB (+2.5 KV) | 284.8 GB (+5.0 KV) | 289.8 GB (+10.0 KV) | 359.8 GB (+80.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 278.3 GB | 280.12 GB (+0.31 KV) | 280.43 GB (+0.62 KV) | 281.05 GB (+1.25 KV) | 282.3 GB (+2.5 KV) | 284.8 GB (+5.0 KV) | 319.8 GB (+40.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 278.3 GB | 279.97 GB (+0.17 KV) | 280.15 GB (+0.34 KV) | 280.49 GB (+0.69 KV) | 281.18 GB (+1.38 KV) | 282.55 GB (+2.75 KV) | 301.8 GB (+22.0 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 278.3 GB | 279.96 GB (+0.16 KV) | 280.12 GB (+0.31 KV) | 280.43 GB (+0.62 KV) | 281.05 GB (+1.25 KV) | 282.3 GB (+2.5 KV) | 299.8 GB (+20.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 278.3 GB | 279.9 GB (+0.09 KV) | 279.99 GB (+0.19 KV) | 280.18 GB (+0.38 KV) | 280.55 GB (+0.75 KV) | 281.3 GB (+1.5 KV) | 291.8 GB (+12.0 KV) |
| Q4_K_S 4.58 bpw | FP32 | 274.11 GB | 276.24 GB (+0.62 KV) | 276.86 GB (+1.25 KV) | 278.11 GB (+2.5 KV) | 280.61 GB (+5.0 KV) | 285.61 GB (+10.0 KV) | 355.61 GB (+80.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 274.11 GB | 275.93 GB (+0.31 KV) | 276.24 GB (+0.62 KV) | 276.86 GB (+1.25 KV) | 278.11 GB (+2.5 KV) | 280.61 GB (+5.0 KV) | 315.61 GB (+40.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 274.11 GB | 275.78 GB (+0.17 KV) | 275.96 GB (+0.34 KV) | 276.3 GB (+0.69 KV) | 276.99 GB (+1.38 KV) | 278.36 GB (+2.75 KV) | 297.61 GB (+22.0 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 274.11 GB | 275.77 GB (+0.16 KV) | 275.93 GB (+0.31 KV) | 276.24 GB (+0.62 KV) | 276.86 GB (+1.25 KV) | 278.11 GB (+2.5 KV) | 295.61 GB (+20.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 274.11 GB | 275.71 GB (+0.09 KV) | 275.8 GB (+0.19 KV) | 275.99 GB (+0.38 KV) | 276.36 GB (+0.75 KV) | 277.11 GB (+1.5 KV) | 287.61 GB (+12.0 KV) |
| Q3_K_M 3.91 bpw | FP32 | 234.01 GB | 236.14 GB (+0.62 KV) | 236.76 GB (+1.25 KV) | 238.01 GB (+2.5 KV) | 240.51 GB (+5.0 KV) | 245.51 GB (+10.0 KV) | 315.51 GB (+80.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 234.01 GB | 235.83 GB (+0.31 KV) | 236.14 GB (+0.62 KV) | 236.76 GB (+1.25 KV) | 238.01 GB (+2.5 KV) | 240.51 GB (+5.0 KV) | 275.51 GB (+40.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 234.01 GB | 235.69 GB (+0.17 KV) | 235.86 GB (+0.34 KV) | 236.2 GB (+0.69 KV) | 236.89 GB (+1.38 KV) | 238.26 GB (+2.75 KV) | 257.51 GB (+22.0 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 234.01 GB | 235.67 GB (+0.16 KV) | 235.83 GB (+0.31 KV) | 236.14 GB (+0.62 KV) | 236.76 GB (+1.25 KV) | 238.01 GB (+2.5 KV) | 255.51 GB (+20.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 234.01 GB | 235.61 GB (+0.09 KV) | 235.7 GB (+0.19 KV) | 235.89 GB (+0.38 KV) | 236.26 GB (+0.75 KV) | 237.01 GB (+1.5 KV) | 247.51 GB (+12.0 KV) |
| Q2_K 2.63 bpw | FP32 | 157.41 GB | 159.53 GB (+0.62 KV) | 160.16 GB (+1.25 KV) | 161.41 GB (+2.5 KV) | 163.91 GB (+5.0 KV) | 168.91 GB (+10.0 KV) | 238.91 GB (+80.0 KV) |
| Q2_K 2.63 bpw | FP16 | 157.41 GB | 159.22 GB (+0.31 KV) | 159.53 GB (+0.62 KV) | 160.16 GB (+1.25 KV) | 161.41 GB (+2.5 KV) | 163.91 GB (+5.0 KV) | 198.91 GB (+40.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 157.41 GB | 159.08 GB (+0.17 KV) | 159.25 GB (+0.34 KV) | 159.59 GB (+0.69 KV) | 160.28 GB (+1.38 KV) | 161.66 GB (+2.75 KV) | 180.91 GB (+22.0 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 157.41 GB | 159.06 GB (+0.16 KV) | 159.22 GB (+0.31 KV) | 159.53 GB (+0.62 KV) | 160.16 GB (+1.25 KV) | 161.41 GB (+2.5 KV) | 178.91 GB (+20.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 157.41 GB | 159.0 GB (+0.09 KV) | 159.09 GB (+0.19 KV) | 159.28 GB (+0.38 KV) | 159.66 GB (+0.75 KV) | 160.41 GB (+1.5 KV) | 170.91 GB (+12.0 KV) |
Total VRAM = Model Weights + KV Cache + 1.5 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.