VRAM usage for all quantization and cache format combinations. Base overhead: 1.5 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 262K Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 258.3 GB | 265.3 GB (+5.5 KV) | 270.8 GB (+11.0 KV) | 281.8 GB (+22.0 KV) | 303.8 GB (+44.0 KV) | 347.8 GB (+88.0 KV) | 435.8 GB (+176.0 KV) |
| FP16 16.0 bpw | FP16 | 258.3 GB | 262.55 GB (+2.75 KV) | 265.3 GB (+5.5 KV) | 270.8 GB (+11.0 KV) | 281.8 GB (+22.0 KV) | 303.8 GB (+44.0 KV) | 347.8 GB (+88.0 KV) |
| FP16 16.0 bpw | Q8_0 | 258.3 GB | 261.31 GB (+1.51 KV) | 262.82 GB (+3.03 KV) | 265.85 GB (+6.05 KV) | 271.9 GB (+12.1 KV) | 284.0 GB (+24.2 KV) | 308.2 GB (+48.4 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 258.3 GB | 261.18 GB (+1.38 KV) | 262.55 GB (+2.75 KV) | 265.3 GB (+5.5 KV) | 270.8 GB (+11.0 KV) | 281.8 GB (+22.0 KV) | 303.8 GB (+44.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 258.3 GB | 260.62 GB (+0.82 KV) | 261.45 GB (+1.65 KV) | 263.1 GB (+3.3 KV) | 266.4 GB (+6.6 KV) | 273.0 GB (+13.2 KV) | 286.2 GB (+26.4 KV) |
| Q8_0 8.0 bpw | FP32 | 129.15 GB | 136.15 GB (+5.5 KV) | 141.65 GB (+11.0 KV) | 152.65 GB (+22.0 KV) | 174.65 GB (+44.0 KV) | 218.65 GB (+88.0 KV) | 306.65 GB (+176.0 KV) |
| Q8_0 8.0 bpw | FP16 | 129.15 GB | 133.4 GB (+2.75 KV) | 136.15 GB (+5.5 KV) | 141.65 GB (+11.0 KV) | 152.65 GB (+22.0 KV) | 174.65 GB (+44.0 KV) | 218.65 GB (+88.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 129.15 GB | 132.16 GB (+1.51 KV) | 133.68 GB (+3.03 KV) | 136.7 GB (+6.05 KV) | 142.75 GB (+12.1 KV) | 154.85 GB (+24.2 KV) | 179.05 GB (+48.4 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 129.15 GB | 132.03 GB (+1.38 KV) | 133.4 GB (+2.75 KV) | 136.15 GB (+5.5 KV) | 141.65 GB (+11.0 KV) | 152.65 GB (+22.0 KV) | 174.65 GB (+44.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 129.15 GB | 131.47 GB (+0.82 KV) | 132.3 GB (+1.65 KV) | 133.95 GB (+3.3 KV) | 137.25 GB (+6.6 KV) | 143.85 GB (+13.2 KV) | 157.05 GB (+26.4 KV) |
| Q4_K_M 4.65 bpw | FP32 | 75.07 GB | 82.07 GB (+5.5 KV) | 87.57 GB (+11.0 KV) | 98.57 GB (+22.0 KV) | 120.57 GB (+44.0 KV) | 164.57 GB (+88.0 KV) | 252.57 GB (+176.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 75.07 GB | 79.32 GB (+2.75 KV) | 82.07 GB (+5.5 KV) | 87.57 GB (+11.0 KV) | 98.57 GB (+22.0 KV) | 120.57 GB (+44.0 KV) | 164.57 GB (+88.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 75.07 GB | 78.08 GB (+1.51 KV) | 79.59 GB (+3.03 KV) | 82.62 GB (+6.05 KV) | 88.67 GB (+12.1 KV) | 100.77 GB (+24.2 KV) | 124.97 GB (+48.4 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 75.07 GB | 77.94 GB (+1.38 KV) | 79.32 GB (+2.75 KV) | 82.07 GB (+5.5 KV) | 87.57 GB (+11.0 KV) | 98.57 GB (+22.0 KV) | 120.57 GB (+44.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 75.07 GB | 77.39 GB (+0.82 KV) | 78.22 GB (+1.65 KV) | 79.87 GB (+3.3 KV) | 83.17 GB (+6.6 KV) | 89.77 GB (+13.2 KV) | 102.97 GB (+26.4 KV) |
| Q4_K_S 4.58 bpw | FP32 | 73.94 GB | 80.94 GB (+5.5 KV) | 86.44 GB (+11.0 KV) | 97.44 GB (+22.0 KV) | 119.44 GB (+44.0 KV) | 163.44 GB (+88.0 KV) | 251.44 GB (+176.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 73.94 GB | 78.19 GB (+2.75 KV) | 80.94 GB (+5.5 KV) | 86.44 GB (+11.0 KV) | 97.44 GB (+22.0 KV) | 119.44 GB (+44.0 KV) | 163.44 GB (+88.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 73.94 GB | 76.95 GB (+1.51 KV) | 78.46 GB (+3.03 KV) | 81.49 GB (+6.05 KV) | 87.54 GB (+12.1 KV) | 99.64 GB (+24.2 KV) | 123.84 GB (+48.4 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 73.94 GB | 76.81 GB (+1.38 KV) | 78.19 GB (+2.75 KV) | 80.94 GB (+5.5 KV) | 86.44 GB (+11.0 KV) | 97.44 GB (+22.0 KV) | 119.44 GB (+44.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 73.94 GB | 76.26 GB (+0.82 KV) | 77.09 GB (+1.65 KV) | 78.74 GB (+3.3 KV) | 82.04 GB (+6.6 KV) | 88.64 GB (+13.2 KV) | 101.84 GB (+26.4 KV) |
| Q3_K_M 3.91 bpw | FP32 | 63.12 GB | 70.12 GB (+5.5 KV) | 75.62 GB (+11.0 KV) | 86.62 GB (+22.0 KV) | 108.62 GB (+44.0 KV) | 152.62 GB (+88.0 KV) | 240.62 GB (+176.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 63.12 GB | 67.37 GB (+2.75 KV) | 70.12 GB (+5.5 KV) | 75.62 GB (+11.0 KV) | 86.62 GB (+22.0 KV) | 108.62 GB (+44.0 KV) | 152.62 GB (+88.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 63.12 GB | 66.13 GB (+1.51 KV) | 67.65 GB (+3.03 KV) | 70.67 GB (+6.05 KV) | 76.72 GB (+12.1 KV) | 88.82 GB (+24.2 KV) | 113.02 GB (+48.4 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 63.12 GB | 66.0 GB (+1.38 KV) | 67.37 GB (+2.75 KV) | 70.12 GB (+5.5 KV) | 75.62 GB (+11.0 KV) | 86.62 GB (+22.0 KV) | 108.62 GB (+44.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 63.12 GB | 65.45 GB (+0.82 KV) | 66.27 GB (+1.65 KV) | 67.92 GB (+3.3 KV) | 71.22 GB (+6.6 KV) | 77.82 GB (+13.2 KV) | 91.02 GB (+26.4 KV) |
| Q2_K 2.63 bpw | FP32 | 42.46 GB | 49.46 GB (+5.5 KV) | 54.96 GB (+11.0 KV) | 65.96 GB (+22.0 KV) | 87.96 GB (+44.0 KV) | 131.96 GB (+88.0 KV) | 219.96 GB (+176.0 KV) |
| Q2_K 2.63 bpw | FP16 | 42.46 GB | 46.71 GB (+2.75 KV) | 49.46 GB (+5.5 KV) | 54.96 GB (+11.0 KV) | 65.96 GB (+22.0 KV) | 87.96 GB (+44.0 KV) | 131.96 GB (+88.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 42.46 GB | 45.47 GB (+1.51 KV) | 46.98 GB (+3.03 KV) | 50.01 GB (+6.05 KV) | 56.06 GB (+12.1 KV) | 68.16 GB (+24.2 KV) | 92.36 GB (+48.4 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 42.46 GB | 45.33 GB (+1.38 KV) | 46.71 GB (+2.75 KV) | 49.46 GB (+5.5 KV) | 54.96 GB (+11.0 KV) | 65.96 GB (+22.0 KV) | 87.96 GB (+44.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 42.46 GB | 44.78 GB (+0.82 KV) | 45.61 GB (+1.65 KV) | 47.26 GB (+3.3 KV) | 50.56 GB (+6.6 KV) | 57.16 GB (+13.2 KV) | 70.36 GB (+26.4 KV) |
Total VRAM = Model Weights + KV Cache + 1.5 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.