VRAM usage for all quantization and cache format combinations. Base overhead: 1.23 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context |
|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 152.67 GB | 158.9 GB (+5.0 KV) | 163.9 GB (+10.0 KV) | 173.9 GB (+20.0 KV) | 193.9 GB (+40.0 KV) | 233.9 GB (+80.0 KV) |
| FP16 16.0 bpw | FP16 | 152.67 GB | 156.4 GB (+2.5 KV) | 158.9 GB (+5.0 KV) | 163.9 GB (+10.0 KV) | 173.9 GB (+20.0 KV) | 193.9 GB (+40.0 KV) |
| FP16 16.0 bpw | Q8_0 | 152.67 GB | 155.27 GB (+1.38 KV) | 156.65 GB (+2.75 KV) | 159.4 GB (+5.5 KV) | 164.9 GB (+11.0 KV) | 175.9 GB (+22.0 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 152.67 GB | 155.15 GB (+1.25 KV) | 156.4 GB (+2.5 KV) | 158.9 GB (+5.0 KV) | 163.9 GB (+10.0 KV) | 173.9 GB (+20.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 152.67 GB | 154.65 GB (+0.75 KV) | 155.4 GB (+1.5 KV) | 156.9 GB (+3.0 KV) | 159.9 GB (+6.0 KV) | 165.9 GB (+12.0 KV) |
| Q8_0 8.0 bpw | FP32 | 76.34 GB | 82.56 GB (+5.0 KV) | 87.56 GB (+10.0 KV) | 97.56 GB (+20.0 KV) | 117.56 GB (+40.0 KV) | 157.56 GB (+80.0 KV) |
| Q8_0 8.0 bpw | FP16 | 76.34 GB | 80.06 GB (+2.5 KV) | 82.56 GB (+5.0 KV) | 87.56 GB (+10.0 KV) | 97.56 GB (+20.0 KV) | 117.56 GB (+40.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 76.34 GB | 78.94 GB (+1.38 KV) | 80.31 GB (+2.75 KV) | 83.06 GB (+5.5 KV) | 88.56 GB (+11.0 KV) | 99.56 GB (+22.0 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 76.34 GB | 78.81 GB (+1.25 KV) | 80.06 GB (+2.5 KV) | 82.56 GB (+5.0 KV) | 87.56 GB (+10.0 KV) | 97.56 GB (+20.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 76.34 GB | 78.31 GB (+0.75 KV) | 79.06 GB (+1.5 KV) | 80.56 GB (+3.0 KV) | 83.56 GB (+6.0 KV) | 89.56 GB (+12.0 KV) |
| Q4_K_M 4.65 bpw | FP32 | 44.37 GB | 50.6 GB (+5.0 KV) | 55.6 GB (+10.0 KV) | 65.6 GB (+20.0 KV) | 85.6 GB (+40.0 KV) | 125.6 GB (+80.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 44.37 GB | 48.1 GB (+2.5 KV) | 50.6 GB (+5.0 KV) | 55.6 GB (+10.0 KV) | 65.6 GB (+20.0 KV) | 85.6 GB (+40.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 44.37 GB | 46.97 GB (+1.38 KV) | 48.35 GB (+2.75 KV) | 51.1 GB (+5.5 KV) | 56.6 GB (+11.0 KV) | 67.6 GB (+22.0 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 44.37 GB | 46.85 GB (+1.25 KV) | 48.1 GB (+2.5 KV) | 50.6 GB (+5.0 KV) | 55.6 GB (+10.0 KV) | 65.6 GB (+20.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 44.37 GB | 46.35 GB (+0.75 KV) | 47.1 GB (+1.5 KV) | 48.6 GB (+3.0 KV) | 51.6 GB (+6.0 KV) | 57.6 GB (+12.0 KV) |
| Q4_K_S 4.58 bpw | FP32 | 43.7 GB | 49.93 GB (+5.0 KV) | 54.93 GB (+10.0 KV) | 64.93 GB (+20.0 KV) | 84.93 GB (+40.0 KV) | 124.93 GB (+80.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 43.7 GB | 47.43 GB (+2.5 KV) | 49.93 GB (+5.0 KV) | 54.93 GB (+10.0 KV) | 64.93 GB (+20.0 KV) | 84.93 GB (+40.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 43.7 GB | 46.3 GB (+1.38 KV) | 47.68 GB (+2.75 KV) | 50.43 GB (+5.5 KV) | 55.93 GB (+11.0 KV) | 66.93 GB (+22.0 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 43.7 GB | 46.18 GB (+1.25 KV) | 47.43 GB (+2.5 KV) | 49.93 GB (+5.0 KV) | 54.93 GB (+10.0 KV) | 64.93 GB (+20.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 43.7 GB | 45.68 GB (+0.75 KV) | 46.43 GB (+1.5 KV) | 47.93 GB (+3.0 KV) | 50.93 GB (+6.0 KV) | 56.93 GB (+12.0 KV) |
| Q3_K_M 3.91 bpw | FP32 | 37.31 GB | 43.54 GB (+5.0 KV) | 48.54 GB (+10.0 KV) | 58.54 GB (+20.0 KV) | 78.54 GB (+40.0 KV) | 118.54 GB (+80.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 37.31 GB | 41.04 GB (+2.5 KV) | 43.54 GB (+5.0 KV) | 48.54 GB (+10.0 KV) | 58.54 GB (+20.0 KV) | 78.54 GB (+40.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 37.31 GB | 39.91 GB (+1.38 KV) | 41.29 GB (+2.75 KV) | 44.04 GB (+5.5 KV) | 49.54 GB (+11.0 KV) | 60.54 GB (+22.0 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 37.31 GB | 39.79 GB (+1.25 KV) | 41.04 GB (+2.5 KV) | 43.54 GB (+5.0 KV) | 48.54 GB (+10.0 KV) | 58.54 GB (+20.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 37.31 GB | 39.29 GB (+0.75 KV) | 40.04 GB (+1.5 KV) | 41.54 GB (+3.0 KV) | 44.54 GB (+6.0 KV) | 50.54 GB (+12.0 KV) |
| Q2_K 2.63 bpw | FP32 | 25.1 GB | 31.32 GB (+5.0 KV) | 36.32 GB (+10.0 KV) | 46.32 GB (+20.0 KV) | 66.32 GB (+40.0 KV) | 106.32 GB (+80.0 KV) |
| Q2_K 2.63 bpw | FP16 | 25.1 GB | 28.82 GB (+2.5 KV) | 31.32 GB (+5.0 KV) | 36.32 GB (+10.0 KV) | 46.32 GB (+20.0 KV) | 66.32 GB (+40.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 25.1 GB | 27.7 GB (+1.38 KV) | 29.07 GB (+2.75 KV) | 31.82 GB (+5.5 KV) | 37.32 GB (+11.0 KV) | 48.32 GB (+22.0 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 25.1 GB | 27.57 GB (+1.25 KV) | 28.82 GB (+2.5 KV) | 31.32 GB (+5.0 KV) | 36.32 GB (+10.0 KV) | 46.32 GB (+20.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 25.1 GB | 27.07 GB (+0.75 KV) | 27.82 GB (+1.5 KV) | 29.32 GB (+3.0 KV) | 32.32 GB (+6.0 KV) | 38.32 GB (+12.0 KV) |
Total VRAM = Model Weights + KV Cache + 1.23 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.