VRAM usage for all quantization and cache format combinations. Base overhead: 1.21 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context |
|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 148.26 GB | 154.47 GB (+5.0 KV) | 159.47 GB (+10.0 KV) | 169.47 GB (+20.0 KV) | 189.47 GB (+40.0 KV) | 229.47 GB (+80.0 KV) |
| FP16 16.0 bpw | FP16 | 148.26 GB | 151.97 GB (+2.5 KV) | 154.47 GB (+5.0 KV) | 159.47 GB (+10.0 KV) | 169.47 GB (+20.0 KV) | 189.47 GB (+40.0 KV) |
| FP16 16.0 bpw | Q8_0 | 148.26 GB | 150.84 GB (+1.38 KV) | 152.22 GB (+2.75 KV) | 154.97 GB (+5.5 KV) | 160.47 GB (+11.0 KV) | 171.47 GB (+22.0 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 148.26 GB | 150.72 GB (+1.25 KV) | 151.97 GB (+2.5 KV) | 154.47 GB (+5.0 KV) | 159.47 GB (+10.0 KV) | 169.47 GB (+20.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 148.26 GB | 150.22 GB (+0.75 KV) | 150.97 GB (+1.5 KV) | 152.47 GB (+3.0 KV) | 155.47 GB (+6.0 KV) | 161.47 GB (+12.0 KV) |
| Q8_0 8.0 bpw | FP32 | 74.13 GB | 80.34 GB (+5.0 KV) | 85.34 GB (+10.0 KV) | 95.34 GB (+20.0 KV) | 115.34 GB (+40.0 KV) | 155.34 GB (+80.0 KV) |
| Q8_0 8.0 bpw | FP16 | 74.13 GB | 77.84 GB (+2.5 KV) | 80.34 GB (+5.0 KV) | 85.34 GB (+10.0 KV) | 95.34 GB (+20.0 KV) | 115.34 GB (+40.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 74.13 GB | 76.71 GB (+1.38 KV) | 78.09 GB (+2.75 KV) | 80.84 GB (+5.5 KV) | 86.34 GB (+11.0 KV) | 97.34 GB (+22.0 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 74.13 GB | 76.59 GB (+1.25 KV) | 77.84 GB (+2.5 KV) | 80.34 GB (+5.0 KV) | 85.34 GB (+10.0 KV) | 95.34 GB (+20.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 74.13 GB | 76.09 GB (+0.75 KV) | 76.84 GB (+1.5 KV) | 78.34 GB (+3.0 KV) | 81.34 GB (+6.0 KV) | 87.34 GB (+12.0 KV) |
| Q4_K_M 4.65 bpw | FP32 | 43.09 GB | 49.29 GB (+5.0 KV) | 54.29 GB (+10.0 KV) | 64.29 GB (+20.0 KV) | 84.29 GB (+40.0 KV) | 124.29 GB (+80.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 43.09 GB | 46.79 GB (+2.5 KV) | 49.29 GB (+5.0 KV) | 54.29 GB (+10.0 KV) | 64.29 GB (+20.0 KV) | 84.29 GB (+40.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 43.09 GB | 45.67 GB (+1.38 KV) | 47.04 GB (+2.75 KV) | 49.79 GB (+5.5 KV) | 55.29 GB (+11.0 KV) | 66.29 GB (+22.0 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 43.09 GB | 45.54 GB (+1.25 KV) | 46.79 GB (+2.5 KV) | 49.29 GB (+5.0 KV) | 54.29 GB (+10.0 KV) | 64.29 GB (+20.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 43.09 GB | 45.04 GB (+0.75 KV) | 45.79 GB (+1.5 KV) | 47.29 GB (+3.0 KV) | 50.29 GB (+6.0 KV) | 56.29 GB (+12.0 KV) |
| Q4_K_S 4.58 bpw | FP32 | 42.44 GB | 48.65 GB (+5.0 KV) | 53.65 GB (+10.0 KV) | 63.65 GB (+20.0 KV) | 83.65 GB (+40.0 KV) | 123.65 GB (+80.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 42.44 GB | 46.15 GB (+2.5 KV) | 48.65 GB (+5.0 KV) | 53.65 GB (+10.0 KV) | 63.65 GB (+20.0 KV) | 83.65 GB (+40.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 42.44 GB | 45.02 GB (+1.38 KV) | 46.4 GB (+2.75 KV) | 49.15 GB (+5.5 KV) | 54.65 GB (+11.0 KV) | 65.65 GB (+22.0 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 42.44 GB | 44.9 GB (+1.25 KV) | 46.15 GB (+2.5 KV) | 48.65 GB (+5.0 KV) | 53.65 GB (+10.0 KV) | 63.65 GB (+20.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 42.44 GB | 44.4 GB (+0.75 KV) | 45.15 GB (+1.5 KV) | 46.65 GB (+3.0 KV) | 49.65 GB (+6.0 KV) | 55.65 GB (+12.0 KV) |
| Q3_K_M 3.91 bpw | FP32 | 36.23 GB | 42.44 GB (+5.0 KV) | 47.44 GB (+10.0 KV) | 57.44 GB (+20.0 KV) | 77.44 GB (+40.0 KV) | 117.44 GB (+80.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 36.23 GB | 39.94 GB (+2.5 KV) | 42.44 GB (+5.0 KV) | 47.44 GB (+10.0 KV) | 57.44 GB (+20.0 KV) | 77.44 GB (+40.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 36.23 GB | 38.81 GB (+1.38 KV) | 40.19 GB (+2.75 KV) | 42.94 GB (+5.5 KV) | 48.44 GB (+11.0 KV) | 59.44 GB (+22.0 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 36.23 GB | 38.69 GB (+1.25 KV) | 39.94 GB (+2.5 KV) | 42.44 GB (+5.0 KV) | 47.44 GB (+10.0 KV) | 57.44 GB (+20.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 36.23 GB | 38.19 GB (+0.75 KV) | 38.94 GB (+1.5 KV) | 40.44 GB (+3.0 KV) | 43.44 GB (+6.0 KV) | 49.44 GB (+12.0 KV) |
| Q2_K 2.63 bpw | FP32 | 24.37 GB | 30.58 GB (+5.0 KV) | 35.58 GB (+10.0 KV) | 45.58 GB (+20.0 KV) | 65.58 GB (+40.0 KV) | 105.58 GB (+80.0 KV) |
| Q2_K 2.63 bpw | FP16 | 24.37 GB | 28.08 GB (+2.5 KV) | 30.58 GB (+5.0 KV) | 35.58 GB (+10.0 KV) | 45.58 GB (+20.0 KV) | 65.58 GB (+40.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 24.37 GB | 26.95 GB (+1.38 KV) | 28.33 GB (+2.75 KV) | 31.08 GB (+5.5 KV) | 36.58 GB (+11.0 KV) | 47.58 GB (+22.0 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 24.37 GB | 26.83 GB (+1.25 KV) | 28.08 GB (+2.5 KV) | 30.58 GB (+5.0 KV) | 35.58 GB (+10.0 KV) | 45.58 GB (+20.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 24.37 GB | 26.33 GB (+0.75 KV) | 27.08 GB (+1.5 KV) | 28.58 GB (+3.0 KV) | 31.58 GB (+6.0 KV) | 37.58 GB (+12.0 KV) |
Total VRAM = Model Weights + KV Cache + 1.21 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.