VRAM usage for all quantization and cache format combinations. Base overhead: 1.23 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 128K Context |
|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 154.35 GB | 160.59 GB (+5.0 KV) | 165.59 GB (+10.0 KV) | 175.59 GB (+20.0 KV) | 195.59 GB (+40.0 KV) | 233.71 GB (+78.12 KV) |
| FP16 16.0 bpw | FP16 | 154.35 GB | 158.09 GB (+2.5 KV) | 160.59 GB (+5.0 KV) | 165.59 GB (+10.0 KV) | 175.59 GB (+20.0 KV) | 194.65 GB (+39.06 KV) |
| FP16 16.0 bpw | Q8_0 | 154.35 GB | 156.96 GB (+1.38 KV) | 158.34 GB (+2.75 KV) | 161.09 GB (+5.5 KV) | 166.59 GB (+11.0 KV) | 177.07 GB (+21.48 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 154.35 GB | 156.84 GB (+1.25 KV) | 158.09 GB (+2.5 KV) | 160.59 GB (+5.0 KV) | 165.59 GB (+10.0 KV) | 175.12 GB (+19.53 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 154.35 GB | 156.34 GB (+0.75 KV) | 157.09 GB (+1.5 KV) | 158.59 GB (+3.0 KV) | 161.59 GB (+6.0 KV) | 167.3 GB (+11.72 KV) |
| Q8_0 8.0 bpw | FP32 | 77.17 GB | 83.41 GB (+5.0 KV) | 88.41 GB (+10.0 KV) | 98.41 GB (+20.0 KV) | 118.41 GB (+40.0 KV) | 156.54 GB (+78.12 KV) |
| Q8_0 8.0 bpw | FP16 | 77.17 GB | 80.91 GB (+2.5 KV) | 83.41 GB (+5.0 KV) | 88.41 GB (+10.0 KV) | 98.41 GB (+20.0 KV) | 117.47 GB (+39.06 KV) |
| Q8_0 8.0 bpw | Q8_0 | 77.17 GB | 79.78 GB (+1.38 KV) | 81.16 GB (+2.75 KV) | 83.91 GB (+5.5 KV) | 89.41 GB (+11.0 KV) | 99.89 GB (+21.48 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 77.17 GB | 79.66 GB (+1.25 KV) | 80.91 GB (+2.5 KV) | 83.41 GB (+5.0 KV) | 88.41 GB (+10.0 KV) | 97.94 GB (+19.53 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 77.17 GB | 79.16 GB (+0.75 KV) | 79.91 GB (+1.5 KV) | 81.41 GB (+3.0 KV) | 84.41 GB (+6.0 KV) | 90.13 GB (+11.72 KV) |
| Q4_K_M 4.65 bpw | FP32 | 44.86 GB | 51.09 GB (+5.0 KV) | 56.09 GB (+10.0 KV) | 66.09 GB (+20.0 KV) | 86.09 GB (+40.0 KV) | 124.22 GB (+78.12 KV) |
| Q4_K_M 4.65 bpw | FP16 | 44.86 GB | 48.59 GB (+2.5 KV) | 51.09 GB (+5.0 KV) | 56.09 GB (+10.0 KV) | 66.09 GB (+20.0 KV) | 85.16 GB (+39.06 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 44.86 GB | 47.47 GB (+1.38 KV) | 48.84 GB (+2.75 KV) | 51.59 GB (+5.5 KV) | 57.09 GB (+11.0 KV) | 67.58 GB (+21.48 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 44.86 GB | 47.34 GB (+1.25 KV) | 48.59 GB (+2.5 KV) | 51.09 GB (+5.0 KV) | 56.09 GB (+10.0 KV) | 65.62 GB (+19.53 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 44.86 GB | 46.84 GB (+0.75 KV) | 47.59 GB (+1.5 KV) | 49.09 GB (+3.0 KV) | 52.09 GB (+6.0 KV) | 57.81 GB (+11.72 KV) |
| Q4_K_S 4.58 bpw | FP32 | 44.18 GB | 50.42 GB (+5.0 KV) | 55.42 GB (+10.0 KV) | 65.42 GB (+20.0 KV) | 85.42 GB (+40.0 KV) | 123.54 GB (+78.12 KV) |
| Q4_K_S 4.58 bpw | FP16 | 44.18 GB | 47.92 GB (+2.5 KV) | 50.42 GB (+5.0 KV) | 55.42 GB (+10.0 KV) | 65.42 GB (+20.0 KV) | 84.48 GB (+39.06 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 44.18 GB | 46.79 GB (+1.38 KV) | 48.17 GB (+2.75 KV) | 50.92 GB (+5.5 KV) | 56.42 GB (+11.0 KV) | 66.9 GB (+21.48 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 44.18 GB | 46.67 GB (+1.25 KV) | 47.92 GB (+2.5 KV) | 50.42 GB (+5.0 KV) | 55.42 GB (+10.0 KV) | 64.95 GB (+19.53 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 44.18 GB | 46.17 GB (+0.75 KV) | 46.92 GB (+1.5 KV) | 48.42 GB (+3.0 KV) | 51.42 GB (+6.0 KV) | 57.14 GB (+11.72 KV) |
| Q3_K_M 3.91 bpw | FP32 | 37.72 GB | 43.95 GB (+5.0 KV) | 48.95 GB (+10.0 KV) | 58.95 GB (+20.0 KV) | 78.95 GB (+40.0 KV) | 117.08 GB (+78.12 KV) |
| Q3_K_M 3.91 bpw | FP16 | 37.72 GB | 41.45 GB (+2.5 KV) | 43.95 GB (+5.0 KV) | 48.95 GB (+10.0 KV) | 58.95 GB (+20.0 KV) | 78.02 GB (+39.06 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 37.72 GB | 40.33 GB (+1.38 KV) | 41.7 GB (+2.75 KV) | 44.45 GB (+5.5 KV) | 49.95 GB (+11.0 KV) | 60.44 GB (+21.48 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 37.72 GB | 40.2 GB (+1.25 KV) | 41.45 GB (+2.5 KV) | 43.95 GB (+5.0 KV) | 48.95 GB (+10.0 KV) | 58.49 GB (+19.53 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 37.72 GB | 39.7 GB (+0.75 KV) | 40.45 GB (+1.5 KV) | 41.95 GB (+3.0 KV) | 44.95 GB (+6.0 KV) | 50.67 GB (+11.72 KV) |
| Q2_K 2.63 bpw | FP32 | 25.37 GB | 31.61 GB (+5.0 KV) | 36.61 GB (+10.0 KV) | 46.61 GB (+20.0 KV) | 66.61 GB (+40.0 KV) | 104.73 GB (+78.12 KV) |
| Q2_K 2.63 bpw | FP16 | 25.37 GB | 29.11 GB (+2.5 KV) | 31.61 GB (+5.0 KV) | 36.61 GB (+10.0 KV) | 46.61 GB (+20.0 KV) | 65.67 GB (+39.06 KV) |
| Q2_K 2.63 bpw | Q8_0 | 25.37 GB | 27.98 GB (+1.38 KV) | 29.36 GB (+2.75 KV) | 32.11 GB (+5.5 KV) | 37.61 GB (+11.0 KV) | 48.09 GB (+21.48 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 25.37 GB | 27.86 GB (+1.25 KV) | 29.11 GB (+2.5 KV) | 31.61 GB (+5.0 KV) | 36.61 GB (+10.0 KV) | 46.14 GB (+19.53 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 25.37 GB | 27.36 GB (+0.75 KV) | 28.11 GB (+1.5 KV) | 29.61 GB (+3.0 KV) | 32.61 GB (+6.0 KV) | 38.33 GB (+11.72 KV) |
Total VRAM = Model Weights + KV Cache + 1.23 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.