VRAM usage for all quantization and cache format combinations. Base overhead: 1.5 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context |
|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 252.0 GB | 254.06 GB (+0.56 KV) | 254.62 GB (+1.12 KV) | 255.75 GB (+2.25 KV) | 258.0 GB (+4.5 KV) | 262.5 GB (+9.0 KV) |
| FP16 16.0 bpw | FP16 | 252.0 GB | 253.78 GB (+0.28 KV) | 254.06 GB (+0.56 KV) | 254.62 GB (+1.12 KV) | 255.75 GB (+2.25 KV) | 258.0 GB (+4.5 KV) |
| FP16 16.0 bpw | Q8_0 | 252.0 GB | 253.65 GB (+0.15 KV) | 253.81 GB (+0.31 KV) | 254.12 GB (+0.62 KV) | 254.74 GB (+1.24 KV) | 255.97 GB (+2.47 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 252.0 GB | 253.64 GB (+0.14 KV) | 253.78 GB (+0.28 KV) | 254.06 GB (+0.56 KV) | 254.62 GB (+1.12 KV) | 255.75 GB (+2.25 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 252.0 GB | 253.58 GB (+0.08 KV) | 253.67 GB (+0.17 KV) | 253.84 GB (+0.34 KV) | 254.18 GB (+0.67 KV) | 254.85 GB (+1.35 KV) |
| Q8_0 8.0 bpw | FP32 | 126.0 GB | 128.06 GB (+0.56 KV) | 128.62 GB (+1.12 KV) | 129.75 GB (+2.25 KV) | 132.0 GB (+4.5 KV) | 136.5 GB (+9.0 KV) |
| Q8_0 8.0 bpw | FP16 | 126.0 GB | 127.78 GB (+0.28 KV) | 128.06 GB (+0.56 KV) | 128.62 GB (+1.12 KV) | 129.75 GB (+2.25 KV) | 132.0 GB (+4.5 KV) |
| Q8_0 8.0 bpw | Q8_0 | 126.0 GB | 127.65 GB (+0.15 KV) | 127.81 GB (+0.31 KV) | 128.12 GB (+0.62 KV) | 128.74 GB (+1.24 KV) | 129.97 GB (+2.47 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 126.0 GB | 127.64 GB (+0.14 KV) | 127.78 GB (+0.28 KV) | 128.06 GB (+0.56 KV) | 128.62 GB (+1.12 KV) | 129.75 GB (+2.25 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 126.0 GB | 127.58 GB (+0.08 KV) | 127.67 GB (+0.17 KV) | 127.84 GB (+0.34 KV) | 128.18 GB (+0.67 KV) | 128.85 GB (+1.35 KV) |
| Q4_K_M 4.65 bpw | FP32 | 73.24 GB | 75.3 GB (+0.56 KV) | 75.86 GB (+1.12 KV) | 76.99 GB (+2.25 KV) | 79.24 GB (+4.5 KV) | 83.74 GB (+9.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 73.24 GB | 75.02 GB (+0.28 KV) | 75.3 GB (+0.56 KV) | 75.86 GB (+1.12 KV) | 76.99 GB (+2.25 KV) | 79.24 GB (+4.5 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 73.24 GB | 74.89 GB (+0.15 KV) | 75.05 GB (+0.31 KV) | 75.36 GB (+0.62 KV) | 75.97 GB (+1.24 KV) | 77.21 GB (+2.47 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 73.24 GB | 74.88 GB (+0.14 KV) | 75.02 GB (+0.28 KV) | 75.3 GB (+0.56 KV) | 75.86 GB (+1.12 KV) | 76.99 GB (+2.25 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 73.24 GB | 74.82 GB (+0.08 KV) | 74.91 GB (+0.17 KV) | 75.08 GB (+0.34 KV) | 75.41 GB (+0.67 KV) | 76.09 GB (+1.35 KV) |
| Q4_K_S 4.58 bpw | FP32 | 72.14 GB | 74.2 GB (+0.56 KV) | 74.76 GB (+1.12 KV) | 75.89 GB (+2.25 KV) | 78.14 GB (+4.5 KV) | 82.63 GB (+9.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 72.14 GB | 73.92 GB (+0.28 KV) | 74.2 GB (+0.56 KV) | 74.76 GB (+1.12 KV) | 75.89 GB (+2.25 KV) | 78.13 GB (+4.5 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 72.14 GB | 73.79 GB (+0.15 KV) | 73.94 GB (+0.31 KV) | 74.25 GB (+0.62 KV) | 74.87 GB (+1.24 KV) | 76.11 GB (+2.47 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 72.14 GB | 73.78 GB (+0.14 KV) | 73.92 GB (+0.28 KV) | 74.2 GB (+0.56 KV) | 74.76 GB (+1.12 KV) | 75.88 GB (+2.25 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 72.14 GB | 73.72 GB (+0.08 KV) | 73.8 GB (+0.17 KV) | 73.97 GB (+0.34 KV) | 74.31 GB (+0.67 KV) | 74.98 GB (+1.35 KV) |
| Q3_K_M 3.91 bpw | FP32 | 61.58 GB | 63.65 GB (+0.56 KV) | 64.21 GB (+1.12 KV) | 65.33 GB (+2.25 KV) | 67.58 GB (+4.5 KV) | 72.08 GB (+9.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 61.58 GB | 63.36 GB (+0.28 KV) | 63.65 GB (+0.56 KV) | 64.21 GB (+1.12 KV) | 65.33 GB (+2.25 KV) | 67.58 GB (+4.5 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 61.58 GB | 63.24 GB (+0.15 KV) | 63.39 GB (+0.31 KV) | 63.7 GB (+0.62 KV) | 64.32 GB (+1.24 KV) | 65.56 GB (+2.47 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 61.58 GB | 63.22 GB (+0.14 KV) | 63.36 GB (+0.28 KV) | 63.65 GB (+0.56 KV) | 64.21 GB (+1.12 KV) | 65.33 GB (+2.25 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 61.58 GB | 63.17 GB (+0.08 KV) | 63.25 GB (+0.17 KV) | 63.42 GB (+0.34 KV) | 63.76 GB (+0.67 KV) | 64.43 GB (+1.35 KV) |
| Q2_K 2.63 bpw | FP32 | 41.42 GB | 43.48 GB (+0.56 KV) | 44.05 GB (+1.12 KV) | 45.17 GB (+2.25 KV) | 47.42 GB (+4.5 KV) | 51.92 GB (+9.0 KV) |
| Q2_K 2.63 bpw | FP16 | 41.42 GB | 43.2 GB (+0.28 KV) | 43.48 GB (+0.56 KV) | 44.05 GB (+1.12 KV) | 45.17 GB (+2.25 KV) | 47.42 GB (+4.5 KV) |
| Q2_K 2.63 bpw | Q8_0 | 41.42 GB | 43.08 GB (+0.15 KV) | 43.23 GB (+0.31 KV) | 43.54 GB (+0.62 KV) | 44.16 GB (+1.24 KV) | 45.4 GB (+2.47 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 41.42 GB | 43.06 GB (+0.14 KV) | 43.2 GB (+0.28 KV) | 43.48 GB (+0.56 KV) | 44.05 GB (+1.12 KV) | 45.17 GB (+2.25 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 41.42 GB | 43.01 GB (+0.08 KV) | 43.09 GB (+0.17 KV) | 43.26 GB (+0.34 KV) | 43.6 GB (+0.67 KV) | 44.27 GB (+1.35 KV) |
Total VRAM = Model Weights + KV Cache + 1.5 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.