VRAM usage for all quantization and cache format combinations. Base overhead: 0.8 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 202K Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 63.0 GB | 64.63 GB (+0.83 KV) | 65.45 GB (+1.65 KV) | 67.1 GB (+3.3 KV) | 70.41 GB (+6.61 KV) | 77.02 GB (+13.22 KV) | 84.25 GB (+20.45 KV) |
| FP16 16.0 bpw | FP16 | 63.0 GB | 64.21 GB (+0.41 KV) | 64.63 GB (+0.83 KV) | 65.45 GB (+1.65 KV) | 67.1 GB (+3.3 KV) | 70.41 GB (+6.61 KV) | 74.02 GB (+10.22 KV) |
| FP16 16.0 bpw | Q8_0 | 63.0 GB | 64.03 GB (+0.23 KV) | 64.25 GB (+0.45 KV) | 64.71 GB (+0.91 KV) | 65.62 GB (+1.82 KV) | 67.44 GB (+3.64 KV) | 69.42 GB (+5.62 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 63.0 GB | 64.01 GB (+0.21 KV) | 64.21 GB (+0.41 KV) | 64.63 GB (+0.83 KV) | 65.45 GB (+1.65 KV) | 67.1 GB (+3.3 KV) | 68.91 GB (+5.11 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 63.0 GB | 63.92 GB (+0.12 KV) | 64.05 GB (+0.25 KV) | 64.3 GB (+0.5 KV) | 64.79 GB (+0.99 KV) | 65.78 GB (+1.98 KV) | 66.87 GB (+3.07 KV) |
| Q8_0 8.0 bpw | FP32 | 31.5 GB | 33.13 GB (+0.83 KV) | 33.95 GB (+1.65 KV) | 35.6 GB (+3.3 KV) | 38.91 GB (+6.61 KV) | 45.52 GB (+13.22 KV) | 52.75 GB (+20.45 KV) |
| Q8_0 8.0 bpw | FP16 | 31.5 GB | 32.71 GB (+0.41 KV) | 33.13 GB (+0.83 KV) | 33.95 GB (+1.65 KV) | 35.6 GB (+3.3 KV) | 38.91 GB (+6.61 KV) | 42.52 GB (+10.22 KV) |
| Q8_0 8.0 bpw | Q8_0 | 31.5 GB | 32.53 GB (+0.23 KV) | 32.75 GB (+0.45 KV) | 33.21 GB (+0.91 KV) | 34.12 GB (+1.82 KV) | 35.94 GB (+3.64 KV) | 37.92 GB (+5.62 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 31.5 GB | 32.51 GB (+0.21 KV) | 32.71 GB (+0.41 KV) | 33.13 GB (+0.83 KV) | 33.95 GB (+1.65 KV) | 35.6 GB (+3.3 KV) | 37.41 GB (+5.11 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 31.5 GB | 32.42 GB (+0.12 KV) | 32.55 GB (+0.25 KV) | 32.8 GB (+0.5 KV) | 33.29 GB (+0.99 KV) | 34.28 GB (+1.98 KV) | 35.37 GB (+3.07 KV) |
| Q4_K_M 4.65 bpw | FP32 | 18.31 GB | 19.94 GB (+0.83 KV) | 20.76 GB (+1.65 KV) | 22.41 GB (+3.3 KV) | 25.72 GB (+6.61 KV) | 32.33 GB (+13.22 KV) | 39.56 GB (+20.45 KV) |
| Q4_K_M 4.65 bpw | FP16 | 18.31 GB | 19.52 GB (+0.41 KV) | 19.94 GB (+0.83 KV) | 20.76 GB (+1.65 KV) | 22.41 GB (+3.3 KV) | 25.72 GB (+6.61 KV) | 29.33 GB (+10.22 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 18.31 GB | 19.34 GB (+0.23 KV) | 19.56 GB (+0.45 KV) | 20.02 GB (+0.91 KV) | 20.93 GB (+1.82 KV) | 22.74 GB (+3.64 KV) | 24.73 GB (+5.62 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 18.31 GB | 19.32 GB (+0.21 KV) | 19.52 GB (+0.41 KV) | 19.94 GB (+0.83 KV) | 20.76 GB (+1.65 KV) | 22.41 GB (+3.3 KV) | 24.22 GB (+5.11 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 18.31 GB | 19.23 GB (+0.12 KV) | 19.36 GB (+0.25 KV) | 19.61 GB (+0.5 KV) | 20.1 GB (+0.99 KV) | 21.09 GB (+1.98 KV) | 22.18 GB (+3.07 KV) |
| Q4_K_S 4.58 bpw | FP32 | 18.03 GB | 19.66 GB (+0.83 KV) | 20.49 GB (+1.65 KV) | 22.14 GB (+3.3 KV) | 25.44 GB (+6.61 KV) | 32.05 GB (+13.22 KV) | 39.28 GB (+20.45 KV) |
| Q4_K_S 4.58 bpw | FP16 | 18.03 GB | 19.25 GB (+0.41 KV) | 19.66 GB (+0.83 KV) | 20.49 GB (+1.65 KV) | 22.14 GB (+3.3 KV) | 25.44 GB (+6.61 KV) | 29.06 GB (+10.22 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 18.03 GB | 19.06 GB (+0.23 KV) | 19.29 GB (+0.45 KV) | 19.74 GB (+0.91 KV) | 20.65 GB (+1.82 KV) | 22.47 GB (+3.64 KV) | 24.46 GB (+5.62 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 18.03 GB | 19.04 GB (+0.21 KV) | 19.25 GB (+0.41 KV) | 19.66 GB (+0.83 KV) | 20.49 GB (+1.65 KV) | 22.14 GB (+3.3 KV) | 23.95 GB (+5.11 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 18.03 GB | 18.96 GB (+0.12 KV) | 19.08 GB (+0.25 KV) | 19.33 GB (+0.5 KV) | 19.83 GB (+0.99 KV) | 20.82 GB (+1.98 KV) | 21.9 GB (+3.07 KV) |
| Q3_K_M 3.91 bpw | FP32 | 15.4 GB | 17.02 GB (+0.83 KV) | 17.85 GB (+1.65 KV) | 19.5 GB (+3.3 KV) | 22.81 GB (+6.61 KV) | 29.41 GB (+13.22 KV) | 36.64 GB (+20.45 KV) |
| Q3_K_M 3.91 bpw | FP16 | 15.4 GB | 16.61 GB (+0.41 KV) | 17.02 GB (+0.83 KV) | 17.85 GB (+1.65 KV) | 19.5 GB (+3.3 KV) | 22.81 GB (+6.61 KV) | 26.42 GB (+10.22 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 15.4 GB | 16.42 GB (+0.23 KV) | 16.65 GB (+0.45 KV) | 17.1 GB (+0.91 KV) | 18.01 GB (+1.82 KV) | 19.83 GB (+3.64 KV) | 21.82 GB (+5.62 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 15.4 GB | 16.4 GB (+0.21 KV) | 16.61 GB (+0.41 KV) | 17.02 GB (+0.83 KV) | 17.85 GB (+1.65 KV) | 19.5 GB (+3.3 KV) | 21.31 GB (+5.11 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 15.4 GB | 16.32 GB (+0.12 KV) | 16.44 GB (+0.25 KV) | 16.69 GB (+0.5 KV) | 17.19 GB (+0.99 KV) | 18.18 GB (+1.98 KV) | 19.26 GB (+3.07 KV) |
| Q2_K 2.63 bpw | FP32 | 10.36 GB | 11.98 GB (+0.83 KV) | 12.81 GB (+1.65 KV) | 14.46 GB (+3.3 KV) | 17.77 GB (+6.61 KV) | 24.37 GB (+13.22 KV) | 31.6 GB (+20.45 KV) |
| Q2_K 2.63 bpw | FP16 | 10.36 GB | 11.57 GB (+0.41 KV) | 11.98 GB (+0.83 KV) | 12.81 GB (+1.65 KV) | 14.46 GB (+3.3 KV) | 17.77 GB (+6.61 KV) | 21.38 GB (+10.22 KV) |
| Q2_K 2.63 bpw | Q8_0 | 10.36 GB | 11.38 GB (+0.23 KV) | 11.61 GB (+0.45 KV) | 12.06 GB (+0.91 KV) | 12.97 GB (+1.82 KV) | 14.79 GB (+3.64 KV) | 16.78 GB (+5.62 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 10.36 GB | 11.36 GB (+0.21 KV) | 11.57 GB (+0.41 KV) | 11.98 GB (+0.83 KV) | 12.81 GB (+1.65 KV) | 14.46 GB (+3.3 KV) | 16.27 GB (+5.11 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 10.36 GB | 11.28 GB (+0.12 KV) | 11.4 GB (+0.25 KV) | 11.65 GB (+0.5 KV) | 12.15 GB (+0.99 KV) | 13.14 GB (+1.98 KV) | 14.22 GB (+3.07 KV) |
Total VRAM = Model Weights + KV Cache + 0.8 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.