VRAM usage for all quantization and cache format combinations. Base overhead: 0.7 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context |
|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 42.0 GB | 43.08 GB (+0.38 KV) | 43.45 GB (+0.75 KV) | 44.2 GB (+1.5 KV) | 45.7 GB (+3.0 KV) | 48.7 GB (+6.0 KV) |
| FP16 16.0 bpw | FP16 | 42.0 GB | 42.89 GB (+0.19 KV) | 43.08 GB (+0.38 KV) | 43.45 GB (+0.75 KV) | 44.2 GB (+1.5 KV) | 45.7 GB (+3.0 KV) |
| FP16 16.0 bpw | Q8_0 | 42.0 GB | 42.8 GB (+0.1 KV) | 42.91 GB (+0.21 KV) | 43.11 GB (+0.41 KV) | 43.53 GB (+0.83 KV) | 44.35 GB (+1.65 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 42.0 GB | 42.79 GB (+0.09 KV) | 42.89 GB (+0.19 KV) | 43.08 GB (+0.38 KV) | 43.45 GB (+0.75 KV) | 44.2 GB (+1.5 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 42.0 GB | 42.76 GB (+0.06 KV) | 42.81 GB (+0.11 KV) | 42.93 GB (+0.22 KV) | 43.15 GB (+0.45 KV) | 43.6 GB (+0.9 KV) |
| Q8_0 8.0 bpw | FP32 | 21.0 GB | 22.07 GB (+0.38 KV) | 22.45 GB (+0.75 KV) | 23.2 GB (+1.5 KV) | 24.7 GB (+3.0 KV) | 27.7 GB (+6.0 KV) |
| Q8_0 8.0 bpw | FP16 | 21.0 GB | 21.89 GB (+0.19 KV) | 22.07 GB (+0.38 KV) | 22.45 GB (+0.75 KV) | 23.2 GB (+1.5 KV) | 24.7 GB (+3.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 21.0 GB | 21.8 GB (+0.1 KV) | 21.91 GB (+0.21 KV) | 22.11 GB (+0.41 KV) | 22.52 GB (+0.83 KV) | 23.35 GB (+1.65 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 21.0 GB | 21.79 GB (+0.09 KV) | 21.89 GB (+0.19 KV) | 22.07 GB (+0.38 KV) | 22.45 GB (+0.75 KV) | 23.2 GB (+1.5 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 21.0 GB | 21.76 GB (+0.06 KV) | 21.81 GB (+0.11 KV) | 21.93 GB (+0.22 KV) | 22.15 GB (+0.45 KV) | 22.6 GB (+0.9 KV) |
| Q4_K_M 4.65 bpw | FP32 | 12.21 GB | 13.28 GB (+0.38 KV) | 13.66 GB (+0.75 KV) | 14.41 GB (+1.5 KV) | 15.91 GB (+3.0 KV) | 18.91 GB (+6.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 12.21 GB | 13.09 GB (+0.19 KV) | 13.28 GB (+0.38 KV) | 13.66 GB (+0.75 KV) | 14.41 GB (+1.5 KV) | 15.91 GB (+3.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 12.21 GB | 13.01 GB (+0.1 KV) | 13.11 GB (+0.21 KV) | 13.32 GB (+0.41 KV) | 13.73 GB (+0.83 KV) | 14.56 GB (+1.65 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 12.21 GB | 13.0 GB (+0.09 KV) | 13.09 GB (+0.19 KV) | 13.28 GB (+0.38 KV) | 13.66 GB (+0.75 KV) | 14.41 GB (+1.5 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 12.21 GB | 12.96 GB (+0.06 KV) | 13.02 GB (+0.11 KV) | 13.13 GB (+0.22 KV) | 13.36 GB (+0.45 KV) | 13.81 GB (+0.9 KV) |
| Q4_K_S 4.58 bpw | FP32 | 12.02 GB | 13.1 GB (+0.38 KV) | 13.47 GB (+0.75 KV) | 14.22 GB (+1.5 KV) | 15.72 GB (+3.0 KV) | 18.72 GB (+6.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 12.02 GB | 12.91 GB (+0.19 KV) | 13.1 GB (+0.38 KV) | 13.47 GB (+0.75 KV) | 14.22 GB (+1.5 KV) | 15.72 GB (+3.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 12.02 GB | 12.83 GB (+0.1 KV) | 12.93 GB (+0.21 KV) | 13.13 GB (+0.41 KV) | 13.55 GB (+0.83 KV) | 14.37 GB (+1.65 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 12.02 GB | 12.82 GB (+0.09 KV) | 12.91 GB (+0.19 KV) | 13.1 GB (+0.38 KV) | 13.47 GB (+0.75 KV) | 14.22 GB (+1.5 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 12.02 GB | 12.78 GB (+0.06 KV) | 12.83 GB (+0.11 KV) | 12.95 GB (+0.22 KV) | 13.17 GB (+0.45 KV) | 13.62 GB (+0.9 KV) |
| Q3_K_M 3.91 bpw | FP32 | 10.26 GB | 11.34 GB (+0.38 KV) | 11.71 GB (+0.75 KV) | 12.46 GB (+1.5 KV) | 13.96 GB (+3.0 KV) | 16.96 GB (+6.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 10.26 GB | 11.15 GB (+0.19 KV) | 11.34 GB (+0.38 KV) | 11.71 GB (+0.75 KV) | 12.46 GB (+1.5 KV) | 13.96 GB (+3.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 10.26 GB | 11.07 GB (+0.1 KV) | 11.17 GB (+0.21 KV) | 11.38 GB (+0.41 KV) | 11.79 GB (+0.83 KV) | 12.61 GB (+1.65 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 10.26 GB | 11.06 GB (+0.09 KV) | 11.15 GB (+0.19 KV) | 11.34 GB (+0.38 KV) | 11.71 GB (+0.75 KV) | 12.46 GB (+1.5 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 10.26 GB | 11.02 GB (+0.06 KV) | 11.08 GB (+0.11 KV) | 11.19 GB (+0.22 KV) | 11.41 GB (+0.45 KV) | 11.86 GB (+0.9 KV) |
| Q2_K 2.63 bpw | FP32 | 6.9 GB | 7.98 GB (+0.38 KV) | 8.35 GB (+0.75 KV) | 9.1 GB (+1.5 KV) | 10.6 GB (+3.0 KV) | 13.6 GB (+6.0 KV) |
| Q2_K 2.63 bpw | FP16 | 6.9 GB | 7.79 GB (+0.19 KV) | 7.98 GB (+0.38 KV) | 8.35 GB (+0.75 KV) | 9.1 GB (+1.5 KV) | 10.6 GB (+3.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 6.9 GB | 7.71 GB (+0.1 KV) | 7.81 GB (+0.21 KV) | 8.02 GB (+0.41 KV) | 8.43 GB (+0.83 KV) | 9.25 GB (+1.65 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 6.9 GB | 7.7 GB (+0.09 KV) | 7.79 GB (+0.19 KV) | 7.98 GB (+0.38 KV) | 8.35 GB (+0.75 KV) | 9.1 GB (+1.5 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 6.9 GB | 7.66 GB (+0.06 KV) | 7.72 GB (+0.11 KV) | 7.83 GB (+0.22 KV) | 8.05 GB (+0.45 KV) | 8.5 GB (+0.9 KV) |
Total VRAM = Model Weights + KV Cache + 0.7 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.