VRAM usage for all quantization and cache format combinations. Base overhead: 0.8 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 40K Context |
|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 64.05 GB | 66.36 GB (+1.5 KV) | 67.86 GB (+3.0 KV) | 70.86 GB (+6.0 KV) | 72.36 GB (+7.5 KV) |
| FP16 16.0 bpw | FP16 | 64.05 GB | 65.61 GB (+0.75 KV) | 66.36 GB (+1.5 KV) | 67.86 GB (+3.0 KV) | 68.61 GB (+3.75 KV) |
| FP16 16.0 bpw | Q8_0 | 64.05 GB | 65.27 GB (+0.41 KV) | 65.68 GB (+0.83 KV) | 66.51 GB (+1.65 KV) | 66.92 GB (+2.06 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 64.05 GB | 65.23 GB (+0.38 KV) | 65.61 GB (+0.75 KV) | 66.36 GB (+1.5 KV) | 66.73 GB (+1.88 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 64.05 GB | 65.08 GB (+0.22 KV) | 65.31 GB (+0.45 KV) | 65.76 GB (+0.9 KV) | 65.98 GB (+1.12 KV) |
| Q8_0 8.0 bpw | FP32 | 32.02 GB | 34.33 GB (+1.5 KV) | 35.83 GB (+3.0 KV) | 38.83 GB (+6.0 KV) | 40.33 GB (+7.5 KV) |
| Q8_0 8.0 bpw | FP16 | 32.02 GB | 33.58 GB (+0.75 KV) | 34.33 GB (+1.5 KV) | 35.83 GB (+3.0 KV) | 36.58 GB (+3.75 KV) |
| Q8_0 8.0 bpw | Q8_0 | 32.02 GB | 33.24 GB (+0.41 KV) | 33.66 GB (+0.83 KV) | 34.48 GB (+1.65 KV) | 34.89 GB (+2.06 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 32.02 GB | 33.2 GB (+0.38 KV) | 33.58 GB (+0.75 KV) | 34.33 GB (+1.5 KV) | 34.7 GB (+1.88 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 32.02 GB | 33.05 GB (+0.22 KV) | 33.28 GB (+0.45 KV) | 33.73 GB (+0.9 KV) | 33.95 GB (+1.12 KV) |
| Q4_K_M 4.65 bpw | FP32 | 18.61 GB | 20.92 GB (+1.5 KV) | 22.42 GB (+3.0 KV) | 25.42 GB (+6.0 KV) | 26.92 GB (+7.5 KV) |
| Q4_K_M 4.65 bpw | FP16 | 18.61 GB | 20.17 GB (+0.75 KV) | 20.92 GB (+1.5 KV) | 22.42 GB (+3.0 KV) | 23.17 GB (+3.75 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 18.61 GB | 19.83 GB (+0.41 KV) | 20.24 GB (+0.83 KV) | 21.07 GB (+1.65 KV) | 21.48 GB (+2.06 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 18.61 GB | 19.79 GB (+0.38 KV) | 20.17 GB (+0.75 KV) | 20.92 GB (+1.5 KV) | 21.29 GB (+1.88 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 18.61 GB | 19.64 GB (+0.22 KV) | 19.87 GB (+0.45 KV) | 20.32 GB (+0.9 KV) | 20.54 GB (+1.12 KV) |
| Q4_K_S 4.58 bpw | FP32 | 18.33 GB | 20.64 GB (+1.5 KV) | 22.14 GB (+3.0 KV) | 25.14 GB (+6.0 KV) | 26.64 GB (+7.5 KV) |
| Q4_K_S 4.58 bpw | FP16 | 18.33 GB | 19.89 GB (+0.75 KV) | 20.64 GB (+1.5 KV) | 22.14 GB (+3.0 KV) | 22.89 GB (+3.75 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 18.33 GB | 19.55 GB (+0.41 KV) | 19.96 GB (+0.83 KV) | 20.79 GB (+1.65 KV) | 21.2 GB (+2.06 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 18.33 GB | 19.51 GB (+0.38 KV) | 19.89 GB (+0.75 KV) | 20.64 GB (+1.5 KV) | 21.01 GB (+1.88 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 18.33 GB | 19.36 GB (+0.22 KV) | 19.59 GB (+0.45 KV) | 20.04 GB (+0.9 KV) | 20.26 GB (+1.12 KV) |
| Q3_K_M 3.91 bpw | FP32 | 15.65 GB | 17.96 GB (+1.5 KV) | 19.46 GB (+3.0 KV) | 22.46 GB (+6.0 KV) | 23.96 GB (+7.5 KV) |
| Q3_K_M 3.91 bpw | FP16 | 15.65 GB | 17.21 GB (+0.75 KV) | 17.96 GB (+1.5 KV) | 19.46 GB (+3.0 KV) | 20.21 GB (+3.75 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 15.65 GB | 16.87 GB (+0.41 KV) | 17.28 GB (+0.83 KV) | 18.11 GB (+1.65 KV) | 18.52 GB (+2.06 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 15.65 GB | 16.83 GB (+0.38 KV) | 17.21 GB (+0.75 KV) | 17.96 GB (+1.5 KV) | 18.33 GB (+1.88 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 15.65 GB | 16.68 GB (+0.22 KV) | 16.91 GB (+0.45 KV) | 17.36 GB (+0.9 KV) | 17.58 GB (+1.12 KV) |
| Q2_K 2.63 bpw | FP32 | 10.53 GB | 12.83 GB (+1.5 KV) | 14.33 GB (+3.0 KV) | 17.33 GB (+6.0 KV) | 18.83 GB (+7.5 KV) |
| Q2_K 2.63 bpw | FP16 | 10.53 GB | 12.08 GB (+0.75 KV) | 12.83 GB (+1.5 KV) | 14.33 GB (+3.0 KV) | 15.08 GB (+3.75 KV) |
| Q2_K 2.63 bpw | Q8_0 | 10.53 GB | 11.75 GB (+0.41 KV) | 12.16 GB (+0.83 KV) | 12.98 GB (+1.65 KV) | 13.4 GB (+2.06 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 10.53 GB | 11.71 GB (+0.38 KV) | 12.08 GB (+0.75 KV) | 12.83 GB (+1.5 KV) | 13.21 GB (+1.88 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 10.53 GB | 11.56 GB (+0.22 KV) | 11.78 GB (+0.45 KV) | 12.23 GB (+0.9 KV) | 12.46 GB (+1.12 KV) |
Total VRAM = Model Weights + KV Cache + 0.8 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.