VRAM usage for all quantization and cache format combinations. Base overhead: 0.58 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context |
|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 16.8 GB | 19.38 GB (+2.0 KV) | 21.38 GB (+4.0 KV) | 25.38 GB (+8.0 KV) | 33.38 GB (+16.0 KV) | 49.38 GB (+32.0 KV) |
| FP16 16.0 bpw | FP16 | 16.8 GB | 18.38 GB (+1.0 KV) | 19.38 GB (+2.0 KV) | 21.38 GB (+4.0 KV) | 25.38 GB (+8.0 KV) | 33.38 GB (+16.0 KV) |
| FP16 16.0 bpw | Q8_0 | 16.8 GB | 17.93 GB (+0.55 KV) | 18.48 GB (+1.1 KV) | 19.58 GB (+2.2 KV) | 21.78 GB (+4.4 KV) | 26.18 GB (+8.8 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 16.8 GB | 17.88 GB (+0.5 KV) | 18.38 GB (+1.0 KV) | 19.38 GB (+2.0 KV) | 21.38 GB (+4.0 KV) | 25.38 GB (+8.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 16.8 GB | 17.68 GB (+0.3 KV) | 17.98 GB (+0.6 KV) | 18.58 GB (+1.2 KV) | 19.78 GB (+2.4 KV) | 22.18 GB (+4.8 KV) |
| Q8_0 8.0 bpw | FP32 | 8.4 GB | 10.98 GB (+2.0 KV) | 12.98 GB (+4.0 KV) | 16.98 GB (+8.0 KV) | 24.98 GB (+16.0 KV) | 40.98 GB (+32.0 KV) |
| Q8_0 8.0 bpw | FP16 | 8.4 GB | 9.98 GB (+1.0 KV) | 10.98 GB (+2.0 KV) | 12.98 GB (+4.0 KV) | 16.98 GB (+8.0 KV) | 24.98 GB (+16.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 8.4 GB | 9.53 GB (+0.55 KV) | 10.08 GB (+1.1 KV) | 11.18 GB (+2.2 KV) | 13.38 GB (+4.4 KV) | 17.78 GB (+8.8 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 8.4 GB | 9.48 GB (+0.5 KV) | 9.98 GB (+1.0 KV) | 10.98 GB (+2.0 KV) | 12.98 GB (+4.0 KV) | 16.98 GB (+8.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 8.4 GB | 9.28 GB (+0.3 KV) | 9.58 GB (+0.6 KV) | 10.18 GB (+1.2 KV) | 11.38 GB (+2.4 KV) | 13.78 GB (+4.8 KV) |
| Q4_K_M 4.65 bpw | FP32 | 4.88 GB | 7.46 GB (+2.0 KV) | 9.46 GB (+4.0 KV) | 13.46 GB (+8.0 KV) | 21.46 GB (+16.0 KV) | 37.46 GB (+32.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 4.88 GB | 6.46 GB (+1.0 KV) | 7.46 GB (+2.0 KV) | 9.46 GB (+4.0 KV) | 13.46 GB (+8.0 KV) | 21.46 GB (+16.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 4.88 GB | 6.01 GB (+0.55 KV) | 6.56 GB (+1.1 KV) | 7.66 GB (+2.2 KV) | 9.86 GB (+4.4 KV) | 14.26 GB (+8.8 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 4.88 GB | 5.96 GB (+0.5 KV) | 6.46 GB (+1.0 KV) | 7.46 GB (+2.0 KV) | 9.46 GB (+4.0 KV) | 13.46 GB (+8.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 4.88 GB | 5.76 GB (+0.3 KV) | 6.06 GB (+0.6 KV) | 6.66 GB (+1.2 KV) | 7.86 GB (+2.4 KV) | 10.26 GB (+4.8 KV) |
| Q4_K_S 4.58 bpw | FP32 | 4.81 GB | 7.39 GB (+2.0 KV) | 9.39 GB (+4.0 KV) | 13.39 GB (+8.0 KV) | 21.39 GB (+16.0 KV) | 37.39 GB (+32.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 4.81 GB | 6.39 GB (+1.0 KV) | 7.39 GB (+2.0 KV) | 9.39 GB (+4.0 KV) | 13.39 GB (+8.0 KV) | 21.39 GB (+16.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 4.81 GB | 5.94 GB (+0.55 KV) | 6.49 GB (+1.1 KV) | 7.59 GB (+2.2 KV) | 9.79 GB (+4.4 KV) | 14.19 GB (+8.8 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 4.81 GB | 5.89 GB (+0.5 KV) | 6.39 GB (+1.0 KV) | 7.39 GB (+2.0 KV) | 9.39 GB (+4.0 KV) | 13.39 GB (+8.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 4.81 GB | 5.69 GB (+0.3 KV) | 5.99 GB (+0.6 KV) | 6.59 GB (+1.2 KV) | 7.79 GB (+2.4 KV) | 10.19 GB (+4.8 KV) |
| Q3_K_M 3.91 bpw | FP32 | 4.11 GB | 6.69 GB (+2.0 KV) | 8.69 GB (+4.0 KV) | 12.69 GB (+8.0 KV) | 20.69 GB (+16.0 KV) | 36.69 GB (+32.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 4.11 GB | 5.69 GB (+1.0 KV) | 6.69 GB (+2.0 KV) | 8.69 GB (+4.0 KV) | 12.69 GB (+8.0 KV) | 20.69 GB (+16.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 4.11 GB | 5.24 GB (+0.55 KV) | 5.79 GB (+1.1 KV) | 6.89 GB (+2.2 KV) | 9.09 GB (+4.4 KV) | 13.49 GB (+8.8 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 4.11 GB | 5.19 GB (+0.5 KV) | 5.69 GB (+1.0 KV) | 6.69 GB (+2.0 KV) | 8.69 GB (+4.0 KV) | 12.69 GB (+8.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 4.11 GB | 4.99 GB (+0.3 KV) | 5.29 GB (+0.6 KV) | 5.89 GB (+1.2 KV) | 7.09 GB (+2.4 KV) | 9.49 GB (+4.8 KV) |
| Q2_K 2.63 bpw | FP32 | 2.76 GB | 5.34 GB (+2.0 KV) | 7.34 GB (+4.0 KV) | 11.34 GB (+8.0 KV) | 19.34 GB (+16.0 KV) | 35.34 GB (+32.0 KV) |
| Q2_K 2.63 bpw | FP16 | 2.76 GB | 4.34 GB (+1.0 KV) | 5.34 GB (+2.0 KV) | 7.34 GB (+4.0 KV) | 11.34 GB (+8.0 KV) | 19.34 GB (+16.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 2.76 GB | 3.89 GB (+0.55 KV) | 4.44 GB (+1.1 KV) | 5.54 GB (+2.2 KV) | 7.74 GB (+4.4 KV) | 12.14 GB (+8.8 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 2.76 GB | 3.84 GB (+0.5 KV) | 4.34 GB (+1.0 KV) | 5.34 GB (+2.0 KV) | 7.34 GB (+4.0 KV) | 11.34 GB (+8.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 2.76 GB | 3.64 GB (+0.3 KV) | 3.94 GB (+0.6 KV) | 4.54 GB (+1.2 KV) | 5.74 GB (+2.4 KV) | 8.14 GB (+4.8 KV) |
Total VRAM = Model Weights + KV Cache + 0.58 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.