VRAM usage for all quantization and cache format combinations. Base overhead: 0.57 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 262K Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 14.7 GB | 17.27 GB (+2.0 KV) | 19.27 GB (+4.0 KV) | 23.27 GB (+8.0 KV) | 31.27 GB (+16.0 KV) | 47.27 GB (+32.0 KV) | 79.27 GB (+64.0 KV) |
| FP16 16.0 bpw | FP16 | 14.7 GB | 16.27 GB (+1.0 KV) | 17.27 GB (+2.0 KV) | 19.27 GB (+4.0 KV) | 23.27 GB (+8.0 KV) | 31.27 GB (+16.0 KV) | 47.27 GB (+32.0 KV) |
| FP16 16.0 bpw | Q8_0 | 14.7 GB | 15.82 GB (+0.55 KV) | 16.37 GB (+1.1 KV) | 17.47 GB (+2.2 KV) | 19.67 GB (+4.4 KV) | 24.07 GB (+8.8 KV) | 32.87 GB (+17.6 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 14.7 GB | 15.77 GB (+0.5 KV) | 16.27 GB (+1.0 KV) | 17.27 GB (+2.0 KV) | 19.27 GB (+4.0 KV) | 23.27 GB (+8.0 KV) | 31.27 GB (+16.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 14.7 GB | 15.57 GB (+0.3 KV) | 15.87 GB (+0.6 KV) | 16.47 GB (+1.2 KV) | 17.67 GB (+2.4 KV) | 20.07 GB (+4.8 KV) | 24.87 GB (+9.6 KV) |
| Q8_0 8.0 bpw | FP32 | 7.35 GB | 9.92 GB (+2.0 KV) | 11.92 GB (+4.0 KV) | 15.92 GB (+8.0 KV) | 23.92 GB (+16.0 KV) | 39.92 GB (+32.0 KV) | 71.92 GB (+64.0 KV) |
| Q8_0 8.0 bpw | FP16 | 7.35 GB | 8.92 GB (+1.0 KV) | 9.92 GB (+2.0 KV) | 11.92 GB (+4.0 KV) | 15.92 GB (+8.0 KV) | 23.92 GB (+16.0 KV) | 39.92 GB (+32.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 7.35 GB | 8.47 GB (+0.55 KV) | 9.02 GB (+1.1 KV) | 10.12 GB (+2.2 KV) | 12.32 GB (+4.4 KV) | 16.72 GB (+8.8 KV) | 25.52 GB (+17.6 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 7.35 GB | 8.42 GB (+0.5 KV) | 8.92 GB (+1.0 KV) | 9.92 GB (+2.0 KV) | 11.92 GB (+4.0 KV) | 15.92 GB (+8.0 KV) | 23.92 GB (+16.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 7.35 GB | 8.22 GB (+0.3 KV) | 8.52 GB (+0.6 KV) | 9.12 GB (+1.2 KV) | 10.32 GB (+2.4 KV) | 12.72 GB (+4.8 KV) | 17.52 GB (+9.6 KV) |
| Q4_K_M 4.65 bpw | FP32 | 4.27 GB | 6.84 GB (+2.0 KV) | 8.84 GB (+4.0 KV) | 12.84 GB (+8.0 KV) | 20.84 GB (+16.0 KV) | 36.84 GB (+32.0 KV) | 68.84 GB (+64.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 4.27 GB | 5.84 GB (+1.0 KV) | 6.84 GB (+2.0 KV) | 8.84 GB (+4.0 KV) | 12.84 GB (+8.0 KV) | 20.84 GB (+16.0 KV) | 36.84 GB (+32.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 4.27 GB | 5.39 GB (+0.55 KV) | 5.94 GB (+1.1 KV) | 7.04 GB (+2.2 KV) | 9.24 GB (+4.4 KV) | 13.64 GB (+8.8 KV) | 22.44 GB (+17.6 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 4.27 GB | 5.34 GB (+0.5 KV) | 5.84 GB (+1.0 KV) | 6.84 GB (+2.0 KV) | 8.84 GB (+4.0 KV) | 12.84 GB (+8.0 KV) | 20.84 GB (+16.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 4.27 GB | 5.14 GB (+0.3 KV) | 5.44 GB (+0.6 KV) | 6.04 GB (+1.2 KV) | 7.24 GB (+2.4 KV) | 9.64 GB (+4.8 KV) | 14.44 GB (+9.6 KV) |
| Q4_K_S 4.58 bpw | FP32 | 4.21 GB | 6.78 GB (+2.0 KV) | 8.78 GB (+4.0 KV) | 12.78 GB (+8.0 KV) | 20.78 GB (+16.0 KV) | 36.78 GB (+32.0 KV) | 68.78 GB (+64.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 4.21 GB | 5.78 GB (+1.0 KV) | 6.78 GB (+2.0 KV) | 8.78 GB (+4.0 KV) | 12.78 GB (+8.0 KV) | 20.78 GB (+16.0 KV) | 36.78 GB (+32.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 4.21 GB | 5.33 GB (+0.55 KV) | 5.88 GB (+1.1 KV) | 6.98 GB (+2.2 KV) | 9.18 GB (+4.4 KV) | 13.58 GB (+8.8 KV) | 22.38 GB (+17.6 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 4.21 GB | 5.28 GB (+0.5 KV) | 5.78 GB (+1.0 KV) | 6.78 GB (+2.0 KV) | 8.78 GB (+4.0 KV) | 12.78 GB (+8.0 KV) | 20.78 GB (+16.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 4.21 GB | 5.08 GB (+0.3 KV) | 5.38 GB (+0.6 KV) | 5.98 GB (+1.2 KV) | 7.18 GB (+2.4 KV) | 9.58 GB (+4.8 KV) | 14.38 GB (+9.6 KV) |
| Q3_K_M 3.91 bpw | FP32 | 3.59 GB | 6.16 GB (+2.0 KV) | 8.16 GB (+4.0 KV) | 12.16 GB (+8.0 KV) | 20.16 GB (+16.0 KV) | 36.16 GB (+32.0 KV) | 68.16 GB (+64.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 3.59 GB | 5.16 GB (+1.0 KV) | 6.16 GB (+2.0 KV) | 8.16 GB (+4.0 KV) | 12.16 GB (+8.0 KV) | 20.16 GB (+16.0 KV) | 36.16 GB (+32.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 3.59 GB | 4.71 GB (+0.55 KV) | 5.26 GB (+1.1 KV) | 6.36 GB (+2.2 KV) | 8.56 GB (+4.4 KV) | 12.96 GB (+8.8 KV) | 21.76 GB (+17.6 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 3.59 GB | 4.66 GB (+0.5 KV) | 5.16 GB (+1.0 KV) | 6.16 GB (+2.0 KV) | 8.16 GB (+4.0 KV) | 12.16 GB (+8.0 KV) | 20.16 GB (+16.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 3.59 GB | 4.46 GB (+0.3 KV) | 4.76 GB (+0.6 KV) | 5.36 GB (+1.2 KV) | 6.56 GB (+2.4 KV) | 8.96 GB (+4.8 KV) | 13.76 GB (+9.6 KV) |
| Q2_K 2.63 bpw | FP32 | 2.42 GB | 4.99 GB (+2.0 KV) | 6.99 GB (+4.0 KV) | 10.99 GB (+8.0 KV) | 18.99 GB (+16.0 KV) | 34.99 GB (+32.0 KV) | 66.99 GB (+64.0 KV) |
| Q2_K 2.63 bpw | FP16 | 2.42 GB | 3.99 GB (+1.0 KV) | 4.99 GB (+2.0 KV) | 6.99 GB (+4.0 KV) | 10.99 GB (+8.0 KV) | 18.99 GB (+16.0 KV) | 34.99 GB (+32.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 2.42 GB | 3.54 GB (+0.55 KV) | 4.09 GB (+1.1 KV) | 5.19 GB (+2.2 KV) | 7.39 GB (+4.4 KV) | 11.79 GB (+8.8 KV) | 20.59 GB (+17.6 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 2.42 GB | 3.49 GB (+0.5 KV) | 3.99 GB (+1.0 KV) | 4.99 GB (+2.0 KV) | 6.99 GB (+4.0 KV) | 10.99 GB (+8.0 KV) | 18.99 GB (+16.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 2.42 GB | 3.29 GB (+0.3 KV) | 3.59 GB (+0.6 KV) | 4.19 GB (+1.2 KV) | 5.39 GB (+2.4 KV) | 7.79 GB (+4.8 KV) | 12.59 GB (+9.6 KV) |
Total VRAM = Model Weights + KV Cache + 0.57 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.