VRAM usage for all quantization and cache format combinations. Base overhead: 0.52 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 262K Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 4.83 GB | 7.1 GB (+1.75 KV) | 8.85 GB (+3.5 KV) | 12.35 GB (+7.0 KV) | 19.35 GB (+14.0 KV) | 33.35 GB (+28.0 KV) | 61.35 GB (+56.0 KV) |
| FP16 16.0 bpw | FP16 | 4.83 GB | 6.23 GB (+0.88 KV) | 7.1 GB (+1.75 KV) | 8.85 GB (+3.5 KV) | 12.35 GB (+7.0 KV) | 19.35 GB (+14.0 KV) | 33.35 GB (+28.0 KV) |
| FP16 16.0 bpw | Q8_0 | 4.83 GB | 5.83 GB (+0.48 KV) | 6.32 GB (+0.96 KV) | 7.28 GB (+1.93 KV) | 9.2 GB (+3.85 KV) | 13.05 GB (+7.7 KV) | 20.75 GB (+15.4 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 4.83 GB | 5.79 GB (+0.44 KV) | 6.23 GB (+0.88 KV) | 7.1 GB (+1.75 KV) | 8.85 GB (+3.5 KV) | 12.35 GB (+7.0 KV) | 19.35 GB (+14.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 4.83 GB | 5.62 GB (+0.26 KV) | 5.88 GB (+0.53 KV) | 6.4 GB (+1.05 KV) | 7.45 GB (+2.1 KV) | 9.55 GB (+4.2 KV) | 13.75 GB (+8.4 KV) |
| Q8_0 8.0 bpw | FP32 | 2.42 GB | 4.69 GB (+1.75 KV) | 6.44 GB (+3.5 KV) | 9.94 GB (+7.0 KV) | 16.94 GB (+14.0 KV) | 30.94 GB (+28.0 KV) | 58.94 GB (+56.0 KV) |
| Q8_0 8.0 bpw | FP16 | 2.42 GB | 3.81 GB (+0.88 KV) | 4.69 GB (+1.75 KV) | 6.44 GB (+3.5 KV) | 9.94 GB (+7.0 KV) | 16.94 GB (+14.0 KV) | 30.94 GB (+28.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 2.42 GB | 3.42 GB (+0.48 KV) | 3.9 GB (+0.96 KV) | 4.86 GB (+1.93 KV) | 6.79 GB (+3.85 KV) | 10.64 GB (+7.7 KV) | 18.34 GB (+15.4 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 2.42 GB | 3.38 GB (+0.44 KV) | 3.81 GB (+0.88 KV) | 4.69 GB (+1.75 KV) | 6.44 GB (+3.5 KV) | 9.94 GB (+7.0 KV) | 16.94 GB (+14.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 2.42 GB | 3.2 GB (+0.26 KV) | 3.46 GB (+0.53 KV) | 3.99 GB (+1.05 KV) | 5.04 GB (+2.1 KV) | 7.14 GB (+4.2 KV) | 11.34 GB (+8.4 KV) |
| Q4_K_M 4.65 bpw | FP32 | 1.4 GB | 3.68 GB (+1.75 KV) | 5.43 GB (+3.5 KV) | 8.93 GB (+7.0 KV) | 15.93 GB (+14.0 KV) | 29.93 GB (+28.0 KV) | 57.93 GB (+56.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 1.4 GB | 2.8 GB (+0.88 KV) | 3.68 GB (+1.75 KV) | 5.43 GB (+3.5 KV) | 8.93 GB (+7.0 KV) | 15.93 GB (+14.0 KV) | 29.93 GB (+28.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 1.4 GB | 2.41 GB (+0.48 KV) | 2.89 GB (+0.96 KV) | 3.85 GB (+1.93 KV) | 5.78 GB (+3.85 KV) | 9.63 GB (+7.7 KV) | 17.33 GB (+15.4 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 1.4 GB | 2.36 GB (+0.44 KV) | 2.8 GB (+0.88 KV) | 3.68 GB (+1.75 KV) | 5.43 GB (+3.5 KV) | 8.93 GB (+7.0 KV) | 15.93 GB (+14.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 1.4 GB | 2.19 GB (+0.26 KV) | 2.45 GB (+0.53 KV) | 2.98 GB (+1.05 KV) | 4.03 GB (+2.1 KV) | 6.13 GB (+4.2 KV) | 10.33 GB (+8.4 KV) |
| Q4_K_S 4.58 bpw | FP32 | 1.38 GB | 3.66 GB (+1.75 KV) | 5.41 GB (+3.5 KV) | 8.91 GB (+7.0 KV) | 15.91 GB (+14.0 KV) | 29.91 GB (+28.0 KV) | 57.91 GB (+56.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 1.38 GB | 2.78 GB (+0.88 KV) | 3.66 GB (+1.75 KV) | 5.41 GB (+3.5 KV) | 8.91 GB (+7.0 KV) | 15.91 GB (+14.0 KV) | 29.91 GB (+28.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 1.38 GB | 2.39 GB (+0.48 KV) | 2.87 GB (+0.96 KV) | 3.83 GB (+1.93 KV) | 5.76 GB (+3.85 KV) | 9.61 GB (+7.7 KV) | 17.31 GB (+15.4 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 1.38 GB | 2.34 GB (+0.44 KV) | 2.78 GB (+0.88 KV) | 3.66 GB (+1.75 KV) | 5.41 GB (+3.5 KV) | 8.91 GB (+7.0 KV) | 15.91 GB (+14.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 1.38 GB | 2.17 GB (+0.26 KV) | 2.43 GB (+0.53 KV) | 2.96 GB (+1.05 KV) | 4.01 GB (+2.1 KV) | 6.11 GB (+4.2 KV) | 10.31 GB (+8.4 KV) |
| Q3_K_M 3.91 bpw | FP32 | 1.18 GB | 3.45 GB (+1.75 KV) | 5.2 GB (+3.5 KV) | 8.7 GB (+7.0 KV) | 15.7 GB (+14.0 KV) | 29.7 GB (+28.0 KV) | 57.7 GB (+56.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 1.18 GB | 2.58 GB (+0.88 KV) | 3.45 GB (+1.75 KV) | 5.2 GB (+3.5 KV) | 8.7 GB (+7.0 KV) | 15.7 GB (+14.0 KV) | 29.7 GB (+28.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 1.18 GB | 2.18 GB (+0.48 KV) | 2.67 GB (+0.96 KV) | 3.63 GB (+1.93 KV) | 5.55 GB (+3.85 KV) | 9.4 GB (+7.7 KV) | 17.1 GB (+15.4 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 1.18 GB | 2.14 GB (+0.44 KV) | 2.58 GB (+0.88 KV) | 3.45 GB (+1.75 KV) | 5.2 GB (+3.5 KV) | 8.7 GB (+7.0 KV) | 15.7 GB (+14.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 1.18 GB | 1.97 GB (+0.26 KV) | 2.23 GB (+0.53 KV) | 2.75 GB (+1.05 KV) | 3.8 GB (+2.1 KV) | 5.9 GB (+4.2 KV) | 10.1 GB (+8.4 KV) |
| Q2_K 2.63 bpw | FP32 | 0.79 GB | 3.07 GB (+1.75 KV) | 4.82 GB (+3.5 KV) | 8.32 GB (+7.0 KV) | 15.32 GB (+14.0 KV) | 29.32 GB (+28.0 KV) | 57.32 GB (+56.0 KV) |
| Q2_K 2.63 bpw | FP16 | 0.79 GB | 2.19 GB (+0.88 KV) | 3.07 GB (+1.75 KV) | 4.82 GB (+3.5 KV) | 8.32 GB (+7.0 KV) | 15.32 GB (+14.0 KV) | 29.32 GB (+28.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 0.79 GB | 1.8 GB (+0.48 KV) | 2.28 GB (+0.96 KV) | 3.24 GB (+1.93 KV) | 5.17 GB (+3.85 KV) | 9.02 GB (+7.7 KV) | 16.72 GB (+15.4 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 0.79 GB | 1.75 GB (+0.44 KV) | 2.19 GB (+0.88 KV) | 3.07 GB (+1.75 KV) | 4.82 GB (+3.5 KV) | 8.32 GB (+7.0 KV) | 15.32 GB (+14.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 0.79 GB | 1.58 GB (+0.26 KV) | 1.84 GB (+0.53 KV) | 2.37 GB (+1.05 KV) | 3.42 GB (+2.1 KV) | 5.52 GB (+4.2 KV) | 9.72 GB (+8.4 KV) |
Total VRAM = Model Weights + KV Cache + 0.52 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.