VRAM usage for all quantization and cache format combinations. Base overhead: 0.64 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 262K Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 29.4 GB | 32.54 GB (+2.5 KV) | 35.04 GB (+5.0 KV) | 40.04 GB (+10.0 KV) | 50.04 GB (+20.0 KV) | 70.04 GB (+40.0 KV) | 110.04 GB (+80.0 KV) |
| FP16 16.0 bpw | FP16 | 29.4 GB | 31.29 GB (+1.25 KV) | 32.54 GB (+2.5 KV) | 35.04 GB (+5.0 KV) | 40.04 GB (+10.0 KV) | 50.04 GB (+20.0 KV) | 70.04 GB (+40.0 KV) |
| FP16 16.0 bpw | Q8_0 | 29.4 GB | 30.73 GB (+0.69 KV) | 31.42 GB (+1.38 KV) | 32.79 GB (+2.75 KV) | 35.54 GB (+5.5 KV) | 41.04 GB (+11.0 KV) | 52.04 GB (+22.0 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 29.4 GB | 30.67 GB (+0.62 KV) | 31.29 GB (+1.25 KV) | 32.54 GB (+2.5 KV) | 35.04 GB (+5.0 KV) | 40.04 GB (+10.0 KV) | 50.04 GB (+20.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 29.4 GB | 30.42 GB (+0.38 KV) | 30.79 GB (+0.75 KV) | 31.54 GB (+1.5 KV) | 33.04 GB (+3.0 KV) | 36.04 GB (+6.0 KV) | 42.04 GB (+12.0 KV) |
| Q8_0 8.0 bpw | FP32 | 14.7 GB | 17.84 GB (+2.5 KV) | 20.34 GB (+5.0 KV) | 25.34 GB (+10.0 KV) | 35.34 GB (+20.0 KV) | 55.34 GB (+40.0 KV) | 95.34 GB (+80.0 KV) |
| Q8_0 8.0 bpw | FP16 | 14.7 GB | 16.59 GB (+1.25 KV) | 17.84 GB (+2.5 KV) | 20.34 GB (+5.0 KV) | 25.34 GB (+10.0 KV) | 35.34 GB (+20.0 KV) | 55.34 GB (+40.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 14.7 GB | 16.03 GB (+0.69 KV) | 16.72 GB (+1.38 KV) | 18.09 GB (+2.75 KV) | 20.84 GB (+5.5 KV) | 26.34 GB (+11.0 KV) | 37.34 GB (+22.0 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 14.7 GB | 15.97 GB (+0.62 KV) | 16.59 GB (+1.25 KV) | 17.84 GB (+2.5 KV) | 20.34 GB (+5.0 KV) | 25.34 GB (+10.0 KV) | 35.34 GB (+20.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 14.7 GB | 15.72 GB (+0.38 KV) | 16.09 GB (+0.75 KV) | 16.84 GB (+1.5 KV) | 18.34 GB (+3.0 KV) | 21.34 GB (+6.0 KV) | 27.34 GB (+12.0 KV) |
| Q4_K_M 4.65 bpw | FP32 | 8.54 GB | 11.68 GB (+2.5 KV) | 14.18 GB (+5.0 KV) | 19.18 GB (+10.0 KV) | 29.18 GB (+20.0 KV) | 49.18 GB (+40.0 KV) | 89.18 GB (+80.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 8.54 GB | 10.43 GB (+1.25 KV) | 11.68 GB (+2.5 KV) | 14.18 GB (+5.0 KV) | 19.18 GB (+10.0 KV) | 29.18 GB (+20.0 KV) | 49.18 GB (+40.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 8.54 GB | 9.87 GB (+0.69 KV) | 10.56 GB (+1.38 KV) | 11.93 GB (+2.75 KV) | 14.68 GB (+5.5 KV) | 20.18 GB (+11.0 KV) | 31.18 GB (+22.0 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 8.54 GB | 9.81 GB (+0.62 KV) | 10.43 GB (+1.25 KV) | 11.68 GB (+2.5 KV) | 14.18 GB (+5.0 KV) | 19.18 GB (+10.0 KV) | 29.18 GB (+20.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 8.54 GB | 9.56 GB (+0.38 KV) | 9.93 GB (+0.75 KV) | 10.68 GB (+1.5 KV) | 12.18 GB (+3.0 KV) | 15.18 GB (+6.0 KV) | 21.18 GB (+12.0 KV) |
| Q4_K_S 4.58 bpw | FP32 | 8.42 GB | 11.56 GB (+2.5 KV) | 14.06 GB (+5.0 KV) | 19.06 GB (+10.0 KV) | 29.06 GB (+20.0 KV) | 49.06 GB (+40.0 KV) | 89.06 GB (+80.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 8.42 GB | 10.31 GB (+1.25 KV) | 11.56 GB (+2.5 KV) | 14.06 GB (+5.0 KV) | 19.06 GB (+10.0 KV) | 29.06 GB (+20.0 KV) | 49.06 GB (+40.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 8.42 GB | 9.74 GB (+0.69 KV) | 10.43 GB (+1.38 KV) | 11.81 GB (+2.75 KV) | 14.56 GB (+5.5 KV) | 20.06 GB (+11.0 KV) | 31.06 GB (+22.0 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 8.42 GB | 9.68 GB (+0.62 KV) | 10.31 GB (+1.25 KV) | 11.56 GB (+2.5 KV) | 14.06 GB (+5.0 KV) | 19.06 GB (+10.0 KV) | 29.06 GB (+20.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 8.42 GB | 9.43 GB (+0.38 KV) | 9.81 GB (+0.75 KV) | 10.56 GB (+1.5 KV) | 12.06 GB (+3.0 KV) | 15.06 GB (+6.0 KV) | 21.06 GB (+12.0 KV) |
| Q3_K_M 3.91 bpw | FP32 | 7.18 GB | 10.32 GB (+2.5 KV) | 12.82 GB (+5.0 KV) | 17.82 GB (+10.0 KV) | 27.82 GB (+20.0 KV) | 47.82 GB (+40.0 KV) | 87.82 GB (+80.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 7.18 GB | 9.07 GB (+1.25 KV) | 10.32 GB (+2.5 KV) | 12.82 GB (+5.0 KV) | 17.82 GB (+10.0 KV) | 27.82 GB (+20.0 KV) | 47.82 GB (+40.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 7.18 GB | 8.51 GB (+0.69 KV) | 9.2 GB (+1.38 KV) | 10.57 GB (+2.75 KV) | 13.32 GB (+5.5 KV) | 18.82 GB (+11.0 KV) | 29.82 GB (+22.0 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 7.18 GB | 8.45 GB (+0.62 KV) | 9.07 GB (+1.25 KV) | 10.32 GB (+2.5 KV) | 12.82 GB (+5.0 KV) | 17.82 GB (+10.0 KV) | 27.82 GB (+20.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 7.18 GB | 8.2 GB (+0.38 KV) | 8.57 GB (+0.75 KV) | 9.32 GB (+1.5 KV) | 10.82 GB (+3.0 KV) | 13.82 GB (+6.0 KV) | 19.82 GB (+12.0 KV) |
| Q2_K 2.63 bpw | FP32 | 4.83 GB | 7.97 GB (+2.5 KV) | 10.47 GB (+5.0 KV) | 15.47 GB (+10.0 KV) | 25.47 GB (+20.0 KV) | 45.47 GB (+40.0 KV) | 85.47 GB (+80.0 KV) |
| Q2_K 2.63 bpw | FP16 | 4.83 GB | 6.72 GB (+1.25 KV) | 7.97 GB (+2.5 KV) | 10.47 GB (+5.0 KV) | 15.47 GB (+10.0 KV) | 25.47 GB (+20.0 KV) | 45.47 GB (+40.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 4.83 GB | 6.16 GB (+0.69 KV) | 6.85 GB (+1.38 KV) | 8.22 GB (+2.75 KV) | 10.97 GB (+5.5 KV) | 16.47 GB (+11.0 KV) | 27.47 GB (+22.0 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 4.83 GB | 6.1 GB (+0.62 KV) | 6.72 GB (+1.25 KV) | 7.97 GB (+2.5 KV) | 10.47 GB (+5.0 KV) | 15.47 GB (+10.0 KV) | 25.47 GB (+20.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 4.83 GB | 5.85 GB (+0.38 KV) | 6.22 GB (+0.75 KV) | 6.97 GB (+1.5 KV) | 8.47 GB (+3.0 KV) | 11.47 GB (+6.0 KV) | 17.47 GB (+12.0 KV) |
Total VRAM = Model Weights + KV Cache + 0.64 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.