VRAM usage for all quantization and cache format combinations. Base overhead: 0.52 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 40K Context |
|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 3.57 GB | 5.84 GB (+1.75 KV) | 7.59 GB (+3.5 KV) | 11.09 GB (+7.0 KV) | 12.84 GB (+8.75 KV) |
| FP16 16.0 bpw | FP16 | 3.57 GB | 4.96 GB (+0.88 KV) | 5.84 GB (+1.75 KV) | 7.59 GB (+3.5 KV) | 8.46 GB (+4.38 KV) |
| FP16 16.0 bpw | Q8_0 | 3.57 GB | 4.57 GB (+0.48 KV) | 5.05 GB (+0.96 KV) | 6.01 GB (+1.93 KV) | 6.49 GB (+2.41 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 3.57 GB | 4.52 GB (+0.44 KV) | 4.96 GB (+0.88 KV) | 5.84 GB (+1.75 KV) | 6.27 GB (+2.19 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 3.57 GB | 4.35 GB (+0.26 KV) | 4.61 GB (+0.53 KV) | 5.14 GB (+1.05 KV) | 5.4 GB (+1.31 KV) |
| Q8_0 8.0 bpw | FP32 | 1.78 GB | 4.05 GB (+1.75 KV) | 5.8 GB (+3.5 KV) | 9.3 GB (+7.0 KV) | 11.05 GB (+8.75 KV) |
| Q8_0 8.0 bpw | FP16 | 1.78 GB | 3.18 GB (+0.88 KV) | 4.05 GB (+1.75 KV) | 5.8 GB (+3.5 KV) | 6.68 GB (+4.38 KV) |
| Q8_0 8.0 bpw | Q8_0 | 1.78 GB | 2.78 GB (+0.48 KV) | 3.26 GB (+0.96 KV) | 4.23 GB (+1.93 KV) | 4.71 GB (+2.41 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 1.78 GB | 2.74 GB (+0.44 KV) | 3.18 GB (+0.88 KV) | 4.05 GB (+1.75 KV) | 4.49 GB (+2.19 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 1.78 GB | 2.56 GB (+0.26 KV) | 2.83 GB (+0.53 KV) | 3.35 GB (+1.05 KV) | 3.61 GB (+1.31 KV) |
| Q4_K_M 4.65 bpw | FP32 | 1.04 GB | 3.3 GB (+1.75 KV) | 5.05 GB (+3.5 KV) | 8.55 GB (+7.0 KV) | 10.3 GB (+8.75 KV) |
| Q4_K_M 4.65 bpw | FP16 | 1.04 GB | 2.43 GB (+0.88 KV) | 3.3 GB (+1.75 KV) | 5.05 GB (+3.5 KV) | 5.93 GB (+4.38 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 1.04 GB | 2.04 GB (+0.48 KV) | 2.52 GB (+0.96 KV) | 3.48 GB (+1.93 KV) | 3.96 GB (+2.41 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 1.04 GB | 1.99 GB (+0.44 KV) | 2.43 GB (+0.88 KV) | 3.3 GB (+1.75 KV) | 3.74 GB (+2.19 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 1.04 GB | 1.82 GB (+0.26 KV) | 2.08 GB (+0.53 KV) | 2.6 GB (+1.05 KV) | 2.87 GB (+1.31 KV) |
| Q4_K_S 4.58 bpw | FP32 | 1.02 GB | 3.29 GB (+1.75 KV) | 5.04 GB (+3.5 KV) | 8.54 GB (+7.0 KV) | 10.29 GB (+8.75 KV) |
| Q4_K_S 4.58 bpw | FP16 | 1.02 GB | 2.41 GB (+0.88 KV) | 3.29 GB (+1.75 KV) | 5.04 GB (+3.5 KV) | 5.91 GB (+4.38 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 1.02 GB | 2.02 GB (+0.48 KV) | 2.5 GB (+0.96 KV) | 3.46 GB (+1.93 KV) | 3.95 GB (+2.41 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 1.02 GB | 1.98 GB (+0.44 KV) | 2.41 GB (+0.88 KV) | 3.29 GB (+1.75 KV) | 3.73 GB (+2.19 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 1.02 GB | 1.8 GB (+0.26 KV) | 2.06 GB (+0.53 KV) | 2.59 GB (+1.05 KV) | 2.85 GB (+1.31 KV) |
| Q3_K_M 3.91 bpw | FP32 | 0.87 GB | 3.14 GB (+1.75 KV) | 4.89 GB (+3.5 KV) | 8.39 GB (+7.0 KV) | 10.14 GB (+8.75 KV) |
| Q3_K_M 3.91 bpw | FP16 | 0.87 GB | 2.26 GB (+0.88 KV) | 3.14 GB (+1.75 KV) | 4.89 GB (+3.5 KV) | 5.76 GB (+4.38 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 0.87 GB | 1.87 GB (+0.48 KV) | 2.35 GB (+0.96 KV) | 3.31 GB (+1.93 KV) | 3.8 GB (+2.41 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 0.87 GB | 1.83 GB (+0.44 KV) | 2.26 GB (+0.88 KV) | 3.14 GB (+1.75 KV) | 3.58 GB (+2.19 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 0.87 GB | 1.65 GB (+0.26 KV) | 1.91 GB (+0.53 KV) | 2.44 GB (+1.05 KV) | 2.7 GB (+1.31 KV) |
| Q2_K 2.63 bpw | FP32 | 0.59 GB | 2.85 GB (+1.75 KV) | 4.6 GB (+3.5 KV) | 8.1 GB (+7.0 KV) | 9.85 GB (+8.75 KV) |
| Q2_K 2.63 bpw | FP16 | 0.59 GB | 1.98 GB (+0.88 KV) | 2.85 GB (+1.75 KV) | 4.6 GB (+3.5 KV) | 5.48 GB (+4.38 KV) |
| Q2_K 2.63 bpw | Q8_0 | 0.59 GB | 1.59 GB (+0.48 KV) | 2.07 GB (+0.96 KV) | 3.03 GB (+1.93 KV) | 3.51 GB (+2.41 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 0.59 GB | 1.54 GB (+0.44 KV) | 1.98 GB (+0.88 KV) | 2.85 GB (+1.75 KV) | 3.29 GB (+2.19 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 0.59 GB | 1.37 GB (+0.26 KV) | 1.63 GB (+0.53 KV) | 2.15 GB (+1.05 KV) | 2.42 GB (+1.31 KV) |
Total VRAM = Model Weights + KV Cache + 0.52 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.