VRAM usage for all quantization and cache format combinations. Base overhead: 0.52 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 262K Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 3.78 GB | 5.29 GB (+0.99 KV) | 6.28 GB (+1.98 KV) | 8.27 GB (+3.97 KV) | 12.24 GB (+7.94 KV) | 20.17 GB (+15.88 KV) | 36.05 GB (+31.75 KV) |
| FP16 16.0 bpw | FP16 | 3.78 GB | 4.79 GB (+0.5 KV) | 5.29 GB (+0.99 KV) | 6.28 GB (+1.98 KV) | 8.27 GB (+3.97 KV) | 12.24 GB (+7.94 KV) | 20.17 GB (+15.88 KV) |
| FP16 16.0 bpw | Q8_0 | 3.78 GB | 4.57 GB (+0.27 KV) | 4.84 GB (+0.55 KV) | 5.39 GB (+1.09 KV) | 6.48 GB (+2.18 KV) | 8.66 GB (+4.37 KV) | 13.03 GB (+8.73 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 3.78 GB | 4.55 GB (+0.25 KV) | 4.79 GB (+0.5 KV) | 5.29 GB (+0.99 KV) | 6.28 GB (+1.98 KV) | 8.27 GB (+3.97 KV) | 12.24 GB (+7.94 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 3.78 GB | 4.45 GB (+0.15 KV) | 4.6 GB (+0.3 KV) | 4.89 GB (+0.6 KV) | 5.49 GB (+1.19 KV) | 6.68 GB (+2.38 KV) | 9.06 GB (+4.76 KV) |
| Q8_0 8.0 bpw | FP32 | 1.89 GB | 3.4 GB (+0.99 KV) | 4.39 GB (+1.98 KV) | 6.38 GB (+3.97 KV) | 10.35 GB (+7.94 KV) | 18.28 GB (+15.88 KV) | 34.16 GB (+31.75 KV) |
| Q8_0 8.0 bpw | FP16 | 1.89 GB | 2.9 GB (+0.5 KV) | 3.4 GB (+0.99 KV) | 4.39 GB (+1.98 KV) | 6.38 GB (+3.97 KV) | 10.35 GB (+7.94 KV) | 18.28 GB (+15.88 KV) |
| Q8_0 8.0 bpw | Q8_0 | 1.89 GB | 2.68 GB (+0.27 KV) | 2.95 GB (+0.55 KV) | 3.5 GB (+1.09 KV) | 4.59 GB (+2.18 KV) | 6.77 GB (+4.37 KV) | 11.14 GB (+8.73 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 1.89 GB | 2.66 GB (+0.25 KV) | 2.9 GB (+0.5 KV) | 3.4 GB (+0.99 KV) | 4.39 GB (+1.98 KV) | 6.38 GB (+3.97 KV) | 10.35 GB (+7.94 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 1.89 GB | 2.56 GB (+0.15 KV) | 2.71 GB (+0.3 KV) | 3.0 GB (+0.6 KV) | 3.6 GB (+1.19 KV) | 4.79 GB (+2.38 KV) | 7.17 GB (+4.76 KV) |
| Q4_K_M 4.65 bpw | FP32 | 1.1 GB | 2.61 GB (+0.99 KV) | 3.6 GB (+1.98 KV) | 5.59 GB (+3.97 KV) | 9.55 GB (+7.94 KV) | 17.49 GB (+15.88 KV) | 33.37 GB (+31.75 KV) |
| Q4_K_M 4.65 bpw | FP16 | 1.1 GB | 2.11 GB (+0.5 KV) | 2.61 GB (+0.99 KV) | 3.6 GB (+1.98 KV) | 5.59 GB (+3.97 KV) | 9.55 GB (+7.94 KV) | 17.49 GB (+15.88 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 1.1 GB | 1.89 GB (+0.27 KV) | 2.16 GB (+0.55 KV) | 2.71 GB (+1.09 KV) | 3.8 GB (+2.18 KV) | 5.98 GB (+4.37 KV) | 10.35 GB (+8.73 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 1.1 GB | 1.86 GB (+0.25 KV) | 2.11 GB (+0.5 KV) | 2.61 GB (+0.99 KV) | 3.6 GB (+1.98 KV) | 5.59 GB (+3.97 KV) | 9.55 GB (+7.94 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 1.1 GB | 1.77 GB (+0.15 KV) | 1.91 GB (+0.3 KV) | 2.21 GB (+0.6 KV) | 2.81 GB (+1.19 KV) | 4.0 GB (+2.38 KV) | 6.38 GB (+4.76 KV) |
| Q4_K_S 4.58 bpw | FP32 | 1.08 GB | 2.59 GB (+0.99 KV) | 3.58 GB (+1.98 KV) | 5.57 GB (+3.97 KV) | 9.54 GB (+7.94 KV) | 17.48 GB (+15.88 KV) | 33.35 GB (+31.75 KV) |
| Q4_K_S 4.58 bpw | FP16 | 1.08 GB | 2.1 GB (+0.5 KV) | 2.59 GB (+0.99 KV) | 3.58 GB (+1.98 KV) | 5.57 GB (+3.97 KV) | 9.54 GB (+7.94 KV) | 17.48 GB (+15.88 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 1.08 GB | 1.87 GB (+0.27 KV) | 2.15 GB (+0.55 KV) | 2.69 GB (+1.09 KV) | 3.78 GB (+2.18 KV) | 5.97 GB (+4.37 KV) | 10.33 GB (+8.73 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 1.08 GB | 1.85 GB (+0.25 KV) | 2.1 GB (+0.5 KV) | 2.59 GB (+0.99 KV) | 3.58 GB (+1.98 KV) | 5.57 GB (+3.97 KV) | 9.54 GB (+7.94 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 1.08 GB | 1.75 GB (+0.15 KV) | 1.9 GB (+0.3 KV) | 2.2 GB (+0.6 KV) | 2.79 GB (+1.19 KV) | 3.98 GB (+2.38 KV) | 6.36 GB (+4.76 KV) |
| Q3_K_M 3.91 bpw | FP32 | 0.92 GB | 2.43 GB (+0.99 KV) | 3.43 GB (+1.98 KV) | 5.41 GB (+3.97 KV) | 9.38 GB (+7.94 KV) | 17.32 GB (+15.88 KV) | 33.19 GB (+31.75 KV) |
| Q3_K_M 3.91 bpw | FP16 | 0.92 GB | 1.94 GB (+0.5 KV) | 2.43 GB (+0.99 KV) | 3.43 GB (+1.98 KV) | 5.41 GB (+3.97 KV) | 9.38 GB (+7.94 KV) | 17.32 GB (+15.88 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 0.92 GB | 1.71 GB (+0.27 KV) | 1.99 GB (+0.55 KV) | 2.53 GB (+1.09 KV) | 3.62 GB (+2.18 KV) | 5.81 GB (+4.37 KV) | 10.17 GB (+8.73 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 0.92 GB | 1.69 GB (+0.25 KV) | 1.94 GB (+0.5 KV) | 2.43 GB (+0.99 KV) | 3.43 GB (+1.98 KV) | 5.41 GB (+3.97 KV) | 9.38 GB (+7.94 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 0.92 GB | 1.59 GB (+0.15 KV) | 1.74 GB (+0.3 KV) | 2.04 GB (+0.6 KV) | 2.63 GB (+1.19 KV) | 3.82 GB (+2.38 KV) | 6.2 GB (+4.76 KV) |
| Q2_K 2.63 bpw | FP32 | 0.62 GB | 2.13 GB (+0.99 KV) | 3.12 GB (+1.98 KV) | 5.11 GB (+3.97 KV) | 9.08 GB (+7.94 KV) | 17.01 GB (+15.88 KV) | 32.89 GB (+31.75 KV) |
| Q2_K 2.63 bpw | FP16 | 0.62 GB | 1.64 GB (+0.5 KV) | 2.13 GB (+0.99 KV) | 3.12 GB (+1.98 KV) | 5.11 GB (+3.97 KV) | 9.08 GB (+7.94 KV) | 17.01 GB (+15.88 KV) |
| Q2_K 2.63 bpw | Q8_0 | 0.62 GB | 1.41 GB (+0.27 KV) | 1.69 GB (+0.55 KV) | 2.23 GB (+1.09 KV) | 3.32 GB (+2.18 KV) | 5.5 GB (+4.37 KV) | 9.87 GB (+8.73 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 0.62 GB | 1.39 GB (+0.25 KV) | 1.64 GB (+0.5 KV) | 2.13 GB (+0.99 KV) | 3.12 GB (+1.98 KV) | 5.11 GB (+3.97 KV) | 9.08 GB (+7.94 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 0.62 GB | 1.29 GB (+0.15 KV) | 1.44 GB (+0.3 KV) | 1.73 GB (+0.6 KV) | 2.33 GB (+1.19 KV) | 3.52 GB (+2.38 KV) | 5.9 GB (+4.76 KV) |
Total VRAM = Model Weights + KV Cache + 0.52 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.