VRAM usage for all quantization and cache format combinations. Base overhead: 0.54 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 128K Context |
|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 7.98 GB | 9.08 GB (+0.56 KV) | 9.64 GB (+1.12 KV) | 10.77 GB (+2.25 KV) | 13.02 GB (+4.5 KV) | 17.31 GB (+8.79 KV) |
| FP16 16.0 bpw | FP16 | 7.98 GB | 8.8 GB (+0.28 KV) | 9.08 GB (+0.56 KV) | 9.64 GB (+1.12 KV) | 10.77 GB (+2.25 KV) | 12.91 GB (+4.39 KV) |
| FP16 16.0 bpw | Q8_0 | 7.98 GB | 8.67 GB (+0.15 KV) | 8.83 GB (+0.31 KV) | 9.14 GB (+0.62 KV) | 9.76 GB (+1.24 KV) | 10.93 GB (+2.42 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 7.98 GB | 8.66 GB (+0.14 KV) | 8.8 GB (+0.28 KV) | 9.08 GB (+0.56 KV) | 9.64 GB (+1.12 KV) | 10.72 GB (+2.2 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 7.98 GB | 8.6 GB (+0.08 KV) | 8.69 GB (+0.17 KV) | 8.86 GB (+0.34 KV) | 9.19 GB (+0.67 KV) | 9.84 GB (+1.32 KV) |
| Q8_0 8.0 bpw | FP32 | 3.99 GB | 5.09 GB (+0.56 KV) | 5.65 GB (+1.12 KV) | 6.78 GB (+2.25 KV) | 9.03 GB (+4.5 KV) | 13.32 GB (+8.79 KV) |
| Q8_0 8.0 bpw | FP16 | 3.99 GB | 4.81 GB (+0.28 KV) | 5.09 GB (+0.56 KV) | 5.65 GB (+1.12 KV) | 6.78 GB (+2.25 KV) | 8.92 GB (+4.39 KV) |
| Q8_0 8.0 bpw | Q8_0 | 3.99 GB | 4.68 GB (+0.15 KV) | 4.84 GB (+0.31 KV) | 5.15 GB (+0.62 KV) | 5.77 GB (+1.24 KV) | 6.94 GB (+2.42 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 3.99 GB | 4.67 GB (+0.14 KV) | 4.81 GB (+0.28 KV) | 5.09 GB (+0.56 KV) | 5.65 GB (+1.12 KV) | 6.73 GB (+2.2 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 3.99 GB | 4.61 GB (+0.08 KV) | 4.7 GB (+0.17 KV) | 4.87 GB (+0.34 KV) | 5.2 GB (+0.67 KV) | 5.85 GB (+1.32 KV) |
| Q4_K_M 4.65 bpw | FP32 | 2.32 GB | 3.42 GB (+0.56 KV) | 3.98 GB (+1.12 KV) | 5.11 GB (+2.25 KV) | 7.36 GB (+4.5 KV) | 11.65 GB (+8.79 KV) |
| Q4_K_M 4.65 bpw | FP16 | 2.32 GB | 3.14 GB (+0.28 KV) | 3.42 GB (+0.56 KV) | 3.98 GB (+1.12 KV) | 5.11 GB (+2.25 KV) | 7.25 GB (+4.39 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 2.32 GB | 3.01 GB (+0.15 KV) | 3.17 GB (+0.31 KV) | 3.48 GB (+0.62 KV) | 4.09 GB (+1.24 KV) | 5.27 GB (+2.42 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 2.32 GB | 3.0 GB (+0.14 KV) | 3.14 GB (+0.28 KV) | 3.42 GB (+0.56 KV) | 3.98 GB (+1.12 KV) | 5.05 GB (+2.2 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 2.32 GB | 2.94 GB (+0.08 KV) | 3.03 GB (+0.17 KV) | 3.19 GB (+0.34 KV) | 3.53 GB (+0.67 KV) | 4.18 GB (+1.32 KV) |
| Q4_K_S 4.58 bpw | FP32 | 2.28 GB | 3.38 GB (+0.56 KV) | 3.95 GB (+1.12 KV) | 5.07 GB (+2.25 KV) | 7.32 GB (+4.5 KV) | 11.61 GB (+8.79 KV) |
| Q4_K_S 4.58 bpw | FP16 | 2.28 GB | 3.1 GB (+0.28 KV) | 3.38 GB (+0.56 KV) | 3.95 GB (+1.12 KV) | 5.07 GB (+2.25 KV) | 7.22 GB (+4.39 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 2.28 GB | 2.98 GB (+0.15 KV) | 3.13 GB (+0.31 KV) | 3.44 GB (+0.62 KV) | 4.06 GB (+1.24 KV) | 5.24 GB (+2.42 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 2.28 GB | 2.96 GB (+0.14 KV) | 3.1 GB (+0.28 KV) | 3.38 GB (+0.56 KV) | 3.95 GB (+1.12 KV) | 5.02 GB (+2.2 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 2.28 GB | 2.91 GB (+0.08 KV) | 2.99 GB (+0.17 KV) | 3.16 GB (+0.34 KV) | 3.5 GB (+0.67 KV) | 4.14 GB (+1.32 KV) |
| Q3_K_M 3.91 bpw | FP32 | 1.95 GB | 3.05 GB (+0.56 KV) | 3.61 GB (+1.12 KV) | 4.74 GB (+2.25 KV) | 6.99 GB (+4.5 KV) | 11.28 GB (+8.79 KV) |
| Q3_K_M 3.91 bpw | FP16 | 1.95 GB | 2.77 GB (+0.28 KV) | 3.05 GB (+0.56 KV) | 3.61 GB (+1.12 KV) | 4.74 GB (+2.25 KV) | 6.88 GB (+4.39 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 1.95 GB | 2.64 GB (+0.15 KV) | 2.8 GB (+0.31 KV) | 3.11 GB (+0.62 KV) | 3.73 GB (+1.24 KV) | 4.91 GB (+2.42 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 1.95 GB | 2.63 GB (+0.14 KV) | 2.77 GB (+0.28 KV) | 3.05 GB (+0.56 KV) | 3.61 GB (+1.12 KV) | 4.69 GB (+2.2 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 1.95 GB | 2.57 GB (+0.08 KV) | 2.66 GB (+0.17 KV) | 2.83 GB (+0.34 KV) | 3.16 GB (+0.67 KV) | 3.81 GB (+1.32 KV) |
| Q2_K 2.63 bpw | FP32 | 1.31 GB | 2.41 GB (+0.56 KV) | 2.97 GB (+1.12 KV) | 4.1 GB (+2.25 KV) | 6.35 GB (+4.5 KV) | 10.64 GB (+8.79 KV) |
| Q2_K 2.63 bpw | FP16 | 1.31 GB | 2.13 GB (+0.28 KV) | 2.41 GB (+0.56 KV) | 2.97 GB (+1.12 KV) | 4.1 GB (+2.25 KV) | 6.24 GB (+4.39 KV) |
| Q2_K 2.63 bpw | Q8_0 | 1.31 GB | 2.0 GB (+0.15 KV) | 2.16 GB (+0.31 KV) | 2.47 GB (+0.62 KV) | 3.09 GB (+1.24 KV) | 4.27 GB (+2.42 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 1.31 GB | 1.99 GB (+0.14 KV) | 2.13 GB (+0.28 KV) | 2.41 GB (+0.56 KV) | 2.97 GB (+1.12 KV) | 4.05 GB (+2.2 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 1.31 GB | 1.93 GB (+0.08 KV) | 2.02 GB (+0.17 KV) | 2.19 GB (+0.34 KV) | 2.52 GB (+0.67 KV) | 3.17 GB (+1.32 KV) |
Total VRAM = Model Weights + KV Cache + 0.54 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.