VRAM usage for all quantization and cache format combinations. Base overhead: 0.53 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 262K Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 7.14 GB | 9.3 GB (+1.62 KV) | 10.92 GB (+3.25 KV) | 14.17 GB (+6.5 KV) | 20.67 GB (+13.0 KV) | 33.67 GB (+26.0 KV) | 59.67 GB (+52.0 KV) |
| FP16 16.0 bpw | FP16 | 7.14 GB | 8.49 GB (+0.81 KV) | 9.3 GB (+1.62 KV) | 10.92 GB (+3.25 KV) | 14.17 GB (+6.5 KV) | 20.67 GB (+13.0 KV) | 33.67 GB (+26.0 KV) |
| FP16 16.0 bpw | Q8_0 | 7.14 GB | 8.12 GB (+0.45 KV) | 8.57 GB (+0.89 KV) | 9.46 GB (+1.79 KV) | 11.25 GB (+3.58 KV) | 14.82 GB (+7.15 KV) | 21.97 GB (+14.3 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 7.14 GB | 8.08 GB (+0.41 KV) | 8.49 GB (+0.81 KV) | 9.3 GB (+1.62 KV) | 10.92 GB (+3.25 KV) | 14.17 GB (+6.5 KV) | 20.67 GB (+13.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 7.14 GB | 7.92 GB (+0.24 KV) | 8.16 GB (+0.49 KV) | 8.65 GB (+0.97 KV) | 9.62 GB (+1.95 KV) | 11.57 GB (+3.9 KV) | 15.47 GB (+7.8 KV) |
| Q8_0 8.0 bpw | FP32 | 3.57 GB | 5.73 GB (+1.62 KV) | 7.35 GB (+3.25 KV) | 10.6 GB (+6.5 KV) | 17.1 GB (+13.0 KV) | 30.1 GB (+26.0 KV) | 56.1 GB (+52.0 KV) |
| Q8_0 8.0 bpw | FP16 | 3.57 GB | 4.92 GB (+0.81 KV) | 5.73 GB (+1.62 KV) | 7.35 GB (+3.25 KV) | 10.6 GB (+6.5 KV) | 17.1 GB (+13.0 KV) | 30.1 GB (+26.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 3.57 GB | 4.55 GB (+0.45 KV) | 5.0 GB (+0.89 KV) | 5.89 GB (+1.79 KV) | 7.68 GB (+3.58 KV) | 11.25 GB (+7.15 KV) | 18.4 GB (+14.3 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 3.57 GB | 4.51 GB (+0.41 KV) | 4.92 GB (+0.81 KV) | 5.73 GB (+1.62 KV) | 7.35 GB (+3.25 KV) | 10.6 GB (+6.5 KV) | 17.1 GB (+13.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 3.57 GB | 4.35 GB (+0.24 KV) | 4.59 GB (+0.49 KV) | 5.08 GB (+0.97 KV) | 6.05 GB (+1.95 KV) | 8.0 GB (+3.9 KV) | 11.9 GB (+7.8 KV) |
| Q4_K_M 4.65 bpw | FP32 | 2.08 GB | 4.23 GB (+1.62 KV) | 5.86 GB (+3.25 KV) | 9.11 GB (+6.5 KV) | 15.61 GB (+13.0 KV) | 28.61 GB (+26.0 KV) | 54.61 GB (+52.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 2.08 GB | 3.42 GB (+0.81 KV) | 4.23 GB (+1.62 KV) | 5.86 GB (+3.25 KV) | 9.11 GB (+6.5 KV) | 15.61 GB (+13.0 KV) | 28.61 GB (+26.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 2.08 GB | 3.06 GB (+0.45 KV) | 3.5 GB (+0.89 KV) | 4.4 GB (+1.79 KV) | 6.18 GB (+3.58 KV) | 9.76 GB (+7.15 KV) | 16.91 GB (+14.3 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 2.08 GB | 3.02 GB (+0.41 KV) | 3.42 GB (+0.81 KV) | 4.23 GB (+1.62 KV) | 5.86 GB (+3.25 KV) | 9.11 GB (+6.5 KV) | 15.61 GB (+13.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 2.08 GB | 2.85 GB (+0.24 KV) | 3.1 GB (+0.49 KV) | 3.58 GB (+0.97 KV) | 4.56 GB (+1.95 KV) | 6.51 GB (+3.9 KV) | 10.41 GB (+7.8 KV) |
| Q4_K_S 4.58 bpw | FP32 | 2.04 GB | 4.2 GB (+1.62 KV) | 5.83 GB (+3.25 KV) | 9.08 GB (+6.5 KV) | 15.58 GB (+13.0 KV) | 28.58 GB (+26.0 KV) | 54.58 GB (+52.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 2.04 GB | 3.39 GB (+0.81 KV) | 4.2 GB (+1.62 KV) | 5.83 GB (+3.25 KV) | 9.08 GB (+6.5 KV) | 15.58 GB (+13.0 KV) | 28.58 GB (+26.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 2.04 GB | 3.02 GB (+0.45 KV) | 3.47 GB (+0.89 KV) | 4.37 GB (+1.79 KV) | 6.15 GB (+3.58 KV) | 9.73 GB (+7.15 KV) | 16.88 GB (+14.3 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 2.04 GB | 2.98 GB (+0.41 KV) | 3.39 GB (+0.81 KV) | 4.2 GB (+1.62 KV) | 5.83 GB (+3.25 KV) | 9.08 GB (+6.5 KV) | 15.58 GB (+13.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 2.04 GB | 2.82 GB (+0.24 KV) | 3.07 GB (+0.49 KV) | 3.55 GB (+0.97 KV) | 4.53 GB (+1.95 KV) | 6.48 GB (+3.9 KV) | 10.38 GB (+7.8 KV) |
| Q3_K_M 3.91 bpw | FP32 | 1.74 GB | 3.9 GB (+1.62 KV) | 5.53 GB (+3.25 KV) | 8.78 GB (+6.5 KV) | 15.28 GB (+13.0 KV) | 28.28 GB (+26.0 KV) | 54.28 GB (+52.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 1.74 GB | 3.09 GB (+0.81 KV) | 3.9 GB (+1.62 KV) | 5.53 GB (+3.25 KV) | 8.78 GB (+6.5 KV) | 15.28 GB (+13.0 KV) | 28.28 GB (+26.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 1.74 GB | 2.73 GB (+0.45 KV) | 3.17 GB (+0.89 KV) | 4.07 GB (+1.79 KV) | 5.85 GB (+3.58 KV) | 9.43 GB (+7.15 KV) | 16.58 GB (+14.3 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 1.74 GB | 2.69 GB (+0.41 KV) | 3.09 GB (+0.81 KV) | 3.9 GB (+1.62 KV) | 5.53 GB (+3.25 KV) | 8.78 GB (+6.5 KV) | 15.28 GB (+13.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 1.74 GB | 2.52 GB (+0.24 KV) | 2.77 GB (+0.49 KV) | 3.25 GB (+0.97 KV) | 4.23 GB (+1.95 KV) | 6.18 GB (+3.9 KV) | 10.08 GB (+7.8 KV) |
| Q2_K 2.63 bpw | FP32 | 1.17 GB | 3.33 GB (+1.62 KV) | 4.96 GB (+3.25 KV) | 8.21 GB (+6.5 KV) | 14.71 GB (+13.0 KV) | 27.71 GB (+26.0 KV) | 53.71 GB (+52.0 KV) |
| Q2_K 2.63 bpw | FP16 | 1.17 GB | 2.52 GB (+0.81 KV) | 3.33 GB (+1.62 KV) | 4.96 GB (+3.25 KV) | 8.21 GB (+6.5 KV) | 14.71 GB (+13.0 KV) | 27.71 GB (+26.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 1.17 GB | 2.15 GB (+0.45 KV) | 2.6 GB (+0.89 KV) | 3.5 GB (+1.79 KV) | 5.28 GB (+3.58 KV) | 8.86 GB (+7.15 KV) | 16.01 GB (+14.3 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 1.17 GB | 2.11 GB (+0.41 KV) | 2.52 GB (+0.81 KV) | 3.33 GB (+1.62 KV) | 4.96 GB (+3.25 KV) | 8.21 GB (+6.5 KV) | 14.71 GB (+13.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 1.17 GB | 1.95 GB (+0.24 KV) | 2.2 GB (+0.49 KV) | 2.68 GB (+0.97 KV) | 3.66 GB (+1.95 KV) | 5.61 GB (+3.9 KV) | 9.51 GB (+7.8 KV) |
Total VRAM = Model Weights + KV Cache + 0.53 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.