VRAM usage for all quantization and cache format combinations. Base overhead: 0.98 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 1M Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 100.8 GB | 101.9 GB (+0.12 KV) | 102.02 GB (+0.24 KV) | 102.25 GB (+0.47 KV) | 102.73 GB (+0.95 KV) | 103.68 GB (+1.9 KV) | 116.97 GB (+15.19 KV) |
| FP16 16.0 bpw | FP16 | 100.8 GB | 101.84 GB (+0.06 KV) | 101.9 GB (+0.12 KV) | 102.02 GB (+0.24 KV) | 102.25 GB (+0.47 KV) | 102.73 GB (+0.95 KV) | 109.37 GB (+7.59 KV) |
| FP16 16.0 bpw | Q8_0 | 100.8 GB | 101.81 GB (+0.03 KV) | 101.85 GB (+0.07 KV) | 101.91 GB (+0.13 KV) | 102.04 GB (+0.26 KV) | 102.3 GB (+0.52 KV) | 105.96 GB (+4.18 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 100.8 GB | 101.81 GB (+0.03 KV) | 101.84 GB (+0.06 KV) | 101.9 GB (+0.12 KV) | 102.02 GB (+0.24 KV) | 102.25 GB (+0.47 KV) | 105.58 GB (+3.8 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 100.8 GB | 101.8 GB (+0.02 KV) | 101.82 GB (+0.04 KV) | 101.85 GB (+0.07 KV) | 101.92 GB (+0.14 KV) | 102.06 GB (+0.28 KV) | 104.06 GB (+2.28 KV) |
| Q8_0 8.0 bpw | FP32 | 50.4 GB | 51.5 GB (+0.12 KV) | 51.62 GB (+0.24 KV) | 51.85 GB (+0.47 KV) | 52.33 GB (+0.95 KV) | 53.28 GB (+1.9 KV) | 66.57 GB (+15.19 KV) |
| Q8_0 8.0 bpw | FP16 | 50.4 GB | 51.44 GB (+0.06 KV) | 51.5 GB (+0.12 KV) | 51.62 GB (+0.24 KV) | 51.85 GB (+0.47 KV) | 52.33 GB (+0.95 KV) | 58.97 GB (+7.59 KV) |
| Q8_0 8.0 bpw | Q8_0 | 50.4 GB | 51.41 GB (+0.03 KV) | 51.45 GB (+0.07 KV) | 51.51 GB (+0.13 KV) | 51.64 GB (+0.26 KV) | 51.9 GB (+0.52 KV) | 55.56 GB (+4.18 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 50.4 GB | 51.41 GB (+0.03 KV) | 51.44 GB (+0.06 KV) | 51.5 GB (+0.12 KV) | 51.62 GB (+0.24 KV) | 51.85 GB (+0.47 KV) | 55.18 GB (+3.8 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 50.4 GB | 51.4 GB (+0.02 KV) | 51.42 GB (+0.04 KV) | 51.45 GB (+0.07 KV) | 51.52 GB (+0.14 KV) | 51.66 GB (+0.28 KV) | 53.66 GB (+2.28 KV) |
| Q4_K_M 4.65 bpw | FP32 | 29.3 GB | 30.39 GB (+0.12 KV) | 30.51 GB (+0.24 KV) | 30.75 GB (+0.47 KV) | 31.22 GB (+0.95 KV) | 32.17 GB (+1.9 KV) | 45.46 GB (+15.19 KV) |
| Q4_K_M 4.65 bpw | FP16 | 29.3 GB | 30.33 GB (+0.06 KV) | 30.39 GB (+0.12 KV) | 30.51 GB (+0.24 KV) | 30.75 GB (+0.47 KV) | 31.22 GB (+0.95 KV) | 37.87 GB (+7.59 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 29.3 GB | 30.31 GB (+0.03 KV) | 30.34 GB (+0.07 KV) | 30.41 GB (+0.13 KV) | 30.54 GB (+0.26 KV) | 30.8 GB (+0.52 KV) | 34.45 GB (+4.18 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 29.3 GB | 30.3 GB (+0.03 KV) | 30.33 GB (+0.06 KV) | 30.39 GB (+0.12 KV) | 30.51 GB (+0.24 KV) | 30.75 GB (+0.47 KV) | 34.07 GB (+3.8 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 29.3 GB | 30.29 GB (+0.02 KV) | 30.31 GB (+0.04 KV) | 30.35 GB (+0.07 KV) | 30.42 GB (+0.14 KV) | 30.56 GB (+0.28 KV) | 32.55 GB (+2.28 KV) |
| Q4_K_S 4.58 bpw | FP32 | 28.85 GB | 29.95 GB (+0.12 KV) | 30.07 GB (+0.24 KV) | 30.31 GB (+0.47 KV) | 30.78 GB (+0.95 KV) | 31.73 GB (+1.9 KV) | 45.02 GB (+15.19 KV) |
| Q4_K_S 4.58 bpw | FP16 | 28.85 GB | 29.89 GB (+0.06 KV) | 29.95 GB (+0.12 KV) | 30.07 GB (+0.24 KV) | 30.31 GB (+0.47 KV) | 30.78 GB (+0.95 KV) | 37.43 GB (+7.59 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 28.85 GB | 29.87 GB (+0.03 KV) | 29.9 GB (+0.07 KV) | 29.96 GB (+0.13 KV) | 30.1 GB (+0.26 KV) | 30.36 GB (+0.52 KV) | 34.01 GB (+4.18 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 28.85 GB | 29.86 GB (+0.03 KV) | 29.89 GB (+0.06 KV) | 29.95 GB (+0.12 KV) | 30.07 GB (+0.24 KV) | 30.31 GB (+0.47 KV) | 33.63 GB (+3.8 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 28.85 GB | 29.85 GB (+0.02 KV) | 29.87 GB (+0.04 KV) | 29.91 GB (+0.07 KV) | 29.98 GB (+0.14 KV) | 30.12 GB (+0.28 KV) | 32.11 GB (+2.28 KV) |
| Q3_K_M 3.91 bpw | FP32 | 24.63 GB | 25.73 GB (+0.12 KV) | 25.85 GB (+0.24 KV) | 26.09 GB (+0.47 KV) | 26.56 GB (+0.95 KV) | 27.51 GB (+1.9 KV) | 40.8 GB (+15.19 KV) |
| Q3_K_M 3.91 bpw | FP16 | 24.63 GB | 25.67 GB (+0.06 KV) | 25.73 GB (+0.12 KV) | 25.85 GB (+0.24 KV) | 26.09 GB (+0.47 KV) | 26.56 GB (+0.95 KV) | 33.21 GB (+7.59 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 24.63 GB | 25.65 GB (+0.03 KV) | 25.68 GB (+0.07 KV) | 25.74 GB (+0.13 KV) | 25.87 GB (+0.26 KV) | 26.14 GB (+0.52 KV) | 29.79 GB (+4.18 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 24.63 GB | 25.64 GB (+0.03 KV) | 25.67 GB (+0.06 KV) | 25.73 GB (+0.12 KV) | 25.85 GB (+0.24 KV) | 26.09 GB (+0.47 KV) | 29.41 GB (+3.8 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 24.63 GB | 25.63 GB (+0.02 KV) | 25.65 GB (+0.04 KV) | 25.68 GB (+0.07 KV) | 25.76 GB (+0.14 KV) | 25.9 GB (+0.28 KV) | 27.89 GB (+2.28 KV) |
| Q2_K 2.63 bpw | FP32 | 16.57 GB | 17.67 GB (+0.12 KV) | 17.79 GB (+0.24 KV) | 18.02 GB (+0.47 KV) | 18.5 GB (+0.95 KV) | 19.45 GB (+1.9 KV) | 32.74 GB (+15.19 KV) |
| Q2_K 2.63 bpw | FP16 | 16.57 GB | 17.61 GB (+0.06 KV) | 17.67 GB (+0.12 KV) | 17.79 GB (+0.24 KV) | 18.02 GB (+0.47 KV) | 18.5 GB (+0.95 KV) | 25.14 GB (+7.59 KV) |
| Q2_K 2.63 bpw | Q8_0 | 16.57 GB | 17.58 GB (+0.03 KV) | 17.61 GB (+0.07 KV) | 17.68 GB (+0.13 KV) | 17.81 GB (+0.26 KV) | 18.07 GB (+0.52 KV) | 21.73 GB (+4.18 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 16.57 GB | 17.58 GB (+0.03 KV) | 17.61 GB (+0.06 KV) | 17.67 GB (+0.12 KV) | 17.79 GB (+0.24 KV) | 18.02 GB (+0.47 KV) | 21.35 GB (+3.8 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 16.57 GB | 17.57 GB (+0.02 KV) | 17.58 GB (+0.04 KV) | 17.62 GB (+0.07 KV) | 17.69 GB (+0.14 KV) | 17.83 GB (+0.28 KV) | 19.83 GB (+2.28 KV) |
Total VRAM = Model Weights + KV Cache + 0.98 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.