VRAM usage for all quantization and cache format combinations. Base overhead: 0.83 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 128K Context |
|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 70.35 GB | 75.19 GB (+4.0 KV) | 79.19 GB (+8.0 KV) | 87.19 GB (+16.0 KV) | 103.19 GB (+32.0 KV) | 133.69 GB (+62.5 KV) |
| FP16 16.0 bpw | FP16 | 70.35 GB | 73.19 GB (+2.0 KV) | 75.19 GB (+4.0 KV) | 79.19 GB (+8.0 KV) | 87.19 GB (+16.0 KV) | 102.44 GB (+31.25 KV) |
| FP16 16.0 bpw | Q8_0 | 70.35 GB | 72.28 GB (+1.1 KV) | 73.39 GB (+2.2 KV) | 75.59 GB (+4.4 KV) | 79.98 GB (+8.8 KV) | 88.37 GB (+17.19 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 70.35 GB | 72.19 GB (+1.0 KV) | 73.19 GB (+2.0 KV) | 75.19 GB (+4.0 KV) | 79.19 GB (+8.0 KV) | 86.81 GB (+15.62 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 70.35 GB | 71.78 GB (+0.6 KV) | 72.39 GB (+1.2 KV) | 73.59 GB (+2.4 KV) | 75.98 GB (+4.8 KV) | 80.56 GB (+9.38 KV) |
| Q8_0 8.0 bpw | FP32 | 35.18 GB | 40.01 GB (+4.0 KV) | 44.01 GB (+8.0 KV) | 52.01 GB (+16.0 KV) | 68.01 GB (+32.0 KV) | 98.51 GB (+62.5 KV) |
| Q8_0 8.0 bpw | FP16 | 35.18 GB | 38.01 GB (+2.0 KV) | 40.01 GB (+4.0 KV) | 44.01 GB (+8.0 KV) | 52.01 GB (+16.0 KV) | 67.26 GB (+31.25 KV) |
| Q8_0 8.0 bpw | Q8_0 | 35.18 GB | 37.11 GB (+1.1 KV) | 38.21 GB (+2.2 KV) | 40.41 GB (+4.4 KV) | 44.81 GB (+8.8 KV) | 53.2 GB (+17.19 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 35.18 GB | 37.01 GB (+1.0 KV) | 38.01 GB (+2.0 KV) | 40.01 GB (+4.0 KV) | 44.01 GB (+8.0 KV) | 51.64 GB (+15.62 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 35.18 GB | 36.61 GB (+0.6 KV) | 37.21 GB (+1.2 KV) | 38.41 GB (+2.4 KV) | 40.81 GB (+4.8 KV) | 45.39 GB (+9.38 KV) |
| Q4_K_M 4.65 bpw | FP32 | 20.45 GB | 25.28 GB (+4.0 KV) | 29.28 GB (+8.0 KV) | 37.28 GB (+16.0 KV) | 53.28 GB (+32.0 KV) | 83.78 GB (+62.5 KV) |
| Q4_K_M 4.65 bpw | FP16 | 20.45 GB | 23.28 GB (+2.0 KV) | 25.28 GB (+4.0 KV) | 29.28 GB (+8.0 KV) | 37.28 GB (+16.0 KV) | 52.53 GB (+31.25 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 20.45 GB | 22.38 GB (+1.1 KV) | 23.48 GB (+2.2 KV) | 25.68 GB (+4.4 KV) | 30.08 GB (+8.8 KV) | 38.47 GB (+17.19 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 20.45 GB | 22.28 GB (+1.0 KV) | 23.28 GB (+2.0 KV) | 25.28 GB (+4.0 KV) | 29.28 GB (+8.0 KV) | 36.91 GB (+15.62 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 20.45 GB | 21.88 GB (+0.6 KV) | 22.48 GB (+1.2 KV) | 23.68 GB (+2.4 KV) | 26.08 GB (+4.8 KV) | 30.66 GB (+9.38 KV) |
| Q4_K_S 4.58 bpw | FP32 | 20.14 GB | 24.97 GB (+4.0 KV) | 28.97 GB (+8.0 KV) | 36.97 GB (+16.0 KV) | 52.97 GB (+32.0 KV) | 83.47 GB (+62.5 KV) |
| Q4_K_S 4.58 bpw | FP16 | 20.14 GB | 22.97 GB (+2.0 KV) | 24.97 GB (+4.0 KV) | 28.97 GB (+8.0 KV) | 36.97 GB (+16.0 KV) | 52.22 GB (+31.25 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 20.14 GB | 22.07 GB (+1.1 KV) | 23.17 GB (+2.2 KV) | 25.37 GB (+4.4 KV) | 29.77 GB (+8.8 KV) | 38.16 GB (+17.19 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 20.14 GB | 21.97 GB (+1.0 KV) | 22.97 GB (+2.0 KV) | 24.97 GB (+4.0 KV) | 28.97 GB (+8.0 KV) | 36.6 GB (+15.62 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 20.14 GB | 21.57 GB (+0.6 KV) | 22.17 GB (+1.2 KV) | 23.37 GB (+2.4 KV) | 25.77 GB (+4.8 KV) | 30.35 GB (+9.38 KV) |
| Q3_K_M 3.91 bpw | FP32 | 17.19 GB | 22.03 GB (+4.0 KV) | 26.03 GB (+8.0 KV) | 34.03 GB (+16.0 KV) | 50.03 GB (+32.0 KV) | 80.53 GB (+62.5 KV) |
| Q3_K_M 3.91 bpw | FP16 | 17.19 GB | 20.03 GB (+2.0 KV) | 22.03 GB (+4.0 KV) | 26.03 GB (+8.0 KV) | 34.03 GB (+16.0 KV) | 49.28 GB (+31.25 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 17.19 GB | 19.13 GB (+1.1 KV) | 20.23 GB (+2.2 KV) | 22.43 GB (+4.4 KV) | 26.83 GB (+8.8 KV) | 35.21 GB (+17.19 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 17.19 GB | 19.03 GB (+1.0 KV) | 20.03 GB (+2.0 KV) | 22.03 GB (+4.0 KV) | 26.03 GB (+8.0 KV) | 33.65 GB (+15.62 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 17.19 GB | 18.63 GB (+0.6 KV) | 19.23 GB (+1.2 KV) | 20.43 GB (+2.4 KV) | 22.83 GB (+4.8 KV) | 27.4 GB (+9.38 KV) |
| Q2_K 2.63 bpw | FP32 | 11.56 GB | 16.4 GB (+4.0 KV) | 20.4 GB (+8.0 KV) | 28.4 GB (+16.0 KV) | 44.4 GB (+32.0 KV) | 74.9 GB (+62.5 KV) |
| Q2_K 2.63 bpw | FP16 | 11.56 GB | 14.4 GB (+2.0 KV) | 16.4 GB (+4.0 KV) | 20.4 GB (+8.0 KV) | 28.4 GB (+16.0 KV) | 43.65 GB (+31.25 KV) |
| Q2_K 2.63 bpw | Q8_0 | 11.56 GB | 13.5 GB (+1.1 KV) | 14.6 GB (+2.2 KV) | 16.8 GB (+4.4 KV) | 21.2 GB (+8.8 KV) | 29.59 GB (+17.19 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 11.56 GB | 13.4 GB (+1.0 KV) | 14.4 GB (+2.0 KV) | 16.4 GB (+4.0 KV) | 20.4 GB (+8.0 KV) | 28.02 GB (+15.62 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 11.56 GB | 13.0 GB (+0.6 KV) | 13.6 GB (+1.2 KV) | 14.8 GB (+2.4 KV) | 17.2 GB (+4.8 KV) | 21.77 GB (+9.38 KV) |
Total VRAM = Model Weights + KV Cache + 0.83 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.