VRAM usage for all quantization and cache format combinations. Base overhead: 0.82 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context |
|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 68.25 GB | 73.08 GB (+4.0 KV) | 77.08 GB (+8.0 KV) | 85.08 GB (+16.0 KV) | 101.08 GB (+32.0 KV) | 133.07 GB (+64.0 KV) |
| FP16 16.0 bpw | FP16 | 68.25 GB | 71.08 GB (+2.0 KV) | 73.08 GB (+4.0 KV) | 77.08 GB (+8.0 KV) | 85.08 GB (+16.0 KV) | 101.08 GB (+32.0 KV) |
| FP16 16.0 bpw | Q8_0 | 68.25 GB | 70.17 GB (+1.1 KV) | 71.28 GB (+2.2 KV) | 73.48 GB (+4.4 KV) | 77.88 GB (+8.8 KV) | 86.67 GB (+17.6 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 68.25 GB | 70.08 GB (+1.0 KV) | 71.08 GB (+2.0 KV) | 73.08 GB (+4.0 KV) | 77.08 GB (+8.0 KV) | 85.08 GB (+16.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 68.25 GB | 69.67 GB (+0.6 KV) | 70.28 GB (+1.2 KV) | 71.48 GB (+2.4 KV) | 73.88 GB (+4.8 KV) | 78.67 GB (+9.6 KV) |
| Q8_0 8.0 bpw | FP32 | 34.12 GB | 38.95 GB (+4.0 KV) | 42.95 GB (+8.0 KV) | 50.95 GB (+16.0 KV) | 66.95 GB (+32.0 KV) | 98.95 GB (+64.0 KV) |
| Q8_0 8.0 bpw | FP16 | 34.12 GB | 36.95 GB (+2.0 KV) | 38.95 GB (+4.0 KV) | 42.95 GB (+8.0 KV) | 50.95 GB (+16.0 KV) | 66.95 GB (+32.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 34.12 GB | 36.05 GB (+1.1 KV) | 37.15 GB (+2.2 KV) | 39.35 GB (+4.4 KV) | 43.75 GB (+8.8 KV) | 52.55 GB (+17.6 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 34.12 GB | 35.95 GB (+1.0 KV) | 36.95 GB (+2.0 KV) | 38.95 GB (+4.0 KV) | 42.95 GB (+8.0 KV) | 50.95 GB (+16.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 34.12 GB | 35.55 GB (+0.6 KV) | 36.15 GB (+1.2 KV) | 37.35 GB (+2.4 KV) | 39.75 GB (+4.8 KV) | 44.55 GB (+9.6 KV) |
| Q4_K_M 4.65 bpw | FP32 | 19.84 GB | 24.66 GB (+4.0 KV) | 28.66 GB (+8.0 KV) | 36.66 GB (+16.0 KV) | 52.66 GB (+32.0 KV) | 84.66 GB (+64.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 19.84 GB | 22.66 GB (+2.0 KV) | 24.66 GB (+4.0 KV) | 28.66 GB (+8.0 KV) | 36.66 GB (+16.0 KV) | 52.66 GB (+32.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 19.84 GB | 21.76 GB (+1.1 KV) | 22.86 GB (+2.2 KV) | 25.06 GB (+4.4 KV) | 29.46 GB (+8.8 KV) | 38.26 GB (+17.6 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 19.84 GB | 21.66 GB (+1.0 KV) | 22.66 GB (+2.0 KV) | 24.66 GB (+4.0 KV) | 28.66 GB (+8.0 KV) | 36.66 GB (+16.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 19.84 GB | 21.26 GB (+0.6 KV) | 21.86 GB (+1.2 KV) | 23.06 GB (+2.4 KV) | 25.46 GB (+4.8 KV) | 30.26 GB (+9.6 KV) |
| Q4_K_S 4.58 bpw | FP32 | 19.54 GB | 24.36 GB (+4.0 KV) | 28.36 GB (+8.0 KV) | 36.36 GB (+16.0 KV) | 52.36 GB (+32.0 KV) | 84.36 GB (+64.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 19.54 GB | 22.36 GB (+2.0 KV) | 24.36 GB (+4.0 KV) | 28.36 GB (+8.0 KV) | 36.36 GB (+16.0 KV) | 52.36 GB (+32.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 19.54 GB | 21.46 GB (+1.1 KV) | 22.56 GB (+2.2 KV) | 24.76 GB (+4.4 KV) | 29.16 GB (+8.8 KV) | 37.96 GB (+17.6 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 19.54 GB | 21.36 GB (+1.0 KV) | 22.36 GB (+2.0 KV) | 24.36 GB (+4.0 KV) | 28.36 GB (+8.0 KV) | 36.36 GB (+16.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 19.54 GB | 20.96 GB (+0.6 KV) | 21.56 GB (+1.2 KV) | 22.76 GB (+2.4 KV) | 25.16 GB (+4.8 KV) | 29.96 GB (+9.6 KV) |
| Q3_K_M 3.91 bpw | FP32 | 16.68 GB | 21.5 GB (+4.0 KV) | 25.5 GB (+8.0 KV) | 33.5 GB (+16.0 KV) | 49.5 GB (+32.0 KV) | 81.5 GB (+64.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 16.68 GB | 19.5 GB (+2.0 KV) | 21.5 GB (+4.0 KV) | 25.5 GB (+8.0 KV) | 33.5 GB (+16.0 KV) | 49.5 GB (+32.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 16.68 GB | 18.6 GB (+1.1 KV) | 19.7 GB (+2.2 KV) | 21.9 GB (+4.4 KV) | 26.3 GB (+8.8 KV) | 35.1 GB (+17.6 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 16.68 GB | 18.5 GB (+1.0 KV) | 19.5 GB (+2.0 KV) | 21.5 GB (+4.0 KV) | 25.5 GB (+8.0 KV) | 33.5 GB (+16.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 16.68 GB | 18.1 GB (+0.6 KV) | 18.7 GB (+1.2 KV) | 19.9 GB (+2.4 KV) | 22.3 GB (+4.8 KV) | 27.1 GB (+9.6 KV) |
| Q2_K 2.63 bpw | FP32 | 11.22 GB | 16.04 GB (+4.0 KV) | 20.04 GB (+8.0 KV) | 28.04 GB (+16.0 KV) | 44.04 GB (+32.0 KV) | 76.04 GB (+64.0 KV) |
| Q2_K 2.63 bpw | FP16 | 11.22 GB | 14.04 GB (+2.0 KV) | 16.04 GB (+4.0 KV) | 20.04 GB (+8.0 KV) | 28.04 GB (+16.0 KV) | 44.04 GB (+32.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 11.22 GB | 13.14 GB (+1.1 KV) | 14.24 GB (+2.2 KV) | 16.44 GB (+4.4 KV) | 20.84 GB (+8.8 KV) | 29.64 GB (+17.6 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 11.22 GB | 13.04 GB (+1.0 KV) | 14.04 GB (+2.0 KV) | 16.04 GB (+4.0 KV) | 20.04 GB (+8.0 KV) | 28.04 GB (+16.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 11.22 GB | 12.64 GB (+0.6 KV) | 13.24 GB (+1.2 KV) | 14.44 GB (+2.4 KV) | 16.84 GB (+4.8 KV) | 21.64 GB (+9.6 KV) |
Total VRAM = Model Weights + KV Cache + 0.82 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.