VRAM usage for all quantization and cache format combinations. Base overhead: 1.5 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 200K Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 483.0 GB | 488.38 GB (+3.88 KV) | 492.25 GB (+7.75 KV) | 500.0 GB (+15.5 KV) | 515.5 GB (+31.0 KV) | 546.5 GB (+62.0 KV) | 579.13 GB (+94.63 KV) |
| FP16 16.0 bpw | FP16 | 483.0 GB | 486.44 GB (+1.94 KV) | 488.38 GB (+3.88 KV) | 492.25 GB (+7.75 KV) | 500.0 GB (+15.5 KV) | 515.5 GB (+31.0 KV) | 531.82 GB (+47.32 KV) |
| FP16 16.0 bpw | Q8_0 | 483.0 GB | 485.57 GB (+1.07 KV) | 486.63 GB (+2.13 KV) | 488.76 GB (+4.26 KV) | 493.02 GB (+8.53 KV) | 501.55 GB (+17.05 KV) | 510.52 GB (+26.02 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 483.0 GB | 485.47 GB (+0.97 KV) | 486.44 GB (+1.94 KV) | 488.38 GB (+3.88 KV) | 492.25 GB (+7.75 KV) | 500.0 GB (+15.5 KV) | 508.16 GB (+23.66 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 483.0 GB | 485.08 GB (+0.58 KV) | 485.66 GB (+1.16 KV) | 486.82 GB (+2.32 KV) | 489.15 GB (+4.65 KV) | 493.8 GB (+9.3 KV) | 498.69 GB (+14.19 KV) |
| Q8_0 8.0 bpw | FP32 | 241.5 GB | 246.88 GB (+3.88 KV) | 250.75 GB (+7.75 KV) | 258.5 GB (+15.5 KV) | 274.0 GB (+31.0 KV) | 305.0 GB (+62.0 KV) | 337.63 GB (+94.63 KV) |
| Q8_0 8.0 bpw | FP16 | 241.5 GB | 244.94 GB (+1.94 KV) | 246.88 GB (+3.88 KV) | 250.75 GB (+7.75 KV) | 258.5 GB (+15.5 KV) | 274.0 GB (+31.0 KV) | 290.32 GB (+47.32 KV) |
| Q8_0 8.0 bpw | Q8_0 | 241.5 GB | 244.07 GB (+1.07 KV) | 245.13 GB (+2.13 KV) | 247.26 GB (+4.26 KV) | 251.53 GB (+8.53 KV) | 260.05 GB (+17.05 KV) | 269.02 GB (+26.02 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 241.5 GB | 243.97 GB (+0.97 KV) | 244.94 GB (+1.94 KV) | 246.88 GB (+3.88 KV) | 250.75 GB (+7.75 KV) | 258.5 GB (+15.5 KV) | 266.66 GB (+23.66 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 241.5 GB | 243.58 GB (+0.58 KV) | 244.16 GB (+1.16 KV) | 245.32 GB (+2.32 KV) | 247.65 GB (+4.65 KV) | 252.3 GB (+9.3 KV) | 257.19 GB (+14.19 KV) |
| Q4_K_M 4.65 bpw | FP32 | 140.37 GB | 145.75 GB (+3.88 KV) | 149.62 GB (+7.75 KV) | 157.37 GB (+15.5 KV) | 172.87 GB (+31.0 KV) | 203.87 GB (+62.0 KV) | 236.5 GB (+94.63 KV) |
| Q4_K_M 4.65 bpw | FP16 | 140.37 GB | 143.81 GB (+1.94 KV) | 145.75 GB (+3.88 KV) | 149.62 GB (+7.75 KV) | 157.37 GB (+15.5 KV) | 172.87 GB (+31.0 KV) | 189.19 GB (+47.32 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 140.37 GB | 142.94 GB (+1.07 KV) | 144.0 GB (+2.13 KV) | 146.13 GB (+4.26 KV) | 150.4 GB (+8.53 KV) | 158.92 GB (+17.05 KV) | 167.9 GB (+26.02 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 140.37 GB | 142.84 GB (+0.97 KV) | 143.81 GB (+1.94 KV) | 145.75 GB (+3.88 KV) | 149.62 GB (+7.75 KV) | 157.37 GB (+15.5 KV) | 165.53 GB (+23.66 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 140.37 GB | 142.45 GB (+0.58 KV) | 143.03 GB (+1.16 KV) | 144.2 GB (+2.32 KV) | 146.52 GB (+4.65 KV) | 151.17 GB (+9.3 KV) | 156.07 GB (+14.19 KV) |
| Q4_K_S 4.58 bpw | FP32 | 138.26 GB | 143.63 GB (+3.88 KV) | 147.51 GB (+7.75 KV) | 155.26 GB (+15.5 KV) | 170.76 GB (+31.0 KV) | 201.76 GB (+62.0 KV) | 234.39 GB (+94.63 KV) |
| Q4_K_S 4.58 bpw | FP16 | 138.26 GB | 141.7 GB (+1.94 KV) | 143.63 GB (+3.88 KV) | 147.51 GB (+7.75 KV) | 155.26 GB (+15.5 KV) | 170.76 GB (+31.0 KV) | 187.08 GB (+47.32 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 138.26 GB | 140.82 GB (+1.07 KV) | 141.89 GB (+2.13 KV) | 144.02 GB (+4.26 KV) | 148.28 GB (+8.53 KV) | 156.81 GB (+17.05 KV) | 165.78 GB (+26.02 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 138.26 GB | 140.73 GB (+0.97 KV) | 141.7 GB (+1.94 KV) | 143.63 GB (+3.88 KV) | 147.51 GB (+7.75 KV) | 155.26 GB (+15.5 KV) | 163.42 GB (+23.66 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 138.26 GB | 140.34 GB (+0.58 KV) | 140.92 GB (+1.16 KV) | 142.08 GB (+2.32 KV) | 144.41 GB (+4.65 KV) | 149.06 GB (+9.3 KV) | 153.95 GB (+14.19 KV) |
| Q3_K_M 3.91 bpw | FP32 | 118.03 GB | 123.41 GB (+3.88 KV) | 127.28 GB (+7.75 KV) | 135.03 GB (+15.5 KV) | 150.53 GB (+31.0 KV) | 181.53 GB (+62.0 KV) | 214.17 GB (+94.63 KV) |
| Q3_K_M 3.91 bpw | FP16 | 118.03 GB | 121.47 GB (+1.94 KV) | 123.41 GB (+3.88 KV) | 127.28 GB (+7.75 KV) | 135.03 GB (+15.5 KV) | 150.53 GB (+31.0 KV) | 166.85 GB (+47.32 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 118.03 GB | 120.6 GB (+1.07 KV) | 121.66 GB (+2.13 KV) | 123.8 GB (+4.26 KV) | 128.06 GB (+8.53 KV) | 136.58 GB (+17.05 KV) | 145.56 GB (+26.02 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 118.03 GB | 120.5 GB (+0.97 KV) | 121.47 GB (+1.94 KV) | 123.41 GB (+3.88 KV) | 127.28 GB (+7.75 KV) | 135.03 GB (+15.5 KV) | 143.19 GB (+23.66 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 118.03 GB | 120.11 GB (+0.58 KV) | 120.7 GB (+1.16 KV) | 121.86 GB (+2.32 KV) | 124.18 GB (+4.65 KV) | 128.83 GB (+9.3 KV) | 133.73 GB (+14.19 KV) |
| Q2_K 2.63 bpw | FP32 | 79.39 GB | 84.77 GB (+3.88 KV) | 88.64 GB (+7.75 KV) | 96.39 GB (+15.5 KV) | 111.89 GB (+31.0 KV) | 142.89 GB (+62.0 KV) | 175.53 GB (+94.63 KV) |
| Q2_K 2.63 bpw | FP16 | 79.39 GB | 82.83 GB (+1.94 KV) | 84.77 GB (+3.88 KV) | 88.64 GB (+7.75 KV) | 96.39 GB (+15.5 KV) | 111.89 GB (+31.0 KV) | 128.21 GB (+47.32 KV) |
| Q2_K 2.63 bpw | Q8_0 | 79.39 GB | 81.96 GB (+1.07 KV) | 83.02 GB (+2.13 KV) | 85.16 GB (+4.26 KV) | 89.42 GB (+8.53 KV) | 97.94 GB (+17.05 KV) | 106.92 GB (+26.02 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 79.39 GB | 81.86 GB (+0.97 KV) | 82.83 GB (+1.94 KV) | 84.77 GB (+3.88 KV) | 88.64 GB (+7.75 KV) | 96.39 GB (+15.5 KV) | 104.55 GB (+23.66 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 79.39 GB | 81.47 GB (+0.58 KV) | 82.06 GB (+1.16 KV) | 83.22 GB (+2.32 KV) | 85.54 GB (+4.65 KV) | 90.19 GB (+9.3 KV) | 95.09 GB (+14.19 KV) |
Total VRAM = Model Weights + KV Cache + 1.5 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.