VRAM usage for all quantization and cache format combinations. Base overhead: 1.5 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 163K Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 1409.1 GB | 1411.67 GB (+1.07 KV) | 1412.74 GB (+2.14 KV) | 1414.89 GB (+4.29 KV) | 1419.18 GB (+8.58 KV) | 1427.76 GB (+17.16 KV) | 1432.05 GB (+21.45 KV) |
| FP16 16.0 bpw | FP16 | 1409.1 GB | 1411.14 GB (+0.54 KV) | 1411.67 GB (+1.07 KV) | 1412.74 GB (+2.14 KV) | 1414.89 GB (+4.29 KV) | 1419.18 GB (+8.58 KV) | 1421.32 GB (+10.72 KV) |
| FP16 16.0 bpw | Q8_0 | 1409.1 GB | 1410.89 GB (+0.29 KV) | 1411.19 GB (+0.59 KV) | 1411.78 GB (+1.18 KV) | 1412.96 GB (+2.36 KV) | 1415.32 GB (+4.72 KV) | 1416.5 GB (+5.9 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 1409.1 GB | 1410.87 GB (+0.27 KV) | 1411.14 GB (+0.54 KV) | 1411.67 GB (+1.07 KV) | 1412.74 GB (+2.14 KV) | 1414.89 GB (+4.29 KV) | 1415.96 GB (+5.36 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 1409.1 GB | 1410.76 GB (+0.16 KV) | 1410.92 GB (+0.32 KV) | 1411.24 GB (+0.64 KV) | 1411.89 GB (+1.29 KV) | 1413.17 GB (+2.57 KV) | 1413.82 GB (+3.22 KV) |
| Q8_0 8.0 bpw | FP32 | 704.55 GB | 707.12 GB (+1.07 KV) | 708.19 GB (+2.14 KV) | 710.34 GB (+4.29 KV) | 714.63 GB (+8.58 KV) | 723.21 GB (+17.16 KV) | 727.5 GB (+21.45 KV) |
| Q8_0 8.0 bpw | FP16 | 704.55 GB | 706.59 GB (+0.54 KV) | 707.12 GB (+1.07 KV) | 708.19 GB (+2.14 KV) | 710.34 GB (+4.29 KV) | 714.63 GB (+8.58 KV) | 716.77 GB (+10.72 KV) |
| Q8_0 8.0 bpw | Q8_0 | 704.55 GB | 706.34 GB (+0.29 KV) | 706.64 GB (+0.59 KV) | 707.23 GB (+1.18 KV) | 708.41 GB (+2.36 KV) | 710.77 GB (+4.72 KV) | 711.95 GB (+5.9 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 704.55 GB | 706.32 GB (+0.27 KV) | 706.59 GB (+0.54 KV) | 707.12 GB (+1.07 KV) | 708.19 GB (+2.14 KV) | 710.34 GB (+4.29 KV) | 711.41 GB (+5.36 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 704.55 GB | 706.21 GB (+0.16 KV) | 706.37 GB (+0.32 KV) | 706.69 GB (+0.64 KV) | 707.34 GB (+1.29 KV) | 708.62 GB (+2.57 KV) | 709.27 GB (+3.22 KV) |
| Q4_K_M 4.65 bpw | FP32 | 409.52 GB | 412.09 GB (+1.07 KV) | 413.16 GB (+2.14 KV) | 415.31 GB (+4.29 KV) | 419.6 GB (+8.58 KV) | 428.18 GB (+17.16 KV) | 432.47 GB (+21.45 KV) |
| Q4_K_M 4.65 bpw | FP16 | 409.52 GB | 411.56 GB (+0.54 KV) | 412.09 GB (+1.07 KV) | 413.16 GB (+2.14 KV) | 415.31 GB (+4.29 KV) | 419.6 GB (+8.58 KV) | 421.74 GB (+10.72 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 409.52 GB | 411.31 GB (+0.29 KV) | 411.61 GB (+0.59 KV) | 412.2 GB (+1.18 KV) | 413.38 GB (+2.36 KV) | 415.74 GB (+4.72 KV) | 416.92 GB (+5.9 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 409.52 GB | 411.29 GB (+0.27 KV) | 411.56 GB (+0.54 KV) | 412.09 GB (+1.07 KV) | 413.16 GB (+2.14 KV) | 415.31 GB (+4.29 KV) | 416.38 GB (+5.36 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 409.52 GB | 411.18 GB (+0.16 KV) | 411.34 GB (+0.32 KV) | 411.66 GB (+0.64 KV) | 412.31 GB (+1.29 KV) | 413.59 GB (+2.57 KV) | 414.24 GB (+3.22 KV) |
| Q4_K_S 4.58 bpw | FP32 | 403.35 GB | 405.93 GB (+1.07 KV) | 407.0 GB (+2.14 KV) | 409.14 GB (+4.29 KV) | 413.43 GB (+8.58 KV) | 422.01 GB (+17.16 KV) | 426.3 GB (+21.45 KV) |
| Q4_K_S 4.58 bpw | FP16 | 403.35 GB | 405.39 GB (+0.54 KV) | 405.93 GB (+1.07 KV) | 407.0 GB (+2.14 KV) | 409.14 GB (+4.29 KV) | 413.43 GB (+8.58 KV) | 415.58 GB (+10.72 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 403.35 GB | 405.15 GB (+0.29 KV) | 405.44 GB (+0.59 KV) | 406.03 GB (+1.18 KV) | 407.21 GB (+2.36 KV) | 409.57 GB (+4.72 KV) | 410.75 GB (+5.9 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 403.35 GB | 405.12 GB (+0.27 KV) | 405.39 GB (+0.54 KV) | 405.93 GB (+1.07 KV) | 407.0 GB (+2.14 KV) | 409.14 GB (+4.29 KV) | 410.22 GB (+5.36 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 403.35 GB | 405.02 GB (+0.16 KV) | 405.18 GB (+0.32 KV) | 405.5 GB (+0.64 KV) | 406.14 GB (+1.29 KV) | 407.43 GB (+2.57 KV) | 408.07 GB (+3.22 KV) |
| Q3_K_M 3.91 bpw | FP32 | 344.35 GB | 346.92 GB (+1.07 KV) | 347.99 GB (+2.14 KV) | 350.14 GB (+4.29 KV) | 354.43 GB (+8.58 KV) | 363.01 GB (+17.16 KV) | 367.29 GB (+21.45 KV) |
| Q3_K_M 3.91 bpw | FP16 | 344.35 GB | 346.38 GB (+0.54 KV) | 346.92 GB (+1.07 KV) | 347.99 GB (+2.14 KV) | 350.14 GB (+4.29 KV) | 354.43 GB (+8.58 KV) | 356.57 GB (+10.72 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 344.35 GB | 346.14 GB (+0.29 KV) | 346.44 GB (+0.59 KV) | 347.03 GB (+1.18 KV) | 348.21 GB (+2.36 KV) | 350.57 GB (+4.72 KV) | 351.75 GB (+5.9 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 344.35 GB | 346.12 GB (+0.27 KV) | 346.38 GB (+0.54 KV) | 346.92 GB (+1.07 KV) | 347.99 GB (+2.14 KV) | 350.14 GB (+4.29 KV) | 351.21 GB (+5.36 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 344.35 GB | 346.01 GB (+0.16 KV) | 346.17 GB (+0.32 KV) | 346.49 GB (+0.64 KV) | 347.14 GB (+1.29 KV) | 348.42 GB (+2.57 KV) | 349.07 GB (+3.22 KV) |
| Q2_K 2.63 bpw | FP32 | 231.62 GB | 234.19 GB (+1.07 KV) | 235.27 GB (+2.14 KV) | 237.41 GB (+4.29 KV) | 241.7 GB (+8.58 KV) | 250.28 GB (+17.16 KV) | 254.57 GB (+21.45 KV) |
| Q2_K 2.63 bpw | FP16 | 231.62 GB | 233.66 GB (+0.54 KV) | 234.19 GB (+1.07 KV) | 235.27 GB (+2.14 KV) | 237.41 GB (+4.29 KV) | 241.7 GB (+8.58 KV) | 243.84 GB (+10.72 KV) |
| Q2_K 2.63 bpw | Q8_0 | 231.62 GB | 233.42 GB (+0.29 KV) | 233.71 GB (+0.59 KV) | 234.3 GB (+1.18 KV) | 235.48 GB (+2.36 KV) | 237.84 GB (+4.72 KV) | 239.02 GB (+5.9 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 231.62 GB | 233.39 GB (+0.27 KV) | 233.66 GB (+0.54 KV) | 234.19 GB (+1.07 KV) | 235.27 GB (+2.14 KV) | 237.41 GB (+4.29 KV) | 238.48 GB (+5.36 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 231.62 GB | 233.28 GB (+0.16 KV) | 233.44 GB (+0.32 KV) | 233.76 GB (+0.64 KV) | 234.41 GB (+1.29 KV) | 235.69 GB (+2.57 KV) | 236.34 GB (+3.22 KV) |
Total VRAM = Model Weights + KV Cache + 1.5 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.