VRAM usage for all quantization and cache format combinations. Base overhead: 0.74 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 393K Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 50.4 GB | 53.64 GB (+2.5 KV) | 56.14 GB (+5.0 KV) | 61.14 GB (+10.0 KV) | 71.14 GB (+20.0 KV) | 91.14 GB (+40.0 KV) | 171.14 GB (+120.0 KV) |
| FP16 16.0 bpw | FP16 | 50.4 GB | 52.39 GB (+1.25 KV) | 53.64 GB (+2.5 KV) | 56.14 GB (+5.0 KV) | 61.14 GB (+10.0 KV) | 71.14 GB (+20.0 KV) | 111.14 GB (+60.0 KV) |
| FP16 16.0 bpw | Q8_0 | 50.4 GB | 51.83 GB (+0.69 KV) | 52.52 GB (+1.38 KV) | 53.89 GB (+2.75 KV) | 56.64 GB (+5.5 KV) | 62.14 GB (+11.0 KV) | 84.14 GB (+33.0 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 50.4 GB | 51.77 GB (+0.62 KV) | 52.39 GB (+1.25 KV) | 53.64 GB (+2.5 KV) | 56.14 GB (+5.0 KV) | 61.14 GB (+10.0 KV) | 81.14 GB (+30.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 50.4 GB | 51.52 GB (+0.38 KV) | 51.89 GB (+0.75 KV) | 52.64 GB (+1.5 KV) | 54.14 GB (+3.0 KV) | 57.14 GB (+6.0 KV) | 69.14 GB (+18.0 KV) |
| Q8_0 8.0 bpw | FP32 | 25.2 GB | 28.44 GB (+2.5 KV) | 30.94 GB (+5.0 KV) | 35.94 GB (+10.0 KV) | 45.94 GB (+20.0 KV) | 65.94 GB (+40.0 KV) | 145.94 GB (+120.0 KV) |
| Q8_0 8.0 bpw | FP16 | 25.2 GB | 27.19 GB (+1.25 KV) | 28.44 GB (+2.5 KV) | 30.94 GB (+5.0 KV) | 35.94 GB (+10.0 KV) | 45.94 GB (+20.0 KV) | 85.94 GB (+60.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 25.2 GB | 26.63 GB (+0.69 KV) | 27.32 GB (+1.38 KV) | 28.69 GB (+2.75 KV) | 31.44 GB (+5.5 KV) | 36.94 GB (+11.0 KV) | 58.94 GB (+33.0 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 25.2 GB | 26.57 GB (+0.62 KV) | 27.19 GB (+1.25 KV) | 28.44 GB (+2.5 KV) | 30.94 GB (+5.0 KV) | 35.94 GB (+10.0 KV) | 55.94 GB (+30.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 25.2 GB | 26.32 GB (+0.38 KV) | 26.69 GB (+0.75 KV) | 27.44 GB (+1.5 KV) | 28.94 GB (+3.0 KV) | 31.94 GB (+6.0 KV) | 43.94 GB (+18.0 KV) |
| Q4_K_M 4.65 bpw | FP32 | 14.65 GB | 17.89 GB (+2.5 KV) | 20.39 GB (+5.0 KV) | 25.39 GB (+10.0 KV) | 35.39 GB (+20.0 KV) | 55.39 GB (+40.0 KV) | 135.39 GB (+120.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 14.65 GB | 16.64 GB (+1.25 KV) | 17.89 GB (+2.5 KV) | 20.39 GB (+5.0 KV) | 25.39 GB (+10.0 KV) | 35.39 GB (+20.0 KV) | 75.39 GB (+60.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 14.65 GB | 16.07 GB (+0.69 KV) | 16.76 GB (+1.38 KV) | 18.14 GB (+2.75 KV) | 20.89 GB (+5.5 KV) | 26.39 GB (+11.0 KV) | 48.39 GB (+33.0 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 14.65 GB | 16.01 GB (+0.62 KV) | 16.64 GB (+1.25 KV) | 17.89 GB (+2.5 KV) | 20.39 GB (+5.0 KV) | 25.39 GB (+10.0 KV) | 45.39 GB (+30.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 14.65 GB | 15.76 GB (+0.38 KV) | 16.14 GB (+0.75 KV) | 16.89 GB (+1.5 KV) | 18.39 GB (+3.0 KV) | 21.39 GB (+6.0 KV) | 33.39 GB (+18.0 KV) |
| Q4_K_S 4.58 bpw | FP32 | 14.43 GB | 17.67 GB (+2.5 KV) | 20.17 GB (+5.0 KV) | 25.17 GB (+10.0 KV) | 35.17 GB (+20.0 KV) | 55.17 GB (+40.0 KV) | 135.17 GB (+120.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 14.43 GB | 16.42 GB (+1.25 KV) | 17.67 GB (+2.5 KV) | 20.17 GB (+5.0 KV) | 25.17 GB (+10.0 KV) | 35.17 GB (+20.0 KV) | 75.17 GB (+60.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 14.43 GB | 15.85 GB (+0.69 KV) | 16.54 GB (+1.38 KV) | 17.92 GB (+2.75 KV) | 20.67 GB (+5.5 KV) | 26.17 GB (+11.0 KV) | 48.17 GB (+33.0 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 14.43 GB | 15.79 GB (+0.62 KV) | 16.42 GB (+1.25 KV) | 17.67 GB (+2.5 KV) | 20.17 GB (+5.0 KV) | 25.17 GB (+10.0 KV) | 45.17 GB (+30.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 14.43 GB | 15.54 GB (+0.38 KV) | 15.92 GB (+0.75 KV) | 16.67 GB (+1.5 KV) | 18.17 GB (+3.0 KV) | 21.17 GB (+6.0 KV) | 33.17 GB (+18.0 KV) |
| Q3_K_M 3.91 bpw | FP32 | 12.32 GB | 15.56 GB (+2.5 KV) | 18.06 GB (+5.0 KV) | 23.06 GB (+10.0 KV) | 33.06 GB (+20.0 KV) | 53.06 GB (+40.0 KV) | 133.06 GB (+120.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 12.32 GB | 14.31 GB (+1.25 KV) | 15.56 GB (+2.5 KV) | 18.06 GB (+5.0 KV) | 23.06 GB (+10.0 KV) | 33.06 GB (+20.0 KV) | 73.06 GB (+60.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 12.32 GB | 13.74 GB (+0.69 KV) | 14.43 GB (+1.38 KV) | 15.81 GB (+2.75 KV) | 18.56 GB (+5.5 KV) | 24.06 GB (+11.0 KV) | 46.06 GB (+33.0 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 12.32 GB | 13.68 GB (+0.62 KV) | 14.31 GB (+1.25 KV) | 15.56 GB (+2.5 KV) | 18.06 GB (+5.0 KV) | 23.06 GB (+10.0 KV) | 43.06 GB (+30.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 12.32 GB | 13.43 GB (+0.38 KV) | 13.81 GB (+0.75 KV) | 14.56 GB (+1.5 KV) | 16.06 GB (+3.0 KV) | 19.06 GB (+6.0 KV) | 31.06 GB (+18.0 KV) |
| Q2_K 2.63 bpw | FP32 | 8.28 GB | 11.52 GB (+2.5 KV) | 14.02 GB (+5.0 KV) | 19.02 GB (+10.0 KV) | 29.02 GB (+20.0 KV) | 49.02 GB (+40.0 KV) | 129.02 GB (+120.0 KV) |
| Q2_K 2.63 bpw | FP16 | 8.28 GB | 10.27 GB (+1.25 KV) | 11.52 GB (+2.5 KV) | 14.02 GB (+5.0 KV) | 19.02 GB (+10.0 KV) | 29.02 GB (+20.0 KV) | 69.02 GB (+60.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 8.28 GB | 9.71 GB (+0.69 KV) | 10.4 GB (+1.38 KV) | 11.77 GB (+2.75 KV) | 14.52 GB (+5.5 KV) | 20.02 GB (+11.0 KV) | 42.02 GB (+33.0 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 8.28 GB | 9.65 GB (+0.62 KV) | 10.27 GB (+1.25 KV) | 11.52 GB (+2.5 KV) | 14.02 GB (+5.0 KV) | 19.02 GB (+10.0 KV) | 39.02 GB (+30.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 8.28 GB | 9.4 GB (+0.38 KV) | 9.77 GB (+0.75 KV) | 10.52 GB (+1.5 KV) | 12.02 GB (+3.0 KV) | 15.02 GB (+6.0 KV) | 27.02 GB (+18.0 KV) |
Total VRAM = Model Weights + KV Cache + 0.74 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.