VRAM usage for all quantization and cache format combinations. Base overhead: 1.23 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 4K Context |
|---|---|---|---|
| FP16 16.0 bpw | FP32 | 152.67 GB | 156.4 GB (+2.5 KV) |
| FP16 16.0 bpw | FP16 | 152.67 GB | 155.15 GB (+1.25 KV) |
| FP16 16.0 bpw | Q8_0 | 152.67 GB | 154.58 GB (+0.69 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 152.67 GB | 154.52 GB (+0.62 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 152.67 GB | 154.27 GB (+0.38 KV) |
| Q8_0 8.0 bpw | FP32 | 76.34 GB | 80.06 GB (+2.5 KV) |
| Q8_0 8.0 bpw | FP16 | 76.34 GB | 78.81 GB (+1.25 KV) |
| Q8_0 8.0 bpw | Q8_0 | 76.34 GB | 78.25 GB (+0.69 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 76.34 GB | 78.19 GB (+0.62 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 76.34 GB | 77.94 GB (+0.38 KV) |
| Q4_K_M 4.65 bpw | FP32 | 44.37 GB | 48.1 GB (+2.5 KV) |
| Q4_K_M 4.65 bpw | FP16 | 44.37 GB | 46.85 GB (+1.25 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 44.37 GB | 46.28 GB (+0.69 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 44.37 GB | 46.22 GB (+0.62 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 44.37 GB | 45.97 GB (+0.38 KV) |
| Q4_K_S 4.58 bpw | FP32 | 43.7 GB | 47.43 GB (+2.5 KV) |
| Q4_K_S 4.58 bpw | FP16 | 43.7 GB | 46.18 GB (+1.25 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 43.7 GB | 45.62 GB (+0.69 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 43.7 GB | 45.55 GB (+0.62 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 43.7 GB | 45.3 GB (+0.38 KV) |
| Q3_K_M 3.91 bpw | FP32 | 37.31 GB | 41.04 GB (+2.5 KV) |
| Q3_K_M 3.91 bpw | FP16 | 37.31 GB | 39.79 GB (+1.25 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 37.31 GB | 39.22 GB (+0.69 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 37.31 GB | 39.16 GB (+0.62 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 37.31 GB | 38.91 GB (+0.38 KV) |
| Q2_K 2.63 bpw | FP32 | 25.1 GB | 28.82 GB (+2.5 KV) |
| Q2_K 2.63 bpw | FP16 | 25.1 GB | 27.57 GB (+1.25 KV) |
| Q2_K 2.63 bpw | Q8_0 | 25.1 GB | 27.01 GB (+0.69 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 25.1 GB | 26.95 GB (+0.62 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 25.1 GB | 26.7 GB (+0.38 KV) |
Total VRAM = Model Weights + KV Cache + 1.23 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.