VRAM usage for all quantization and cache format combinations. Base overhead: 0.53 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context |
|---|---|---|---|
| FP16 16.0 bpw | FP32 | 6.3 GB | 7.77 GB (+0.94 KV) |
| FP16 16.0 bpw | FP16 | 6.3 GB | 7.3 GB (+0.47 KV) |
| FP16 16.0 bpw | Q8_0 | 6.3 GB | 7.09 GB (+0.26 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 6.3 GB | 7.06 GB (+0.23 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 6.3 GB | 6.97 GB (+0.14 KV) |
| Q8_0 8.0 bpw | FP32 | 3.15 GB | 4.62 GB (+0.94 KV) |
| Q8_0 8.0 bpw | FP16 | 3.15 GB | 4.15 GB (+0.47 KV) |
| Q8_0 8.0 bpw | Q8_0 | 3.15 GB | 3.94 GB (+0.26 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 3.15 GB | 3.91 GB (+0.23 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 3.15 GB | 3.82 GB (+0.14 KV) |
| Q4_K_M 4.65 bpw | FP32 | 1.83 GB | 3.3 GB (+0.94 KV) |
| Q4_K_M 4.65 bpw | FP16 | 1.83 GB | 2.83 GB (+0.47 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 1.83 GB | 2.62 GB (+0.26 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 1.83 GB | 2.6 GB (+0.23 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 1.83 GB | 2.5 GB (+0.14 KV) |
| Q4_K_S 4.58 bpw | FP32 | 1.8 GB | 3.27 GB (+0.94 KV) |
| Q4_K_S 4.58 bpw | FP16 | 1.8 GB | 2.8 GB (+0.47 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 1.8 GB | 2.59 GB (+0.26 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 1.8 GB | 2.57 GB (+0.23 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 1.8 GB | 2.47 GB (+0.14 KV) |
| Q3_K_M 3.91 bpw | FP32 | 1.54 GB | 3.01 GB (+0.94 KV) |
| Q3_K_M 3.91 bpw | FP16 | 1.54 GB | 2.54 GB (+0.47 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 1.54 GB | 2.33 GB (+0.26 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 1.54 GB | 2.3 GB (+0.23 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 1.54 GB | 2.21 GB (+0.14 KV) |
| Q2_K 2.63 bpw | FP32 | 1.04 GB | 2.5 GB (+0.94 KV) |
| Q2_K 2.63 bpw | FP16 | 1.04 GB | 2.03 GB (+0.47 KV) |
| Q2_K 2.63 bpw | Q8_0 | 1.04 GB | 1.82 GB (+0.26 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 1.04 GB | 1.8 GB (+0.23 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 1.04 GB | 1.71 GB (+0.14 KV) |
Total VRAM = Model Weights + KV Cache + 0.53 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.