Active Parameters: 4.0B
VRAM usage for all quantization and cache format combinations. Base overhead: 0.84 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context |
|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 71.4 GB | 73.74 GB (+1.5 KV) | 75.24 GB (+3.0 KV) | 78.24 GB (+6.0 KV) | 84.24 GB (+12.0 KV) |
| FP16 16.0 bpw | FP16 | 71.4 GB | 72.99 GB (+0.75 KV) | 73.74 GB (+1.5 KV) | 75.24 GB (+3.0 KV) | 78.24 GB (+6.0 KV) |
| FP16 16.0 bpw | Q8_0 | 71.4 GB | 72.65 GB (+0.41 KV) | 73.07 GB (+0.83 KV) | 73.89 GB (+1.65 KV) | 75.54 GB (+3.3 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 71.4 GB | 72.62 GB (+0.38 KV) | 72.99 GB (+0.75 KV) | 73.74 GB (+1.5 KV) | 75.24 GB (+3.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 71.4 GB | 72.47 GB (+0.22 KV) | 72.69 GB (+0.45 KV) | 73.14 GB (+0.9 KV) | 74.04 GB (+1.8 KV) |
| Q8_0 8.0 bpw | FP32 | 35.7 GB | 38.04 GB (+1.5 KV) | 39.54 GB (+3.0 KV) | 42.54 GB (+6.0 KV) | 48.54 GB (+12.0 KV) |
| Q8_0 8.0 bpw | FP16 | 35.7 GB | 37.29 GB (+0.75 KV) | 38.04 GB (+1.5 KV) | 39.54 GB (+3.0 KV) | 42.54 GB (+6.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 35.7 GB | 36.95 GB (+0.41 KV) | 37.37 GB (+0.83 KV) | 38.19 GB (+1.65 KV) | 39.84 GB (+3.3 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 35.7 GB | 36.92 GB (+0.38 KV) | 37.29 GB (+0.75 KV) | 38.04 GB (+1.5 KV) | 39.54 GB (+3.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 35.7 GB | 36.77 GB (+0.22 KV) | 36.99 GB (+0.45 KV) | 37.44 GB (+0.9 KV) | 38.34 GB (+1.8 KV) |
| Q4_K_M 4.65 bpw | FP32 | 20.75 GB | 23.09 GB (+1.5 KV) | 24.59 GB (+3.0 KV) | 27.59 GB (+6.0 KV) | 33.59 GB (+12.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 20.75 GB | 22.34 GB (+0.75 KV) | 23.09 GB (+1.5 KV) | 24.59 GB (+3.0 KV) | 27.59 GB (+6.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 20.75 GB | 22.0 GB (+0.41 KV) | 22.42 GB (+0.83 KV) | 23.24 GB (+1.65 KV) | 24.89 GB (+3.3 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 20.75 GB | 21.97 GB (+0.38 KV) | 22.34 GB (+0.75 KV) | 23.09 GB (+1.5 KV) | 24.59 GB (+3.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 20.75 GB | 21.82 GB (+0.22 KV) | 22.04 GB (+0.45 KV) | 22.49 GB (+0.9 KV) | 23.39 GB (+1.8 KV) |
| Q4_K_S 4.58 bpw | FP32 | 20.44 GB | 22.78 GB (+1.5 KV) | 24.28 GB (+3.0 KV) | 27.28 GB (+6.0 KV) | 33.28 GB (+12.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 20.44 GB | 22.03 GB (+0.75 KV) | 22.78 GB (+1.5 KV) | 24.28 GB (+3.0 KV) | 27.28 GB (+6.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 20.44 GB | 21.69 GB (+0.41 KV) | 22.1 GB (+0.83 KV) | 22.93 GB (+1.65 KV) | 24.58 GB (+3.3 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 20.44 GB | 21.65 GB (+0.38 KV) | 22.03 GB (+0.75 KV) | 22.78 GB (+1.5 KV) | 24.28 GB (+3.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 20.44 GB | 21.5 GB (+0.22 KV) | 21.73 GB (+0.45 KV) | 22.18 GB (+0.9 KV) | 23.08 GB (+1.8 KV) |
| Q3_K_M 3.91 bpw | FP32 | 17.45 GB | 19.79 GB (+1.5 KV) | 21.29 GB (+3.0 KV) | 24.29 GB (+6.0 KV) | 30.29 GB (+12.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 17.45 GB | 19.04 GB (+0.75 KV) | 19.79 GB (+1.5 KV) | 21.29 GB (+3.0 KV) | 24.29 GB (+6.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 17.45 GB | 18.7 GB (+0.41 KV) | 19.11 GB (+0.83 KV) | 19.94 GB (+1.65 KV) | 21.59 GB (+3.3 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 17.45 GB | 18.66 GB (+0.38 KV) | 19.04 GB (+0.75 KV) | 19.79 GB (+1.5 KV) | 21.29 GB (+3.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 17.45 GB | 18.51 GB (+0.22 KV) | 18.74 GB (+0.45 KV) | 19.19 GB (+0.9 KV) | 20.09 GB (+1.8 KV) |
| Q2_K 2.63 bpw | FP32 | 11.74 GB | 14.08 GB (+1.5 KV) | 15.58 GB (+3.0 KV) | 18.58 GB (+6.0 KV) | 24.58 GB (+12.0 KV) |
| Q2_K 2.63 bpw | FP16 | 11.74 GB | 13.33 GB (+0.75 KV) | 14.08 GB (+1.5 KV) | 15.58 GB (+3.0 KV) | 18.58 GB (+6.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 11.74 GB | 12.99 GB (+0.41 KV) | 13.4 GB (+0.83 KV) | 14.23 GB (+1.65 KV) | 15.88 GB (+3.3 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 11.74 GB | 12.95 GB (+0.38 KV) | 13.33 GB (+0.75 KV) | 14.08 GB (+1.5 KV) | 15.58 GB (+3.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 11.74 GB | 12.8 GB (+0.22 KV) | 13.03 GB (+0.45 KV) | 13.48 GB (+0.9 KV) | 14.38 GB (+1.8 KV) |
Total VRAM = Model Weights + KV Cache + 0.84 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.