Active Parameters: 3.3B
VRAM usage for all quantization and cache format combinations. Base overhead: 0.82 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context |
|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 67.2 GB | 69.52 GB (+1.5 KV) | 71.02 GB (+3.0 KV) | 74.02 GB (+6.0 KV) | 80.02 GB (+12.0 KV) |
| FP16 16.0 bpw | FP16 | 67.2 GB | 68.77 GB (+0.75 KV) | 69.52 GB (+1.5 KV) | 71.02 GB (+3.0 KV) | 74.02 GB (+6.0 KV) |
| FP16 16.0 bpw | Q8_0 | 67.2 GB | 68.43 GB (+0.41 KV) | 68.84 GB (+0.83 KV) | 69.67 GB (+1.65 KV) | 71.32 GB (+3.3 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 67.2 GB | 68.39 GB (+0.38 KV) | 68.77 GB (+0.75 KV) | 69.52 GB (+1.5 KV) | 71.02 GB (+3.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 67.2 GB | 68.24 GB (+0.22 KV) | 68.47 GB (+0.45 KV) | 68.92 GB (+0.9 KV) | 69.82 GB (+1.8 KV) |
| Q8_0 8.0 bpw | FP32 | 33.6 GB | 35.92 GB (+1.5 KV) | 37.42 GB (+3.0 KV) | 40.42 GB (+6.0 KV) | 46.42 GB (+12.0 KV) |
| Q8_0 8.0 bpw | FP16 | 33.6 GB | 35.17 GB (+0.75 KV) | 35.92 GB (+1.5 KV) | 37.42 GB (+3.0 KV) | 40.42 GB (+6.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 33.6 GB | 34.83 GB (+0.41 KV) | 35.25 GB (+0.83 KV) | 36.07 GB (+1.65 KV) | 37.72 GB (+3.3 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 33.6 GB | 34.8 GB (+0.38 KV) | 35.17 GB (+0.75 KV) | 35.92 GB (+1.5 KV) | 37.42 GB (+3.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 33.6 GB | 34.65 GB (+0.22 KV) | 34.87 GB (+0.45 KV) | 35.32 GB (+0.9 KV) | 36.22 GB (+1.8 KV) |
| Q4_K_M 4.65 bpw | FP32 | 19.53 GB | 21.85 GB (+1.5 KV) | 23.35 GB (+3.0 KV) | 26.35 GB (+6.0 KV) | 32.35 GB (+12.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 19.53 GB | 21.1 GB (+0.75 KV) | 21.85 GB (+1.5 KV) | 23.35 GB (+3.0 KV) | 26.35 GB (+6.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 19.53 GB | 20.76 GB (+0.41 KV) | 21.18 GB (+0.83 KV) | 22.0 GB (+1.65 KV) | 23.65 GB (+3.3 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 19.53 GB | 20.73 GB (+0.38 KV) | 21.1 GB (+0.75 KV) | 21.85 GB (+1.5 KV) | 23.35 GB (+3.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 19.53 GB | 20.58 GB (+0.22 KV) | 20.8 GB (+0.45 KV) | 21.25 GB (+0.9 KV) | 22.15 GB (+1.8 KV) |
| Q4_K_S 4.58 bpw | FP32 | 19.24 GB | 21.56 GB (+1.5 KV) | 23.06 GB (+3.0 KV) | 26.06 GB (+6.0 KV) | 32.06 GB (+12.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 19.24 GB | 20.81 GB (+0.75 KV) | 21.56 GB (+1.5 KV) | 23.06 GB (+3.0 KV) | 26.06 GB (+6.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 19.24 GB | 20.47 GB (+0.41 KV) | 20.88 GB (+0.83 KV) | 21.71 GB (+1.65 KV) | 23.36 GB (+3.3 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 19.24 GB | 20.43 GB (+0.38 KV) | 20.81 GB (+0.75 KV) | 21.56 GB (+1.5 KV) | 23.06 GB (+3.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 19.24 GB | 20.28 GB (+0.22 KV) | 20.51 GB (+0.45 KV) | 20.96 GB (+0.9 KV) | 21.86 GB (+1.8 KV) |
| Q3_K_M 3.91 bpw | FP32 | 16.42 GB | 18.74 GB (+1.5 KV) | 20.24 GB (+3.0 KV) | 23.24 GB (+6.0 KV) | 29.24 GB (+12.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 16.42 GB | 17.99 GB (+0.75 KV) | 18.74 GB (+1.5 KV) | 20.24 GB (+3.0 KV) | 23.24 GB (+6.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 16.42 GB | 17.65 GB (+0.41 KV) | 18.07 GB (+0.83 KV) | 18.89 GB (+1.65 KV) | 20.54 GB (+3.3 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 16.42 GB | 17.62 GB (+0.38 KV) | 17.99 GB (+0.75 KV) | 18.74 GB (+1.5 KV) | 20.24 GB (+3.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 16.42 GB | 17.47 GB (+0.22 KV) | 17.69 GB (+0.45 KV) | 18.14 GB (+0.9 KV) | 19.04 GB (+1.8 KV) |
| Q2_K 2.63 bpw | FP32 | 11.05 GB | 13.37 GB (+1.5 KV) | 14.87 GB (+3.0 KV) | 17.87 GB (+6.0 KV) | 23.87 GB (+12.0 KV) |
| Q2_K 2.63 bpw | FP16 | 11.05 GB | 12.62 GB (+0.75 KV) | 13.37 GB (+1.5 KV) | 14.87 GB (+3.0 KV) | 17.87 GB (+6.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 11.05 GB | 12.28 GB (+0.41 KV) | 12.69 GB (+0.83 KV) | 13.52 GB (+1.65 KV) | 15.17 GB (+3.3 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 11.05 GB | 12.24 GB (+0.38 KV) | 12.62 GB (+0.75 KV) | 13.37 GB (+1.5 KV) | 14.87 GB (+3.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 11.05 GB | 12.09 GB (+0.22 KV) | 12.32 GB (+0.45 KV) | 12.77 GB (+0.9 KV) | 13.67 GB (+1.8 KV) |
Total VRAM = Model Weights + KV Cache + 0.82 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.