Active Parameters: 3.0B
VRAM usage for all quantization and cache format combinations. Base overhead: 1.3 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 262K Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 168.0 GB | 170.8 GB (+1.5 KV) | 172.3 GB (+3.0 KV) | 175.3 GB (+6.0 KV) | 181.3 GB (+12.0 KV) | 193.3 GB (+24.0 KV) | 217.3 GB (+48.0 KV) |
| FP16 16.0 bpw | FP16 | 168.0 GB | 170.05 GB (+0.75 KV) | 170.8 GB (+1.5 KV) | 172.3 GB (+3.0 KV) | 175.3 GB (+6.0 KV) | 181.3 GB (+12.0 KV) | 193.3 GB (+24.0 KV) |
| FP16 16.0 bpw | Q8_0 | 168.0 GB | 169.71 GB (+0.41 KV) | 170.12 GB (+0.83 KV) | 170.95 GB (+1.65 KV) | 172.6 GB (+3.3 KV) | 175.9 GB (+6.6 KV) | 182.5 GB (+13.2 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 168.0 GB | 169.68 GB (+0.38 KV) | 170.05 GB (+0.75 KV) | 170.8 GB (+1.5 KV) | 172.3 GB (+3.0 KV) | 175.3 GB (+6.0 KV) | 181.3 GB (+12.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 168.0 GB | 169.53 GB (+0.22 KV) | 169.75 GB (+0.45 KV) | 170.2 GB (+0.9 KV) | 171.1 GB (+1.8 KV) | 172.9 GB (+3.6 KV) | 176.5 GB (+7.2 KV) |
| Q8_0 8.0 bpw | FP32 | 84.0 GB | 86.8 GB (+1.5 KV) | 88.3 GB (+3.0 KV) | 91.3 GB (+6.0 KV) | 97.3 GB (+12.0 KV) | 109.3 GB (+24.0 KV) | 133.3 GB (+48.0 KV) |
| Q8_0 8.0 bpw | FP16 | 84.0 GB | 86.05 GB (+0.75 KV) | 86.8 GB (+1.5 KV) | 88.3 GB (+3.0 KV) | 91.3 GB (+6.0 KV) | 97.3 GB (+12.0 KV) | 109.3 GB (+24.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 84.0 GB | 85.71 GB (+0.41 KV) | 86.12 GB (+0.83 KV) | 86.95 GB (+1.65 KV) | 88.6 GB (+3.3 KV) | 91.9 GB (+6.6 KV) | 98.5 GB (+13.2 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 84.0 GB | 85.67 GB (+0.38 KV) | 86.05 GB (+0.75 KV) | 86.8 GB (+1.5 KV) | 88.3 GB (+3.0 KV) | 91.3 GB (+6.0 KV) | 97.3 GB (+12.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 84.0 GB | 85.52 GB (+0.22 KV) | 85.75 GB (+0.45 KV) | 86.2 GB (+0.9 KV) | 87.1 GB (+1.8 KV) | 88.9 GB (+3.6 KV) | 92.5 GB (+7.2 KV) |
| Q4_K_M 4.65 bpw | FP32 | 48.83 GB | 51.62 GB (+1.5 KV) | 53.12 GB (+3.0 KV) | 56.12 GB (+6.0 KV) | 62.12 GB (+12.0 KV) | 74.12 GB (+24.0 KV) | 98.12 GB (+48.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 48.83 GB | 50.88 GB (+0.75 KV) | 51.62 GB (+1.5 KV) | 53.12 GB (+3.0 KV) | 56.12 GB (+6.0 KV) | 62.12 GB (+12.0 KV) | 74.12 GB (+24.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 48.83 GB | 50.54 GB (+0.41 KV) | 50.95 GB (+0.83 KV) | 51.77 GB (+1.65 KV) | 53.42 GB (+3.3 KV) | 56.73 GB (+6.6 KV) | 63.33 GB (+13.2 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 48.83 GB | 50.5 GB (+0.38 KV) | 50.88 GB (+0.75 KV) | 51.62 GB (+1.5 KV) | 53.12 GB (+3.0 KV) | 56.12 GB (+6.0 KV) | 62.12 GB (+12.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 48.83 GB | 50.35 GB (+0.22 KV) | 50.58 GB (+0.45 KV) | 51.02 GB (+0.9 KV) | 51.92 GB (+1.8 KV) | 53.73 GB (+3.6 KV) | 57.33 GB (+7.2 KV) |
| Q4_K_S 4.58 bpw | FP32 | 48.09 GB | 50.89 GB (+1.5 KV) | 52.39 GB (+3.0 KV) | 55.39 GB (+6.0 KV) | 61.39 GB (+12.0 KV) | 73.39 GB (+24.0 KV) | 97.39 GB (+48.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 48.09 GB | 50.14 GB (+0.75 KV) | 50.89 GB (+1.5 KV) | 52.39 GB (+3.0 KV) | 55.39 GB (+6.0 KV) | 61.39 GB (+12.0 KV) | 73.39 GB (+24.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 48.09 GB | 49.8 GB (+0.41 KV) | 50.21 GB (+0.83 KV) | 51.04 GB (+1.65 KV) | 52.69 GB (+3.3 KV) | 55.99 GB (+6.6 KV) | 62.59 GB (+13.2 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 48.09 GB | 49.76 GB (+0.38 KV) | 50.14 GB (+0.75 KV) | 50.89 GB (+1.5 KV) | 52.39 GB (+3.0 KV) | 55.39 GB (+6.0 KV) | 61.39 GB (+12.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 48.09 GB | 49.61 GB (+0.22 KV) | 49.84 GB (+0.45 KV) | 50.29 GB (+0.9 KV) | 51.19 GB (+1.8 KV) | 52.99 GB (+3.6 KV) | 56.59 GB (+7.2 KV) |
| Q3_K_M 3.91 bpw | FP32 | 41.05 GB | 43.85 GB (+1.5 KV) | 45.35 GB (+3.0 KV) | 48.35 GB (+6.0 KV) | 54.35 GB (+12.0 KV) | 66.36 GB (+24.0 KV) | 90.36 GB (+48.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 41.05 GB | 43.1 GB (+0.75 KV) | 43.85 GB (+1.5 KV) | 45.35 GB (+3.0 KV) | 48.35 GB (+6.0 KV) | 54.35 GB (+12.0 KV) | 66.36 GB (+24.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 41.05 GB | 42.77 GB (+0.41 KV) | 43.18 GB (+0.83 KV) | 44.0 GB (+1.65 KV) | 45.65 GB (+3.3 KV) | 48.95 GB (+6.6 KV) | 55.55 GB (+13.2 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 41.05 GB | 42.73 GB (+0.38 KV) | 43.1 GB (+0.75 KV) | 43.85 GB (+1.5 KV) | 45.35 GB (+3.0 KV) | 48.35 GB (+6.0 KV) | 54.35 GB (+12.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 41.05 GB | 42.58 GB (+0.22 KV) | 42.8 GB (+0.45 KV) | 43.25 GB (+0.9 KV) | 44.15 GB (+1.8 KV) | 45.95 GB (+3.6 KV) | 49.55 GB (+7.2 KV) |
| Q2_K 2.63 bpw | FP32 | 27.61 GB | 30.41 GB (+1.5 KV) | 31.91 GB (+3.0 KV) | 34.91 GB (+6.0 KV) | 40.91 GB (+12.0 KV) | 52.91 GB (+24.0 KV) | 76.91 GB (+48.0 KV) |
| Q2_K 2.63 bpw | FP16 | 27.61 GB | 29.66 GB (+0.75 KV) | 30.41 GB (+1.5 KV) | 31.91 GB (+3.0 KV) | 34.91 GB (+6.0 KV) | 40.91 GB (+12.0 KV) | 52.91 GB (+24.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 27.61 GB | 29.33 GB (+0.41 KV) | 29.74 GB (+0.83 KV) | 30.56 GB (+1.65 KV) | 32.21 GB (+3.3 KV) | 35.51 GB (+6.6 KV) | 42.11 GB (+13.2 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 27.61 GB | 29.29 GB (+0.38 KV) | 29.66 GB (+0.75 KV) | 30.41 GB (+1.5 KV) | 31.91 GB (+3.0 KV) | 34.91 GB (+6.0 KV) | 40.91 GB (+12.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 27.61 GB | 29.14 GB (+0.22 KV) | 29.36 GB (+0.45 KV) | 29.81 GB (+0.9 KV) | 30.71 GB (+1.8 KV) | 32.51 GB (+3.6 KV) | 36.11 GB (+7.2 KV) |
Total VRAM = Model Weights + KV Cache + 1.3 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.