Active Parameters: 22.0B
VRAM usage for all quantization and cache format combinations. Base overhead: 1.5 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 262K Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 493.5 GB | 497.94 GB (+2.94 KV) | 500.88 GB (+5.88 KV) | 506.75 GB (+11.75 KV) | 518.5 GB (+23.5 KV) | 542.0 GB (+47.0 KV) | 589.0 GB (+94.0 KV) |
| FP16 16.0 bpw | FP16 | 493.5 GB | 496.47 GB (+1.47 KV) | 497.94 GB (+2.94 KV) | 500.88 GB (+5.88 KV) | 506.75 GB (+11.75 KV) | 518.5 GB (+23.5 KV) | 542.0 GB (+47.0 KV) |
| FP16 16.0 bpw | Q8_0 | 493.5 GB | 495.81 GB (+0.81 KV) | 496.62 GB (+1.62 KV) | 498.23 GB (+3.23 KV) | 501.46 GB (+6.46 KV) | 507.93 GB (+12.93 KV) | 520.85 GB (+25.85 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 493.5 GB | 495.73 GB (+0.73 KV) | 496.47 GB (+1.47 KV) | 497.94 GB (+2.94 KV) | 500.88 GB (+5.88 KV) | 506.75 GB (+11.75 KV) | 518.5 GB (+23.5 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 493.5 GB | 495.44 GB (+0.44 KV) | 495.88 GB (+0.88 KV) | 496.76 GB (+1.76 KV) | 498.52 GB (+3.52 KV) | 502.05 GB (+7.05 KV) | 509.1 GB (+14.1 KV) |
| Q8_0 8.0 bpw | FP32 | 246.75 GB | 251.19 GB (+2.94 KV) | 254.12 GB (+5.88 KV) | 260.0 GB (+11.75 KV) | 271.75 GB (+23.5 KV) | 295.25 GB (+47.0 KV) | 342.25 GB (+94.0 KV) |
| Q8_0 8.0 bpw | FP16 | 246.75 GB | 249.72 GB (+1.47 KV) | 251.19 GB (+2.94 KV) | 254.12 GB (+5.88 KV) | 260.0 GB (+11.75 KV) | 271.75 GB (+23.5 KV) | 295.25 GB (+47.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 246.75 GB | 249.06 GB (+0.81 KV) | 249.87 GB (+1.62 KV) | 251.48 GB (+3.23 KV) | 254.71 GB (+6.46 KV) | 261.18 GB (+12.93 KV) | 274.1 GB (+25.85 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 246.75 GB | 248.98 GB (+0.73 KV) | 249.72 GB (+1.47 KV) | 251.19 GB (+2.94 KV) | 254.12 GB (+5.88 KV) | 260.0 GB (+11.75 KV) | 271.75 GB (+23.5 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 246.75 GB | 248.69 GB (+0.44 KV) | 249.13 GB (+0.88 KV) | 250.01 GB (+1.76 KV) | 251.78 GB (+3.52 KV) | 255.3 GB (+7.05 KV) | 262.35 GB (+14.1 KV) |
| Q4_K_M 4.65 bpw | FP32 | 143.42 GB | 147.86 GB (+2.94 KV) | 150.8 GB (+5.88 KV) | 156.67 GB (+11.75 KV) | 168.42 GB (+23.5 KV) | 191.92 GB (+47.0 KV) | 238.92 GB (+94.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 143.42 GB | 146.39 GB (+1.47 KV) | 147.86 GB (+2.94 KV) | 150.8 GB (+5.88 KV) | 156.67 GB (+11.75 KV) | 168.42 GB (+23.5 KV) | 191.92 GB (+47.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 143.42 GB | 145.73 GB (+0.81 KV) | 146.54 GB (+1.62 KV) | 148.15 GB (+3.23 KV) | 151.39 GB (+6.46 KV) | 157.85 GB (+12.93 KV) | 170.77 GB (+25.85 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 143.42 GB | 145.66 GB (+0.73 KV) | 146.39 GB (+1.47 KV) | 147.86 GB (+2.94 KV) | 150.8 GB (+5.88 KV) | 156.67 GB (+11.75 KV) | 168.42 GB (+23.5 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 143.42 GB | 145.36 GB (+0.44 KV) | 145.8 GB (+0.88 KV) | 146.69 GB (+1.76 KV) | 148.45 GB (+3.52 KV) | 151.97 GB (+7.05 KV) | 159.02 GB (+14.1 KV) |
| Q4_K_S 4.58 bpw | FP32 | 141.26 GB | 145.7 GB (+2.94 KV) | 148.64 GB (+5.88 KV) | 154.51 GB (+11.75 KV) | 166.26 GB (+23.5 KV) | 189.76 GB (+47.0 KV) | 236.76 GB (+94.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 141.26 GB | 144.23 GB (+1.47 KV) | 145.7 GB (+2.94 KV) | 148.64 GB (+5.88 KV) | 154.51 GB (+11.75 KV) | 166.26 GB (+23.5 KV) | 189.76 GB (+47.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 141.26 GB | 143.57 GB (+0.81 KV) | 144.38 GB (+1.62 KV) | 146.0 GB (+3.23 KV) | 149.23 GB (+6.46 KV) | 155.69 GB (+12.93 KV) | 168.61 GB (+25.85 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 141.26 GB | 143.5 GB (+0.73 KV) | 144.23 GB (+1.47 KV) | 145.7 GB (+2.94 KV) | 148.64 GB (+5.88 KV) | 154.51 GB (+11.75 KV) | 166.26 GB (+23.5 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 141.26 GB | 143.21 GB (+0.44 KV) | 143.65 GB (+0.88 KV) | 144.53 GB (+1.76 KV) | 146.29 GB (+3.52 KV) | 149.81 GB (+7.05 KV) | 156.86 GB (+14.1 KV) |
| Q3_K_M 3.91 bpw | FP32 | 120.6 GB | 125.04 GB (+2.94 KV) | 127.97 GB (+5.88 KV) | 133.85 GB (+11.75 KV) | 145.6 GB (+23.5 KV) | 169.1 GB (+47.0 KV) | 216.1 GB (+94.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 120.6 GB | 123.57 GB (+1.47 KV) | 125.04 GB (+2.94 KV) | 127.97 GB (+5.88 KV) | 133.85 GB (+11.75 KV) | 145.6 GB (+23.5 KV) | 169.1 GB (+47.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 120.6 GB | 122.91 GB (+0.81 KV) | 123.71 GB (+1.62 KV) | 125.33 GB (+3.23 KV) | 128.56 GB (+6.46 KV) | 135.02 GB (+12.93 KV) | 147.95 GB (+25.85 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 120.6 GB | 122.83 GB (+0.73 KV) | 123.57 GB (+1.47 KV) | 125.04 GB (+2.94 KV) | 127.97 GB (+5.88 KV) | 133.85 GB (+11.75 KV) | 145.6 GB (+23.5 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 120.6 GB | 122.54 GB (+0.44 KV) | 122.98 GB (+0.88 KV) | 123.86 GB (+1.76 KV) | 125.62 GB (+3.52 KV) | 129.15 GB (+7.05 KV) | 136.2 GB (+14.1 KV) |
| Q2_K 2.63 bpw | FP32 | 81.12 GB | 85.56 GB (+2.94 KV) | 88.49 GB (+5.88 KV) | 94.37 GB (+11.75 KV) | 106.12 GB (+23.5 KV) | 129.62 GB (+47.0 KV) | 176.62 GB (+94.0 KV) |
| Q2_K 2.63 bpw | FP16 | 81.12 GB | 84.09 GB (+1.47 KV) | 85.56 GB (+2.94 KV) | 88.49 GB (+5.88 KV) | 94.37 GB (+11.75 KV) | 106.12 GB (+23.5 KV) | 129.62 GB (+47.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 81.12 GB | 83.43 GB (+0.81 KV) | 84.23 GB (+1.62 KV) | 85.85 GB (+3.23 KV) | 89.08 GB (+6.46 KV) | 95.54 GB (+12.93 KV) | 108.47 GB (+25.85 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 81.12 GB | 83.35 GB (+0.73 KV) | 84.09 GB (+1.47 KV) | 85.56 GB (+2.94 KV) | 88.49 GB (+5.88 KV) | 94.37 GB (+11.75 KV) | 106.12 GB (+23.5 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 81.12 GB | 83.06 GB (+0.44 KV) | 83.5 GB (+0.88 KV) | 84.38 GB (+1.76 KV) | 86.14 GB (+3.52 KV) | 89.67 GB (+7.05 KV) | 96.72 GB (+14.1 KV) |
Total VRAM = Model Weights + KV Cache + 1.5 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.