Active Parameters: 35.0B
VRAM usage for all quantization and cache format combinations. Base overhead: 1.5 GB (CUDA context + activations).
| Quantization | Cache Format | Model Weights | 8K Context | 16K Context | 32K Context | 65K Context | 131K Context | 262K Context |
|---|---|---|---|---|---|---|---|---|
| FP16 16.0 bpw | FP32 | 1008.0 GB | 1013.38 GB (+3.88 KV) | 1017.25 GB (+7.75 KV) | 1025.0 GB (+15.5 KV) | 1040.5 GB (+31.0 KV) | 1071.5 GB (+62.0 KV) | 1133.5 GB (+124.0 KV) |
| FP16 16.0 bpw | FP16 | 1008.0 GB | 1011.44 GB (+1.94 KV) | 1013.38 GB (+3.88 KV) | 1017.25 GB (+7.75 KV) | 1025.0 GB (+15.5 KV) | 1040.5 GB (+31.0 KV) | 1071.5 GB (+62.0 KV) |
| FP16 16.0 bpw | Q8_0 | 1008.0 GB | 1010.57 GB (+1.07 KV) | 1011.63 GB (+2.13 KV) | 1013.76 GB (+4.26 KV) | 1018.02 GB (+8.53 KV) | 1026.55 GB (+17.05 KV) | 1043.6 GB (+34.1 KV) |
| FP16 16.0 bpw | FP8 (Exp) | 1008.0 GB | 1010.47 GB (+0.97 KV) | 1011.44 GB (+1.94 KV) | 1013.38 GB (+3.88 KV) | 1017.25 GB (+7.75 KV) | 1025.0 GB (+15.5 KV) | 1040.5 GB (+31.0 KV) |
| FP16 16.0 bpw | Q4_0 (Exp) | 1008.0 GB | 1010.08 GB (+0.58 KV) | 1010.66 GB (+1.16 KV) | 1011.83 GB (+2.32 KV) | 1014.15 GB (+4.65 KV) | 1018.8 GB (+9.3 KV) | 1028.1 GB (+18.6 KV) |
| Q8_0 8.0 bpw | FP32 | 504.0 GB | 509.38 GB (+3.88 KV) | 513.25 GB (+7.75 KV) | 521.0 GB (+15.5 KV) | 536.5 GB (+31.0 KV) | 567.5 GB (+62.0 KV) | 629.5 GB (+124.0 KV) |
| Q8_0 8.0 bpw | FP16 | 504.0 GB | 507.44 GB (+1.94 KV) | 509.38 GB (+3.88 KV) | 513.25 GB (+7.75 KV) | 521.0 GB (+15.5 KV) | 536.5 GB (+31.0 KV) | 567.5 GB (+62.0 KV) |
| Q8_0 8.0 bpw | Q8_0 | 504.0 GB | 506.57 GB (+1.07 KV) | 507.63 GB (+2.13 KV) | 509.76 GB (+4.26 KV) | 514.02 GB (+8.53 KV) | 522.55 GB (+17.05 KV) | 539.6 GB (+34.1 KV) |
| Q8_0 8.0 bpw | FP8 (Exp) | 504.0 GB | 506.47 GB (+0.97 KV) | 507.44 GB (+1.94 KV) | 509.38 GB (+3.88 KV) | 513.25 GB (+7.75 KV) | 521.0 GB (+15.5 KV) | 536.5 GB (+31.0 KV) |
| Q8_0 8.0 bpw | Q4_0 (Exp) | 504.0 GB | 506.08 GB (+0.58 KV) | 506.66 GB (+1.16 KV) | 507.82 GB (+2.32 KV) | 510.15 GB (+4.65 KV) | 514.8 GB (+9.3 KV) | 524.1 GB (+18.6 KV) |
| Q4_K_M 4.65 bpw | FP32 | 292.95 GB | 298.32 GB (+3.88 KV) | 302.2 GB (+7.75 KV) | 309.95 GB (+15.5 KV) | 325.45 GB (+31.0 KV) | 356.45 GB (+62.0 KV) | 418.45 GB (+124.0 KV) |
| Q4_K_M 4.65 bpw | FP16 | 292.95 GB | 296.39 GB (+1.94 KV) | 298.32 GB (+3.88 KV) | 302.2 GB (+7.75 KV) | 309.95 GB (+15.5 KV) | 325.45 GB (+31.0 KV) | 356.45 GB (+62.0 KV) |
| Q4_K_M 4.65 bpw | Q8_0 | 292.95 GB | 295.52 GB (+1.07 KV) | 296.58 GB (+2.13 KV) | 298.71 GB (+4.26 KV) | 302.97 GB (+8.53 KV) | 311.5 GB (+17.05 KV) | 328.55 GB (+34.1 KV) |
| Q4_K_M 4.65 bpw | FP8 (Exp) | 292.95 GB | 295.42 GB (+0.97 KV) | 296.39 GB (+1.94 KV) | 298.32 GB (+3.88 KV) | 302.2 GB (+7.75 KV) | 309.95 GB (+15.5 KV) | 325.45 GB (+31.0 KV) |
| Q4_K_M 4.65 bpw | Q4_0 (Exp) | 292.95 GB | 295.03 GB (+0.58 KV) | 295.61 GB (+1.16 KV) | 296.77 GB (+2.32 KV) | 299.1 GB (+4.65 KV) | 303.75 GB (+9.3 KV) | 313.05 GB (+18.6 KV) |
| Q4_K_S 4.58 bpw | FP32 | 288.54 GB | 293.92 GB (+3.88 KV) | 297.79 GB (+7.75 KV) | 305.54 GB (+15.5 KV) | 321.04 GB (+31.0 KV) | 352.04 GB (+62.0 KV) | 414.04 GB (+124.0 KV) |
| Q4_K_S 4.58 bpw | FP16 | 288.54 GB | 291.98 GB (+1.94 KV) | 293.92 GB (+3.88 KV) | 297.79 GB (+7.75 KV) | 305.54 GB (+15.5 KV) | 321.04 GB (+31.0 KV) | 352.04 GB (+62.0 KV) |
| Q4_K_S 4.58 bpw | Q8_0 | 288.54 GB | 291.11 GB (+1.07 KV) | 292.17 GB (+2.13 KV) | 294.3 GB (+4.26 KV) | 298.56 GB (+8.53 KV) | 307.09 GB (+17.05 KV) | 324.14 GB (+34.1 KV) |
| Q4_K_S 4.58 bpw | FP8 (Exp) | 288.54 GB | 291.01 GB (+0.97 KV) | 291.98 GB (+1.94 KV) | 293.92 GB (+3.88 KV) | 297.79 GB (+7.75 KV) | 305.54 GB (+15.5 KV) | 321.04 GB (+31.0 KV) |
| Q4_K_S 4.58 bpw | Q4_0 (Exp) | 288.54 GB | 290.62 GB (+0.58 KV) | 291.2 GB (+1.16 KV) | 292.37 GB (+2.32 KV) | 294.69 GB (+4.65 KV) | 299.34 GB (+9.3 KV) | 308.64 GB (+18.6 KV) |
| Q3_K_M 3.91 bpw | FP32 | 246.33 GB | 251.71 GB (+3.88 KV) | 255.58 GB (+7.75 KV) | 263.33 GB (+15.5 KV) | 278.83 GB (+31.0 KV) | 309.83 GB (+62.0 KV) | 371.83 GB (+124.0 KV) |
| Q3_K_M 3.91 bpw | FP16 | 246.33 GB | 249.77 GB (+1.94 KV) | 251.71 GB (+3.88 KV) | 255.58 GB (+7.75 KV) | 263.33 GB (+15.5 KV) | 278.83 GB (+31.0 KV) | 309.83 GB (+62.0 KV) |
| Q3_K_M 3.91 bpw | Q8_0 | 246.33 GB | 248.9 GB (+1.07 KV) | 249.96 GB (+2.13 KV) | 252.09 GB (+4.26 KV) | 256.36 GB (+8.53 KV) | 264.88 GB (+17.05 KV) | 281.93 GB (+34.1 KV) |
| Q3_K_M 3.91 bpw | FP8 (Exp) | 246.33 GB | 248.8 GB (+0.97 KV) | 249.77 GB (+1.94 KV) | 251.71 GB (+3.88 KV) | 255.58 GB (+7.75 KV) | 263.33 GB (+15.5 KV) | 278.83 GB (+31.0 KV) |
| Q3_K_M 3.91 bpw | Q4_0 (Exp) | 246.33 GB | 248.41 GB (+0.58 KV) | 248.99 GB (+1.16 KV) | 250.16 GB (+2.32 KV) | 252.48 GB (+4.65 KV) | 257.13 GB (+9.3 KV) | 266.43 GB (+18.6 KV) |
| Q2_K 2.63 bpw | FP32 | 165.69 GB | 171.06 GB (+3.88 KV) | 174.94 GB (+7.75 KV) | 182.69 GB (+15.5 KV) | 198.19 GB (+31.0 KV) | 229.19 GB (+62.0 KV) | 291.19 GB (+124.0 KV) |
| Q2_K 2.63 bpw | FP16 | 165.69 GB | 169.13 GB (+1.94 KV) | 171.06 GB (+3.88 KV) | 174.94 GB (+7.75 KV) | 182.69 GB (+15.5 KV) | 198.19 GB (+31.0 KV) | 229.19 GB (+62.0 KV) |
| Q2_K 2.63 bpw | Q8_0 | 165.69 GB | 168.26 GB (+1.07 KV) | 169.32 GB (+2.13 KV) | 171.45 GB (+4.26 KV) | 175.72 GB (+8.53 KV) | 184.24 GB (+17.05 KV) | 201.29 GB (+34.1 KV) |
| Q2_K 2.63 bpw | FP8 (Exp) | 165.69 GB | 168.16 GB (+0.97 KV) | 169.13 GB (+1.94 KV) | 171.06 GB (+3.88 KV) | 174.94 GB (+7.75 KV) | 182.69 GB (+15.5 KV) | 198.19 GB (+31.0 KV) |
| Q2_K 2.63 bpw | Q4_0 (Exp) | 165.69 GB | 167.77 GB (+0.58 KV) | 168.35 GB (+1.16 KV) | 169.51 GB (+2.32 KV) | 171.84 GB (+4.65 KV) | 176.49 GB (+9.3 KV) | 185.79 GB (+18.6 KV) |
Total VRAM = Model Weights + KV Cache + 1.5 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.
Use our calculator to see if this model fits your specific hardware configuration.