Back to Models

Devstral-2-123B-Instruct-2512

Standard Transformer 123.0B Parameters

Model Specifications

Layers 88
Hidden Dimension 12,288
Attention Heads 96
KV Heads 8
Max Context 262K tokens
Vocabulary Size 131,072

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 1.5 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context 262K Context
FP16 16.0 bpw FP32 258.3 GB 265.3 GB (+5.5 KV) 270.8 GB (+11.0 KV) 281.8 GB (+22.0 KV) 303.8 GB (+44.0 KV) 347.8 GB (+88.0 KV) 435.8 GB (+176.0 KV)
FP16 16.0 bpw FP16 258.3 GB 262.55 GB (+2.75 KV) 265.3 GB (+5.5 KV) 270.8 GB (+11.0 KV) 281.8 GB (+22.0 KV) 303.8 GB (+44.0 KV) 347.8 GB (+88.0 KV)
FP16 16.0 bpw Q8_0 258.3 GB 261.31 GB (+1.51 KV) 262.82 GB (+3.03 KV) 265.85 GB (+6.05 KV) 271.9 GB (+12.1 KV) 284.0 GB (+24.2 KV) 308.2 GB (+48.4 KV)
FP16 16.0 bpw FP8 (Exp) 258.3 GB 261.18 GB (+1.38 KV) 262.55 GB (+2.75 KV) 265.3 GB (+5.5 KV) 270.8 GB (+11.0 KV) 281.8 GB (+22.0 KV) 303.8 GB (+44.0 KV)
FP16 16.0 bpw Q4_0 (Exp) 258.3 GB 260.62 GB (+0.82 KV) 261.45 GB (+1.65 KV) 263.1 GB (+3.3 KV) 266.4 GB (+6.6 KV) 273.0 GB (+13.2 KV) 286.2 GB (+26.4 KV)
Q8_0 8.0 bpw FP32 129.15 GB 136.15 GB (+5.5 KV) 141.65 GB (+11.0 KV) 152.65 GB (+22.0 KV) 174.65 GB (+44.0 KV) 218.65 GB (+88.0 KV) 306.65 GB (+176.0 KV)
Q8_0 8.0 bpw FP16 129.15 GB 133.4 GB (+2.75 KV) 136.15 GB (+5.5 KV) 141.65 GB (+11.0 KV) 152.65 GB (+22.0 KV) 174.65 GB (+44.0 KV) 218.65 GB (+88.0 KV)
Q8_0 8.0 bpw Q8_0 129.15 GB 132.16 GB (+1.51 KV) 133.68 GB (+3.03 KV) 136.7 GB (+6.05 KV) 142.75 GB (+12.1 KV) 154.85 GB (+24.2 KV) 179.05 GB (+48.4 KV)
Q8_0 8.0 bpw FP8 (Exp) 129.15 GB 132.03 GB (+1.38 KV) 133.4 GB (+2.75 KV) 136.15 GB (+5.5 KV) 141.65 GB (+11.0 KV) 152.65 GB (+22.0 KV) 174.65 GB (+44.0 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 129.15 GB 131.47 GB (+0.82 KV) 132.3 GB (+1.65 KV) 133.95 GB (+3.3 KV) 137.25 GB (+6.6 KV) 143.85 GB (+13.2 KV) 157.05 GB (+26.4 KV)
Q4_K_M 4.65 bpw FP32 75.07 GB 82.07 GB (+5.5 KV) 87.57 GB (+11.0 KV) 98.57 GB (+22.0 KV) 120.57 GB (+44.0 KV) 164.57 GB (+88.0 KV) 252.57 GB (+176.0 KV)
Q4_K_M 4.65 bpw FP16 75.07 GB 79.32 GB (+2.75 KV) 82.07 GB (+5.5 KV) 87.57 GB (+11.0 KV) 98.57 GB (+22.0 KV) 120.57 GB (+44.0 KV) 164.57 GB (+88.0 KV)
Q4_K_M 4.65 bpw Q8_0 75.07 GB 78.08 GB (+1.51 KV) 79.59 GB (+3.03 KV) 82.62 GB (+6.05 KV) 88.67 GB (+12.1 KV) 100.77 GB (+24.2 KV) 124.97 GB (+48.4 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 75.07 GB 77.94 GB (+1.38 KV) 79.32 GB (+2.75 KV) 82.07 GB (+5.5 KV) 87.57 GB (+11.0 KV) 98.57 GB (+22.0 KV) 120.57 GB (+44.0 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 75.07 GB 77.39 GB (+0.82 KV) 78.22 GB (+1.65 KV) 79.87 GB (+3.3 KV) 83.17 GB (+6.6 KV) 89.77 GB (+13.2 KV) 102.97 GB (+26.4 KV)
Q4_K_S 4.58 bpw FP32 73.94 GB 80.94 GB (+5.5 KV) 86.44 GB (+11.0 KV) 97.44 GB (+22.0 KV) 119.44 GB (+44.0 KV) 163.44 GB (+88.0 KV) 251.44 GB (+176.0 KV)
Q4_K_S 4.58 bpw FP16 73.94 GB 78.19 GB (+2.75 KV) 80.94 GB (+5.5 KV) 86.44 GB (+11.0 KV) 97.44 GB (+22.0 KV) 119.44 GB (+44.0 KV) 163.44 GB (+88.0 KV)
Q4_K_S 4.58 bpw Q8_0 73.94 GB 76.95 GB (+1.51 KV) 78.46 GB (+3.03 KV) 81.49 GB (+6.05 KV) 87.54 GB (+12.1 KV) 99.64 GB (+24.2 KV) 123.84 GB (+48.4 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 73.94 GB 76.81 GB (+1.38 KV) 78.19 GB (+2.75 KV) 80.94 GB (+5.5 KV) 86.44 GB (+11.0 KV) 97.44 GB (+22.0 KV) 119.44 GB (+44.0 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 73.94 GB 76.26 GB (+0.82 KV) 77.09 GB (+1.65 KV) 78.74 GB (+3.3 KV) 82.04 GB (+6.6 KV) 88.64 GB (+13.2 KV) 101.84 GB (+26.4 KV)
Q3_K_M 3.91 bpw FP32 63.12 GB 70.12 GB (+5.5 KV) 75.62 GB (+11.0 KV) 86.62 GB (+22.0 KV) 108.62 GB (+44.0 KV) 152.62 GB (+88.0 KV) 240.62 GB (+176.0 KV)
Q3_K_M 3.91 bpw FP16 63.12 GB 67.37 GB (+2.75 KV) 70.12 GB (+5.5 KV) 75.62 GB (+11.0 KV) 86.62 GB (+22.0 KV) 108.62 GB (+44.0 KV) 152.62 GB (+88.0 KV)
Q3_K_M 3.91 bpw Q8_0 63.12 GB 66.13 GB (+1.51 KV) 67.65 GB (+3.03 KV) 70.67 GB (+6.05 KV) 76.72 GB (+12.1 KV) 88.82 GB (+24.2 KV) 113.02 GB (+48.4 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 63.12 GB 66.0 GB (+1.38 KV) 67.37 GB (+2.75 KV) 70.12 GB (+5.5 KV) 75.62 GB (+11.0 KV) 86.62 GB (+22.0 KV) 108.62 GB (+44.0 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 63.12 GB 65.45 GB (+0.82 KV) 66.27 GB (+1.65 KV) 67.92 GB (+3.3 KV) 71.22 GB (+6.6 KV) 77.82 GB (+13.2 KV) 91.02 GB (+26.4 KV)
Q2_K 2.63 bpw FP32 42.46 GB 49.46 GB (+5.5 KV) 54.96 GB (+11.0 KV) 65.96 GB (+22.0 KV) 87.96 GB (+44.0 KV) 131.96 GB (+88.0 KV) 219.96 GB (+176.0 KV)
Q2_K 2.63 bpw FP16 42.46 GB 46.71 GB (+2.75 KV) 49.46 GB (+5.5 KV) 54.96 GB (+11.0 KV) 65.96 GB (+22.0 KV) 87.96 GB (+44.0 KV) 131.96 GB (+88.0 KV)
Q2_K 2.63 bpw Q8_0 42.46 GB 45.47 GB (+1.51 KV) 46.98 GB (+3.03 KV) 50.01 GB (+6.05 KV) 56.06 GB (+12.1 KV) 68.16 GB (+24.2 KV) 92.36 GB (+48.4 KV)
Q2_K 2.63 bpw FP8 (Exp) 42.46 GB 45.33 GB (+1.38 KV) 46.71 GB (+2.75 KV) 49.46 GB (+5.5 KV) 54.96 GB (+11.0 KV) 65.96 GB (+22.0 KV) 87.96 GB (+44.0 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 42.46 GB 44.78 GB (+0.82 KV) 45.61 GB (+1.65 KV) 47.26 GB (+3.3 KV) 50.56 GB (+6.6 KV) 57.16 GB (+13.2 KV) 70.36 GB (+26.4 KV)

Total VRAM = Model Weights + KV Cache + 1.5 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Devstral-2-123B-Instruct-2512

Use our calculator to see if this model fits your specific hardware configuration.