Back to Models

Devstral-Small-2-24B-Instruct

Standard Transformer 24.0B Parameters

Model Specifications

Layers 40
Hidden Dimension 5,120
Attention Heads 32
KV Heads 8
Max Context 393K tokens
Vocabulary Size 131,072

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.74 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context 393K Context
FP16 16.0 bpw FP32 50.4 GB 53.64 GB (+2.5 KV) 56.14 GB (+5.0 KV) 61.14 GB (+10.0 KV) 71.14 GB (+20.0 KV) 91.14 GB (+40.0 KV) 171.14 GB (+120.0 KV)
FP16 16.0 bpw FP16 50.4 GB 52.39 GB (+1.25 KV) 53.64 GB (+2.5 KV) 56.14 GB (+5.0 KV) 61.14 GB (+10.0 KV) 71.14 GB (+20.0 KV) 111.14 GB (+60.0 KV)
FP16 16.0 bpw Q8_0 50.4 GB 51.83 GB (+0.69 KV) 52.52 GB (+1.38 KV) 53.89 GB (+2.75 KV) 56.64 GB (+5.5 KV) 62.14 GB (+11.0 KV) 84.14 GB (+33.0 KV)
FP16 16.0 bpw FP8 (Exp) 50.4 GB 51.77 GB (+0.62 KV) 52.39 GB (+1.25 KV) 53.64 GB (+2.5 KV) 56.14 GB (+5.0 KV) 61.14 GB (+10.0 KV) 81.14 GB (+30.0 KV)
FP16 16.0 bpw Q4_0 (Exp) 50.4 GB 51.52 GB (+0.38 KV) 51.89 GB (+0.75 KV) 52.64 GB (+1.5 KV) 54.14 GB (+3.0 KV) 57.14 GB (+6.0 KV) 69.14 GB (+18.0 KV)
Q8_0 8.0 bpw FP32 25.2 GB 28.44 GB (+2.5 KV) 30.94 GB (+5.0 KV) 35.94 GB (+10.0 KV) 45.94 GB (+20.0 KV) 65.94 GB (+40.0 KV) 145.94 GB (+120.0 KV)
Q8_0 8.0 bpw FP16 25.2 GB 27.19 GB (+1.25 KV) 28.44 GB (+2.5 KV) 30.94 GB (+5.0 KV) 35.94 GB (+10.0 KV) 45.94 GB (+20.0 KV) 85.94 GB (+60.0 KV)
Q8_0 8.0 bpw Q8_0 25.2 GB 26.63 GB (+0.69 KV) 27.32 GB (+1.38 KV) 28.69 GB (+2.75 KV) 31.44 GB (+5.5 KV) 36.94 GB (+11.0 KV) 58.94 GB (+33.0 KV)
Q8_0 8.0 bpw FP8 (Exp) 25.2 GB 26.57 GB (+0.62 KV) 27.19 GB (+1.25 KV) 28.44 GB (+2.5 KV) 30.94 GB (+5.0 KV) 35.94 GB (+10.0 KV) 55.94 GB (+30.0 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 25.2 GB 26.32 GB (+0.38 KV) 26.69 GB (+0.75 KV) 27.44 GB (+1.5 KV) 28.94 GB (+3.0 KV) 31.94 GB (+6.0 KV) 43.94 GB (+18.0 KV)
Q4_K_M 4.65 bpw FP32 14.65 GB 17.89 GB (+2.5 KV) 20.39 GB (+5.0 KV) 25.39 GB (+10.0 KV) 35.39 GB (+20.0 KV) 55.39 GB (+40.0 KV) 135.39 GB (+120.0 KV)
Q4_K_M 4.65 bpw FP16 14.65 GB 16.64 GB (+1.25 KV) 17.89 GB (+2.5 KV) 20.39 GB (+5.0 KV) 25.39 GB (+10.0 KV) 35.39 GB (+20.0 KV) 75.39 GB (+60.0 KV)
Q4_K_M 4.65 bpw Q8_0 14.65 GB 16.07 GB (+0.69 KV) 16.76 GB (+1.38 KV) 18.14 GB (+2.75 KV) 20.89 GB (+5.5 KV) 26.39 GB (+11.0 KV) 48.39 GB (+33.0 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 14.65 GB 16.01 GB (+0.62 KV) 16.64 GB (+1.25 KV) 17.89 GB (+2.5 KV) 20.39 GB (+5.0 KV) 25.39 GB (+10.0 KV) 45.39 GB (+30.0 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 14.65 GB 15.76 GB (+0.38 KV) 16.14 GB (+0.75 KV) 16.89 GB (+1.5 KV) 18.39 GB (+3.0 KV) 21.39 GB (+6.0 KV) 33.39 GB (+18.0 KV)
Q4_K_S 4.58 bpw FP32 14.43 GB 17.67 GB (+2.5 KV) 20.17 GB (+5.0 KV) 25.17 GB (+10.0 KV) 35.17 GB (+20.0 KV) 55.17 GB (+40.0 KV) 135.17 GB (+120.0 KV)
Q4_K_S 4.58 bpw FP16 14.43 GB 16.42 GB (+1.25 KV) 17.67 GB (+2.5 KV) 20.17 GB (+5.0 KV) 25.17 GB (+10.0 KV) 35.17 GB (+20.0 KV) 75.17 GB (+60.0 KV)
Q4_K_S 4.58 bpw Q8_0 14.43 GB 15.85 GB (+0.69 KV) 16.54 GB (+1.38 KV) 17.92 GB (+2.75 KV) 20.67 GB (+5.5 KV) 26.17 GB (+11.0 KV) 48.17 GB (+33.0 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 14.43 GB 15.79 GB (+0.62 KV) 16.42 GB (+1.25 KV) 17.67 GB (+2.5 KV) 20.17 GB (+5.0 KV) 25.17 GB (+10.0 KV) 45.17 GB (+30.0 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 14.43 GB 15.54 GB (+0.38 KV) 15.92 GB (+0.75 KV) 16.67 GB (+1.5 KV) 18.17 GB (+3.0 KV) 21.17 GB (+6.0 KV) 33.17 GB (+18.0 KV)
Q3_K_M 3.91 bpw FP32 12.32 GB 15.56 GB (+2.5 KV) 18.06 GB (+5.0 KV) 23.06 GB (+10.0 KV) 33.06 GB (+20.0 KV) 53.06 GB (+40.0 KV) 133.06 GB (+120.0 KV)
Q3_K_M 3.91 bpw FP16 12.32 GB 14.31 GB (+1.25 KV) 15.56 GB (+2.5 KV) 18.06 GB (+5.0 KV) 23.06 GB (+10.0 KV) 33.06 GB (+20.0 KV) 73.06 GB (+60.0 KV)
Q3_K_M 3.91 bpw Q8_0 12.32 GB 13.74 GB (+0.69 KV) 14.43 GB (+1.38 KV) 15.81 GB (+2.75 KV) 18.56 GB (+5.5 KV) 24.06 GB (+11.0 KV) 46.06 GB (+33.0 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 12.32 GB 13.68 GB (+0.62 KV) 14.31 GB (+1.25 KV) 15.56 GB (+2.5 KV) 18.06 GB (+5.0 KV) 23.06 GB (+10.0 KV) 43.06 GB (+30.0 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 12.32 GB 13.43 GB (+0.38 KV) 13.81 GB (+0.75 KV) 14.56 GB (+1.5 KV) 16.06 GB (+3.0 KV) 19.06 GB (+6.0 KV) 31.06 GB (+18.0 KV)
Q2_K 2.63 bpw FP32 8.28 GB 11.52 GB (+2.5 KV) 14.02 GB (+5.0 KV) 19.02 GB (+10.0 KV) 29.02 GB (+20.0 KV) 49.02 GB (+40.0 KV) 129.02 GB (+120.0 KV)
Q2_K 2.63 bpw FP16 8.28 GB 10.27 GB (+1.25 KV) 11.52 GB (+2.5 KV) 14.02 GB (+5.0 KV) 19.02 GB (+10.0 KV) 29.02 GB (+20.0 KV) 69.02 GB (+60.0 KV)
Q2_K 2.63 bpw Q8_0 8.28 GB 9.71 GB (+0.69 KV) 10.4 GB (+1.38 KV) 11.77 GB (+2.75 KV) 14.52 GB (+5.5 KV) 20.02 GB (+11.0 KV) 42.02 GB (+33.0 KV)
Q2_K 2.63 bpw FP8 (Exp) 8.28 GB 9.65 GB (+0.62 KV) 10.27 GB (+1.25 KV) 11.52 GB (+2.5 KV) 14.02 GB (+5.0 KV) 19.02 GB (+10.0 KV) 39.02 GB (+30.0 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 8.28 GB 9.4 GB (+0.38 KV) 9.77 GB (+0.75 KV) 10.52 GB (+1.5 KV) 12.02 GB (+3.0 KV) 15.02 GB (+6.0 KV) 27.02 GB (+18.0 KV)

Total VRAM = Model Weights + KV Cache + 0.74 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Devstral-Small-2-24B-Instruct

Use our calculator to see if this model fits your specific hardware configuration.