Back to Models

Ministral-3-14B-Instruct-2512

Standard Transformer 14.0B Parameters

Model Specifications

Layers 40
Hidden Dimension 5,120
Attention Heads 32
KV Heads 8
Max Context 262K tokens
Vocabulary Size 131,072

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.64 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context 262K Context
FP16 16.0 bpw FP32 29.4 GB 32.54 GB (+2.5 KV) 35.04 GB (+5.0 KV) 40.04 GB (+10.0 KV) 50.04 GB (+20.0 KV) 70.04 GB (+40.0 KV) 110.04 GB (+80.0 KV)
FP16 16.0 bpw FP16 29.4 GB 31.29 GB (+1.25 KV) 32.54 GB (+2.5 KV) 35.04 GB (+5.0 KV) 40.04 GB (+10.0 KV) 50.04 GB (+20.0 KV) 70.04 GB (+40.0 KV)
FP16 16.0 bpw Q8_0 29.4 GB 30.73 GB (+0.69 KV) 31.42 GB (+1.38 KV) 32.79 GB (+2.75 KV) 35.54 GB (+5.5 KV) 41.04 GB (+11.0 KV) 52.04 GB (+22.0 KV)
FP16 16.0 bpw FP8 (Exp) 29.4 GB 30.67 GB (+0.62 KV) 31.29 GB (+1.25 KV) 32.54 GB (+2.5 KV) 35.04 GB (+5.0 KV) 40.04 GB (+10.0 KV) 50.04 GB (+20.0 KV)
FP16 16.0 bpw Q4_0 (Exp) 29.4 GB 30.42 GB (+0.38 KV) 30.79 GB (+0.75 KV) 31.54 GB (+1.5 KV) 33.04 GB (+3.0 KV) 36.04 GB (+6.0 KV) 42.04 GB (+12.0 KV)
Q8_0 8.0 bpw FP32 14.7 GB 17.84 GB (+2.5 KV) 20.34 GB (+5.0 KV) 25.34 GB (+10.0 KV) 35.34 GB (+20.0 KV) 55.34 GB (+40.0 KV) 95.34 GB (+80.0 KV)
Q8_0 8.0 bpw FP16 14.7 GB 16.59 GB (+1.25 KV) 17.84 GB (+2.5 KV) 20.34 GB (+5.0 KV) 25.34 GB (+10.0 KV) 35.34 GB (+20.0 KV) 55.34 GB (+40.0 KV)
Q8_0 8.0 bpw Q8_0 14.7 GB 16.03 GB (+0.69 KV) 16.72 GB (+1.38 KV) 18.09 GB (+2.75 KV) 20.84 GB (+5.5 KV) 26.34 GB (+11.0 KV) 37.34 GB (+22.0 KV)
Q8_0 8.0 bpw FP8 (Exp) 14.7 GB 15.97 GB (+0.62 KV) 16.59 GB (+1.25 KV) 17.84 GB (+2.5 KV) 20.34 GB (+5.0 KV) 25.34 GB (+10.0 KV) 35.34 GB (+20.0 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 14.7 GB 15.72 GB (+0.38 KV) 16.09 GB (+0.75 KV) 16.84 GB (+1.5 KV) 18.34 GB (+3.0 KV) 21.34 GB (+6.0 KV) 27.34 GB (+12.0 KV)
Q4_K_M 4.65 bpw FP32 8.54 GB 11.68 GB (+2.5 KV) 14.18 GB (+5.0 KV) 19.18 GB (+10.0 KV) 29.18 GB (+20.0 KV) 49.18 GB (+40.0 KV) 89.18 GB (+80.0 KV)
Q4_K_M 4.65 bpw FP16 8.54 GB 10.43 GB (+1.25 KV) 11.68 GB (+2.5 KV) 14.18 GB (+5.0 KV) 19.18 GB (+10.0 KV) 29.18 GB (+20.0 KV) 49.18 GB (+40.0 KV)
Q4_K_M 4.65 bpw Q8_0 8.54 GB 9.87 GB (+0.69 KV) 10.56 GB (+1.38 KV) 11.93 GB (+2.75 KV) 14.68 GB (+5.5 KV) 20.18 GB (+11.0 KV) 31.18 GB (+22.0 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 8.54 GB 9.81 GB (+0.62 KV) 10.43 GB (+1.25 KV) 11.68 GB (+2.5 KV) 14.18 GB (+5.0 KV) 19.18 GB (+10.0 KV) 29.18 GB (+20.0 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 8.54 GB 9.56 GB (+0.38 KV) 9.93 GB (+0.75 KV) 10.68 GB (+1.5 KV) 12.18 GB (+3.0 KV) 15.18 GB (+6.0 KV) 21.18 GB (+12.0 KV)
Q4_K_S 4.58 bpw FP32 8.42 GB 11.56 GB (+2.5 KV) 14.06 GB (+5.0 KV) 19.06 GB (+10.0 KV) 29.06 GB (+20.0 KV) 49.06 GB (+40.0 KV) 89.06 GB (+80.0 KV)
Q4_K_S 4.58 bpw FP16 8.42 GB 10.31 GB (+1.25 KV) 11.56 GB (+2.5 KV) 14.06 GB (+5.0 KV) 19.06 GB (+10.0 KV) 29.06 GB (+20.0 KV) 49.06 GB (+40.0 KV)
Q4_K_S 4.58 bpw Q8_0 8.42 GB 9.74 GB (+0.69 KV) 10.43 GB (+1.38 KV) 11.81 GB (+2.75 KV) 14.56 GB (+5.5 KV) 20.06 GB (+11.0 KV) 31.06 GB (+22.0 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 8.42 GB 9.68 GB (+0.62 KV) 10.31 GB (+1.25 KV) 11.56 GB (+2.5 KV) 14.06 GB (+5.0 KV) 19.06 GB (+10.0 KV) 29.06 GB (+20.0 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 8.42 GB 9.43 GB (+0.38 KV) 9.81 GB (+0.75 KV) 10.56 GB (+1.5 KV) 12.06 GB (+3.0 KV) 15.06 GB (+6.0 KV) 21.06 GB (+12.0 KV)
Q3_K_M 3.91 bpw FP32 7.18 GB 10.32 GB (+2.5 KV) 12.82 GB (+5.0 KV) 17.82 GB (+10.0 KV) 27.82 GB (+20.0 KV) 47.82 GB (+40.0 KV) 87.82 GB (+80.0 KV)
Q3_K_M 3.91 bpw FP16 7.18 GB 9.07 GB (+1.25 KV) 10.32 GB (+2.5 KV) 12.82 GB (+5.0 KV) 17.82 GB (+10.0 KV) 27.82 GB (+20.0 KV) 47.82 GB (+40.0 KV)
Q3_K_M 3.91 bpw Q8_0 7.18 GB 8.51 GB (+0.69 KV) 9.2 GB (+1.38 KV) 10.57 GB (+2.75 KV) 13.32 GB (+5.5 KV) 18.82 GB (+11.0 KV) 29.82 GB (+22.0 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 7.18 GB 8.45 GB (+0.62 KV) 9.07 GB (+1.25 KV) 10.32 GB (+2.5 KV) 12.82 GB (+5.0 KV) 17.82 GB (+10.0 KV) 27.82 GB (+20.0 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 7.18 GB 8.2 GB (+0.38 KV) 8.57 GB (+0.75 KV) 9.32 GB (+1.5 KV) 10.82 GB (+3.0 KV) 13.82 GB (+6.0 KV) 19.82 GB (+12.0 KV)
Q2_K 2.63 bpw FP32 4.83 GB 7.97 GB (+2.5 KV) 10.47 GB (+5.0 KV) 15.47 GB (+10.0 KV) 25.47 GB (+20.0 KV) 45.47 GB (+40.0 KV) 85.47 GB (+80.0 KV)
Q2_K 2.63 bpw FP16 4.83 GB 6.72 GB (+1.25 KV) 7.97 GB (+2.5 KV) 10.47 GB (+5.0 KV) 15.47 GB (+10.0 KV) 25.47 GB (+20.0 KV) 45.47 GB (+40.0 KV)
Q2_K 2.63 bpw Q8_0 4.83 GB 6.16 GB (+0.69 KV) 6.85 GB (+1.38 KV) 8.22 GB (+2.75 KV) 10.97 GB (+5.5 KV) 16.47 GB (+11.0 KV) 27.47 GB (+22.0 KV)
Q2_K 2.63 bpw FP8 (Exp) 4.83 GB 6.1 GB (+0.62 KV) 6.72 GB (+1.25 KV) 7.97 GB (+2.5 KV) 10.47 GB (+5.0 KV) 15.47 GB (+10.0 KV) 25.47 GB (+20.0 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 4.83 GB 5.85 GB (+0.38 KV) 6.22 GB (+0.75 KV) 6.97 GB (+1.5 KV) 8.47 GB (+3.0 KV) 11.47 GB (+6.0 KV) 17.47 GB (+12.0 KV)

Total VRAM = Model Weights + KV Cache + 0.64 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Ministral-3-14B-Instruct-2512

Use our calculator to see if this model fits your specific hardware configuration.