Back to Models

Ministral-3-8B-Instruct-2512

Standard Transformer 8.4B Parameters

Model Specifications

Layers 34
Hidden Dimension 4,096
Attention Heads 32
KV Heads 8
Max Context 262K tokens
Vocabulary Size 131,072

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.58 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context 262K Context
FP16 16.0 bpw FP32 17.64 GB 20.35 GB (+2.12 KV) 22.47 GB (+4.25 KV) 26.72 GB (+8.5 KV) 35.22 GB (+17.0 KV) 52.22 GB (+34.0 KV) 86.22 GB (+68.0 KV)
FP16 16.0 bpw FP16 17.64 GB 19.29 GB (+1.06 KV) 20.35 GB (+2.12 KV) 22.47 GB (+4.25 KV) 26.72 GB (+8.5 KV) 35.22 GB (+17.0 KV) 52.22 GB (+34.0 KV)
FP16 16.0 bpw Q8_0 17.64 GB 18.81 GB (+0.58 KV) 19.39 GB (+1.17 KV) 20.56 GB (+2.34 KV) 22.9 GB (+4.68 KV) 27.57 GB (+9.35 KV) 36.92 GB (+18.7 KV)
FP16 16.0 bpw FP8 (Exp) 17.64 GB 18.76 GB (+0.53 KV) 19.29 GB (+1.06 KV) 20.35 GB (+2.12 KV) 22.47 GB (+4.25 KV) 26.72 GB (+8.5 KV) 35.22 GB (+17.0 KV)
FP16 16.0 bpw Q4_0 (Exp) 17.64 GB 18.54 GB (+0.32 KV) 18.86 GB (+0.64 KV) 19.5 GB (+1.27 KV) 20.77 GB (+2.55 KV) 23.32 GB (+5.1 KV) 28.42 GB (+10.2 KV)
Q8_0 8.0 bpw FP32 8.82 GB 11.53 GB (+2.12 KV) 13.65 GB (+4.25 KV) 17.9 GB (+8.5 KV) 26.4 GB (+17.0 KV) 43.4 GB (+34.0 KV) 77.4 GB (+68.0 KV)
Q8_0 8.0 bpw FP16 8.82 GB 10.47 GB (+1.06 KV) 11.53 GB (+2.12 KV) 13.65 GB (+4.25 KV) 17.9 GB (+8.5 KV) 26.4 GB (+17.0 KV) 43.4 GB (+34.0 KV)
Q8_0 8.0 bpw Q8_0 8.82 GB 9.99 GB (+0.58 KV) 10.57 GB (+1.17 KV) 11.74 GB (+2.34 KV) 14.08 GB (+4.68 KV) 18.75 GB (+9.35 KV) 28.1 GB (+18.7 KV)
Q8_0 8.0 bpw FP8 (Exp) 8.82 GB 9.94 GB (+0.53 KV) 10.47 GB (+1.06 KV) 11.53 GB (+2.12 KV) 13.65 GB (+4.25 KV) 17.9 GB (+8.5 KV) 26.4 GB (+17.0 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 8.82 GB 9.72 GB (+0.32 KV) 10.04 GB (+0.64 KV) 10.68 GB (+1.27 KV) 11.95 GB (+2.55 KV) 14.5 GB (+5.1 KV) 19.6 GB (+10.2 KV)
Q4_K_M 4.65 bpw FP32 5.13 GB 7.84 GB (+2.12 KV) 9.96 GB (+4.25 KV) 14.21 GB (+8.5 KV) 22.71 GB (+17.0 KV) 39.71 GB (+34.0 KV) 73.71 GB (+68.0 KV)
Q4_K_M 4.65 bpw FP16 5.13 GB 6.77 GB (+1.06 KV) 7.84 GB (+2.12 KV) 9.96 GB (+4.25 KV) 14.21 GB (+8.5 KV) 22.71 GB (+17.0 KV) 39.71 GB (+34.0 KV)
Q4_K_M 4.65 bpw Q8_0 5.13 GB 6.29 GB (+0.58 KV) 6.88 GB (+1.17 KV) 8.05 GB (+2.34 KV) 10.39 GB (+4.68 KV) 15.06 GB (+9.35 KV) 24.41 GB (+18.7 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 5.13 GB 6.24 GB (+0.53 KV) 6.77 GB (+1.06 KV) 7.84 GB (+2.12 KV) 9.96 GB (+4.25 KV) 14.21 GB (+8.5 KV) 22.71 GB (+17.0 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 5.13 GB 6.03 GB (+0.32 KV) 6.35 GB (+0.64 KV) 6.99 GB (+1.27 KV) 8.26 GB (+2.55 KV) 10.81 GB (+5.1 KV) 15.91 GB (+10.2 KV)
Q4_K_S 4.58 bpw FP32 5.05 GB 7.76 GB (+2.12 KV) 9.88 GB (+4.25 KV) 14.13 GB (+8.5 KV) 22.63 GB (+17.0 KV) 39.63 GB (+34.0 KV) 73.63 GB (+68.0 KV)
Q4_K_S 4.58 bpw FP16 5.05 GB 6.7 GB (+1.06 KV) 7.76 GB (+2.12 KV) 9.88 GB (+4.25 KV) 14.13 GB (+8.5 KV) 22.63 GB (+17.0 KV) 39.63 GB (+34.0 KV)
Q4_K_S 4.58 bpw Q8_0 5.05 GB 6.22 GB (+0.58 KV) 6.8 GB (+1.17 KV) 7.97 GB (+2.34 KV) 10.31 GB (+4.68 KV) 14.98 GB (+9.35 KV) 24.33 GB (+18.7 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 5.05 GB 6.16 GB (+0.53 KV) 6.7 GB (+1.06 KV) 7.76 GB (+2.12 KV) 9.88 GB (+4.25 KV) 14.13 GB (+8.5 KV) 22.63 GB (+17.0 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 5.05 GB 5.95 GB (+0.32 KV) 6.27 GB (+0.64 KV) 6.91 GB (+1.27 KV) 8.18 GB (+2.55 KV) 10.73 GB (+5.1 KV) 15.83 GB (+10.2 KV)
Q3_K_M 3.91 bpw FP32 4.31 GB 7.02 GB (+2.12 KV) 9.14 GB (+4.25 KV) 13.39 GB (+8.5 KV) 21.89 GB (+17.0 KV) 38.89 GB (+34.0 KV) 72.89 GB (+68.0 KV)
Q3_K_M 3.91 bpw FP16 4.31 GB 5.96 GB (+1.06 KV) 7.02 GB (+2.12 KV) 9.14 GB (+4.25 KV) 13.39 GB (+8.5 KV) 21.89 GB (+17.0 KV) 38.89 GB (+34.0 KV)
Q3_K_M 3.91 bpw Q8_0 4.31 GB 5.48 GB (+0.58 KV) 6.06 GB (+1.17 KV) 7.23 GB (+2.34 KV) 9.57 GB (+4.68 KV) 14.24 GB (+9.35 KV) 23.59 GB (+18.7 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 4.31 GB 5.43 GB (+0.53 KV) 5.96 GB (+1.06 KV) 7.02 GB (+2.12 KV) 9.14 GB (+4.25 KV) 13.39 GB (+8.5 KV) 21.89 GB (+17.0 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 4.31 GB 5.21 GB (+0.32 KV) 5.53 GB (+0.64 KV) 6.17 GB (+1.27 KV) 7.44 GB (+2.55 KV) 9.99 GB (+5.1 KV) 15.09 GB (+10.2 KV)
Q2_K 2.63 bpw FP32 2.9 GB 5.61 GB (+2.12 KV) 7.73 GB (+4.25 KV) 11.98 GB (+8.5 KV) 20.48 GB (+17.0 KV) 37.48 GB (+34.0 KV) 71.48 GB (+68.0 KV)
Q2_K 2.63 bpw FP16 2.9 GB 4.55 GB (+1.06 KV) 5.61 GB (+2.12 KV) 7.73 GB (+4.25 KV) 11.98 GB (+8.5 KV) 20.48 GB (+17.0 KV) 37.48 GB (+34.0 KV)
Q2_K 2.63 bpw Q8_0 2.9 GB 4.07 GB (+0.58 KV) 4.65 GB (+1.17 KV) 5.82 GB (+2.34 KV) 8.16 GB (+4.68 KV) 12.83 GB (+9.35 KV) 22.18 GB (+18.7 KV)
Q2_K 2.63 bpw FP8 (Exp) 2.9 GB 4.01 GB (+0.53 KV) 4.55 GB (+1.06 KV) 5.61 GB (+2.12 KV) 7.73 GB (+4.25 KV) 11.98 GB (+8.5 KV) 20.48 GB (+17.0 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 2.9 GB 3.8 GB (+0.32 KV) 4.12 GB (+0.64 KV) 4.76 GB (+1.27 KV) 6.03 GB (+2.55 KV) 8.58 GB (+5.1 KV) 13.68 GB (+10.2 KV)

Total VRAM = Model Weights + KV Cache + 0.58 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Ministral-3-8B-Instruct-2512

Use our calculator to see if this model fits your specific hardware configuration.