Back to Models

HunYuan-MT1.5-1.8B

Standard Transformer 1.8B Parameters

Model Specifications

Layers 32
Hidden Dimension 2,048
Attention Heads 16
KV Heads 4
Max Context 262K tokens
Vocabulary Size 120,818

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.52 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context 262K Context
FP16 16.0 bpw FP32 3.78 GB 5.29 GB (+0.99 KV) 6.28 GB (+1.98 KV) 8.27 GB (+3.97 KV) 12.24 GB (+7.94 KV) 20.17 GB (+15.88 KV) 36.05 GB (+31.75 KV)
FP16 16.0 bpw FP16 3.78 GB 4.79 GB (+0.5 KV) 5.29 GB (+0.99 KV) 6.28 GB (+1.98 KV) 8.27 GB (+3.97 KV) 12.24 GB (+7.94 KV) 20.17 GB (+15.88 KV)
FP16 16.0 bpw Q8_0 3.78 GB 4.57 GB (+0.27 KV) 4.84 GB (+0.55 KV) 5.39 GB (+1.09 KV) 6.48 GB (+2.18 KV) 8.66 GB (+4.37 KV) 13.03 GB (+8.73 KV)
FP16 16.0 bpw FP8 (Exp) 3.78 GB 4.55 GB (+0.25 KV) 4.79 GB (+0.5 KV) 5.29 GB (+0.99 KV) 6.28 GB (+1.98 KV) 8.27 GB (+3.97 KV) 12.24 GB (+7.94 KV)
FP16 16.0 bpw Q4_0 (Exp) 3.78 GB 4.45 GB (+0.15 KV) 4.6 GB (+0.3 KV) 4.89 GB (+0.6 KV) 5.49 GB (+1.19 KV) 6.68 GB (+2.38 KV) 9.06 GB (+4.76 KV)
Q8_0 8.0 bpw FP32 1.89 GB 3.4 GB (+0.99 KV) 4.39 GB (+1.98 KV) 6.38 GB (+3.97 KV) 10.35 GB (+7.94 KV) 18.28 GB (+15.88 KV) 34.16 GB (+31.75 KV)
Q8_0 8.0 bpw FP16 1.89 GB 2.9 GB (+0.5 KV) 3.4 GB (+0.99 KV) 4.39 GB (+1.98 KV) 6.38 GB (+3.97 KV) 10.35 GB (+7.94 KV) 18.28 GB (+15.88 KV)
Q8_0 8.0 bpw Q8_0 1.89 GB 2.68 GB (+0.27 KV) 2.95 GB (+0.55 KV) 3.5 GB (+1.09 KV) 4.59 GB (+2.18 KV) 6.77 GB (+4.37 KV) 11.14 GB (+8.73 KV)
Q8_0 8.0 bpw FP8 (Exp) 1.89 GB 2.66 GB (+0.25 KV) 2.9 GB (+0.5 KV) 3.4 GB (+0.99 KV) 4.39 GB (+1.98 KV) 6.38 GB (+3.97 KV) 10.35 GB (+7.94 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 1.89 GB 2.56 GB (+0.15 KV) 2.71 GB (+0.3 KV) 3.0 GB (+0.6 KV) 3.6 GB (+1.19 KV) 4.79 GB (+2.38 KV) 7.17 GB (+4.76 KV)
Q4_K_M 4.65 bpw FP32 1.1 GB 2.61 GB (+0.99 KV) 3.6 GB (+1.98 KV) 5.59 GB (+3.97 KV) 9.55 GB (+7.94 KV) 17.49 GB (+15.88 KV) 33.37 GB (+31.75 KV)
Q4_K_M 4.65 bpw FP16 1.1 GB 2.11 GB (+0.5 KV) 2.61 GB (+0.99 KV) 3.6 GB (+1.98 KV) 5.59 GB (+3.97 KV) 9.55 GB (+7.94 KV) 17.49 GB (+15.88 KV)
Q4_K_M 4.65 bpw Q8_0 1.1 GB 1.89 GB (+0.27 KV) 2.16 GB (+0.55 KV) 2.71 GB (+1.09 KV) 3.8 GB (+2.18 KV) 5.98 GB (+4.37 KV) 10.35 GB (+8.73 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 1.1 GB 1.86 GB (+0.25 KV) 2.11 GB (+0.5 KV) 2.61 GB (+0.99 KV) 3.6 GB (+1.98 KV) 5.59 GB (+3.97 KV) 9.55 GB (+7.94 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 1.1 GB 1.77 GB (+0.15 KV) 1.91 GB (+0.3 KV) 2.21 GB (+0.6 KV) 2.81 GB (+1.19 KV) 4.0 GB (+2.38 KV) 6.38 GB (+4.76 KV)
Q4_K_S 4.58 bpw FP32 1.08 GB 2.59 GB (+0.99 KV) 3.58 GB (+1.98 KV) 5.57 GB (+3.97 KV) 9.54 GB (+7.94 KV) 17.48 GB (+15.88 KV) 33.35 GB (+31.75 KV)
Q4_K_S 4.58 bpw FP16 1.08 GB 2.1 GB (+0.5 KV) 2.59 GB (+0.99 KV) 3.58 GB (+1.98 KV) 5.57 GB (+3.97 KV) 9.54 GB (+7.94 KV) 17.48 GB (+15.88 KV)
Q4_K_S 4.58 bpw Q8_0 1.08 GB 1.87 GB (+0.27 KV) 2.15 GB (+0.55 KV) 2.69 GB (+1.09 KV) 3.78 GB (+2.18 KV) 5.97 GB (+4.37 KV) 10.33 GB (+8.73 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 1.08 GB 1.85 GB (+0.25 KV) 2.1 GB (+0.5 KV) 2.59 GB (+0.99 KV) 3.58 GB (+1.98 KV) 5.57 GB (+3.97 KV) 9.54 GB (+7.94 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 1.08 GB 1.75 GB (+0.15 KV) 1.9 GB (+0.3 KV) 2.2 GB (+0.6 KV) 2.79 GB (+1.19 KV) 3.98 GB (+2.38 KV) 6.36 GB (+4.76 KV)
Q3_K_M 3.91 bpw FP32 0.92 GB 2.43 GB (+0.99 KV) 3.43 GB (+1.98 KV) 5.41 GB (+3.97 KV) 9.38 GB (+7.94 KV) 17.32 GB (+15.88 KV) 33.19 GB (+31.75 KV)
Q3_K_M 3.91 bpw FP16 0.92 GB 1.94 GB (+0.5 KV) 2.43 GB (+0.99 KV) 3.43 GB (+1.98 KV) 5.41 GB (+3.97 KV) 9.38 GB (+7.94 KV) 17.32 GB (+15.88 KV)
Q3_K_M 3.91 bpw Q8_0 0.92 GB 1.71 GB (+0.27 KV) 1.99 GB (+0.55 KV) 2.53 GB (+1.09 KV) 3.62 GB (+2.18 KV) 5.81 GB (+4.37 KV) 10.17 GB (+8.73 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 0.92 GB 1.69 GB (+0.25 KV) 1.94 GB (+0.5 KV) 2.43 GB (+0.99 KV) 3.43 GB (+1.98 KV) 5.41 GB (+3.97 KV) 9.38 GB (+7.94 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 0.92 GB 1.59 GB (+0.15 KV) 1.74 GB (+0.3 KV) 2.04 GB (+0.6 KV) 2.63 GB (+1.19 KV) 3.82 GB (+2.38 KV) 6.2 GB (+4.76 KV)
Q2_K 2.63 bpw FP32 0.62 GB 2.13 GB (+0.99 KV) 3.12 GB (+1.98 KV) 5.11 GB (+3.97 KV) 9.08 GB (+7.94 KV) 17.01 GB (+15.88 KV) 32.89 GB (+31.75 KV)
Q2_K 2.63 bpw FP16 0.62 GB 1.64 GB (+0.5 KV) 2.13 GB (+0.99 KV) 3.12 GB (+1.98 KV) 5.11 GB (+3.97 KV) 9.08 GB (+7.94 KV) 17.01 GB (+15.88 KV)
Q2_K 2.63 bpw Q8_0 0.62 GB 1.41 GB (+0.27 KV) 1.69 GB (+0.55 KV) 2.23 GB (+1.09 KV) 3.32 GB (+2.18 KV) 5.5 GB (+4.37 KV) 9.87 GB (+8.73 KV)
Q2_K 2.63 bpw FP8 (Exp) 0.62 GB 1.39 GB (+0.25 KV) 1.64 GB (+0.5 KV) 2.13 GB (+0.99 KV) 3.12 GB (+1.98 KV) 5.11 GB (+3.97 KV) 9.08 GB (+7.94 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 0.62 GB 1.29 GB (+0.15 KV) 1.44 GB (+0.3 KV) 1.73 GB (+0.6 KV) 2.33 GB (+1.19 KV) 3.52 GB (+2.38 KV) 5.9 GB (+4.76 KV)

Total VRAM = Model Weights + KV Cache + 0.52 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run HunYuan-MT1.5-1.8B

Use our calculator to see if this model fits your specific hardware configuration.