Back to Models

HunYuan-MT1.5-7B

Standard Transformer 7.0B Parameters

Model Specifications

Layers 32
Hidden Dimension 4,096
Attention Heads 32
KV Heads 8
Max Context 262K tokens
Vocabulary Size 128,167

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.57 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context 262K Context
FP16 16.0 bpw FP32 14.7 GB 17.27 GB (+2.0 KV) 19.27 GB (+4.0 KV) 23.27 GB (+8.0 KV) 31.27 GB (+16.0 KV) 47.27 GB (+32.0 KV) 79.27 GB (+64.0 KV)
FP16 16.0 bpw FP16 14.7 GB 16.27 GB (+1.0 KV) 17.27 GB (+2.0 KV) 19.27 GB (+4.0 KV) 23.27 GB (+8.0 KV) 31.27 GB (+16.0 KV) 47.27 GB (+32.0 KV)
FP16 16.0 bpw Q8_0 14.7 GB 15.82 GB (+0.55 KV) 16.37 GB (+1.1 KV) 17.47 GB (+2.2 KV) 19.67 GB (+4.4 KV) 24.07 GB (+8.8 KV) 32.87 GB (+17.6 KV)
FP16 16.0 bpw FP8 (Exp) 14.7 GB 15.77 GB (+0.5 KV) 16.27 GB (+1.0 KV) 17.27 GB (+2.0 KV) 19.27 GB (+4.0 KV) 23.27 GB (+8.0 KV) 31.27 GB (+16.0 KV)
FP16 16.0 bpw Q4_0 (Exp) 14.7 GB 15.57 GB (+0.3 KV) 15.87 GB (+0.6 KV) 16.47 GB (+1.2 KV) 17.67 GB (+2.4 KV) 20.07 GB (+4.8 KV) 24.87 GB (+9.6 KV)
Q8_0 8.0 bpw FP32 7.35 GB 9.92 GB (+2.0 KV) 11.92 GB (+4.0 KV) 15.92 GB (+8.0 KV) 23.92 GB (+16.0 KV) 39.92 GB (+32.0 KV) 71.92 GB (+64.0 KV)
Q8_0 8.0 bpw FP16 7.35 GB 8.92 GB (+1.0 KV) 9.92 GB (+2.0 KV) 11.92 GB (+4.0 KV) 15.92 GB (+8.0 KV) 23.92 GB (+16.0 KV) 39.92 GB (+32.0 KV)
Q8_0 8.0 bpw Q8_0 7.35 GB 8.47 GB (+0.55 KV) 9.02 GB (+1.1 KV) 10.12 GB (+2.2 KV) 12.32 GB (+4.4 KV) 16.72 GB (+8.8 KV) 25.52 GB (+17.6 KV)
Q8_0 8.0 bpw FP8 (Exp) 7.35 GB 8.42 GB (+0.5 KV) 8.92 GB (+1.0 KV) 9.92 GB (+2.0 KV) 11.92 GB (+4.0 KV) 15.92 GB (+8.0 KV) 23.92 GB (+16.0 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 7.35 GB 8.22 GB (+0.3 KV) 8.52 GB (+0.6 KV) 9.12 GB (+1.2 KV) 10.32 GB (+2.4 KV) 12.72 GB (+4.8 KV) 17.52 GB (+9.6 KV)
Q4_K_M 4.65 bpw FP32 4.27 GB 6.84 GB (+2.0 KV) 8.84 GB (+4.0 KV) 12.84 GB (+8.0 KV) 20.84 GB (+16.0 KV) 36.84 GB (+32.0 KV) 68.84 GB (+64.0 KV)
Q4_K_M 4.65 bpw FP16 4.27 GB 5.84 GB (+1.0 KV) 6.84 GB (+2.0 KV) 8.84 GB (+4.0 KV) 12.84 GB (+8.0 KV) 20.84 GB (+16.0 KV) 36.84 GB (+32.0 KV)
Q4_K_M 4.65 bpw Q8_0 4.27 GB 5.39 GB (+0.55 KV) 5.94 GB (+1.1 KV) 7.04 GB (+2.2 KV) 9.24 GB (+4.4 KV) 13.64 GB (+8.8 KV) 22.44 GB (+17.6 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 4.27 GB 5.34 GB (+0.5 KV) 5.84 GB (+1.0 KV) 6.84 GB (+2.0 KV) 8.84 GB (+4.0 KV) 12.84 GB (+8.0 KV) 20.84 GB (+16.0 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 4.27 GB 5.14 GB (+0.3 KV) 5.44 GB (+0.6 KV) 6.04 GB (+1.2 KV) 7.24 GB (+2.4 KV) 9.64 GB (+4.8 KV) 14.44 GB (+9.6 KV)
Q4_K_S 4.58 bpw FP32 4.21 GB 6.78 GB (+2.0 KV) 8.78 GB (+4.0 KV) 12.78 GB (+8.0 KV) 20.78 GB (+16.0 KV) 36.78 GB (+32.0 KV) 68.78 GB (+64.0 KV)
Q4_K_S 4.58 bpw FP16 4.21 GB 5.78 GB (+1.0 KV) 6.78 GB (+2.0 KV) 8.78 GB (+4.0 KV) 12.78 GB (+8.0 KV) 20.78 GB (+16.0 KV) 36.78 GB (+32.0 KV)
Q4_K_S 4.58 bpw Q8_0 4.21 GB 5.33 GB (+0.55 KV) 5.88 GB (+1.1 KV) 6.98 GB (+2.2 KV) 9.18 GB (+4.4 KV) 13.58 GB (+8.8 KV) 22.38 GB (+17.6 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 4.21 GB 5.28 GB (+0.5 KV) 5.78 GB (+1.0 KV) 6.78 GB (+2.0 KV) 8.78 GB (+4.0 KV) 12.78 GB (+8.0 KV) 20.78 GB (+16.0 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 4.21 GB 5.08 GB (+0.3 KV) 5.38 GB (+0.6 KV) 5.98 GB (+1.2 KV) 7.18 GB (+2.4 KV) 9.58 GB (+4.8 KV) 14.38 GB (+9.6 KV)
Q3_K_M 3.91 bpw FP32 3.59 GB 6.16 GB (+2.0 KV) 8.16 GB (+4.0 KV) 12.16 GB (+8.0 KV) 20.16 GB (+16.0 KV) 36.16 GB (+32.0 KV) 68.16 GB (+64.0 KV)
Q3_K_M 3.91 bpw FP16 3.59 GB 5.16 GB (+1.0 KV) 6.16 GB (+2.0 KV) 8.16 GB (+4.0 KV) 12.16 GB (+8.0 KV) 20.16 GB (+16.0 KV) 36.16 GB (+32.0 KV)
Q3_K_M 3.91 bpw Q8_0 3.59 GB 4.71 GB (+0.55 KV) 5.26 GB (+1.1 KV) 6.36 GB (+2.2 KV) 8.56 GB (+4.4 KV) 12.96 GB (+8.8 KV) 21.76 GB (+17.6 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 3.59 GB 4.66 GB (+0.5 KV) 5.16 GB (+1.0 KV) 6.16 GB (+2.0 KV) 8.16 GB (+4.0 KV) 12.16 GB (+8.0 KV) 20.16 GB (+16.0 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 3.59 GB 4.46 GB (+0.3 KV) 4.76 GB (+0.6 KV) 5.36 GB (+1.2 KV) 6.56 GB (+2.4 KV) 8.96 GB (+4.8 KV) 13.76 GB (+9.6 KV)
Q2_K 2.63 bpw FP32 2.42 GB 4.99 GB (+2.0 KV) 6.99 GB (+4.0 KV) 10.99 GB (+8.0 KV) 18.99 GB (+16.0 KV) 34.99 GB (+32.0 KV) 66.99 GB (+64.0 KV)
Q2_K 2.63 bpw FP16 2.42 GB 3.99 GB (+1.0 KV) 4.99 GB (+2.0 KV) 6.99 GB (+4.0 KV) 10.99 GB (+8.0 KV) 18.99 GB (+16.0 KV) 34.99 GB (+32.0 KV)
Q2_K 2.63 bpw Q8_0 2.42 GB 3.54 GB (+0.55 KV) 4.09 GB (+1.1 KV) 5.19 GB (+2.2 KV) 7.39 GB (+4.4 KV) 11.79 GB (+8.8 KV) 20.59 GB (+17.6 KV)
Q2_K 2.63 bpw FP8 (Exp) 2.42 GB 3.49 GB (+0.5 KV) 3.99 GB (+1.0 KV) 4.99 GB (+2.0 KV) 6.99 GB (+4.0 KV) 10.99 GB (+8.0 KV) 18.99 GB (+16.0 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 2.42 GB 3.29 GB (+0.3 KV) 3.59 GB (+0.6 KV) 4.19 GB (+1.2 KV) 5.39 GB (+2.4 KV) 7.79 GB (+4.8 KV) 12.59 GB (+9.6 KV)

Total VRAM = Model Weights + KV Cache + 0.57 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run HunYuan-MT1.5-7B

Use our calculator to see if this model fits your specific hardware configuration.