Back to Models

Kimi-Linear-48B-A3B-Base

MLA 48.0B Parameters

Active Parameters: 3.0B

Model Specifications

Layers 27
Hidden Dimension 2,304
Attention Heads 32
Max Context 1M tokens
Vocabulary Size 163,840
KV LoRA Rank 512
RoPE Dimension 64
KV Sharing 4x compression

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.98 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context 1M Context
FP16 16.0 bpw FP32 100.8 GB 101.9 GB (+0.12 KV) 102.02 GB (+0.24 KV) 102.25 GB (+0.47 KV) 102.73 GB (+0.95 KV) 103.68 GB (+1.9 KV) 116.97 GB (+15.19 KV)
FP16 16.0 bpw FP16 100.8 GB 101.84 GB (+0.06 KV) 101.9 GB (+0.12 KV) 102.02 GB (+0.24 KV) 102.25 GB (+0.47 KV) 102.73 GB (+0.95 KV) 109.37 GB (+7.59 KV)
FP16 16.0 bpw Q8_0 100.8 GB 101.81 GB (+0.03 KV) 101.85 GB (+0.07 KV) 101.91 GB (+0.13 KV) 102.04 GB (+0.26 KV) 102.3 GB (+0.52 KV) 105.96 GB (+4.18 KV)
FP16 16.0 bpw FP8 (Exp) 100.8 GB 101.81 GB (+0.03 KV) 101.84 GB (+0.06 KV) 101.9 GB (+0.12 KV) 102.02 GB (+0.24 KV) 102.25 GB (+0.47 KV) 105.58 GB (+3.8 KV)
FP16 16.0 bpw Q4_0 (Exp) 100.8 GB 101.8 GB (+0.02 KV) 101.82 GB (+0.04 KV) 101.85 GB (+0.07 KV) 101.92 GB (+0.14 KV) 102.06 GB (+0.28 KV) 104.06 GB (+2.28 KV)
Q8_0 8.0 bpw FP32 50.4 GB 51.5 GB (+0.12 KV) 51.62 GB (+0.24 KV) 51.85 GB (+0.47 KV) 52.33 GB (+0.95 KV) 53.28 GB (+1.9 KV) 66.57 GB (+15.19 KV)
Q8_0 8.0 bpw FP16 50.4 GB 51.44 GB (+0.06 KV) 51.5 GB (+0.12 KV) 51.62 GB (+0.24 KV) 51.85 GB (+0.47 KV) 52.33 GB (+0.95 KV) 58.97 GB (+7.59 KV)
Q8_0 8.0 bpw Q8_0 50.4 GB 51.41 GB (+0.03 KV) 51.45 GB (+0.07 KV) 51.51 GB (+0.13 KV) 51.64 GB (+0.26 KV) 51.9 GB (+0.52 KV) 55.56 GB (+4.18 KV)
Q8_0 8.0 bpw FP8 (Exp) 50.4 GB 51.41 GB (+0.03 KV) 51.44 GB (+0.06 KV) 51.5 GB (+0.12 KV) 51.62 GB (+0.24 KV) 51.85 GB (+0.47 KV) 55.18 GB (+3.8 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 50.4 GB 51.4 GB (+0.02 KV) 51.42 GB (+0.04 KV) 51.45 GB (+0.07 KV) 51.52 GB (+0.14 KV) 51.66 GB (+0.28 KV) 53.66 GB (+2.28 KV)
Q4_K_M 4.65 bpw FP32 29.3 GB 30.39 GB (+0.12 KV) 30.51 GB (+0.24 KV) 30.75 GB (+0.47 KV) 31.22 GB (+0.95 KV) 32.17 GB (+1.9 KV) 45.46 GB (+15.19 KV)
Q4_K_M 4.65 bpw FP16 29.3 GB 30.33 GB (+0.06 KV) 30.39 GB (+0.12 KV) 30.51 GB (+0.24 KV) 30.75 GB (+0.47 KV) 31.22 GB (+0.95 KV) 37.87 GB (+7.59 KV)
Q4_K_M 4.65 bpw Q8_0 29.3 GB 30.31 GB (+0.03 KV) 30.34 GB (+0.07 KV) 30.41 GB (+0.13 KV) 30.54 GB (+0.26 KV) 30.8 GB (+0.52 KV) 34.45 GB (+4.18 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 29.3 GB 30.3 GB (+0.03 KV) 30.33 GB (+0.06 KV) 30.39 GB (+0.12 KV) 30.51 GB (+0.24 KV) 30.75 GB (+0.47 KV) 34.07 GB (+3.8 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 29.3 GB 30.29 GB (+0.02 KV) 30.31 GB (+0.04 KV) 30.35 GB (+0.07 KV) 30.42 GB (+0.14 KV) 30.56 GB (+0.28 KV) 32.55 GB (+2.28 KV)
Q4_K_S 4.58 bpw FP32 28.85 GB 29.95 GB (+0.12 KV) 30.07 GB (+0.24 KV) 30.31 GB (+0.47 KV) 30.78 GB (+0.95 KV) 31.73 GB (+1.9 KV) 45.02 GB (+15.19 KV)
Q4_K_S 4.58 bpw FP16 28.85 GB 29.89 GB (+0.06 KV) 29.95 GB (+0.12 KV) 30.07 GB (+0.24 KV) 30.31 GB (+0.47 KV) 30.78 GB (+0.95 KV) 37.43 GB (+7.59 KV)
Q4_K_S 4.58 bpw Q8_0 28.85 GB 29.87 GB (+0.03 KV) 29.9 GB (+0.07 KV) 29.96 GB (+0.13 KV) 30.1 GB (+0.26 KV) 30.36 GB (+0.52 KV) 34.01 GB (+4.18 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 28.85 GB 29.86 GB (+0.03 KV) 29.89 GB (+0.06 KV) 29.95 GB (+0.12 KV) 30.07 GB (+0.24 KV) 30.31 GB (+0.47 KV) 33.63 GB (+3.8 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 28.85 GB 29.85 GB (+0.02 KV) 29.87 GB (+0.04 KV) 29.91 GB (+0.07 KV) 29.98 GB (+0.14 KV) 30.12 GB (+0.28 KV) 32.11 GB (+2.28 KV)
Q3_K_M 3.91 bpw FP32 24.63 GB 25.73 GB (+0.12 KV) 25.85 GB (+0.24 KV) 26.09 GB (+0.47 KV) 26.56 GB (+0.95 KV) 27.51 GB (+1.9 KV) 40.8 GB (+15.19 KV)
Q3_K_M 3.91 bpw FP16 24.63 GB 25.67 GB (+0.06 KV) 25.73 GB (+0.12 KV) 25.85 GB (+0.24 KV) 26.09 GB (+0.47 KV) 26.56 GB (+0.95 KV) 33.21 GB (+7.59 KV)
Q3_K_M 3.91 bpw Q8_0 24.63 GB 25.65 GB (+0.03 KV) 25.68 GB (+0.07 KV) 25.74 GB (+0.13 KV) 25.87 GB (+0.26 KV) 26.14 GB (+0.52 KV) 29.79 GB (+4.18 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 24.63 GB 25.64 GB (+0.03 KV) 25.67 GB (+0.06 KV) 25.73 GB (+0.12 KV) 25.85 GB (+0.24 KV) 26.09 GB (+0.47 KV) 29.41 GB (+3.8 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 24.63 GB 25.63 GB (+0.02 KV) 25.65 GB (+0.04 KV) 25.68 GB (+0.07 KV) 25.76 GB (+0.14 KV) 25.9 GB (+0.28 KV) 27.89 GB (+2.28 KV)
Q2_K 2.63 bpw FP32 16.57 GB 17.67 GB (+0.12 KV) 17.79 GB (+0.24 KV) 18.02 GB (+0.47 KV) 18.5 GB (+0.95 KV) 19.45 GB (+1.9 KV) 32.74 GB (+15.19 KV)
Q2_K 2.63 bpw FP16 16.57 GB 17.61 GB (+0.06 KV) 17.67 GB (+0.12 KV) 17.79 GB (+0.24 KV) 18.02 GB (+0.47 KV) 18.5 GB (+0.95 KV) 25.14 GB (+7.59 KV)
Q2_K 2.63 bpw Q8_0 16.57 GB 17.58 GB (+0.03 KV) 17.61 GB (+0.07 KV) 17.68 GB (+0.13 KV) 17.81 GB (+0.26 KV) 18.07 GB (+0.52 KV) 21.73 GB (+4.18 KV)
Q2_K 2.63 bpw FP8 (Exp) 16.57 GB 17.58 GB (+0.03 KV) 17.61 GB (+0.06 KV) 17.67 GB (+0.12 KV) 17.79 GB (+0.24 KV) 18.02 GB (+0.47 KV) 21.35 GB (+3.8 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 16.57 GB 17.57 GB (+0.02 KV) 17.58 GB (+0.04 KV) 17.62 GB (+0.07 KV) 17.69 GB (+0.14 KV) 17.83 GB (+0.28 KV) 19.83 GB (+2.28 KV)

Total VRAM = Model Weights + KV Cache + 0.98 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Kimi-Linear-48B-A3B-Base

Use our calculator to see if this model fits your specific hardware configuration.