Back to Models

GLM-4.7-Flash

MLA 30.0B Parameters

Active Parameters: 3.0B

Model Specifications

Layers 47
Hidden Dimension 2,048
Attention Heads 20
Max Context 202K tokens
Vocabulary Size 154,880
KV LoRA Rank 512
RoPE Dimension 64

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.8 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context 202K Context
FP16 16.0 bpw FP32 63.0 GB 64.63 GB (+0.83 KV) 65.45 GB (+1.65 KV) 67.1 GB (+3.3 KV) 70.41 GB (+6.61 KV) 77.02 GB (+13.22 KV) 84.25 GB (+20.45 KV)
FP16 16.0 bpw FP16 63.0 GB 64.21 GB (+0.41 KV) 64.63 GB (+0.83 KV) 65.45 GB (+1.65 KV) 67.1 GB (+3.3 KV) 70.41 GB (+6.61 KV) 74.02 GB (+10.22 KV)
FP16 16.0 bpw Q8_0 63.0 GB 64.03 GB (+0.23 KV) 64.25 GB (+0.45 KV) 64.71 GB (+0.91 KV) 65.62 GB (+1.82 KV) 67.44 GB (+3.64 KV) 69.42 GB (+5.62 KV)
FP16 16.0 bpw FP8 (Exp) 63.0 GB 64.01 GB (+0.21 KV) 64.21 GB (+0.41 KV) 64.63 GB (+0.83 KV) 65.45 GB (+1.65 KV) 67.1 GB (+3.3 KV) 68.91 GB (+5.11 KV)
FP16 16.0 bpw Q4_0 (Exp) 63.0 GB 63.92 GB (+0.12 KV) 64.05 GB (+0.25 KV) 64.3 GB (+0.5 KV) 64.79 GB (+0.99 KV) 65.78 GB (+1.98 KV) 66.87 GB (+3.07 KV)
Q8_0 8.0 bpw FP32 31.5 GB 33.13 GB (+0.83 KV) 33.95 GB (+1.65 KV) 35.6 GB (+3.3 KV) 38.91 GB (+6.61 KV) 45.52 GB (+13.22 KV) 52.75 GB (+20.45 KV)
Q8_0 8.0 bpw FP16 31.5 GB 32.71 GB (+0.41 KV) 33.13 GB (+0.83 KV) 33.95 GB (+1.65 KV) 35.6 GB (+3.3 KV) 38.91 GB (+6.61 KV) 42.52 GB (+10.22 KV)
Q8_0 8.0 bpw Q8_0 31.5 GB 32.53 GB (+0.23 KV) 32.75 GB (+0.45 KV) 33.21 GB (+0.91 KV) 34.12 GB (+1.82 KV) 35.94 GB (+3.64 KV) 37.92 GB (+5.62 KV)
Q8_0 8.0 bpw FP8 (Exp) 31.5 GB 32.51 GB (+0.21 KV) 32.71 GB (+0.41 KV) 33.13 GB (+0.83 KV) 33.95 GB (+1.65 KV) 35.6 GB (+3.3 KV) 37.41 GB (+5.11 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 31.5 GB 32.42 GB (+0.12 KV) 32.55 GB (+0.25 KV) 32.8 GB (+0.5 KV) 33.29 GB (+0.99 KV) 34.28 GB (+1.98 KV) 35.37 GB (+3.07 KV)
Q4_K_M 4.65 bpw FP32 18.31 GB 19.94 GB (+0.83 KV) 20.76 GB (+1.65 KV) 22.41 GB (+3.3 KV) 25.72 GB (+6.61 KV) 32.33 GB (+13.22 KV) 39.56 GB (+20.45 KV)
Q4_K_M 4.65 bpw FP16 18.31 GB 19.52 GB (+0.41 KV) 19.94 GB (+0.83 KV) 20.76 GB (+1.65 KV) 22.41 GB (+3.3 KV) 25.72 GB (+6.61 KV) 29.33 GB (+10.22 KV)
Q4_K_M 4.65 bpw Q8_0 18.31 GB 19.34 GB (+0.23 KV) 19.56 GB (+0.45 KV) 20.02 GB (+0.91 KV) 20.93 GB (+1.82 KV) 22.74 GB (+3.64 KV) 24.73 GB (+5.62 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 18.31 GB 19.32 GB (+0.21 KV) 19.52 GB (+0.41 KV) 19.94 GB (+0.83 KV) 20.76 GB (+1.65 KV) 22.41 GB (+3.3 KV) 24.22 GB (+5.11 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 18.31 GB 19.23 GB (+0.12 KV) 19.36 GB (+0.25 KV) 19.61 GB (+0.5 KV) 20.1 GB (+0.99 KV) 21.09 GB (+1.98 KV) 22.18 GB (+3.07 KV)
Q4_K_S 4.58 bpw FP32 18.03 GB 19.66 GB (+0.83 KV) 20.49 GB (+1.65 KV) 22.14 GB (+3.3 KV) 25.44 GB (+6.61 KV) 32.05 GB (+13.22 KV) 39.28 GB (+20.45 KV)
Q4_K_S 4.58 bpw FP16 18.03 GB 19.25 GB (+0.41 KV) 19.66 GB (+0.83 KV) 20.49 GB (+1.65 KV) 22.14 GB (+3.3 KV) 25.44 GB (+6.61 KV) 29.06 GB (+10.22 KV)
Q4_K_S 4.58 bpw Q8_0 18.03 GB 19.06 GB (+0.23 KV) 19.29 GB (+0.45 KV) 19.74 GB (+0.91 KV) 20.65 GB (+1.82 KV) 22.47 GB (+3.64 KV) 24.46 GB (+5.62 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 18.03 GB 19.04 GB (+0.21 KV) 19.25 GB (+0.41 KV) 19.66 GB (+0.83 KV) 20.49 GB (+1.65 KV) 22.14 GB (+3.3 KV) 23.95 GB (+5.11 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 18.03 GB 18.96 GB (+0.12 KV) 19.08 GB (+0.25 KV) 19.33 GB (+0.5 KV) 19.83 GB (+0.99 KV) 20.82 GB (+1.98 KV) 21.9 GB (+3.07 KV)
Q3_K_M 3.91 bpw FP32 15.4 GB 17.02 GB (+0.83 KV) 17.85 GB (+1.65 KV) 19.5 GB (+3.3 KV) 22.81 GB (+6.61 KV) 29.41 GB (+13.22 KV) 36.64 GB (+20.45 KV)
Q3_K_M 3.91 bpw FP16 15.4 GB 16.61 GB (+0.41 KV) 17.02 GB (+0.83 KV) 17.85 GB (+1.65 KV) 19.5 GB (+3.3 KV) 22.81 GB (+6.61 KV) 26.42 GB (+10.22 KV)
Q3_K_M 3.91 bpw Q8_0 15.4 GB 16.42 GB (+0.23 KV) 16.65 GB (+0.45 KV) 17.1 GB (+0.91 KV) 18.01 GB (+1.82 KV) 19.83 GB (+3.64 KV) 21.82 GB (+5.62 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 15.4 GB 16.4 GB (+0.21 KV) 16.61 GB (+0.41 KV) 17.02 GB (+0.83 KV) 17.85 GB (+1.65 KV) 19.5 GB (+3.3 KV) 21.31 GB (+5.11 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 15.4 GB 16.32 GB (+0.12 KV) 16.44 GB (+0.25 KV) 16.69 GB (+0.5 KV) 17.19 GB (+0.99 KV) 18.18 GB (+1.98 KV) 19.26 GB (+3.07 KV)
Q2_K 2.63 bpw FP32 10.36 GB 11.98 GB (+0.83 KV) 12.81 GB (+1.65 KV) 14.46 GB (+3.3 KV) 17.77 GB (+6.61 KV) 24.37 GB (+13.22 KV) 31.6 GB (+20.45 KV)
Q2_K 2.63 bpw FP16 10.36 GB 11.57 GB (+0.41 KV) 11.98 GB (+0.83 KV) 12.81 GB (+1.65 KV) 14.46 GB (+3.3 KV) 17.77 GB (+6.61 KV) 21.38 GB (+10.22 KV)
Q2_K 2.63 bpw Q8_0 10.36 GB 11.38 GB (+0.23 KV) 11.61 GB (+0.45 KV) 12.06 GB (+0.91 KV) 12.97 GB (+1.82 KV) 14.79 GB (+3.64 KV) 16.78 GB (+5.62 KV)
Q2_K 2.63 bpw FP8 (Exp) 10.36 GB 11.36 GB (+0.21 KV) 11.57 GB (+0.41 KV) 11.98 GB (+0.83 KV) 12.81 GB (+1.65 KV) 14.46 GB (+3.3 KV) 16.27 GB (+5.11 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 10.36 GB 11.28 GB (+0.12 KV) 11.4 GB (+0.25 KV) 11.65 GB (+0.5 KV) 12.15 GB (+0.99 KV) 13.14 GB (+1.98 KV) 14.22 GB (+3.07 KV)

Total VRAM = Model Weights + KV Cache + 0.8 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run GLM-4.7-Flash

Use our calculator to see if this model fits your specific hardware configuration.