Back to Models

DeepSeek-R1-Distill-Qwen-7B

Standard Transformer 7.6B Parameters

Model Specifications

Layers 28
Hidden Dimension 3,584
Attention Heads 28
KV Heads 4
Max Context 131K tokens
Vocabulary Size 152,064

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.58 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context
FP16 16.0 bpw FP32 15.96 GB 17.41 GB (+0.88 KV) 18.29 GB (+1.75 KV) 20.04 GB (+3.5 KV) 23.54 GB (+7.0 KV) 30.54 GB (+14.0 KV)
FP16 16.0 bpw FP16 15.96 GB 16.97 GB (+0.44 KV) 17.41 GB (+0.88 KV) 18.29 GB (+1.75 KV) 20.04 GB (+3.5 KV) 23.54 GB (+7.0 KV)
FP16 16.0 bpw Q8_0 15.96 GB 16.78 GB (+0.24 KV) 17.02 GB (+0.48 KV) 17.5 GB (+0.96 KV) 18.46 GB (+1.93 KV) 20.39 GB (+3.85 KV)
FP16 16.0 bpw FP8 (Exp) 15.96 GB 16.75 GB (+0.22 KV) 16.97 GB (+0.44 KV) 17.41 GB (+0.88 KV) 18.29 GB (+1.75 KV) 20.04 GB (+3.5 KV)
FP16 16.0 bpw Q4_0 (Exp) 15.96 GB 16.67 GB (+0.13 KV) 16.8 GB (+0.26 KV) 17.06 GB (+0.53 KV) 17.59 GB (+1.05 KV) 18.64 GB (+2.1 KV)
Q8_0 8.0 bpw FP32 7.98 GB 9.43 GB (+0.88 KV) 10.31 GB (+1.75 KV) 12.06 GB (+3.5 KV) 15.56 GB (+7.0 KV) 22.56 GB (+14.0 KV)
Q8_0 8.0 bpw FP16 7.98 GB 8.99 GB (+0.44 KV) 9.43 GB (+0.88 KV) 10.31 GB (+1.75 KV) 12.06 GB (+3.5 KV) 15.56 GB (+7.0 KV)
Q8_0 8.0 bpw Q8_0 7.98 GB 8.8 GB (+0.24 KV) 9.04 GB (+0.48 KV) 9.52 GB (+0.96 KV) 10.48 GB (+1.93 KV) 12.41 GB (+3.85 KV)
Q8_0 8.0 bpw FP8 (Exp) 7.98 GB 8.77 GB (+0.22 KV) 8.99 GB (+0.44 KV) 9.43 GB (+0.88 KV) 10.31 GB (+1.75 KV) 12.06 GB (+3.5 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 7.98 GB 8.69 GB (+0.13 KV) 8.82 GB (+0.26 KV) 9.08 GB (+0.53 KV) 9.61 GB (+1.05 KV) 10.66 GB (+2.1 KV)
Q4_K_M 4.65 bpw FP32 4.64 GB 6.09 GB (+0.88 KV) 6.96 GB (+1.75 KV) 8.71 GB (+3.5 KV) 12.21 GB (+7.0 KV) 19.21 GB (+14.0 KV)
Q4_K_M 4.65 bpw FP16 4.64 GB 5.65 GB (+0.44 KV) 6.09 GB (+0.88 KV) 6.96 GB (+1.75 KV) 8.71 GB (+3.5 KV) 12.21 GB (+7.0 KV)
Q4_K_M 4.65 bpw Q8_0 4.64 GB 5.46 GB (+0.24 KV) 5.7 GB (+0.48 KV) 6.18 GB (+0.96 KV) 7.14 GB (+1.93 KV) 9.06 GB (+3.85 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 4.64 GB 5.43 GB (+0.22 KV) 5.65 GB (+0.44 KV) 6.09 GB (+0.88 KV) 6.96 GB (+1.75 KV) 8.71 GB (+3.5 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 4.64 GB 5.35 GB (+0.13 KV) 5.48 GB (+0.26 KV) 5.74 GB (+0.53 KV) 6.26 GB (+1.05 KV) 7.31 GB (+2.1 KV)
Q4_K_S 4.58 bpw FP32 4.57 GB 6.02 GB (+0.88 KV) 6.89 GB (+1.75 KV) 8.64 GB (+3.5 KV) 12.14 GB (+7.0 KV) 19.14 GB (+14.0 KV)
Q4_K_S 4.58 bpw FP16 4.57 GB 5.58 GB (+0.44 KV) 6.02 GB (+0.88 KV) 6.89 GB (+1.75 KV) 8.64 GB (+3.5 KV) 12.14 GB (+7.0 KV)
Q4_K_S 4.58 bpw Q8_0 4.57 GB 5.39 GB (+0.24 KV) 5.63 GB (+0.48 KV) 6.11 GB (+0.96 KV) 7.07 GB (+1.93 KV) 8.99 GB (+3.85 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 4.57 GB 5.36 GB (+0.22 KV) 5.58 GB (+0.44 KV) 6.02 GB (+0.88 KV) 6.89 GB (+1.75 KV) 8.64 GB (+3.5 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 4.57 GB 5.28 GB (+0.13 KV) 5.41 GB (+0.26 KV) 5.67 GB (+0.53 KV) 6.19 GB (+1.05 KV) 7.24 GB (+2.1 KV)
Q3_K_M 3.91 bpw FP32 3.9 GB 5.35 GB (+0.88 KV) 6.23 GB (+1.75 KV) 7.98 GB (+3.5 KV) 11.48 GB (+7.0 KV) 18.48 GB (+14.0 KV)
Q3_K_M 3.91 bpw FP16 3.9 GB 4.91 GB (+0.44 KV) 5.35 GB (+0.88 KV) 6.23 GB (+1.75 KV) 7.98 GB (+3.5 KV) 11.48 GB (+7.0 KV)
Q3_K_M 3.91 bpw Q8_0 3.9 GB 4.72 GB (+0.24 KV) 4.96 GB (+0.48 KV) 5.44 GB (+0.96 KV) 6.4 GB (+1.93 KV) 8.33 GB (+3.85 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 3.9 GB 4.69 GB (+0.22 KV) 4.91 GB (+0.44 KV) 5.35 GB (+0.88 KV) 6.23 GB (+1.75 KV) 7.98 GB (+3.5 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 3.9 GB 4.61 GB (+0.13 KV) 4.74 GB (+0.26 KV) 5.0 GB (+0.53 KV) 5.53 GB (+1.05 KV) 6.58 GB (+2.1 KV)
Q2_K 2.63 bpw FP32 2.62 GB 4.07 GB (+0.88 KV) 4.95 GB (+1.75 KV) 6.7 GB (+3.5 KV) 10.2 GB (+7.0 KV) 17.2 GB (+14.0 KV)
Q2_K 2.63 bpw FP16 2.62 GB 3.64 GB (+0.44 KV) 4.07 GB (+0.88 KV) 4.95 GB (+1.75 KV) 6.7 GB (+3.5 KV) 10.2 GB (+7.0 KV)
Q2_K 2.63 bpw Q8_0 2.62 GB 3.44 GB (+0.24 KV) 3.68 GB (+0.48 KV) 4.16 GB (+0.96 KV) 5.12 GB (+1.93 KV) 7.05 GB (+3.85 KV)
Q2_K 2.63 bpw FP8 (Exp) 2.62 GB 3.42 GB (+0.22 KV) 3.64 GB (+0.44 KV) 4.07 GB (+0.88 KV) 4.95 GB (+1.75 KV) 6.7 GB (+3.5 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 2.62 GB 3.33 GB (+0.13 KV) 3.46 GB (+0.26 KV) 3.72 GB (+0.53 KV) 4.25 GB (+1.05 KV) 5.3 GB (+2.1 KV)

Total VRAM = Model Weights + KV Cache + 0.58 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run DeepSeek-R1-Distill-Qwen-7B

Use our calculator to see if this model fits your specific hardware configuration.