Back to Models

DeepSeek-R1-Distill-Qwen-14B

Standard Transformer 14.7B Parameters

Model Specifications

Layers 48
Hidden Dimension 5,120
Attention Heads 40
KV Heads 8
Max Context 131K tokens
Vocabulary Size 152,064

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.65 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context
FP16 16.0 bpw FP32 30.87 GB 34.52 GB (+3.0 KV) 37.52 GB (+6.0 KV) 43.52 GB (+12.0 KV) 55.52 GB (+24.0 KV) 79.52 GB (+48.0 KV)
FP16 16.0 bpw FP16 30.87 GB 33.02 GB (+1.5 KV) 34.52 GB (+3.0 KV) 37.52 GB (+6.0 KV) 43.52 GB (+12.0 KV) 55.52 GB (+24.0 KV)
FP16 16.0 bpw Q8_0 30.87 GB 32.34 GB (+0.83 KV) 33.17 GB (+1.65 KV) 34.82 GB (+3.3 KV) 38.12 GB (+6.6 KV) 44.72 GB (+13.2 KV)
FP16 16.0 bpw FP8 (Exp) 30.87 GB 32.27 GB (+0.75 KV) 33.02 GB (+1.5 KV) 34.52 GB (+3.0 KV) 37.52 GB (+6.0 KV) 43.52 GB (+12.0 KV)
FP16 16.0 bpw Q4_0 (Exp) 30.87 GB 31.97 GB (+0.45 KV) 32.42 GB (+0.9 KV) 33.32 GB (+1.8 KV) 35.12 GB (+3.6 KV) 38.72 GB (+7.2 KV)
Q8_0 8.0 bpw FP32 15.44 GB 19.08 GB (+3.0 KV) 22.08 GB (+6.0 KV) 28.08 GB (+12.0 KV) 40.08 GB (+24.0 KV) 64.08 GB (+48.0 KV)
Q8_0 8.0 bpw FP16 15.44 GB 17.58 GB (+1.5 KV) 19.08 GB (+3.0 KV) 22.08 GB (+6.0 KV) 28.08 GB (+12.0 KV) 40.08 GB (+24.0 KV)
Q8_0 8.0 bpw Q8_0 15.44 GB 16.91 GB (+0.83 KV) 17.73 GB (+1.65 KV) 19.38 GB (+3.3 KV) 22.68 GB (+6.6 KV) 29.28 GB (+13.2 KV)
Q8_0 8.0 bpw FP8 (Exp) 15.44 GB 16.83 GB (+0.75 KV) 17.58 GB (+1.5 KV) 19.08 GB (+3.0 KV) 22.08 GB (+6.0 KV) 28.08 GB (+12.0 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 15.44 GB 16.53 GB (+0.45 KV) 16.98 GB (+0.9 KV) 17.88 GB (+1.8 KV) 19.68 GB (+3.6 KV) 23.28 GB (+7.2 KV)
Q4_K_M 4.65 bpw FP32 8.97 GB 12.62 GB (+3.0 KV) 15.62 GB (+6.0 KV) 21.62 GB (+12.0 KV) 33.62 GB (+24.0 KV) 57.62 GB (+48.0 KV)
Q4_K_M 4.65 bpw FP16 8.97 GB 11.12 GB (+1.5 KV) 12.62 GB (+3.0 KV) 15.62 GB (+6.0 KV) 21.62 GB (+12.0 KV) 33.62 GB (+24.0 KV)
Q4_K_M 4.65 bpw Q8_0 8.97 GB 10.44 GB (+0.83 KV) 11.27 GB (+1.65 KV) 12.92 GB (+3.3 KV) 16.22 GB (+6.6 KV) 22.82 GB (+13.2 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 8.97 GB 10.37 GB (+0.75 KV) 11.12 GB (+1.5 KV) 12.62 GB (+3.0 KV) 15.62 GB (+6.0 KV) 21.62 GB (+12.0 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 8.97 GB 10.07 GB (+0.45 KV) 10.52 GB (+0.9 KV) 11.42 GB (+1.8 KV) 13.22 GB (+3.6 KV) 16.82 GB (+7.2 KV)
Q4_K_S 4.58 bpw FP32 8.84 GB 12.48 GB (+3.0 KV) 15.48 GB (+6.0 KV) 21.48 GB (+12.0 KV) 33.48 GB (+24.0 KV) 57.48 GB (+48.0 KV)
Q4_K_S 4.58 bpw FP16 8.84 GB 10.98 GB (+1.5 KV) 12.48 GB (+3.0 KV) 15.48 GB (+6.0 KV) 21.48 GB (+12.0 KV) 33.48 GB (+24.0 KV)
Q4_K_S 4.58 bpw Q8_0 8.84 GB 10.31 GB (+0.83 KV) 11.13 GB (+1.65 KV) 12.78 GB (+3.3 KV) 16.08 GB (+6.6 KV) 22.68 GB (+13.2 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 8.84 GB 10.23 GB (+0.75 KV) 10.98 GB (+1.5 KV) 12.48 GB (+3.0 KV) 15.48 GB (+6.0 KV) 21.48 GB (+12.0 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 8.84 GB 9.93 GB (+0.45 KV) 10.38 GB (+0.9 KV) 11.28 GB (+1.8 KV) 13.08 GB (+3.6 KV) 16.68 GB (+7.2 KV)
Q3_K_M 3.91 bpw FP32 7.54 GB 11.19 GB (+3.0 KV) 14.19 GB (+6.0 KV) 20.19 GB (+12.0 KV) 32.19 GB (+24.0 KV) 56.19 GB (+48.0 KV)
Q3_K_M 3.91 bpw FP16 7.54 GB 9.69 GB (+1.5 KV) 11.19 GB (+3.0 KV) 14.19 GB (+6.0 KV) 20.19 GB (+12.0 KV) 32.19 GB (+24.0 KV)
Q3_K_M 3.91 bpw Q8_0 7.54 GB 9.02 GB (+0.83 KV) 9.84 GB (+1.65 KV) 11.49 GB (+3.3 KV) 14.79 GB (+6.6 KV) 21.39 GB (+13.2 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 7.54 GB 8.94 GB (+0.75 KV) 9.69 GB (+1.5 KV) 11.19 GB (+3.0 KV) 14.19 GB (+6.0 KV) 20.19 GB (+12.0 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 7.54 GB 8.64 GB (+0.45 KV) 9.09 GB (+0.9 KV) 9.99 GB (+1.8 KV) 11.79 GB (+3.6 KV) 15.39 GB (+7.2 KV)
Q2_K 2.63 bpw FP32 5.07 GB 8.72 GB (+3.0 KV) 11.72 GB (+6.0 KV) 17.72 GB (+12.0 KV) 29.72 GB (+24.0 KV) 53.72 GB (+48.0 KV)
Q2_K 2.63 bpw FP16 5.07 GB 7.22 GB (+1.5 KV) 8.72 GB (+3.0 KV) 11.72 GB (+6.0 KV) 17.72 GB (+12.0 KV) 29.72 GB (+24.0 KV)
Q2_K 2.63 bpw Q8_0 5.07 GB 6.55 GB (+0.83 KV) 7.37 GB (+1.65 KV) 9.02 GB (+3.3 KV) 12.32 GB (+6.6 KV) 18.92 GB (+13.2 KV)
Q2_K 2.63 bpw FP8 (Exp) 5.07 GB 6.47 GB (+0.75 KV) 7.22 GB (+1.5 KV) 8.72 GB (+3.0 KV) 11.72 GB (+6.0 KV) 17.72 GB (+12.0 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 5.07 GB 6.17 GB (+0.45 KV) 6.62 GB (+0.9 KV) 7.52 GB (+1.8 KV) 9.32 GB (+3.6 KV) 12.92 GB (+7.2 KV)

Total VRAM = Model Weights + KV Cache + 0.65 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run DeepSeek-R1-Distill-Qwen-14B

Use our calculator to see if this model fits your specific hardware configuration.