Back to Models

DeepSeek-V3.1-671B

MLA 671.0B Parameters

Active Parameters: 37.0B

Model Specifications

Layers 61
Hidden Dimension 7,168
Attention Heads 128
Max Context 163K tokens
Vocabulary Size 129,280
KV LoRA Rank 512
RoPE Dimension 64

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 1.5 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context 163K Context
FP16 16.0 bpw FP32 1409.1 GB 1411.67 GB (+1.07 KV) 1412.74 GB (+2.14 KV) 1414.89 GB (+4.29 KV) 1419.18 GB (+8.58 KV) 1427.76 GB (+17.16 KV) 1432.05 GB (+21.45 KV)
FP16 16.0 bpw FP16 1409.1 GB 1411.14 GB (+0.54 KV) 1411.67 GB (+1.07 KV) 1412.74 GB (+2.14 KV) 1414.89 GB (+4.29 KV) 1419.18 GB (+8.58 KV) 1421.32 GB (+10.72 KV)
FP16 16.0 bpw Q8_0 1409.1 GB 1410.89 GB (+0.29 KV) 1411.19 GB (+0.59 KV) 1411.78 GB (+1.18 KV) 1412.96 GB (+2.36 KV) 1415.32 GB (+4.72 KV) 1416.5 GB (+5.9 KV)
FP16 16.0 bpw FP8 (Exp) 1409.1 GB 1410.87 GB (+0.27 KV) 1411.14 GB (+0.54 KV) 1411.67 GB (+1.07 KV) 1412.74 GB (+2.14 KV) 1414.89 GB (+4.29 KV) 1415.96 GB (+5.36 KV)
FP16 16.0 bpw Q4_0 (Exp) 1409.1 GB 1410.76 GB (+0.16 KV) 1410.92 GB (+0.32 KV) 1411.24 GB (+0.64 KV) 1411.89 GB (+1.29 KV) 1413.17 GB (+2.57 KV) 1413.82 GB (+3.22 KV)
Q8_0 8.0 bpw FP32 704.55 GB 707.12 GB (+1.07 KV) 708.19 GB (+2.14 KV) 710.34 GB (+4.29 KV) 714.63 GB (+8.58 KV) 723.21 GB (+17.16 KV) 727.5 GB (+21.45 KV)
Q8_0 8.0 bpw FP16 704.55 GB 706.59 GB (+0.54 KV) 707.12 GB (+1.07 KV) 708.19 GB (+2.14 KV) 710.34 GB (+4.29 KV) 714.63 GB (+8.58 KV) 716.77 GB (+10.72 KV)
Q8_0 8.0 bpw Q8_0 704.55 GB 706.34 GB (+0.29 KV) 706.64 GB (+0.59 KV) 707.23 GB (+1.18 KV) 708.41 GB (+2.36 KV) 710.77 GB (+4.72 KV) 711.95 GB (+5.9 KV)
Q8_0 8.0 bpw FP8 (Exp) 704.55 GB 706.32 GB (+0.27 KV) 706.59 GB (+0.54 KV) 707.12 GB (+1.07 KV) 708.19 GB (+2.14 KV) 710.34 GB (+4.29 KV) 711.41 GB (+5.36 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 704.55 GB 706.21 GB (+0.16 KV) 706.37 GB (+0.32 KV) 706.69 GB (+0.64 KV) 707.34 GB (+1.29 KV) 708.62 GB (+2.57 KV) 709.27 GB (+3.22 KV)
Q4_K_M 4.65 bpw FP32 409.52 GB 412.09 GB (+1.07 KV) 413.16 GB (+2.14 KV) 415.31 GB (+4.29 KV) 419.6 GB (+8.58 KV) 428.18 GB (+17.16 KV) 432.47 GB (+21.45 KV)
Q4_K_M 4.65 bpw FP16 409.52 GB 411.56 GB (+0.54 KV) 412.09 GB (+1.07 KV) 413.16 GB (+2.14 KV) 415.31 GB (+4.29 KV) 419.6 GB (+8.58 KV) 421.74 GB (+10.72 KV)
Q4_K_M 4.65 bpw Q8_0 409.52 GB 411.31 GB (+0.29 KV) 411.61 GB (+0.59 KV) 412.2 GB (+1.18 KV) 413.38 GB (+2.36 KV) 415.74 GB (+4.72 KV) 416.92 GB (+5.9 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 409.52 GB 411.29 GB (+0.27 KV) 411.56 GB (+0.54 KV) 412.09 GB (+1.07 KV) 413.16 GB (+2.14 KV) 415.31 GB (+4.29 KV) 416.38 GB (+5.36 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 409.52 GB 411.18 GB (+0.16 KV) 411.34 GB (+0.32 KV) 411.66 GB (+0.64 KV) 412.31 GB (+1.29 KV) 413.59 GB (+2.57 KV) 414.24 GB (+3.22 KV)
Q4_K_S 4.58 bpw FP32 403.35 GB 405.93 GB (+1.07 KV) 407.0 GB (+2.14 KV) 409.14 GB (+4.29 KV) 413.43 GB (+8.58 KV) 422.01 GB (+17.16 KV) 426.3 GB (+21.45 KV)
Q4_K_S 4.58 bpw FP16 403.35 GB 405.39 GB (+0.54 KV) 405.93 GB (+1.07 KV) 407.0 GB (+2.14 KV) 409.14 GB (+4.29 KV) 413.43 GB (+8.58 KV) 415.58 GB (+10.72 KV)
Q4_K_S 4.58 bpw Q8_0 403.35 GB 405.15 GB (+0.29 KV) 405.44 GB (+0.59 KV) 406.03 GB (+1.18 KV) 407.21 GB (+2.36 KV) 409.57 GB (+4.72 KV) 410.75 GB (+5.9 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 403.35 GB 405.12 GB (+0.27 KV) 405.39 GB (+0.54 KV) 405.93 GB (+1.07 KV) 407.0 GB (+2.14 KV) 409.14 GB (+4.29 KV) 410.22 GB (+5.36 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 403.35 GB 405.02 GB (+0.16 KV) 405.18 GB (+0.32 KV) 405.5 GB (+0.64 KV) 406.14 GB (+1.29 KV) 407.43 GB (+2.57 KV) 408.07 GB (+3.22 KV)
Q3_K_M 3.91 bpw FP32 344.35 GB 346.92 GB (+1.07 KV) 347.99 GB (+2.14 KV) 350.14 GB (+4.29 KV) 354.43 GB (+8.58 KV) 363.01 GB (+17.16 KV) 367.29 GB (+21.45 KV)
Q3_K_M 3.91 bpw FP16 344.35 GB 346.38 GB (+0.54 KV) 346.92 GB (+1.07 KV) 347.99 GB (+2.14 KV) 350.14 GB (+4.29 KV) 354.43 GB (+8.58 KV) 356.57 GB (+10.72 KV)
Q3_K_M 3.91 bpw Q8_0 344.35 GB 346.14 GB (+0.29 KV) 346.44 GB (+0.59 KV) 347.03 GB (+1.18 KV) 348.21 GB (+2.36 KV) 350.57 GB (+4.72 KV) 351.75 GB (+5.9 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 344.35 GB 346.12 GB (+0.27 KV) 346.38 GB (+0.54 KV) 346.92 GB (+1.07 KV) 347.99 GB (+2.14 KV) 350.14 GB (+4.29 KV) 351.21 GB (+5.36 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 344.35 GB 346.01 GB (+0.16 KV) 346.17 GB (+0.32 KV) 346.49 GB (+0.64 KV) 347.14 GB (+1.29 KV) 348.42 GB (+2.57 KV) 349.07 GB (+3.22 KV)
Q2_K 2.63 bpw FP32 231.62 GB 234.19 GB (+1.07 KV) 235.27 GB (+2.14 KV) 237.41 GB (+4.29 KV) 241.7 GB (+8.58 KV) 250.28 GB (+17.16 KV) 254.57 GB (+21.45 KV)
Q2_K 2.63 bpw FP16 231.62 GB 233.66 GB (+0.54 KV) 234.19 GB (+1.07 KV) 235.27 GB (+2.14 KV) 237.41 GB (+4.29 KV) 241.7 GB (+8.58 KV) 243.84 GB (+10.72 KV)
Q2_K 2.63 bpw Q8_0 231.62 GB 233.42 GB (+0.29 KV) 233.71 GB (+0.59 KV) 234.3 GB (+1.18 KV) 235.48 GB (+2.36 KV) 237.84 GB (+4.72 KV) 239.02 GB (+5.9 KV)
Q2_K 2.63 bpw FP8 (Exp) 231.62 GB 233.39 GB (+0.27 KV) 233.66 GB (+0.54 KV) 234.19 GB (+1.07 KV) 235.27 GB (+2.14 KV) 237.41 GB (+4.29 KV) 238.48 GB (+5.36 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 231.62 GB 233.28 GB (+0.16 KV) 233.44 GB (+0.32 KV) 233.76 GB (+0.64 KV) 234.41 GB (+1.29 KV) 235.69 GB (+2.57 KV) 236.34 GB (+3.22 KV)

Total VRAM = Model Weights + KV Cache + 1.5 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run DeepSeek-V3.1-671B

Use our calculator to see if this model fits your specific hardware configuration.