Back to Models

DeepSeek-OCR-2

Mixture of Experts 3.0B Parameters

Active Parameters: 0.5B

Model Specifications

Layers 12
Hidden Dimension 1,280
Attention Heads 10
KV Heads 10
Max Context 8K tokens
Vocabulary Size 129,278

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.53 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context
FP16 16.0 bpw FP32 6.3 GB 7.77 GB (+0.94 KV)
FP16 16.0 bpw FP16 6.3 GB 7.3 GB (+0.47 KV)
FP16 16.0 bpw Q8_0 6.3 GB 7.09 GB (+0.26 KV)
FP16 16.0 bpw FP8 (Exp) 6.3 GB 7.06 GB (+0.23 KV)
FP16 16.0 bpw Q4_0 (Exp) 6.3 GB 6.97 GB (+0.14 KV)
Q8_0 8.0 bpw FP32 3.15 GB 4.62 GB (+0.94 KV)
Q8_0 8.0 bpw FP16 3.15 GB 4.15 GB (+0.47 KV)
Q8_0 8.0 bpw Q8_0 3.15 GB 3.94 GB (+0.26 KV)
Q8_0 8.0 bpw FP8 (Exp) 3.15 GB 3.91 GB (+0.23 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 3.15 GB 3.82 GB (+0.14 KV)
Q4_K_M 4.65 bpw FP32 1.83 GB 3.3 GB (+0.94 KV)
Q4_K_M 4.65 bpw FP16 1.83 GB 2.83 GB (+0.47 KV)
Q4_K_M 4.65 bpw Q8_0 1.83 GB 2.62 GB (+0.26 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 1.83 GB 2.6 GB (+0.23 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 1.83 GB 2.5 GB (+0.14 KV)
Q4_K_S 4.58 bpw FP32 1.8 GB 3.27 GB (+0.94 KV)
Q4_K_S 4.58 bpw FP16 1.8 GB 2.8 GB (+0.47 KV)
Q4_K_S 4.58 bpw Q8_0 1.8 GB 2.59 GB (+0.26 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 1.8 GB 2.57 GB (+0.23 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 1.8 GB 2.47 GB (+0.14 KV)
Q3_K_M 3.91 bpw FP32 1.54 GB 3.01 GB (+0.94 KV)
Q3_K_M 3.91 bpw FP16 1.54 GB 2.54 GB (+0.47 KV)
Q3_K_M 3.91 bpw Q8_0 1.54 GB 2.33 GB (+0.26 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 1.54 GB 2.3 GB (+0.23 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 1.54 GB 2.21 GB (+0.14 KV)
Q2_K 2.63 bpw FP32 1.04 GB 2.5 GB (+0.94 KV)
Q2_K 2.63 bpw FP16 1.04 GB 2.03 GB (+0.47 KV)
Q2_K 2.63 bpw Q8_0 1.04 GB 1.82 GB (+0.26 KV)
Q2_K 2.63 bpw FP8 (Exp) 1.04 GB 1.8 GB (+0.23 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 1.04 GB 1.71 GB (+0.14 KV)

Total VRAM = Model Weights + KV Cache + 0.53 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run DeepSeek-OCR-2

Use our calculator to see if this model fits your specific hardware configuration.