Back to Models

OpenAI-GPT-OSS-20B

Hybrid (Mamba+Attn) 20.0B Parameters

Active Parameters: 3.6B

Model Specifications

Layers 24
Hidden Dimension 2,880
Attention Heads 64
KV Heads 8
Max Context 131K tokens
Vocabulary Size 201,088
Attention Layers 12 of 24

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 0.7 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context
FP16 16.0 bpw FP32 42.0 GB 43.08 GB (+0.38 KV) 43.45 GB (+0.75 KV) 44.2 GB (+1.5 KV) 45.7 GB (+3.0 KV) 48.7 GB (+6.0 KV)
FP16 16.0 bpw FP16 42.0 GB 42.89 GB (+0.19 KV) 43.08 GB (+0.38 KV) 43.45 GB (+0.75 KV) 44.2 GB (+1.5 KV) 45.7 GB (+3.0 KV)
FP16 16.0 bpw Q8_0 42.0 GB 42.8 GB (+0.1 KV) 42.91 GB (+0.21 KV) 43.11 GB (+0.41 KV) 43.53 GB (+0.83 KV) 44.35 GB (+1.65 KV)
FP16 16.0 bpw FP8 (Exp) 42.0 GB 42.79 GB (+0.09 KV) 42.89 GB (+0.19 KV) 43.08 GB (+0.38 KV) 43.45 GB (+0.75 KV) 44.2 GB (+1.5 KV)
FP16 16.0 bpw Q4_0 (Exp) 42.0 GB 42.76 GB (+0.06 KV) 42.81 GB (+0.11 KV) 42.93 GB (+0.22 KV) 43.15 GB (+0.45 KV) 43.6 GB (+0.9 KV)
Q8_0 8.0 bpw FP32 21.0 GB 22.07 GB (+0.38 KV) 22.45 GB (+0.75 KV) 23.2 GB (+1.5 KV) 24.7 GB (+3.0 KV) 27.7 GB (+6.0 KV)
Q8_0 8.0 bpw FP16 21.0 GB 21.89 GB (+0.19 KV) 22.07 GB (+0.38 KV) 22.45 GB (+0.75 KV) 23.2 GB (+1.5 KV) 24.7 GB (+3.0 KV)
Q8_0 8.0 bpw Q8_0 21.0 GB 21.8 GB (+0.1 KV) 21.91 GB (+0.21 KV) 22.11 GB (+0.41 KV) 22.52 GB (+0.83 KV) 23.35 GB (+1.65 KV)
Q8_0 8.0 bpw FP8 (Exp) 21.0 GB 21.79 GB (+0.09 KV) 21.89 GB (+0.19 KV) 22.07 GB (+0.38 KV) 22.45 GB (+0.75 KV) 23.2 GB (+1.5 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 21.0 GB 21.76 GB (+0.06 KV) 21.81 GB (+0.11 KV) 21.93 GB (+0.22 KV) 22.15 GB (+0.45 KV) 22.6 GB (+0.9 KV)
Q4_K_M 4.65 bpw FP32 12.21 GB 13.28 GB (+0.38 KV) 13.66 GB (+0.75 KV) 14.41 GB (+1.5 KV) 15.91 GB (+3.0 KV) 18.91 GB (+6.0 KV)
Q4_K_M 4.65 bpw FP16 12.21 GB 13.09 GB (+0.19 KV) 13.28 GB (+0.38 KV) 13.66 GB (+0.75 KV) 14.41 GB (+1.5 KV) 15.91 GB (+3.0 KV)
Q4_K_M 4.65 bpw Q8_0 12.21 GB 13.01 GB (+0.1 KV) 13.11 GB (+0.21 KV) 13.32 GB (+0.41 KV) 13.73 GB (+0.83 KV) 14.56 GB (+1.65 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 12.21 GB 13.0 GB (+0.09 KV) 13.09 GB (+0.19 KV) 13.28 GB (+0.38 KV) 13.66 GB (+0.75 KV) 14.41 GB (+1.5 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 12.21 GB 12.96 GB (+0.06 KV) 13.02 GB (+0.11 KV) 13.13 GB (+0.22 KV) 13.36 GB (+0.45 KV) 13.81 GB (+0.9 KV)
Q4_K_S 4.58 bpw FP32 12.02 GB 13.1 GB (+0.38 KV) 13.47 GB (+0.75 KV) 14.22 GB (+1.5 KV) 15.72 GB (+3.0 KV) 18.72 GB (+6.0 KV)
Q4_K_S 4.58 bpw FP16 12.02 GB 12.91 GB (+0.19 KV) 13.1 GB (+0.38 KV) 13.47 GB (+0.75 KV) 14.22 GB (+1.5 KV) 15.72 GB (+3.0 KV)
Q4_K_S 4.58 bpw Q8_0 12.02 GB 12.83 GB (+0.1 KV) 12.93 GB (+0.21 KV) 13.13 GB (+0.41 KV) 13.55 GB (+0.83 KV) 14.37 GB (+1.65 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 12.02 GB 12.82 GB (+0.09 KV) 12.91 GB (+0.19 KV) 13.1 GB (+0.38 KV) 13.47 GB (+0.75 KV) 14.22 GB (+1.5 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 12.02 GB 12.78 GB (+0.06 KV) 12.83 GB (+0.11 KV) 12.95 GB (+0.22 KV) 13.17 GB (+0.45 KV) 13.62 GB (+0.9 KV)
Q3_K_M 3.91 bpw FP32 10.26 GB 11.34 GB (+0.38 KV) 11.71 GB (+0.75 KV) 12.46 GB (+1.5 KV) 13.96 GB (+3.0 KV) 16.96 GB (+6.0 KV)
Q3_K_M 3.91 bpw FP16 10.26 GB 11.15 GB (+0.19 KV) 11.34 GB (+0.38 KV) 11.71 GB (+0.75 KV) 12.46 GB (+1.5 KV) 13.96 GB (+3.0 KV)
Q3_K_M 3.91 bpw Q8_0 10.26 GB 11.07 GB (+0.1 KV) 11.17 GB (+0.21 KV) 11.38 GB (+0.41 KV) 11.79 GB (+0.83 KV) 12.61 GB (+1.65 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 10.26 GB 11.06 GB (+0.09 KV) 11.15 GB (+0.19 KV) 11.34 GB (+0.38 KV) 11.71 GB (+0.75 KV) 12.46 GB (+1.5 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 10.26 GB 11.02 GB (+0.06 KV) 11.08 GB (+0.11 KV) 11.19 GB (+0.22 KV) 11.41 GB (+0.45 KV) 11.86 GB (+0.9 KV)
Q2_K 2.63 bpw FP32 6.9 GB 7.98 GB (+0.38 KV) 8.35 GB (+0.75 KV) 9.1 GB (+1.5 KV) 10.6 GB (+3.0 KV) 13.6 GB (+6.0 KV)
Q2_K 2.63 bpw FP16 6.9 GB 7.79 GB (+0.19 KV) 7.98 GB (+0.38 KV) 8.35 GB (+0.75 KV) 9.1 GB (+1.5 KV) 10.6 GB (+3.0 KV)
Q2_K 2.63 bpw Q8_0 6.9 GB 7.71 GB (+0.1 KV) 7.81 GB (+0.21 KV) 8.02 GB (+0.41 KV) 8.43 GB (+0.83 KV) 9.25 GB (+1.65 KV)
Q2_K 2.63 bpw FP8 (Exp) 6.9 GB 7.7 GB (+0.09 KV) 7.79 GB (+0.19 KV) 7.98 GB (+0.38 KV) 8.35 GB (+0.75 KV) 9.1 GB (+1.5 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 6.9 GB 7.66 GB (+0.06 KV) 7.72 GB (+0.11 KV) 7.83 GB (+0.22 KV) 8.05 GB (+0.45 KV) 8.5 GB (+0.9 KV)

Total VRAM = Model Weights + KV Cache + 0.7 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run OpenAI-GPT-OSS-20B

Use our calculator to see if this model fits your specific hardware configuration.