Back to Models

OpenAI-GPT-OSS-120B

Hybrid (Mamba+Attn) 120.0B Parameters

Active Parameters: 5.1B

Model Specifications

Layers 36
Hidden Dimension 2,880
Attention Heads 64
KV Heads 8
Max Context 131K tokens
Vocabulary Size 201,088
Attention Layers 18 of 36

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 1.5 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context
FP16 16.0 bpw FP32 252.0 GB 254.06 GB (+0.56 KV) 254.62 GB (+1.12 KV) 255.75 GB (+2.25 KV) 258.0 GB (+4.5 KV) 262.5 GB (+9.0 KV)
FP16 16.0 bpw FP16 252.0 GB 253.78 GB (+0.28 KV) 254.06 GB (+0.56 KV) 254.62 GB (+1.12 KV) 255.75 GB (+2.25 KV) 258.0 GB (+4.5 KV)
FP16 16.0 bpw Q8_0 252.0 GB 253.65 GB (+0.15 KV) 253.81 GB (+0.31 KV) 254.12 GB (+0.62 KV) 254.74 GB (+1.24 KV) 255.97 GB (+2.47 KV)
FP16 16.0 bpw FP8 (Exp) 252.0 GB 253.64 GB (+0.14 KV) 253.78 GB (+0.28 KV) 254.06 GB (+0.56 KV) 254.62 GB (+1.12 KV) 255.75 GB (+2.25 KV)
FP16 16.0 bpw Q4_0 (Exp) 252.0 GB 253.58 GB (+0.08 KV) 253.67 GB (+0.17 KV) 253.84 GB (+0.34 KV) 254.18 GB (+0.67 KV) 254.85 GB (+1.35 KV)
Q8_0 8.0 bpw FP32 126.0 GB 128.06 GB (+0.56 KV) 128.62 GB (+1.12 KV) 129.75 GB (+2.25 KV) 132.0 GB (+4.5 KV) 136.5 GB (+9.0 KV)
Q8_0 8.0 bpw FP16 126.0 GB 127.78 GB (+0.28 KV) 128.06 GB (+0.56 KV) 128.62 GB (+1.12 KV) 129.75 GB (+2.25 KV) 132.0 GB (+4.5 KV)
Q8_0 8.0 bpw Q8_0 126.0 GB 127.65 GB (+0.15 KV) 127.81 GB (+0.31 KV) 128.12 GB (+0.62 KV) 128.74 GB (+1.24 KV) 129.97 GB (+2.47 KV)
Q8_0 8.0 bpw FP8 (Exp) 126.0 GB 127.64 GB (+0.14 KV) 127.78 GB (+0.28 KV) 128.06 GB (+0.56 KV) 128.62 GB (+1.12 KV) 129.75 GB (+2.25 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 126.0 GB 127.58 GB (+0.08 KV) 127.67 GB (+0.17 KV) 127.84 GB (+0.34 KV) 128.18 GB (+0.67 KV) 128.85 GB (+1.35 KV)
Q4_K_M 4.65 bpw FP32 73.24 GB 75.3 GB (+0.56 KV) 75.86 GB (+1.12 KV) 76.99 GB (+2.25 KV) 79.24 GB (+4.5 KV) 83.74 GB (+9.0 KV)
Q4_K_M 4.65 bpw FP16 73.24 GB 75.02 GB (+0.28 KV) 75.3 GB (+0.56 KV) 75.86 GB (+1.12 KV) 76.99 GB (+2.25 KV) 79.24 GB (+4.5 KV)
Q4_K_M 4.65 bpw Q8_0 73.24 GB 74.89 GB (+0.15 KV) 75.05 GB (+0.31 KV) 75.36 GB (+0.62 KV) 75.97 GB (+1.24 KV) 77.21 GB (+2.47 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 73.24 GB 74.88 GB (+0.14 KV) 75.02 GB (+0.28 KV) 75.3 GB (+0.56 KV) 75.86 GB (+1.12 KV) 76.99 GB (+2.25 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 73.24 GB 74.82 GB (+0.08 KV) 74.91 GB (+0.17 KV) 75.08 GB (+0.34 KV) 75.41 GB (+0.67 KV) 76.09 GB (+1.35 KV)
Q4_K_S 4.58 bpw FP32 72.14 GB 74.2 GB (+0.56 KV) 74.76 GB (+1.12 KV) 75.89 GB (+2.25 KV) 78.14 GB (+4.5 KV) 82.63 GB (+9.0 KV)
Q4_K_S 4.58 bpw FP16 72.14 GB 73.92 GB (+0.28 KV) 74.2 GB (+0.56 KV) 74.76 GB (+1.12 KV) 75.89 GB (+2.25 KV) 78.13 GB (+4.5 KV)
Q4_K_S 4.58 bpw Q8_0 72.14 GB 73.79 GB (+0.15 KV) 73.94 GB (+0.31 KV) 74.25 GB (+0.62 KV) 74.87 GB (+1.24 KV) 76.11 GB (+2.47 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 72.14 GB 73.78 GB (+0.14 KV) 73.92 GB (+0.28 KV) 74.2 GB (+0.56 KV) 74.76 GB (+1.12 KV) 75.88 GB (+2.25 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 72.14 GB 73.72 GB (+0.08 KV) 73.8 GB (+0.17 KV) 73.97 GB (+0.34 KV) 74.31 GB (+0.67 KV) 74.98 GB (+1.35 KV)
Q3_K_M 3.91 bpw FP32 61.58 GB 63.65 GB (+0.56 KV) 64.21 GB (+1.12 KV) 65.33 GB (+2.25 KV) 67.58 GB (+4.5 KV) 72.08 GB (+9.0 KV)
Q3_K_M 3.91 bpw FP16 61.58 GB 63.36 GB (+0.28 KV) 63.65 GB (+0.56 KV) 64.21 GB (+1.12 KV) 65.33 GB (+2.25 KV) 67.58 GB (+4.5 KV)
Q3_K_M 3.91 bpw Q8_0 61.58 GB 63.24 GB (+0.15 KV) 63.39 GB (+0.31 KV) 63.7 GB (+0.62 KV) 64.32 GB (+1.24 KV) 65.56 GB (+2.47 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 61.58 GB 63.22 GB (+0.14 KV) 63.36 GB (+0.28 KV) 63.65 GB (+0.56 KV) 64.21 GB (+1.12 KV) 65.33 GB (+2.25 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 61.58 GB 63.17 GB (+0.08 KV) 63.25 GB (+0.17 KV) 63.42 GB (+0.34 KV) 63.76 GB (+0.67 KV) 64.43 GB (+1.35 KV)
Q2_K 2.63 bpw FP32 41.42 GB 43.48 GB (+0.56 KV) 44.05 GB (+1.12 KV) 45.17 GB (+2.25 KV) 47.42 GB (+4.5 KV) 51.92 GB (+9.0 KV)
Q2_K 2.63 bpw FP16 41.42 GB 43.2 GB (+0.28 KV) 43.48 GB (+0.56 KV) 44.05 GB (+1.12 KV) 45.17 GB (+2.25 KV) 47.42 GB (+4.5 KV)
Q2_K 2.63 bpw Q8_0 41.42 GB 43.08 GB (+0.15 KV) 43.23 GB (+0.31 KV) 43.54 GB (+0.62 KV) 44.16 GB (+1.24 KV) 45.4 GB (+2.47 KV)
Q2_K 2.63 bpw FP8 (Exp) 41.42 GB 43.06 GB (+0.14 KV) 43.2 GB (+0.28 KV) 43.48 GB (+0.56 KV) 44.05 GB (+1.12 KV) 45.17 GB (+2.25 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 41.42 GB 43.01 GB (+0.08 KV) 43.09 GB (+0.17 KV) 43.26 GB (+0.34 KV) 43.6 GB (+0.67 KV) 44.27 GB (+1.35 KV)

Total VRAM = Model Weights + KV Cache + 1.5 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run OpenAI-GPT-OSS-120B

Use our calculator to see if this model fits your specific hardware configuration.