Back to Models

Qwen3-Next-80B-A3B-Thinking

Mixture of Experts 80.0B Parameters

Active Parameters: 3.0B

Model Specifications

Layers 48
Hidden Dimension 2,048
Attention Heads 16
KV Heads 2
Max Context 262K tokens
Vocabulary Size 151,936

VRAM Requirements

VRAM usage for all quantization and cache format combinations. Base overhead: 1.3 GB (CUDA context + activations).

Quantization Cache Format Model Weights 8K Context 16K Context 32K Context 65K Context 131K Context 262K Context
FP16 16.0 bpw FP32 168.0 GB 170.8 GB (+1.5 KV) 172.3 GB (+3.0 KV) 175.3 GB (+6.0 KV) 181.3 GB (+12.0 KV) 193.3 GB (+24.0 KV) 217.3 GB (+48.0 KV)
FP16 16.0 bpw FP16 168.0 GB 170.05 GB (+0.75 KV) 170.8 GB (+1.5 KV) 172.3 GB (+3.0 KV) 175.3 GB (+6.0 KV) 181.3 GB (+12.0 KV) 193.3 GB (+24.0 KV)
FP16 16.0 bpw Q8_0 168.0 GB 169.71 GB (+0.41 KV) 170.12 GB (+0.83 KV) 170.95 GB (+1.65 KV) 172.6 GB (+3.3 KV) 175.9 GB (+6.6 KV) 182.5 GB (+13.2 KV)
FP16 16.0 bpw FP8 (Exp) 168.0 GB 169.68 GB (+0.38 KV) 170.05 GB (+0.75 KV) 170.8 GB (+1.5 KV) 172.3 GB (+3.0 KV) 175.3 GB (+6.0 KV) 181.3 GB (+12.0 KV)
FP16 16.0 bpw Q4_0 (Exp) 168.0 GB 169.53 GB (+0.22 KV) 169.75 GB (+0.45 KV) 170.2 GB (+0.9 KV) 171.1 GB (+1.8 KV) 172.9 GB (+3.6 KV) 176.5 GB (+7.2 KV)
Q8_0 8.0 bpw FP32 84.0 GB 86.8 GB (+1.5 KV) 88.3 GB (+3.0 KV) 91.3 GB (+6.0 KV) 97.3 GB (+12.0 KV) 109.3 GB (+24.0 KV) 133.3 GB (+48.0 KV)
Q8_0 8.0 bpw FP16 84.0 GB 86.05 GB (+0.75 KV) 86.8 GB (+1.5 KV) 88.3 GB (+3.0 KV) 91.3 GB (+6.0 KV) 97.3 GB (+12.0 KV) 109.3 GB (+24.0 KV)
Q8_0 8.0 bpw Q8_0 84.0 GB 85.71 GB (+0.41 KV) 86.12 GB (+0.83 KV) 86.95 GB (+1.65 KV) 88.6 GB (+3.3 KV) 91.9 GB (+6.6 KV) 98.5 GB (+13.2 KV)
Q8_0 8.0 bpw FP8 (Exp) 84.0 GB 85.67 GB (+0.38 KV) 86.05 GB (+0.75 KV) 86.8 GB (+1.5 KV) 88.3 GB (+3.0 KV) 91.3 GB (+6.0 KV) 97.3 GB (+12.0 KV)
Q8_0 8.0 bpw Q4_0 (Exp) 84.0 GB 85.52 GB (+0.22 KV) 85.75 GB (+0.45 KV) 86.2 GB (+0.9 KV) 87.1 GB (+1.8 KV) 88.9 GB (+3.6 KV) 92.5 GB (+7.2 KV)
Q4_K_M 4.65 bpw FP32 48.83 GB 51.62 GB (+1.5 KV) 53.12 GB (+3.0 KV) 56.12 GB (+6.0 KV) 62.12 GB (+12.0 KV) 74.12 GB (+24.0 KV) 98.12 GB (+48.0 KV)
Q4_K_M 4.65 bpw FP16 48.83 GB 50.88 GB (+0.75 KV) 51.62 GB (+1.5 KV) 53.12 GB (+3.0 KV) 56.12 GB (+6.0 KV) 62.12 GB (+12.0 KV) 74.12 GB (+24.0 KV)
Q4_K_M 4.65 bpw Q8_0 48.83 GB 50.54 GB (+0.41 KV) 50.95 GB (+0.83 KV) 51.77 GB (+1.65 KV) 53.42 GB (+3.3 KV) 56.73 GB (+6.6 KV) 63.33 GB (+13.2 KV)
Q4_K_M 4.65 bpw FP8 (Exp) 48.83 GB 50.5 GB (+0.38 KV) 50.88 GB (+0.75 KV) 51.62 GB (+1.5 KV) 53.12 GB (+3.0 KV) 56.12 GB (+6.0 KV) 62.12 GB (+12.0 KV)
Q4_K_M 4.65 bpw Q4_0 (Exp) 48.83 GB 50.35 GB (+0.22 KV) 50.58 GB (+0.45 KV) 51.02 GB (+0.9 KV) 51.92 GB (+1.8 KV) 53.73 GB (+3.6 KV) 57.33 GB (+7.2 KV)
Q4_K_S 4.58 bpw FP32 48.09 GB 50.89 GB (+1.5 KV) 52.39 GB (+3.0 KV) 55.39 GB (+6.0 KV) 61.39 GB (+12.0 KV) 73.39 GB (+24.0 KV) 97.39 GB (+48.0 KV)
Q4_K_S 4.58 bpw FP16 48.09 GB 50.14 GB (+0.75 KV) 50.89 GB (+1.5 KV) 52.39 GB (+3.0 KV) 55.39 GB (+6.0 KV) 61.39 GB (+12.0 KV) 73.39 GB (+24.0 KV)
Q4_K_S 4.58 bpw Q8_0 48.09 GB 49.8 GB (+0.41 KV) 50.21 GB (+0.83 KV) 51.04 GB (+1.65 KV) 52.69 GB (+3.3 KV) 55.99 GB (+6.6 KV) 62.59 GB (+13.2 KV)
Q4_K_S 4.58 bpw FP8 (Exp) 48.09 GB 49.76 GB (+0.38 KV) 50.14 GB (+0.75 KV) 50.89 GB (+1.5 KV) 52.39 GB (+3.0 KV) 55.39 GB (+6.0 KV) 61.39 GB (+12.0 KV)
Q4_K_S 4.58 bpw Q4_0 (Exp) 48.09 GB 49.61 GB (+0.22 KV) 49.84 GB (+0.45 KV) 50.29 GB (+0.9 KV) 51.19 GB (+1.8 KV) 52.99 GB (+3.6 KV) 56.59 GB (+7.2 KV)
Q3_K_M 3.91 bpw FP32 41.05 GB 43.85 GB (+1.5 KV) 45.35 GB (+3.0 KV) 48.35 GB (+6.0 KV) 54.35 GB (+12.0 KV) 66.36 GB (+24.0 KV) 90.36 GB (+48.0 KV)
Q3_K_M 3.91 bpw FP16 41.05 GB 43.1 GB (+0.75 KV) 43.85 GB (+1.5 KV) 45.35 GB (+3.0 KV) 48.35 GB (+6.0 KV) 54.35 GB (+12.0 KV) 66.36 GB (+24.0 KV)
Q3_K_M 3.91 bpw Q8_0 41.05 GB 42.77 GB (+0.41 KV) 43.18 GB (+0.83 KV) 44.0 GB (+1.65 KV) 45.65 GB (+3.3 KV) 48.95 GB (+6.6 KV) 55.55 GB (+13.2 KV)
Q3_K_M 3.91 bpw FP8 (Exp) 41.05 GB 42.73 GB (+0.38 KV) 43.1 GB (+0.75 KV) 43.85 GB (+1.5 KV) 45.35 GB (+3.0 KV) 48.35 GB (+6.0 KV) 54.35 GB (+12.0 KV)
Q3_K_M 3.91 bpw Q4_0 (Exp) 41.05 GB 42.58 GB (+0.22 KV) 42.8 GB (+0.45 KV) 43.25 GB (+0.9 KV) 44.15 GB (+1.8 KV) 45.95 GB (+3.6 KV) 49.55 GB (+7.2 KV)
Q2_K 2.63 bpw FP32 27.61 GB 30.41 GB (+1.5 KV) 31.91 GB (+3.0 KV) 34.91 GB (+6.0 KV) 40.91 GB (+12.0 KV) 52.91 GB (+24.0 KV) 76.91 GB (+48.0 KV)
Q2_K 2.63 bpw FP16 27.61 GB 29.66 GB (+0.75 KV) 30.41 GB (+1.5 KV) 31.91 GB (+3.0 KV) 34.91 GB (+6.0 KV) 40.91 GB (+12.0 KV) 52.91 GB (+24.0 KV)
Q2_K 2.63 bpw Q8_0 27.61 GB 29.33 GB (+0.41 KV) 29.74 GB (+0.83 KV) 30.56 GB (+1.65 KV) 32.21 GB (+3.3 KV) 35.51 GB (+6.6 KV) 42.11 GB (+13.2 KV)
Q2_K 2.63 bpw FP8 (Exp) 27.61 GB 29.29 GB (+0.38 KV) 29.66 GB (+0.75 KV) 30.41 GB (+1.5 KV) 31.91 GB (+3.0 KV) 34.91 GB (+6.0 KV) 40.91 GB (+12.0 KV)
Q2_K 2.63 bpw Q4_0 (Exp) 27.61 GB 29.14 GB (+0.22 KV) 29.36 GB (+0.45 KV) 29.81 GB (+0.9 KV) 30.71 GB (+1.8 KV) 32.51 GB (+3.6 KV) 36.11 GB (+7.2 KV)

Total VRAM = Model Weights + KV Cache + 1.3 GB overhead. Actual usage may vary ±5% based on inference engine and optimizations.

Check if your GPU can run Qwen3-Next-80B-A3B-Thinking

Use our calculator to see if this model fits your specific hardware configuration.