LLM Cost Calculator

Free AI cost calculator to estimate costs for ChatGPT (GPT-5, GPT-5 mini), Claude (Opus 4.5, Sonnet 4.5, Haiku), Gemini, and other language models. Compare pricing across OpenAI, Anthropic, Google AI providers and optimize your AI API spending.

Models are Added Regularly

Related: See the LLM Context Window Comparison for help matching document sizes to model context windows.

Token Usage

Cost Breakdown

Select a model and enter token counts to see the estimated cost.

Compare AI Model Pricing - ChatGPT vs Claude vs Gemini

Quick price comparison for popular AI models (per 1 million tokens) - OpenAI, Anthropic, Google, Meta, DeepSeek etc

Model Provider Input Price Output Price Features
Qwen3 Max (<32k input) Alibaba $1.200000 $6.000000 Caching Batch
Qwen3 Max (>32k input) Alibaba $2.400000 $12.000000 Batch Tiered
Qwen3 Omni Flash (Text) Alibaba $0.430000 $1.660000
Qwen Flash Alibaba $0.050000 $0.400000 Tiered
Qwen Long Latest Alibaba $0.072000 $0.287000
Qwen Omni Turbo (Text) Alibaba $0.070000 $0.270000
Qwen Plus Latest (Non Thinking) Alibaba $0.400000 $1.200000 Tiered
Qwen Plus Latest (Thinking) Alibaba $0.400000 $4.000000 Tiered
Qwen Turbo (Non Thinking) Alibaba $0.050000 $0.200000 Batch
Qwen Turbo (Thinking) Alibaba $0.050000 $0.500000 Batch

The "Hidden" API Costs

Price per million tokens is only half the story. As a developer, here is what actually inflates your bill.

The "Thinking" Tax

Be very careful with reasoning models like GPT-5.2, Gemini 3 Pro, or DeepSeek-V3.2 (Thinking). They generate hidden "Chain of Thought" tokens that you never see in the final response, but you are billed for them.

A simple 500-token output might actually cost you 3,000 tokens in backend processing. While you can limit this via API parameters, doing so usually breaks the model's logic. Always buffer your budget by 4x for reasoning tasks.

Batch API is Your Best Friend

If you are running background jobs (like summarizing daily logs), stop paying full price. I consistently use the Batch API for 50% discounts. The 24-hour SLA sounds scary, but in practice, my logs show jobs usually finish in under 90 minutes. It is the single easiest way to cut your bill in half without changing models.

Stop Overpaying for Logic

For 90% of tasks (classification, extraction, regex), Gemini Flash and GPT-5 mini or nano are indistinguishable from their 'Pro' counterparts. You should only be paying the premium rates ($5.00+/1M) for complex creative writing or deep architectural coding with GPT-5 or Gemini 3 Pro. For everything else, the budget models have effectively won.

How LLM Pricing Works

What Are Tokens?

Tokens are the basic units LLMs use to process text. In English, 1 token ≈ 4 characters or ~0.75 words. The word "ChatGPT" is 2 tokens, while "AI" is 1 token. You pay based on tokens processed.

Use our Token Counter to count tokens in your text.

Input vs Output Tokens

Input tokens are your prompts and context. Output tokens are the AI's responses. Output typically costs 2-5x more because generating text requires more computation than reading it.

Save with Caching & Batch

Prompt caching saves 75-90% on repeated prompts. Batch API offers 50% off for non-urgent requests. Combine both to dramatically reduce your AI costs.

6 Ways to Reduce Your LLM API Costs

1

Count Your Tokens First

Use our Token Counter to accurately measure input/output tokens before making API calls. Knowing exact counts helps you choose the right model and avoid surprises.

2

Choose the Right Model

Use smaller, cheaper models (GPT-5 mini, Claude Haiku) for simple tasks. Reserve expensive models for complex reasoning.

3

Optimize Your Prompts

Shorter, clearer prompts use fewer input tokens. Remove unnecessary context and be specific about what you need.

4

Use Prompt Caching

If you send the same system prompt repeatedly, enable caching. Anthropic and OpenAI offer massive discounts on cached tokens.

5

Batch Non-Urgent Requests

For tasks that don't need instant responses, use Batch API. Get 50% off by allowing requests to complete within 24 hours.

6

Set Max Token Limits

Always set max_tokens in your API calls. This prevents unexpectedly long (and expensive) responses.

Frequently Asked Questions

What is an LLM token?

A token is a unit of text that language models process. In English, one token is roughly 4 characters or about 0.75 words. For example, "ChatGPT" is 2 tokens, while "AI" is 1 token. API pricing is based on the number of tokens processed.

How is LLM API pricing calculated?

LLM API pricing is typically calculated per million tokens, with separate rates for input (prompt) and output (completion) tokens. Output tokens are usually more expensive because they require more computation to generate.

What is prompt caching and how does it save money?

Prompt caching stores frequently used prompts so they don't need to be reprocessed. Cached input tokens are significantly cheaper (often 75-90% less) than regular input tokens. This is ideal for applications with repetitive system prompts or instructions.

What is batch API pricing?

Batch API allows you to send multiple requests at once for non-time-sensitive tasks. Most providers offer a 50% discount on batch requests because they can process them during off-peak hours. Results are typically available within 24 hours.

Which LLM is the most cost-effective?

Cost-effectiveness depends on your use case. For simple tasks like classification or summarization, smaller models like GPT-5 mini or Claude Haiku offer excellent value. For complex reasoning or coding, larger models may be more cost-effective despite higher per-token costs because they require fewer attempts to get accurate results.

How accurate are these price estimates?

Our calculator uses official API pricing from each provider. Prices are updated regularly, but we recommend checking the official pricing pages before making budget decisions. Actual costs may vary slightly based on your usage patterns and any negotiated enterprise rates.