OpenAI vs Anthropic vs Gemini Pricing: 2026 Per-Token Cost Comparison

Per-token prices across the three major providers, including the discounts that matter most: prompt caching (cuts cached-prefix cost 75-90%) and batch APIs (halves cost for non-real-time work). The cheapest provider depends on your tier — frontier, mid, or budget — and your context length. Here is the full breakdown and the decision rule for which provider to default to.

By the LLM Academy team · Reviewed July 2026 · Prices from official provider pages as of mid-2026; always verify current rates

TL;DR — who is cheapest?

There is no single cheapest provider — it depends on the tier. Google Gemini wins on absolute price at most tiers and is the only viable option for 1M+ token contexts. OpenAI GPT-4o-mini is the cheapest budget model with the broadest ecosystem. Anthropic Claude is rarely the cheapest but is preferred for nuanced reasoning, long-form writing, and coding tasks where quality matters more than raw cost.

Decision rule

Budget tier (simple tasks) → GPT-4o-mini or Gemini Flash. Frontier quality → GPT-4o or Claude Sonnet (pick by quality on your eval, not price — they're close). Long context (>200k) → Gemini (cheapest per-token and highest context window). Batch / background → whichever provider has the best frontier-batch rate at your quality bar.

Frontier tier: GPT-4o vs Claude 3.5 Sonnet vs Gemini 1.5 Pro

The frontier tier is where quality matters most and price gaps are real. At mid-2026 rates, here are the per-million-token prices for the flagship models from each provider.

ModelInput ($/M tok)Output ($/M tok)Context window
OpenAI GPT-4o$2.50$10.00128K
Anthropic Claude 3.5 Sonnet$3.00$15.00200K
Google Gemini 1.5 Pro$1.25$5.002M

On raw price, Gemini Pro is the clear winner — roughly half of GPT-4o and a third of Claude Sonnet on output tokens. But raw price is only half the story. Claude Sonnet frequently outperforms GPT-4o and Gemini on coding, reasoning, and long-form writing benchmarks; if it produces a correct answer in one pass where GPT-4o needs two, the effective cost per correct answer can favor Claude despite the higher per-token rate. Always benchmark on your own eval, not on published model cards.

Prices change frequently

Provider pricing shifts quarterly — often downward as competition intensifies. The numbers here are a mid-2026 snapshot. Before any capacity commitment, pull live rates from each provider's pricing page. A gateway like LiteLLM can track and re-route automatically as prices change.

Budget tier: GPT-4o-mini vs Claude Haiku vs Gemini Flash

The budget tier is where most of your traffic should land after model routing. These models are 10-30x cheaper than the frontier tier and handle classification, extraction, summarization, and FAQ with adequate quality.

ModelInput ($/M tok)Output ($/M tok)Best for
GPT-4o-mini$0.15$0.60Broadest ecosystem, tool use
Claude 3 Haiku$0.25$1.25Fast, slightly better reasoning
Gemini 1.5 Flash$0.075$0.30Cheapest, huge context (1M)

Gemini Flash is the absolute cheapest, but GPT-4o-mini has the best tool-calling and structured-output support, which matters for agent workloads. Claude Haiku sits in between on price but often edges the others on raw response quality for its tier. The right choice depends on what your downstream pipeline expects.

Prompt caching: the 75-90% discount you should already be using

All three providers now offer prompt caching — if your prompt shares a prefix with a recent call (a system prompt, a shared document, few-shot examples), the cached prefix tokens are billed at a steep discount. For agent and RAG workloads that resend the same system prompt every turn, this is the single biggest cost lever after model selection.

ProviderCached input pricevs normal inputMin cache prefix
OpenAI (GPT-4o)~$0.3125/M87.5% off1024 tokens
Anthropic (Claude Sonnet)~$0.30/M90% off1024 tokens
Google (Gemini Pro)~$0.3125/M75% off2048 tokens

The caveat is that each provider's cache has a minimum prefix length (1024-2048 tokens) and a TTL (5-60 minutes). Short prompts see no benefit; long, repeated prefixes see huge savings. Anthropic also charges a small write premium for the initial caching, but the net is still a large win on multi-turn agent traffic.

Structure prompts for cache hits

Put stable content (system prompt, tool schemas, few-shot examples) at the start of the prompt and dynamic content (user query, retrieved context) at the end. This maximizes the cacheable prefix. A 2000-token system prompt cached across 1000 daily turns saves 2M input tokens per day — at GPT-4o rates, that is ~$5/day or ~$150/month from one prompt-structure change.

Batch API: 50% off for non-real-time workloads

If your workload tolerates latency (eval runs, content generation, data labeling, document processing), the batch APIs halve your cost. All three providers offer this with a 24-hour SLA.

ProviderBatch APIDiscountSLA
OpenAIBatch API50%Within 24 hours
AnthropicMessage Batches50%Within 24 hours
GoogleBatch API (Gemini)50%Within 24 hours

The batch API stacks with prompt caching — a batch job with a shared system prompt gets both the 50% batch discount and the 75-90% cache discount on the prefix. For eval harnesses running thousands of test prompts with a shared instruction, this can cut eval cost by 80%+. See the cost optimization playbook for where batch fits in the full stack.

Long context: who handles 200K-2M tokens?

Context length is where the providers diverge sharply. Gemini 1.5 Pro supports up to 2 million tokens, Claude Sonnet supports 200K, and GPT-4o caps at 128K. For workloads that require loading entire codebases, long documents, or hours of transcripts into context, Gemini is often the only option — and its per-token price for long context is also the lowest.

# Effective cost to process a 500K-token document (input only) Gemini 1.5 Pro: 500K × $1.25/M = $0.625 Claude Sonnet: (not supported — 200K max) GPT-4o: (not supported — 128K max) # Gemini is the only frontier option above 200K tokens

Beware that long-context quality degrades on all models — the "lost in the middle" problem. Just because a model accepts 2M tokens does not mean it attends to all of them equally. Run your eval at the context lengths you actually use; do not assume a 2M context window gives you 2M tokens of reliable recall.

The case for multi-provider

Relying on a single provider is both a reliability risk (outages, rate limits) and a cost risk (you cannot benefit from price competition). A gateway like LiteLLM or Portkey lets you route by cost, fall back across providers on errors, and A/B-test quality continuously. The typical production setup uses Gemini Flash or GPT-4o-mini for bulk traffic, GPT-4o or Claude Sonnet for quality-sensitive traffic, and Gemini Pro for long-context work — all behind a single gateway endpoint.

Negotiated rates

At scale (typically >$50K/month committed spend), all three providers offer volume discounts off the list price — often 15-30%. If you are at that scale, do not pay list price. Use the gateway's usage data as leverage in negotiation and re-quote annually.

FAQ

Which is cheaper: OpenAI, Anthropic, or Gemini?

It depends on the tier. Frontier: Gemini Pro is cheapest (~$1.25/$5.00), then GPT-4o ($2.50/$10.00), then Claude Sonnet ($3.00/$15.00). Budget: Gemini Flash ($0.075/$0.30) is cheapest, then GPT-4o-mini ($0.15/$0.60). For 1M+ token contexts, Gemini is the only viable frontier option.

How much does prompt caching save?

Prompt caching cuts the cost of cached prefix tokens by 75-90% (OpenAI: 87.5% off, Anthropic: 90% off, Google: 75% off). For workloads with repeated system prompts (agents, RAG), blended input cost drops 40-60%. Requires a minimum cacheable prefix (1024-2048 tokens).

How much does the batch API save?

50% off list price, across all three providers, with a 24-hour completion SLA. It stacks with prompt caching for compounded savings. Use it for evals, content generation, and any non-real-time workload.

Should I use multiple providers?

Yes — for reliability and cost optimization. A gateway (LiteLLM, Portkey) lets you route by cost, fall back on outages, and benchmark quality continuously. The typical setup: budget model for bulk, frontier model for quality, Gemini Pro for long context.

Related deep dives

Sources

Pricing changes frequently and varies by region, commitment level, and enterprise agreement. All figures are mid-2026 list prices from public provider pages. Verify current rates before any procurement decision.