OpenAI vs Anthropic vs Gemini Pricing: 2026 Per-Token Cost Comparison
Per-token prices across the three major providers, including the discounts that matter most: prompt caching (cuts cached-prefix cost 75-90%) and batch APIs (halves cost for non-real-time work). The cheapest provider depends on your tier — frontier, mid, or budget — and your context length. Here is the full breakdown and the decision rule for which provider to default to.
TL;DR — who is cheapest?
There is no single cheapest provider — it depends on the tier. Google Gemini wins on absolute price at most tiers and is the only viable option for 1M+ token contexts. OpenAI GPT-4o-mini is the cheapest budget model with the broadest ecosystem. Anthropic Claude is rarely the cheapest but is preferred for nuanced reasoning, long-form writing, and coding tasks where quality matters more than raw cost.
Budget tier (simple tasks) → GPT-4o-mini or Gemini Flash. Frontier quality → GPT-4o or Claude Sonnet (pick by quality on your eval, not price — they're close). Long context (>200k) → Gemini (cheapest per-token and highest context window). Batch / background → whichever provider has the best frontier-batch rate at your quality bar.
Frontier tier: GPT-4o vs Claude 3.5 Sonnet vs Gemini 1.5 Pro
The frontier tier is where quality matters most and price gaps are real. At mid-2026 rates, here are the per-million-token prices for the flagship models from each provider.
| Model | Input ($/M tok) | Output ($/M tok) | Context window |
|---|---|---|---|
| OpenAI GPT-4o | $2.50 | $10.00 | 128K |
| Anthropic Claude 3.5 Sonnet | $3.00 | $15.00 | 200K |
| Google Gemini 1.5 Pro | $1.25 | $5.00 | 2M |
On raw price, Gemini Pro is the clear winner — roughly half of GPT-4o and a third of Claude Sonnet on output tokens. But raw price is only half the story. Claude Sonnet frequently outperforms GPT-4o and Gemini on coding, reasoning, and long-form writing benchmarks; if it produces a correct answer in one pass where GPT-4o needs two, the effective cost per correct answer can favor Claude despite the higher per-token rate. Always benchmark on your own eval, not on published model cards.
Provider pricing shifts quarterly — often downward as competition intensifies. The numbers here are a mid-2026 snapshot. Before any capacity commitment, pull live rates from each provider's pricing page. A gateway like LiteLLM can track and re-route automatically as prices change.
Budget tier: GPT-4o-mini vs Claude Haiku vs Gemini Flash
The budget tier is where most of your traffic should land after model routing. These models are 10-30x cheaper than the frontier tier and handle classification, extraction, summarization, and FAQ with adequate quality.
| Model | Input ($/M tok) | Output ($/M tok) | Best for |
|---|---|---|---|
| GPT-4o-mini | $0.15 | $0.60 | Broadest ecosystem, tool use |
| Claude 3 Haiku | $0.25 | $1.25 | Fast, slightly better reasoning |
| Gemini 1.5 Flash | $0.075 | $0.30 | Cheapest, huge context (1M) |
Gemini Flash is the absolute cheapest, but GPT-4o-mini has the best tool-calling and structured-output support, which matters for agent workloads. Claude Haiku sits in between on price but often edges the others on raw response quality for its tier. The right choice depends on what your downstream pipeline expects.
Prompt caching: the 75-90% discount you should already be using
All three providers now offer prompt caching — if your prompt shares a prefix with a recent call (a system prompt, a shared document, few-shot examples), the cached prefix tokens are billed at a steep discount. For agent and RAG workloads that resend the same system prompt every turn, this is the single biggest cost lever after model selection.
| Provider | Cached input price | vs normal input | Min cache prefix |
|---|---|---|---|
| OpenAI (GPT-4o) | ~$0.3125/M | 87.5% off | 1024 tokens |
| Anthropic (Claude Sonnet) | ~$0.30/M | 90% off | 1024 tokens |
| Google (Gemini Pro) | ~$0.3125/M | 75% off | 2048 tokens |
The caveat is that each provider's cache has a minimum prefix length (1024-2048 tokens) and a TTL (5-60 minutes). Short prompts see no benefit; long, repeated prefixes see huge savings. Anthropic also charges a small write premium for the initial caching, but the net is still a large win on multi-turn agent traffic.
Put stable content (system prompt, tool schemas, few-shot examples) at the start of the prompt and dynamic content (user query, retrieved context) at the end. This maximizes the cacheable prefix. A 2000-token system prompt cached across 1000 daily turns saves 2M input tokens per day — at GPT-4o rates, that is ~$5/day or ~$150/month from one prompt-structure change.
Batch API: 50% off for non-real-time workloads
If your workload tolerates latency (eval runs, content generation, data labeling, document processing), the batch APIs halve your cost. All three providers offer this with a 24-hour SLA.
| Provider | Batch API | Discount | SLA |
|---|---|---|---|
| OpenAI | Batch API | 50% | Within 24 hours |
| Anthropic | Message Batches | 50% | Within 24 hours |
| Batch API (Gemini) | 50% | Within 24 hours |
The batch API stacks with prompt caching — a batch job with a shared system prompt gets both the 50% batch discount and the 75-90% cache discount on the prefix. For eval harnesses running thousands of test prompts with a shared instruction, this can cut eval cost by 80%+. See the cost optimization playbook for where batch fits in the full stack.
Long context: who handles 200K-2M tokens?
Context length is where the providers diverge sharply. Gemini 1.5 Pro supports up to 2 million tokens, Claude Sonnet supports 200K, and GPT-4o caps at 128K. For workloads that require loading entire codebases, long documents, or hours of transcripts into context, Gemini is often the only option — and its per-token price for long context is also the lowest.
Beware that long-context quality degrades on all models — the "lost in the middle" problem. Just because a model accepts 2M tokens does not mean it attends to all of them equally. Run your eval at the context lengths you actually use; do not assume a 2M context window gives you 2M tokens of reliable recall.
The case for multi-provider
Relying on a single provider is both a reliability risk (outages, rate limits) and a cost risk (you cannot benefit from price competition). A gateway like LiteLLM or Portkey lets you route by cost, fall back across providers on errors, and A/B-test quality continuously. The typical production setup uses Gemini Flash or GPT-4o-mini for bulk traffic, GPT-4o or Claude Sonnet for quality-sensitive traffic, and Gemini Pro for long-context work — all behind a single gateway endpoint.
At scale (typically >$50K/month committed spend), all three providers offer volume discounts off the list price — often 15-30%. If you are at that scale, do not pay list price. Use the gateway's usage data as leverage in negotiation and re-quote annually.
FAQ
Which is cheaper: OpenAI, Anthropic, or Gemini?
It depends on the tier. Frontier: Gemini Pro is cheapest (~$1.25/$5.00), then GPT-4o ($2.50/$10.00), then Claude Sonnet ($3.00/$15.00). Budget: Gemini Flash ($0.075/$0.30) is cheapest, then GPT-4o-mini ($0.15/$0.60). For 1M+ token contexts, Gemini is the only viable frontier option.
How much does prompt caching save?
Prompt caching cuts the cost of cached prefix tokens by 75-90% (OpenAI: 87.5% off, Anthropic: 90% off, Google: 75% off). For workloads with repeated system prompts (agents, RAG), blended input cost drops 40-60%. Requires a minimum cacheable prefix (1024-2048 tokens).
How much does the batch API save?
50% off list price, across all three providers, with a 24-hour completion SLA. It stacks with prompt caching for compounded savings. Use it for evals, content generation, and any non-real-time workload.
Should I use multiple providers?
Yes — for reliability and cost optimization. A gateway (LiteLLM, Portkey) lets you route by cost, fall back on outages, and benchmark quality continuously. The typical setup: budget model for bulk, frontier model for quality, Gemini Pro for long context.
Related deep dives
- LLM Cost Optimization Playbook — the full 5-lever stack that builds on these prices
- Model Routing — how to send traffic to the cheapest viable tier
- Semantic Caching — eliminates the call entirely on repeated prompts
- LiteLLM vs Portkey vs OpenRouter — the gateways that enable multi-provider routing
Sources
- OpenAI pricing page, "GPT-4o and GPT-4o-mini API pricing," 2026
- Anthropic pricing page, "Claude 3.5 Sonnet and Haiku API pricing," 2026
- Google AI Studio / Vertex AI pricing, "Gemini 1.5 Pro and Flash API pricing," 2026
- OpenAI, "Prompt Caching" documentation, 2024-2025
- Anthropic, "Prompt Caching and Message Batches" documentation, 2024-2025
- Artificial Analysis, "LLM Performance and Pricing Leaderboard," 2026 (independent cross-provider comparison)
Pricing changes frequently and varies by region, commitment level, and enterprise agreement. All figures are mid-2026 list prices from public provider pages. Verify current rates before any procurement decision.