OpenAI vs Anthropic vs Gemini Pricing: 2026 Per-Token Cost Comparison

Q: Which is cheaper: OpenAI, Anthropic, or Gemini?

It depends on the tier. For frontier models, Google's Gemini 1.5 Pro is typically cheapest at ~$1.25/$5.00 per million input/output tokens, followed by GPT-4o at $2.50/$10.00 and Claude 3.5 Sonnet at $3.00/$15.00. For budget tiers, GPT-4o-mini ($0.15/$0.60) and Gemini 1.5 Flash ($0.075/$0.30) are the cheapest. For 1M+ token contexts, Gemini is the only viable option at most price points.

Q: How much does prompt caching save?

Prompt caching (offered by all three providers) cuts the cost of cached prefix tokens by 50-80%. OpenAI charges ~10% of normal input price for cached tokens; Anthropic charges ~10% (90% discount); Google charges ~25% of normal for cached tokens. For workloads with repeated system prompts or shared context (agents, RAG), this reduces effective input cost dramatically — often by 40-60% blended.

Q: How much does the batch API save?

All three providers offer a batch API (process within 24 hours) at a 50% discount. OpenAI Batch, Anthropic Message Batches, and Gemini Batch API all halve per-token cost for non-real-time workloads. If your use case tolerates latency (background processing, evals, content generation), batch is the single biggest cost lever after model selection.

Q: Should I use multiple providers?

Yes, for production reliability and cost optimization. A gateway (LiteLLM, Portkey) lets you route by cost, fall back on rate limits or outages, and benchmark providers continuously. Many teams use Gemini Flash for bulk work, GPT-4o-mini for mid-tier, and reserve Claude Sonnet or GPT-4o for the prompts that need frontier quality.

Per-token prices across the three major providers, including the discounts that matter most: prompt caching (cuts cached-prefix cost 75-90%) and batch APIs (halves cost for non-real-time work). The cheapest provider depends on your tier — frontier, mid, or budget — and your context length. Here is the full breakdown and the decision rule for which provider to default to.

By the LLM Academy team · Reviewed July 2026 · Prices from official provider pages as of mid-2026; always verify current rates

TL;DR — who is cheapest?

There is no single cheapest provider — it depends on the tier. Google Gemini wins on absolute price at most tiers and is the only viable option for 1M+ token contexts. OpenAI GPT-4o-mini is the cheapest budget model with the broadest ecosystem. Anthropic Claude is rarely the cheapest but is preferred for nuanced reasoning, long-form writing, and coding tasks where quality matters more than raw cost.

Decision rule

Budget tier (simple tasks) → GPT-4o-mini or Gemini Flash. Frontier quality → GPT-4o or Claude Sonnet (pick by quality on your eval, not price — they're close). Long context (>200k) → Gemini (cheapest per-token and highest context window). Batch / background → whichever provider has the best frontier-batch rate at your quality bar.

Frontier tier: GPT-4o vs Claude 3.5 Sonnet vs Gemini 1.5 Pro

The frontier tier is where quality matters most and price gaps are real. At mid-2026 rates, here are the per-million-token prices for the flagship models from each provider.

Model	Input ($/M tok)	Output ($/M tok)	Context window
OpenAI GPT-4o	$2.50	$10.00	128K
Anthropic Claude 3.5 Sonnet	$3.00	$15.00	200K
Google Gemini 1.5 Pro	$1.25	$5.00	2M

On raw price, Gemini Pro is the clear winner — roughly half of GPT-4o and a third of Claude Sonnet on output tokens. But raw price is only half the story. Claude Sonnet frequently outperforms GPT-4o and Gemini on coding, reasoning, and long-form writing benchmarks; if it produces a correct answer in one pass where GPT-4o needs two, the effective cost per correct answer can favor Claude despite the higher per-token rate. Always benchmark on your own eval, not on published model cards.

Prices change frequently

Provider pricing shifts quarterly — often downward as competition intensifies. The numbers here are a mid-2026 snapshot. Before any capacity commitment, pull live rates from each provider's pricing page. A gateway like LiteLLM can track and re-route automatically as prices change.

Budget tier: GPT-4o-mini vs Claude Haiku vs Gemini Flash

The budget tier is where most of your traffic should land after model routing. These models are 10-30x cheaper than the frontier tier and handle classification, extraction, summarization, and FAQ with adequate quality.

Model	Input ($/M tok)	Output ($/M tok)	Best for
GPT-4o-mini	$0.15	$0.60	Broadest ecosystem, tool use
Claude 3 Haiku	$0.25	$1.25	Fast, slightly better reasoning
Gemini 1.5 Flash	$0.075	$0.30	Cheapest, huge context (1M)

Gemini Flash is the absolute cheapest, but GPT-4o-mini has the best tool-calling and structured-output support, which matters for agent workloads. Claude Haiku sits in between on price but often edges the others on raw response quality for its tier. The right choice depends on what your downstream pipeline expects.

Prompt caching: the 75-90% discount you should already be using

All three providers now offer prompt caching — if your prompt shares a prefix with a recent call (a system prompt, a shared document, few-shot examples), the cached prefix tokens are billed at a steep discount. For agent and RAG workloads that resend the same system prompt every turn, this is the single biggest cost lever after model selection.

Provider	Cached input price	vs normal input	Min cache prefix
OpenAI (GPT-4o)	~$0.3125/M	87.5% off	1024 tokens
Anthropic (Claude Sonnet)	~$0.30/M	90% off	1024 tokens
Google (Gemini Pro)	~$0.3125/M	75% off	2048 tokens

The caveat is that each provider's cache has a minimum prefix length (1024-2048 tokens) and a TTL (5-60 minutes). Short prompts see no benefit; long, repeated prefixes see huge savings. Anthropic also charges a small write premium for the initial caching, but the net is still a large win on multi-turn agent traffic.

Structure prompts for cache hits

Put stable content (system prompt, tool schemas, few-shot examples) at the start of the prompt and dynamic content (user query, retrieved context) at the end. This maximizes the cacheable prefix. A 2000-token system prompt cached across 1000 daily turns saves 2M input tokens per day — at GPT-4o rates, that is ~$5/day or ~$150/month from one prompt-structure change.

Batch API: 50% off for non-real-time workloads

If your workload tolerates latency (eval runs, content generation, data labeling, document processing), the batch APIs halve your cost. All three providers offer this with a 24-hour SLA.

Provider	Batch API	Discount	SLA
OpenAI	Batch API	50%	Within 24 hours
Anthropic	Message Batches	50%	Within 24 hours
Google	Batch API (Gemini)	50%	Within 24 hours

The batch API stacks with prompt caching — a batch job with a shared system prompt gets both the 50% batch discount and the 75-90% cache discount on the prefix. For eval harnesses running thousands of test prompts with a shared instruction, this can cut eval cost by 80%+. See the cost optimization playbook for where batch fits in the full stack.

Long context: who handles 200K-2M tokens?

Context length is where the providers diverge sharply. Gemini 1.5 Pro supports up to 2 million tokens, Claude Sonnet supports 200K, and GPT-4o caps at 128K. For workloads that require loading entire codebases, long documents, or hours of transcripts into context, Gemini is often the only option — and its per-token price for long context is also the lowest.

# Effective cost to process a 500K-token document (input only) Gemini 1.5 Pro: 500K × $1.25/M = $0.625 Claude Sonnet: (not supported — 200K max) GPT-4o: (not supported — 128K max) # Gemini is the only frontier option above 200K tokens

Beware that long-context quality degrades on all models — the "lost in the middle" problem. Just because a model accepts 2M tokens does not mean it attends to all of them equally. Run your eval at the context lengths you actually use; do not assume a 2M context window gives you 2M tokens of reliable recall.

The case for multi-provider

Relying on a single provider is both a reliability risk (outages, rate limits) and a cost risk (you cannot benefit from price competition). A gateway like LiteLLM or Portkey lets you route by cost, fall back across providers on errors, and A/B-test quality continuously. The typical production setup uses Gemini Flash or GPT-4o-mini for bulk traffic, GPT-4o or Claude Sonnet for quality-sensitive traffic, and Gemini Pro for long-context work — all behind a single gateway endpoint.

Negotiated rates

At scale (typically >$50K/month committed spend), all three providers offer volume discounts off the list price — often 15-30%. If you are at that scale, do not pay list price. Use the gateway's usage data as leverage in negotiation and re-quote annually.

FAQ

Which is cheaper: OpenAI, Anthropic, or Gemini?

It depends on the tier. Frontier: Gemini Pro is cheapest (~$1.25/$5.00), then GPT-4o ($2.50/$10.00), then Claude Sonnet ($3.00/$15.00). Budget: Gemini Flash ($0.075/$0.30) is cheapest, then GPT-4o-mini ($0.15/$0.60). For 1M+ token contexts, Gemini is the only viable frontier option.

How much does prompt caching save?

Prompt caching cuts the cost of cached prefix tokens by 75-90% (OpenAI: 87.5% off, Anthropic: 90% off, Google: 75% off). For workloads with repeated system prompts (agents, RAG), blended input cost drops 40-60%. Requires a minimum cacheable prefix (1024-2048 tokens).

How much does the batch API save?

50% off list price, across all three providers, with a 24-hour completion SLA. It stacks with prompt caching for compounded savings. Use it for evals, content generation, and any non-real-time workload.

Should I use multiple providers?

Yes — for reliability and cost optimization. A gateway (LiteLLM, Portkey) lets you route by cost, fall back on outages, and benchmark quality continuously. The typical setup: budget model for bulk, frontier model for quality, Gemini Pro for long context.

Related deep dives

LLM Cost Optimization Playbook — the full 5-lever stack that builds on these prices
Model Routing — how to send traffic to the cheapest viable tier
Semantic Caching — eliminates the call entirely on repeated prompts
LiteLLM vs Portkey vs OpenRouter — the gateways that enable multi-provider routing

Sources

OpenAI pricing page, "GPT-4o and GPT-4o-mini API pricing," 2026
Anthropic pricing page, "Claude 3.5 Sonnet and Haiku API pricing," 2026
Google AI Studio / Vertex AI pricing, "Gemini 1.5 Pro and Flash API pricing," 2026
OpenAI, "Prompt Caching" documentation, 2024-2025
Anthropic, "Prompt Caching and Message Batches" documentation, 2024-2025
Artificial Analysis, "LLM Performance and Pricing Leaderboard," 2026 (independent cross-provider comparison)

Pricing changes frequently and varies by region, commitment level, and enterprise agreement. All figures are mid-2026 list prices from public provider pages. Verify current rates before any procurement decision.