OpenAI prices cached input at 10% of the standard input rate in 2026: GPT-5.5 cached input is $0.50/Mtoken versus $5.00 uncached, and GPT-5.4 is $0.25 versus $2.50. Caching is automatic (no code change, prefix-based) for qualifying prompts. Anthropic's model is similar on reads (cache hit at 0.1x input) but charges a separate cache write (1.25x for the 5-minute tier). For a reused 50K-token system prompt, caching pays off after the first reuse. Model the break-even with the Token Cost Optimizer.
TL;DR
| Provider | Cached input | Standard input | Discount | Write cost | Mechanism |
|---|---|---|---|---|---|
| OpenAI GPT-5.5 | $0.50 / Mtok | $5.00 / Mtok | 90% | none separate | Automatic, prefix-based |
| OpenAI GPT-5.4 | $0.25 / Mtok | $2.50 / Mtok | 90% | none separate | Automatic, prefix-based |
| Anthropic Sonnet 4.6 | $0.30 / Mtok | $3.00 / Mtok | 90% | $3.75 / Mtok (5m write) | Explicit or automatic cache_control |
| Anthropic Opus 4.7 | $0.50 / Mtok | $5.00 / Mtok | 90% | $6.25 / Mtok (5m write) | Explicit or automatic cache_control |
All figures verified 2026-05-25 against the official OpenAI and Anthropic pricing pages. Both providers give a 90% read discount; the structural difference is the cache-write charge.
OpenAI cached-input pricing
OpenAI's pricing page lists a dedicated "Cached input" column. Verified 2026-05-25:
- GPT-5.5: input $5.00, cached input $0.50, output $30.00 per Mtoken.
- GPT-5.4: input $2.50, cached input $0.25, output $15.00 per Mtoken.
- GPT-5.4-mini: input $0.75, cached input $0.075, output $4.50 per Mtoken.
- GPT-5.4-nano: input $0.20, cached input $0.02, output $1.25 per Mtoken.
The pattern is consistent: cached input is exactly 10% of standard input across the GPT-5 family, a flat 90% discount on the reused prefix. There is no separate cache-write line item; you pay standard input the first time and cached input on subsequent hits.
Is OpenAI caching automatic?
Yes. OpenAI prompt caching is automatic and requires no code change: when a request shares a long prefix with a recent prior request, the matching prefix is served from cache at the cached-input rate. It is prefix-based, so the cacheable content (system prompt, instructions, few-shot examples, a pinned document) should sit at the front of the prompt with the variable content after it. Cache entries are short-lived; OpenAI's caches persist on the order of minutes, so caching helps bursty repeated calls, not requests spread hours apart. Confirm the current minimum-prefix-length and TTL details on the official caching guide before tuning.
Anthropic for comparison
Anthropic's prompt caching reaches the same 90% read discount but structures it differently. Verified 2026-05-25:
- Cache read (hit): 0.1x base input. Sonnet 4.6 reads at $0.30/Mtok, Opus 4.7 at $0.50/Mtok, Haiku 4.5 at $0.10/Mtok.
- Cache write: a separate charge. The 5-minute write is 1.25x base input (Sonnet $3.75, Opus $6.25), the 1-hour write is 2x.
So Anthropic makes the write cost explicit, while OpenAI folds the write into the standard first-call input price. The economics work out close, but Anthropic's model rewards careful breakpoint placement and gives an explicit 1-hour cache option that OpenAI's automatic short-lived cache does not advertise.
The break-even, computed live
The Token Cost Optimizer computes the per-call and monthly cost with caching applied to the reused prefix. The scenario below models a 50K-token reused system prompt at a 90% cache-hit rate on Claude Sonnet 4.6 across 200 calls/day, the canonical finance-research-loop setup where the same instructions and schema repeat on every call. The verified output block at the foot of the page is computed live from the shipped engine bundle.
The break-even logic: with a 90% read discount, caching beats no-cache after a single reuse for the short-lived (5-minute / OpenAI-automatic) tier. For Anthropic's 1-hour write (2x), it pays off after two reads. For a system prompt reused hundreds of times a day, the discount is effectively the full 90% on the cached portion. The OpenAI side of the engine does not model the cache rate (the bundle lacks OpenAI cache fields), so use the verified OpenAI cached-input prices in the table above for GPT-5 family math.
Break-even reuse count for a 50K-token prompt
Concretely, for a 50K-token cacheable prefix:
- Uncached: every call pays full input on all 50K tokens.
- OpenAI automatic: first call full input, each subsequent in-window call pays 10% on the 50K prefix. Net win from the second call onward.
- Anthropic 5-minute: first call pays 1.25x on the write, each hit pays 0.1x. Win after one read.
- Anthropic 1-hour: first call pays 2x on the write, each hit pays 0.1x. Win after two reads.
For any workload that reuses the prefix more than twice inside the cache window, caching is a clear saving. The only case where it does not help is single-shot prompts with no shared prefix.
Decision guidance
- Bursty repeated calls with a shared prefix (finance research loop): caching is a near-free 90% discount on the reused portion; enable it.
- OpenAI stack: caching is automatic; just put stable content at the front of the prompt.
- Anthropic stack: place cache_control breakpoints deliberately; use the 1-hour write if reuse spans more than a few minutes.
- Single-shot, unique prompts: caching does not help; do not over-engineer for it.
Related in this series
- Best LLM for Financial Analysis 2026: the task-tiered model picks.
- Cheapest LLM for SEC Filings 2026: caching the repeated filing boilerplate.
- Claude vs GPT-5 vs Gemini for Financial Analysis 2026: the three-way pricing head-to-head.
- Prompt Caching Economics for Finance: the methodology behind cache break-even.
Connects to
- Token Cost Optimizer: the engine behind this page's break-even block.
- Model Selector for Finance: which model to cache against.
- Caching Strategies for LLM Pipelines 2026: cache-architecture patterns.
References
- OpenAI. "API Pricing." developers.openai.com/api/docs/pricing, verified 2026-05-25 (GPT-5.5 cached $0.50 / input $5.00; GPT-5.4 cached $0.25 / input $2.50).
- OpenAI. "Prompt caching guide." platform.openai.com/docs/guides/prompt-caching, accessed 2026-05-25 (automatic, prefix-based).
- Anthropic. "Pricing — prompt caching." platform.claude.com/docs/en/about-claude/pricing, verified 2026-05-25 (read 0.1x, 5m write 1.25x, 1h write 2x; Sonnet read $0.30, Opus read $0.50).
Verified engine output
Show the recompute-verified inputs and outputs
| model_id | claude-sonnet-4-6 |
|---|---|
| input_tokens_per_call | 50000 |
| output_tokens_per_call | 1500 |
| calls_per_idea | 1 |
| retry_rate | 0 |
| ideas_per_day | 200 |
| validation_rate | 0.2 |
| cache_hit_rate | 0.9 |
| model › id | claude-sonnet-4-6 |
|---|---|
| model › provider | anthropic |
| model › name | Claude Sonnet 4.6 |
| model › input usd per mtoken | 3 |
| model › output usd per mtoken | 15 |
| model › cache write usd per mtoken | 3.75 |
| model › cache read usd per mtoken | 0.3 |
| model › context window | 500000 |
| model › notes | Best price/performance for bulk research loops. |
| effective cost per call | 0.051 |
| cost per idea | 0.051 |
| cost per validated trade | 0.25499999999999995 |
| cost per day | 10.2 |
| cost per month | 306 |
| cost per year | 3722.9999999999995 |
Computed live at build time.
Frequently asked questions
- What is OpenAI's prompt caching discount in 2026?
- Cached input is 10% of the standard input rate (a 90% discount) across the GPT-5 family: GPT-5.5 cached input is $0.50/Mtok vs $5.00 uncached; GPT-5.4 is $0.25 vs $2.50 (verified 2026-05-25).
- Is OpenAI prompt caching automatic?
- Yes. It is automatic and prefix-based, requiring no code change. Put stable content (system prompt, instructions, pinned documents) at the front so the variable content does not break the cacheable prefix.
- How does OpenAI caching compare to Anthropic's?
- Both give a 90% read discount. OpenAI has no separate cache-write charge (first call is standard input); Anthropic charges an explicit cache write (1.25x for 5-minute, 2x for 1-hour) and offers a longer 1-hour cache tier.
- When does prompt caching pay off?
- After the first reuse for the short-lived OpenAI/Anthropic-5-minute tiers, and after two reads for Anthropic's 1-hour write. For a system prompt reused many times a day, the saving is effectively the full 90% on the cached portion.