Prompt Cache Break-Even Formula
Prompt caching stores a prompt prefix so repeated calls pay a discounted cached-read rate instead of the full input rate, but writing the cache costs a premium over the normal input rate. The break-even is the number of cache reads needed for the per-read savings to repay the one-time write premium. Above it, caching is cheaper; below it, the write premium is wasted.
Formula
Copy the exact expression or work through it step by step below.
Reads_breakeven = (P_write - P_in) / (P_in - P_read)
where P_write = cache-write price, P_in = standard input price, P_read = cached-read price (all per token over the cached prefix) Variables
P_write
Cache-write price
Per-token price to write the prefix into the cache, charged once when the cache is created or refreshed. It is a premium over the standard input price, commonly around 1.25x.
P_in
Standard input price
Per-token price you would pay for the prefix on every call without caching. It is the baseline the cache is competing against.
P_read
Cached-read price
Discounted per-token price for reading the cached prefix on subsequent calls, often 0.1x the standard input price. The gap between P_in and P_read is the per-read saving.
Reads_breakeven
Break-even read count
The number of cached reads at which cumulative savings equal the write premium. Reuse the cached prefix more times than this within its time-to-live and caching is net cheaper.
Step By Step
- 1
Identify the three per-token prices for the cached prefix: write, standard input, and cached read.
P_write = 6.25, P_in = 5.00, P_read = 0.50 (per million tokens).
- 2
Compute the write premium, the extra paid once to populate the cache.
P_write - P_in = 6.25 - 5.00 = 1.25 per million.
- 3
Compute the per-read saving, what each cached read saves versus the standard rate.
P_in - P_read = 5.00 - 0.50 = 4.50 per million.
- 4
Divide the write premium by the per-read saving to get the break-even read count.
1.25 / 4.50 = 0.278, so the very first cached read already repays the write premium.
Worked Example
Caching a large fixed system prompt and document context reused across queries
Cache-write price (per 1M)
6.25
Standard input price (per 1M)
5.00
Cached-read price (per 1M)
0.50
Write premium = 6.25 - 5.00 = 1.25. Per-read saving = 5.00 - 0.50 = 4.50. Break-even reads = 1.25 / 4.50 = 0.278. Since you cannot read a fraction, the first cached read (read number 1) already saves 4.50 - 1.25 = 3.25 net versus never caching.
Break-even is well under one read: with a typical 1.25x write and 0.1x read multiplier, caching pays off on the very first reuse. The practical constraint is not the break-even count but the cache time-to-live: if the prefix is not reused before it expires (often a few minutes), you pay the write premium with no offsetting reads. Cache only prefixes you will hit again within the TTL.
Common Variations
Try These Tools
Run the numbers next
Batch vs Real-Time Cost Calculator
Jobs per day, tokens per job, model, deadline — get real-time vs batch cost side-by-side with savings estimate and batch-eligibility flag. Based.
Financial Document Token Estimator
Paste a 10-K, 10-Q, 8-K or earnings transcript and see token count + one-pass extraction cost across eight frontier LLMs, with cache-hit toggle.
Sources & References
- Prompt Caching with Claude — Anthropic
- Prompt Caching (OpenAI API) — OpenAI
Related Content
Keep the topic connected
Cost Per 1K Tokens Formula
The cost-per-1K-tokens formula: input tokens times input price plus output tokens times output price. Why output dominates LLM cost, with an example.
Cost Per Validated Trade Formula
The cost-per-validated-trade formula: total LLM spend over trades that pass validation. The real unit economics of an AI trading agent.
Agent-Cost Envelope
The agent-cost envelope: the loop of (calls × tokens × retries × model_price) that determines the dollar cost of an LLM-driven trading agent per decision.
MCP (Model Context Protocol)
Model Context Protocol: Anthropic's open standard for letting LLMs discover and call tools — the interface, why it matters, and finance MCP server checks.