AI in Markets Formula

Prompt Cache Break-Even Formula

Prompt caching stores a prompt prefix so repeated calls pay a discounted cached-read rate instead of the full input rate, but writing the cache costs a premium over the normal input rate. The break-even is the number of cache reads needed for the per-read savings to repay the one-time write premium. Above it, caching is cheaper; below it, the write premium is wasted.

4 VARIABLESPublished May 26, 2026Live Content

By AI Fin Hub Research · AI Fin Hub Team

Best Next MoveCalculators

Token-Cost Optimizer

Compute the dollar cost of a trading research loop across Claude, GPT, and Gemini. Prompt length × model × retry × call volume → cost per idea and per.

CalculatorOpen ->

On This Page

Formula 4 variables Worked example Variations

Formula

Copy the exact expression or work through it step by step below.

 Reads_breakeven = (P_write - P_in) / (P_in - P_read)

where P_write = cache-write price, P_in = standard input price, P_read = cached-read price (all per token over the cached prefix)

Variables

P_write

Cache-write price

Per-token price to write the prefix into the cache, charged once when the cache is created or refreshed. It is a premium over the standard input price, commonly around 1.25x.

P_in

Standard input price

Per-token price you would pay for the prefix on every call without caching. It is the baseline the cache is competing against.

P_read

Cached-read price

Discounted per-token price for reading the cached prefix on subsequent calls, often 0.1x the standard input price. The gap between P_in and P_read is the per-read saving.

Reads_breakeven

Break-even read count

The number of cached reads at which cumulative savings equal the write premium. Reuse the cached prefix more times than this within its time-to-live and caching is net cheaper.

Step By Step

1

Identify the three per-token prices for the cached prefix: write, standard input, and cached read.

P_write = 6.25, P_in = 5.00, P_read = 0.50 (per million tokens).
2

Compute the write premium, the extra paid once to populate the cache.

P_write - P_in = 6.25 - 5.00 = 1.25 per million.
3

Compute the per-read saving, what each cached read saves versus the standard rate.

P_in - P_read = 5.00 - 0.50 = 4.50 per million.
4

Divide the write premium by the per-read saving to get the break-even read count.

1.25 / 4.50 = 0.278, so the very first cached read already repays the write premium.

Worked Example

Caching a large fixed system prompt and document context reused across queries

Cache-write price (per 1M)

6.25

Standard input price (per 1M)

5.00

Cached-read price (per 1M)

0.50

Write premium = 6.25 - 5.00 = 1.25. Per-read saving = 5.00 - 0.50 = 4.50. Break-even reads = 1.25 / 4.50 = 0.278. Since you cannot read a fraction, the first cached read (read number 1) already saves 4.50 - 1.25 = 3.25 net versus never caching.

Break-even is well under one read: with a typical 1.25x write and 0.1x read multiplier, caching pays off on the very first reuse. The practical constraint is not the break-even count but the cache time-to-live: if the prefix is not reused before it expires (often a few minutes), you pay the write premium with no offsetting reads. Cache only prefixes you will hit again within the TTL.

Common Variations

Per-call amortized cost: total cost over N calls is P_write x prefix + (N-1) x P_read x prefix + variable suffix, useful for budgeting a whole session.

TTL-constrained reuse: discount the expected reuse count by the probability the prefix is hit again before the cache expires.

Multi-tier prefixes: cache only the stable head (system prompt, reference docs) while leaving volatile context uncached.

Try These Tools

Run the numbers next

CalculatorsCalculator

Batch vs Real-Time Cost Calculator

Jobs per day, tokens per job, model, deadline — get real-time vs batch cost side-by-side with savings estimate and batch-eligibility flag. Based.

Launch toolOpen ->

CalculatorsCalculator

Financial Document Token Estimator

Paste a 10-K, 10-Q, 8-K or earnings transcript and see token count + one-pass extraction cost across ten frontier LLMs, with cache-hit toggle.

Launch toolOpen ->

Sources & References

Prompt Caching with Claude — Anthropic
Prompt Caching (OpenAI API) — OpenAI

Keep the topic connected

AI in Markets4 VARIABLES

Cost Per 1K Tokens Formula

The cost-per-1K-tokens formula: input tokens times input price plus output tokens times output price. Why output dominates LLM cost, with an example.

Keep readingRead ->

AI in Markets4 VARIABLES

Cost Per Validated Trade Formula

The cost-per-validated-trade formula: total LLM spend over trades that pass validation. The real unit economics of an AI trading agent.

Keep readingRead ->

AI in Markets1 FAQS

Agent-Cost Envelope

The agent-cost envelope: the loop of (calls × tokens × retries × model_price) that determines the dollar cost of an LLM-driven trading agent per decision.

Keep readingRead ->

AI in Markets2 FAQS

MCP (Model Context Protocol)

Model Context Protocol: Anthropic's open standard for letting LLMs discover and call tools — the interface, why it matters, and finance MCP server checks.

Keep readingRead ->