Skip to main content
aifinhub
AI in Markets Formula

Prompt Cache Break-Even Formula

Prompt caching stores a prompt prefix so repeated calls pay a discounted cached-read rate instead of the full input rate, but writing the cache costs a premium over the normal input rate. The break-even is the number of cache reads needed for the per-read savings to repay the one-time write premium. Above it, caching is cheaper; below it, the write premium is wasted.

By AI Fin Hub Research · AI Fin Hub Team
Best Next MoveCalculators

Token-Cost Optimizer

Compute the dollar cost of a trading research loop across Claude, GPT, and Gemini. Prompt length × model × retry × call volume → cost per idea and per.

CalculatorOpen ->

On This Page

Formula

Copy the exact expression or work through it step by step below.

Reads_breakeven = (P_write - P_in) / (P_in - P_read) where P_write = cache-write price, P_in = standard input price, P_read = cached-read price (all per token over the cached prefix)

Variables

P_write

Cache-write price

Per-token price to write the prefix into the cache, charged once when the cache is created or refreshed. It is a premium over the standard input price, commonly around 1.25x.

P_in

Standard input price

Per-token price you would pay for the prefix on every call without caching. It is the baseline the cache is competing against.

P_read

Cached-read price

Discounted per-token price for reading the cached prefix on subsequent calls, often 0.1x the standard input price. The gap between P_in and P_read is the per-read saving.

Reads_breakeven

Break-even read count

The number of cached reads at which cumulative savings equal the write premium. Reuse the cached prefix more times than this within its time-to-live and caching is net cheaper.

Step By Step

  1. 1

    Identify the three per-token prices for the cached prefix: write, standard input, and cached read.

    P_write = 6.25, P_in = 5.00, P_read = 0.50 (per million tokens).

  2. 2

    Compute the write premium, the extra paid once to populate the cache.

    P_write - P_in = 6.25 - 5.00 = 1.25 per million.

  3. 3

    Compute the per-read saving, what each cached read saves versus the standard rate.

    P_in - P_read = 5.00 - 0.50 = 4.50 per million.

  4. 4

    Divide the write premium by the per-read saving to get the break-even read count.

    1.25 / 4.50 = 0.278, so the very first cached read already repays the write premium.

Worked Example

Caching a large fixed system prompt and document context reused across queries

Cache-write price (per 1M)

6.25

Standard input price (per 1M)

5.00

Cached-read price (per 1M)

0.50

Write premium = 6.25 - 5.00 = 1.25. Per-read saving = 5.00 - 0.50 = 4.50. Break-even reads = 1.25 / 4.50 = 0.278. Since you cannot read a fraction, the first cached read (read number 1) already saves 4.50 - 1.25 = 3.25 net versus never caching.

Break-even is well under one read: with a typical 1.25x write and 0.1x read multiplier, caching pays off on the very first reuse. The practical constraint is not the break-even count but the cache time-to-live: if the prefix is not reused before it expires (often a few minutes), you pay the write premium with no offsetting reads. Cache only prefixes you will hit again within the TTL.

Common Variations

Per-call amortized cost: total cost over N calls is P_write x prefix + (N-1) x P_read x prefix + variable suffix, useful for budgeting a whole session.
TTL-constrained reuse: discount the expected reuse count by the probability the prefix is hit again before the cache expires.
Multi-tier prefixes: cache only the stable head (system prompt, reference docs) while leaving volatile context uncached.

Try These Tools

Run the numbers next

Sources & References

Related Content

Keep the topic connected

Planning estimates only — not financial, tax, or investment advice.