Engine-computed reference · 6×5 grid · 30 cells

Token Cost vs Cache Hit Reference Grid

The monthly cost of a fixed LLM research loop for every combination of input-token size and prompt-cache hit rate across this grid, priced on Claude Sonnet 4.6. Each cell is a live run of the Token Cost Optimizer engine; no value on this page was entered by hand.

Monthly cost scales linearly with input-token size and falls steeply as the prompt-cache hit rate rises, because cached input reads price far below fresh input tokens. Across this grid the monthly cost runs from $128 (smallest prompt at 2000 tokens, 90% cache) up to $1158 (heaviest prompt at 64000 tokens, 0% cache). At the heaviest prompt, lifting the cache hit rate from 0% to 90% cuts the bill by 73% — the single largest lever on this surface. For how to amortize the cache write across a loop, see token-cost cache amortization. Education only — not investment advice.

Monthly cost by input tokens × cache hit rate

Rows = input tokens per call. Columns = prompt-cache hit rate. Each value is the engine's costPerMonth output in USD.

Monthly research-loop cost for each input-token size and cache hit rate.
tokens \ cache	0%	25%	50%	75%	90%
2,000	$154	$147	$139	$132	$128
4,000	$186	$172	$157	$143	$134
8,000	$251	$222	$193	$164	$146
16,000	$381	$322	$264	$206	$171
32,000	$640	$523	$407	$290	$220
64,000	$1158	$925	$692	$458	$318

Headline metric: monthly cost (USD). The CSV download below also carries the cost per idea, cost per validated trade, and effective cost per call.

Download CSV (30 rows)

Provenance

Engine: Token Cost Optimizer (token-cost-optimizer) — computed live from /engines/token-cost-optimizer.js
Grid: input tokens ∈ {2000, 4000, 8000, 16000, 32000, 64000} × cache hit ∈ {0%, 25%, 50%, 75%, 90%} = 30 cells
Fixed inputs: model=claude-sonnet-4-6, output_tokens=1500, calls_per_idea=5, retry_rate=0.2, ideas_per_day=30, validation_rate=0.25
Computed: 2026-05-23, recomputed in CI on every build

The engine is deterministic: token accounting and cache-read pricing are closed-form, so the same input always returns the same output. The full method — the per-call token math, the retry multiplier, and the cache discount — is documented at the Token Cost Optimizer methodology page. For tying spend back to a per-trade unit, see the cost-per-validated-trade framework.

Reference grids are planning aids, not financial, tax, or investment advice. Prices reflect the engine's published rate card and may differ from your billing.