Engine-computed reference · 6×5 grid · 30 cells
Token Cost vs Cache Hit Reference Grid
The monthly cost of a fixed LLM research loop for every combination of input-token size and prompt-cache hit rate across this grid, priced on Claude Sonnet 4.6. Each cell is a live run of the Token Cost Optimizer engine; no value on this page was entered by hand.
Monthly cost scales linearly with input-token size and falls steeply as the prompt-cache hit rate rises, because cached input reads price far below fresh input tokens. Across this grid the monthly cost runs from $128 (smallest prompt at 2000 tokens, 90% cache) up to $1158 (heaviest prompt at 64000 tokens, 0% cache). At the heaviest prompt, lifting the cache hit rate from 0% to 90% cuts the bill by 73% — the single largest lever on this surface. For how to amortize the cache write across a loop, see token-cost cache amortization. Education only — not investment advice.
Monthly cost by input tokens × cache hit rate
Rows = input tokens per call. Columns = prompt-cache hit rate. Each value
is the engine's
costPerMonth output in USD.
| tokens \ cache | 0% | 25% | 50% | 75% | 90% |
|---|---|---|---|---|---|
| 2,000 | $154 | $147 | $139 | $132 | $128 |
| 4,000 | $186 | $172 | $157 | $143 | $134 |
| 8,000 | $251 | $222 | $193 | $164 | $146 |
| 16,000 | $381 | $322 | $264 | $206 | $171 |
| 32,000 | $640 | $523 | $407 | $290 | $220 |
| 64,000 | $1158 | $925 | $692 | $458 | $318 |
Headline metric: monthly cost (USD). The CSV download below also carries the cost per idea, cost per validated trade, and effective cost per call.
Provenance
- Engine
- Token Cost Optimizer
(
token-cost-optimizer) — computed live from/engines/token-cost-optimizer.js - Grid
- input tokens ∈ {2000, 4000, 8000, 16000, 32000, 64000} × cache hit ∈ {0%, 25%, 50%, 75%, 90%} = 30 cells
- Fixed inputs
- model=claude-sonnet-4-6, output_tokens=1500, calls_per_idea=5, retry_rate=0.2, ideas_per_day=30, validation_rate=0.25
- Computed
- 2026-05-23, recomputed in CI on every build
The engine is deterministic: token accounting and cache-read pricing are closed-form, so the same input always returns the same output. The full method — the per-call token math, the retry multiplier, and the cache discount — is documented at the Token Cost Optimizer methodology page. For tying spend back to a per-trade unit, see the cost-per-validated-trade framework.
Reference grids are planning aids, not financial, tax, or investment advice. Prices reflect the engine's published rate card and may differ from your billing.