Skip to main content
aifinhub

Engine-computed reference · 6×5 grid · 30 cells

Token Cost vs Cache Hit Reference Grid

The monthly cost of a fixed LLM research loop for every combination of input-token size and prompt-cache hit rate across this grid, priced on Claude Sonnet 4.6. Each cell is a live run of the Token Cost Optimizer engine; no value on this page was entered by hand.

Monthly cost scales linearly with input-token size and falls steeply as the prompt-cache hit rate rises, because cached input reads price far below fresh input tokens. Across this grid the monthly cost runs from $128 (smallest prompt at 2000 tokens, 90% cache) up to $1158 (heaviest prompt at 64000 tokens, 0% cache). At the heaviest prompt, lifting the cache hit rate from 0% to 90% cuts the bill by 73% — the single largest lever on this surface. For how to amortize the cache write across a loop, see token-cost cache amortization. Education only — not investment advice.

Monthly cost by input tokens × cache hit rate

Rows = input tokens per call. Columns = prompt-cache hit rate. Each value is the engine's costPerMonth output in USD.

Monthly research-loop cost for each input-token size and cache hit rate.
tokens \ cache 0%25%50%75%90%
2,000 $154 $147 $139 $132 $128
4,000 $186 $172 $157 $143 $134
8,000 $251 $222 $193 $164 $146
16,000 $381 $322 $264 $206 $171
32,000 $640 $523 $407 $290 $220
64,000 $1158 $925 $692 $458 $318

Headline metric: monthly cost (USD). The CSV download below also carries the cost per idea, cost per validated trade, and effective cost per call.

Download CSV (30 rows)

Provenance

Engine
Token Cost Optimizer (token-cost-optimizer) — computed live from /engines/token-cost-optimizer.js
Grid
input tokens ∈ {2000, 4000, 8000, 16000, 32000, 64000} × cache hit ∈ {0%, 25%, 50%, 75%, 90%} = 30 cells
Fixed inputs
model=claude-sonnet-4-6, output_tokens=1500, calls_per_idea=5, retry_rate=0.2, ideas_per_day=30, validation_rate=0.25
Computed
2026-05-23, recomputed in CI on every build

The engine is deterministic: token accounting and cache-read pricing are closed-form, so the same input always returns the same output. The full method — the per-call token math, the retry multiplier, and the cache discount — is documented at the Token Cost Optimizer methodology page. For tying spend back to a per-trade unit, see the cost-per-validated-trade framework.

Reference grids are planning aids, not financial, tax, or investment advice. Prices reflect the engine's published rate card and may differ from your billing.

Planning estimates only — not financial, tax, or investment advice.