Methodology · Tool · Last updated 2026-04-20
How Token-Cost Optimizer works
How the Token-Cost Optimizer prices LLM research loops.
Formulas
effective_call = input_tokens × price_in + output_tokens × price_out
(Anthropic: cache-hit fraction priced at cache_read)
calls_per_idea = calls × (1 + retry_rate)
cost_per_idea = effective_call × calls_per_idea
cost_per_day = cost_per_idea × ideas_per_day
cost_per_validated = cost_per_idea / validation_rate
cost_per_year = cost_per_day × 365 Pricing rate table (2026-04-20, USD per 1M tokens)
| Model | Input | Output | Cache read |
|---|---|---|---|
| Claude Opus 4.7 | $15 | $75 | $1.50 |
| Claude Sonnet 4.6 | $3 | $15 | $0.30 |
| Claude Haiku 4.5 | $1 | $5 | $0.10 |
| GPT-5 | $10 | $40 | — |
| GPT-5 mini | $2 | $8 | — |
| o4-mini | $3 | $12 | — |
| Gemini 2.5 Pro | $1.25 | $10 | — |
| Gemini 2.5 Flash | $0.30 | $2.50 | — |
Pricing sources
Assumptions + limitations
- Direct-API pricing only. Batch-API discounts (Anthropic 50%, OpenAI 50%) and enterprise rates are not modeled.
- Deterministic token counts. Real prompts have variance; use the average of a representative sample for input/output token counts.
- Cache hit rate applies only to Anthropic models with prompt caching. For other providers, that slider has no effect.
- No image or tool-use pricing. Multimodal inputs and tool-call round trips add tokens not counted in the calculator — add them to the input/output fields manually if material.
- Validation rate is a user estimate. Track it empirically for accuracy; the tool cannot infer it.
- Retry rate models transient failures + model re-prompts. Structured retries with larger context bumps are undercounted.
Changelog
- 2026-04-20 — Initial release with 8 models across Anthropic, OpenAI, Google.