aifinhub

Methodology · Tool · Last updated 2026-04-20

How Token-Cost Optimizer works

How the Token-Cost Optimizer prices LLM research loops.

Formulas

effective_call      = input_tokens × price_in + output_tokens × price_out
                      (Anthropic: cache-hit fraction priced at cache_read)
calls_per_idea      = calls × (1 + retry_rate)
cost_per_idea       = effective_call × calls_per_idea
cost_per_day        = cost_per_idea × ideas_per_day
cost_per_validated  = cost_per_idea / validation_rate
cost_per_year       = cost_per_day × 365

Pricing rate table (2026-04-20, USD per 1M tokens)

ModelInputOutputCache read
Claude Opus 4.7$15$75$1.50
Claude Sonnet 4.6$3$15$0.30
Claude Haiku 4.5$1$5$0.10
GPT-5$10$40
GPT-5 mini$2$8
o4-mini$3$12
Gemini 2.5 Pro$1.25$10
Gemini 2.5 Flash$0.30$2.50

Pricing sources

Assumptions + limitations

  1. Direct-API pricing only. Batch-API discounts (Anthropic 50%, OpenAI 50%) and enterprise rates are not modeled.
  2. Deterministic token counts. Real prompts have variance; use the average of a representative sample for input/output token counts.
  3. Cache hit rate applies only to Anthropic models with prompt caching. For other providers, that slider has no effect.
  4. No image or tool-use pricing. Multimodal inputs and tool-call round trips add tokens not counted in the calculator — add them to the input/output fields manually if material.
  5. Validation rate is a user estimate. Track it empirically for accuracy; the tool cannot infer it.
  6. Retry rate models transient failures + model re-prompts. Structured retries with larger context bumps are undercounted.

Changelog

  • 2026-04-20 — Initial release with 8 models across Anthropic, OpenAI, Google.
Planning estimates only — not financial, tax, or investment advice.