Skip to main content
aifinhub

Worked example

Running the shipped token-cost-optimizer engine on the input below produces exactly this output. Continuous integration recomputes it against the engine bundle on every build, so these numbers cannot drift from the code.

Input

{
  "tool": "token-cost-optimizer",
  "input_tokens_per_call": 6000,
  "output_tokens_per_call": 1200,
  "calls_per_idea": 4,
  "retry_rate": 0.15,
  "ideas_per_day": 20,
  "validation_rate": 0.2,
  "cache_hit_rate": 0.5
}

Output

{
  "model": {
    "id": "claude-sonnet-4-6",
    "provider": "anthropic",
    "name": "Claude Sonnet 4.6",
    "inputUsdPerMToken": 3,
    "outputUsdPerMToken": 15,
    "cacheWriteUsdPerMToken": 3.75,
    "cacheReadUsdPerMToken": 0.3,
    "contextWindow": 500000,
    "notes": "Best price/performance for bulk research loops."
  },
  "effectiveCostPerCall": 0.0279,
  "costPerIdea": 0.12833999999999998,
  "costPerValidatedTrade": 0.6416999999999998,
  "costPerDay": 2.5667999999999997,
  "costPerMonth": 77.00399999999999,
  "costPerYear": 936.882
}

Frequently asked questions

What does the Token-Cost Optimizer methodology page document?
LLM pricing rate table, formulas, assumptions, and limitations for AI Fin Hub's Token-Cost Optimizer. Source citations, assumption deltas, and as-of dates It states the formulas, assumptions, data sources, limitations, and reproducibility steps behind the Token-Cost Optimizer, in the Finance category.
When was the Token-Cost Optimizer methodology last reviewed?
This methodology was last reviewed on 2026-04-20. The matching tool is at https://aifinhub.io/token-cost-optimizer/.
Are the Token-Cost Optimizer numbers reproducible?
Yes. This page embeds a worked example whose output is the verbatim result of running the shipped token-cost-optimizer engine on a fixed input; the embedded JSON is recomputed and diffed against the engine in CI, so the numbers cannot drift from the code.

Methodology · Tool · Last updated 2026-04-20

How Token-Cost Optimizer works

How the Token-Cost Optimizer prices LLM research loops.

Formulas

effective_call      = input_tokens × price_in + output_tokens × price_out
                      (Anthropic: cache-hit fraction priced at cache_read)
calls_per_idea      = calls × (1 + retry_rate)
cost_per_idea       = effective_call × calls_per_idea
cost_per_day        = cost_per_idea × ideas_per_day
cost_per_validated  = cost_per_idea / validation_rate
cost_per_year       = cost_per_day × 365

Pricing rate table (2026-05-25, USD per 1M tokens)

ModelInputOutputCache read
Claude Opus 4.7$5$25$0.50
Claude Sonnet 4.6$3$15$0.30
Claude Haiku 4.5$1$5$0.10
GPT-5.5$5$30
GPT-5.4 mini$0.75$4.50
o4-mini$3$12
Gemini 2.5 Pro$1.25$10
Gemini 3.5 Flash$1.50$9
Gemini 2.5 Flash$0.30$2.50
Gemini 2.5 Flash-Lite$0.10$0.40

Pricing sources

Assumptions + limitations

  1. Direct-API pricing only. Batch-API discounts (Anthropic 50%, OpenAI 50%) and enterprise rates are not modeled.
  2. Deterministic token counts. Real prompts have variance; use the average of a representative sample for input/output token counts.
  3. Cache hit rate applies only to Anthropic models with prompt caching. For other providers, that slider has no effect.
  4. No image or tool-use pricing. Multimodal inputs and tool-call round trips add tokens not counted in the calculator — add them to the input/output fields manually if material.
  5. Validation rate is a user estimate. Track it empirically for accuracy; the tool cannot infer it.
  6. Retry rate models transient failures + model re-prompts. Structured retries with larger context bumps are undercounted.

Changelog

  • 2026-04-20 — Initial release with 8 models across Anthropic, OpenAI, Google.
  • 2026-05-25 — Refreshed rates to current list prices: Claude Opus 4.7 to $5/$25 (cache read $0.50), GPT-5.5 to $5/$30, GPT-5.4 mini to $0.75/$4.50.
  • 2026-05-25 — Added Gemini 3.5 Flash ($1.50/$9, a frontier agent-tier) and Gemini 2.5 Flash-Lite ($0.10/$0.40, the cheapest tier); table now tracks 10 models.
Planning estimates only — not financial, tax, or investment advice.