Calculator

Token-Cost Optimizer

Name: Token-Cost Optimizer
Author: AI Fin Hub Research

LLM trading research loop cost calculator. Prompt length × model × retry × call volume → dollar cost per idea and per validated trade. Browser-only. Free.

AI Fin Hub Research Published Apr 20, 2026 Methodology Corrections

Transparent by design — computed in your browser from a published formula and sourced rates, not a black box. Data verified May 25, 2026. Sources: Anthropic pricing ↗ · OpenAI pricing ↗ · Google AI / Gemini pricing ↗ Full methodology →

Inputs: Form inputs / CSV
Runtime: Instant
Privacy: Client-side · no upload
API key: Not required
Methodology: Open →

Education · Not investment advice. BaFin/EU framework. Past performance does not indicate future results. Editorial standards Sponsor disclosure Corrections

1 · Configure your research loop

Primary modelInput tokens per callOutput tokens per callCalls per idea

Retry rate15%

Ideas per day

Validation rate30%

Cache hit rate (Anthropic)50%

Monthly cost

$62

Claude Sonnet 4.6 · cache hit 50% · 25.5× cheapest (Gemini 2.5 Flash-Lite)

Annual run-rate: $749

Unit cost breakdown

Per call

$0.036

effective, after cache

Per idea

$0.205

5 calls × retries

Per validated trade

$0.684

30% pass rate

2 · Model comparison at these inputs

Model	Per call	Per idea	Per month	Per validated trade
Gemini 2.5 Flash-Litegoogle	$0.00140	$0.00805	$2	$0.027
Gemini 2.5 Flashgoogle	$0.00615	$0.035	$11	$0.118
Claude Haiku 4.5anthropic	$0.012	$0.068	$21	$0.228
GPT-5.4 miniopenai	$0.013	$0.073	$22	$0.244
Gemini 2.5 Progoogle	$0.025	$0.144	$43	$0.479
Gemini 3.5 Flashgoogle	$0.026	$0.147	$44	$0.489
Claude Sonnet 4.6anthropicprimary	$0.036	$0.205	$62	$0.684
o4-mini (reasoning)openai	$0.042	$0.242	$72	$0.805
Claude Opus 4.8anthropic	$0.059	$0.342	$103	$1.14
GPT-5.5openai	$0.085	$0.489	$147	$1.63

How the cost flows

effective_call   = input × price_in + output × price_out
                   (Anthropic: cache-hit fraction priced at cache_read)
calls_per_idea   = calls × (1 + retry_rate)
cost_per_idea    = effective_call × calls_per_idea
cost_per_day     = cost_per_idea × ideas_per_day
cost_per_val     = cost_per_idea / validation_rate
cost_per_year    = cost_per_day × 365

Pricing last verified 2026-04-20. See methodology for the full rate table with vendor-page sources.

How to use

Step-by-step

Full calculator guide →

1
Enter prompt length (input tokens), expected output length (output tokens), call volume (calls per period), and pick the model. Cache hit rate is optional but commonly skipped — fill it in if you have a high-repeat prefix.
2
Read total cost per call and per period. Check input vs. output cost split — long-context, short-output workloads (research) skew to input cost; long-output workloads (content) skew to output.
3
Compare across models by switching the dropdown. Sonnet is typically 5x cheaper than Opus at similar quality; Haiku another 4x cheaper than Sonnet.
4
Toggle prompt caching. If your input has a stable prefix (system prompt, schema), caching the prefix saves 90% on those tokens.
5
Re-run with worst-case retry assumptions. A retry-on-error workflow can 2-4x the cost — important to budget for.

For agents

Use in an agent

Same math, same result shape as the UI above — as a static ES module. No HTTP request, no auth, no rate limit.

import { compute } from "https://aifinhub.io/engines/token-cost-optimizer.js";

Contract: /contracts/token-cost-optimizer.json Full agent guide →

Glossary references

Terms used by this tool

All glossary →

Questions people ask next

FAQ

How are model prices kept current?

The tool reads from a versioned price table that is manually updated when Anthropic, OpenAI, or Google publishes new pricing. The methodology page shows the asOfDate. Prices change infrequently (typically once per quarter); when they change, the new rates apply on the asOfDate forward — historical comparisons use the rates that were in effect at the time.

What's the difference between input and output token cost?

Input tokens are what you send to the model (prompt, system message, tool definitions); output tokens are what the model generates. Output is typically 3-5× more expensive than input. For research loops with long context but short answers, input cost dominates; for content generation it flips.

Does the tool account for prompt caching?

Yes. When the input is a prefix that repeats across many calls, Anthropic prompt caching (and OpenAI prompt caching) discounts the cached tokens by 90%. The tool exposes a 'cached input' field — if you set cache hit rate to 80%, the cost falls accordingly. The methodology page shows the discount factors per provider.

How do I estimate token count from text?

Rule of thumb: 1 token ≈ 4 characters in English, or 0.75 words. The tool uses 4 chars/token as the default but lets you override per-call. For accuracy, use the provider's tokenizer (tiktoken for OpenAI, the Claude tokenizer endpoint, or Gemini's tokenize API). Token counts vary by language: 1 token ≈ 2-3 chars in CJK text.

What's a 'research loop' cost in practice?

A typical agent research loop runs the model 5-30 times per task: tool selection, retrieval, summarization, error recovery. The tool multiplies per-call cost by call volume to estimate total cost per 'idea evaluated'. For high-frequency strategies that re-run the loop hourly, monthly LLM bills can rival market-data subscriptions — that's the use case the tool was built for.

Related deep dive

All articles →

Read further

Long-form context behind the tool output.

Used in

Decision workflows that use this tool

Goal-driven flows that bundle this tool with adjacent ones.

Plan Your Agent Stack
Estimate first-year cost for an LLM agent — token budget, vendor selection, MCP servers.
Open

Complementary tools

Prompt Regression Tester

Run the same prompt against multiple models (Claude 4.5/4.6/4.7, GPT-5, Gemini 2.5) with your own keys. Diff outputs, score drift, catch regressions.

Playgrounds Open

Data-Vendor TCO Calculator

Compute annual cost of market data across Databento, Polygon, Alpaca, Tiingo, FMP, and Alpha Vantage for your exact universe, bar resolution, history.

Comparators Open

Agent Skill Tester for Markets

Paste a SKILL.md definition + sample input + your Anthropic API key. See structured extraction, token cost, and latency — all in your browser. No signup.