Calculator
Token-Cost Optimizer
LLM trading research loop cost calculator. Prompt length × model × retry × call volume → dollar cost per idea and per validated trade. Browser-only. Free.
Transparent by design — computed in your browser from a published formula and sourced rates, not a black box. Data verified May 25, 2026. Sources: Anthropic pricing ↗ · OpenAI pricing ↗ · Google AI / Gemini pricing ↗ Full methodology →
- Inputs
- Form inputs / CSV
- Runtime
- Instant
- Privacy
- Client-side · no upload
- API key
- Not required
- Methodology
- Open →
1 · Configure your research loop
Monthly cost
$62
Claude Sonnet 4.6 · cache hit 50% · 25.5× cheapest (Gemini 2.5 Flash-Lite)
Annual run-rate: $749
Unit cost breakdown
Per call
$0.036
effective, after cache
Per idea
$0.205
5 calls × retries
Per validated trade
$0.684
30% pass rate
2 · Model comparison at these inputs
| Model | Per call | Per idea | Per month | Per validated trade |
|---|---|---|---|---|
| Gemini 2.5 Flash-Litegoogle | $0.00140 | $0.00805 | $2 | $0.027 |
| Gemini 2.5 Flashgoogle | $0.00615 | $0.035 | $11 | $0.118 |
| Claude Haiku 4.5anthropic | $0.012 | $0.068 | $21 | $0.228 |
| GPT-5.4 miniopenai | $0.013 | $0.073 | $22 | $0.244 |
| Gemini 2.5 Progoogle | $0.025 | $0.144 | $43 | $0.479 |
| Gemini 3.5 Flashgoogle | $0.026 | $0.147 | $44 | $0.489 |
| Claude Sonnet 4.6anthropicprimary | $0.036 | $0.205 | $62 | $0.684 |
| o4-mini (reasoning)openai | $0.042 | $0.242 | $72 | $0.805 |
| Claude Opus 4.8anthropic | $0.059 | $0.342 | $103 | $1.14 |
| GPT-5.5openai | $0.085 | $0.489 | $147 | $1.63 |
How the cost flows
effective_call = input × price_in + output × price_out
(Anthropic: cache-hit fraction priced at cache_read)
calls_per_idea = calls × (1 + retry_rate)
cost_per_idea = effective_call × calls_per_idea
cost_per_day = cost_per_idea × ideas_per_day
cost_per_val = cost_per_idea / validation_rate
cost_per_year = cost_per_day × 365Pricing last verified 2026-04-20. See methodology for the full rate table with vendor-page sources.
How to use
Step-by-step
- 1
Enter prompt length (input tokens), expected output length (output tokens), call volume (calls per period), and pick the model. Cache hit rate is optional but commonly skipped — fill it in if you have a high-repeat prefix.
- 2
Read total cost per call and per period. Check input vs. output cost split — long-context, short-output workloads (research) skew to input cost; long-output workloads (content) skew to output.
- 3
Compare across models by switching the dropdown. Sonnet is typically 5x cheaper than Opus at similar quality; Haiku another 4x cheaper than Sonnet.
- 4
Toggle prompt caching. If your input has a stable prefix (system prompt, schema), caching the prefix saves 90% on those tokens.
- 5
Re-run with worst-case retry assumptions. A retry-on-error workflow can 2-4x the cost — important to budget for.
For agents
Use in an agent
Same math, same result shape as the UI above — as a static ES module. No HTTP request, no auth, no rate limit.
import { compute } from "https://aifinhub.io/engines/token-cost-optimizer.js"; Contract: /contracts/token-cost-optimizer.json Full agent guide →
Glossary references
Terms used by this tool
Questions people ask next
FAQ
How are model prices kept current?
The tool reads from a versioned price table that is manually updated when Anthropic, OpenAI, or Google publishes new pricing. The methodology page shows the asOfDate. Prices change infrequently (typically once per quarter); when they change, the new rates apply on the asOfDate forward — historical comparisons use the rates that were in effect at the time.
What's the difference between input and output token cost?
Input tokens are what you send to the model (prompt, system message, tool definitions); output tokens are what the model generates. Output is typically 3-5× more expensive than input. For research loops with long context but short answers, input cost dominates; for content generation it flips.
Does the tool account for prompt caching?
Yes. When the input is a prefix that repeats across many calls, Anthropic prompt caching (and OpenAI prompt caching) discounts the cached tokens by 90%. The tool exposes a 'cached input' field — if you set cache hit rate to 80%, the cost falls accordingly. The methodology page shows the discount factors per provider.
How do I estimate token count from text?
Rule of thumb: 1 token ≈ 4 characters in English, or 0.75 words. The tool uses 4 chars/token as the default but lets you override per-call. For accuracy, use the provider's tokenizer (tiktoken for OpenAI, the Claude tokenizer endpoint, or Gemini's tokenize API). Token counts vary by language: 1 token ≈ 2-3 chars in CJK text.
What's a 'research loop' cost in practice?
A typical agent research loop runs the model 5-30 times per task: tool selection, retrieval, summarization, error recovery. The tool multiplies per-call cost by call volume to estimate total cost per 'idea evaluated'. For high-frequency strategies that re-run the loop hourly, monthly LLM bills can rival market-data subscriptions — that's the use case the tool was built for.
Related deep dive
All articles →Read further
Long-form context behind the tool output.
- Comparison · Benchmark·10 min
Financial QA LLM Benchmarks 2026: FinanceBench & Fin-RATE
Financial QA LLM benchmarks 2026: FinQA, FinanceBench, DocFinQA, and Fin-RATE leaderboard scores, plus whole-filing read costs verified 2026-06-17.
Read - Pillar · Guide·7 min
DeepSeek V4 for Finance 2026: SEC Filing Extraction Cost
DeepSeek V4 for finance 2026: V4-Flash reads a full 10-K for about $0.018 at $0.14/$0.28 per Mtok with a 1M-token window. Legacy model IDs retire July 24.
Read - Pillar · Guide·10 min
The 2026 Engineer's Guide to AI in Markets
An engineer's map of where LLMs, MCP servers, and market-data APIs fit into a 2026 trading stack — and where they still break. Direct, no hype, no grift.
Read
Used in
Decision workflows that use this tool
Goal-driven flows that bundle this tool with adjacent ones.
Complementary tools
Users of this tool often explore
Prompt Regression Tester
Run the same prompt against multiple models (Claude 4.5/4.6/4.7, GPT-5, Gemini 2.5) with your own keys. Diff outputs, score drift, catch regressions.
Data-Vendor TCO Calculator
Compute annual cost of market data across Databento, Polygon, Alpaca, Tiingo, FMP, and Alpha Vantage for your exact universe, bar resolution, history.
Agent Skill Tester for Markets
Paste a SKILL.md definition + sample input + your Anthropic API key. See structured extraction, token cost, and latency — all in your browser. No signup.