Skip to main content
aifinhub

Calculator

Token-Cost Optimizer

LLM trading research loop cost calculator. Prompt length × model × retry × call volume → dollar cost per idea and per validated trade. Browser-only. Free.

Transparent by design — computed in your browser from a published formula and sourced rates, not a black box. Data verified May 25, 2026. Sources: Anthropic pricing ↗ · OpenAI pricing ↗ · Google AI / Gemini pricing ↗ Full methodology →

Inputs
Form inputs / CSV
Runtime
Instant
Privacy
Client-side · no upload
API key
Not required
Methodology
Open →

Education · Not investment advice. BaFin/EU framework. Past performance does not indicate future results. Editorial standards Sponsor disclosure Corrections

1 · Configure your research loop

15%
30%
50%

Monthly cost

$62

Claude Sonnet 4.6 · cache hit 50% · 25.5× cheapest (Gemini 2.5 Flash-Lite)

Annual run-rate: $749

Unit cost breakdown

Per call

$0.036

effective, after cache

Per idea

$0.205

5 calls × retries

Per validated trade

$0.684

30% pass rate

2 · Model comparison at these inputs

ModelPer callPer ideaPer monthPer validated trade
Gemini 2.5 Flash-Litegoogle$0.00140$0.00805$2$0.027
Gemini 2.5 Flashgoogle$0.00615$0.035$11$0.118
Claude Haiku 4.5anthropic$0.012$0.068$21$0.228
GPT-5.4 miniopenai$0.013$0.073$22$0.244
Gemini 2.5 Progoogle$0.025$0.144$43$0.479
Gemini 3.5 Flashgoogle$0.026$0.147$44$0.489
Claude Sonnet 4.6anthropicprimary$0.036$0.205$62$0.684
o4-mini (reasoning)openai$0.042$0.242$72$0.805
Claude Opus 4.8anthropic$0.059$0.342$103$1.14
GPT-5.5openai$0.085$0.489$147$1.63

How the cost flows

effective_call   = input × price_in + output × price_out
                   (Anthropic: cache-hit fraction priced at cache_read)
calls_per_idea   = calls × (1 + retry_rate)
cost_per_idea    = effective_call × calls_per_idea
cost_per_day     = cost_per_idea × ideas_per_day
cost_per_val     = cost_per_idea / validation_rate
cost_per_year    = cost_per_day × 365

Pricing last verified 2026-04-20. See methodology for the full rate table with vendor-page sources.

How to use

Step-by-step

Full calculator guide →
  1. 1

    Enter prompt length (input tokens), expected output length (output tokens), call volume (calls per period), and pick the model. Cache hit rate is optional but commonly skipped — fill it in if you have a high-repeat prefix.

  2. 2

    Read total cost per call and per period. Check input vs. output cost split — long-context, short-output workloads (research) skew to input cost; long-output workloads (content) skew to output.

  3. 3

    Compare across models by switching the dropdown. Sonnet is typically 5x cheaper than Opus at similar quality; Haiku another 4x cheaper than Sonnet.

  4. 4

    Toggle prompt caching. If your input has a stable prefix (system prompt, schema), caching the prefix saves 90% on those tokens.

  5. 5

    Re-run with worst-case retry assumptions. A retry-on-error workflow can 2-4x the cost — important to budget for.

For agents

Use in an agent

Same math, same result shape as the UI above — as a static ES module. No HTTP request, no auth, no rate limit.

import { compute } from "https://aifinhub.io/engines/token-cost-optimizer.js";

Contract: /contracts/token-cost-optimizer.json Full agent guide →

Glossary references

Terms used by this tool

All glossary →

Questions people ask next

FAQ

How are model prices kept current?

The tool reads from a versioned price table that is manually updated when Anthropic, OpenAI, or Google publishes new pricing. The methodology page shows the asOfDate. Prices change infrequently (typically once per quarter); when they change, the new rates apply on the asOfDate forward — historical comparisons use the rates that were in effect at the time.

What's the difference between input and output token cost?

Input tokens are what you send to the model (prompt, system message, tool definitions); output tokens are what the model generates. Output is typically 3-5× more expensive than input. For research loops with long context but short answers, input cost dominates; for content generation it flips.

Does the tool account for prompt caching?

Yes. When the input is a prefix that repeats across many calls, Anthropic prompt caching (and OpenAI prompt caching) discounts the cached tokens by 90%. The tool exposes a 'cached input' field — if you set cache hit rate to 80%, the cost falls accordingly. The methodology page shows the discount factors per provider.

How do I estimate token count from text?

Rule of thumb: 1 token ≈ 4 characters in English, or 0.75 words. The tool uses 4 chars/token as the default but lets you override per-call. For accuracy, use the provider's tokenizer (tiktoken for OpenAI, the Claude tokenizer endpoint, or Gemini's tokenize API). Token counts vary by language: 1 token ≈ 2-3 chars in CJK text.

What's a 'research loop' cost in practice?

A typical agent research loop runs the model 5-30 times per task: tool selection, retrieval, summarization, error recovery. The tool multiplies per-call cost by call volume to estimate total cost per 'idea evaluated'. For high-frequency strategies that re-run the loop hourly, monthly LLM bills can rival market-data subscriptions — that's the use case the tool was built for.

Complementary tools

Planning estimates only — not financial, tax, or investment advice.