What's the difference between input and output token cost?

Input tokens are what you send to the model (prompt, system message, tool definitions); output tokens are what the model generates. Output is typically 3-5× more expensive than input. For research loops with long context but short answers, input cost dominates; for content generation it flips.

Does the tool account for prompt caching?

Yes. When the input is a prefix that repeats across many calls, Anthropic prompt caching (and OpenAI prompt caching) discounts the cached tokens by 90%. The tool exposes a 'cached input' field — if you set cache hit rate to 80%, the cost falls accordingly. The methodology page shows the discount factors per provider.

What's a common mistake when using Token-Cost Optimizer?

Pricing in dollars per million tokens without modeling cache hit rate. Cached prompts can be 90% cheaper for the right workload — ignore caching and the bill estimate is way off.

How does cache-hit rate flip which model wins?

Skipping retries in the cost model. A 5% error rate at 3 retries each adds 15% to the bill — bake retry policy into the per-call cost.

AI in Markets Calculator Guide

How to use Token-Cost Optimizer

From prompt length, response length, model choice, retry rate, and call volume, it computes per-decision and monthly token cost across Claude, GPT, and Gemini so you can spot where to trim or switch models.

5 STEPSPublished May 12, 2026Live Content

By Orbyd Editorial · AI Fin Hub Team

Best Next MoveCalculators

Token-Cost Optimizer

Compute the dollar cost of a trading research loop across Claude, GPT, and Gemini. Prompt length × model × retry × call volume → cost per idea and per.

CalculatorOpen ->

On This Page

Overview 5 steps Scenarios FAQ

What It Does

Use the calculator with intent

Builders running LLM workloads at scale who need to know whether the bill comes from prompt length, response length, or sloppy retries.

Interpreting Results

Read the cost-per-call column first; that's the lever you can attack. If response cost dominates, model swap helps most. If prompt cost dominates, prompt caching or shrinking the system message helps more.

Input Steps

Field by field

1

Enter inputs

Enter prompt length (input tokens), expected output length (output tokens), call volume (calls per period), and pick the model. Cache hit rate is optional but commonly skipped — fill it in if you have a high-repeat prefix.
2

Read outputs

Read total cost per call and per period. Check input vs. output cost split — long-context, short-output workloads (research) skew to input cost; long-output workloads (content) skew to output.
3

Compare results

Compare across models by switching the dropdown. Sonnet is typically 5x cheaper than Opus at similar quality; Haiku another 4x cheaper than Sonnet.
4

Toggle setting

Toggle prompt caching. If your input has a stable prefix (system prompt, schema), caching the prefix saves 90% on those tokens.
5

Re-run

Re-run with worst-case retry assumptions. A retry-on-error workflow can 2-4x the cost — important to budget for.

Common Scenarios

Use realistic starting points

High-volume cheap workflow

Prompt tokens

2000

Response tokens

300

Calls per day

50000

Model

Sonnet

Monthly cost dominated by prompt tokens; prompt caching cuts the bill ~80%. Cheaper model (Haiku) reduces it further if quality holds.

Low-volume premium workflow

Prompt tokens

8000

Response tokens

2000

Calls per day

500

Model

Opus

Response tokens drive cost; Sonnet handles most use-cases at ~30% the cost. Reserve Opus for the calls that genuinely need it.

Try These Tools

Run the numbers next

CalculatorsCalculator

Agent Cost Envelope Calculator

Model an LLM research loop end-to-end — steps, tool calls, convergence checks, markets per day — and see per-loop, daily, and monthly cost with cost-cap.

Launch toolOpen ->

CalculatorsCalculator

Batch vs Real-Time Cost Calculator

Jobs per day, tokens per job, model, deadline — get real-time vs batch cost side-by-side with savings estimate and batch-eligibility flag. Based.

Launch toolOpen ->

CalculatorsCalculator

Earnings-Call Summarization Cost Calculator

LLM cost per stock per quarter to summarize earnings transcripts across Sonnet, Opus, GPT-4o, Gemini 2.5 Pro/Flash. Cache-hit-rate aware. Snapshot pricing.

Launch toolOpen ->

FAQ

Questions people ask next

The short answers readers usually want after the first pass.

The tool reads from a versioned price table that is manually updated when Anthropic, OpenAI, or Google publishes new pricing. The methodology page shows the asOfDate. Prices change infrequently (typically once per quarter); when they change, the new rates apply on the asOfDate forward — historical comparisons use the rates that were in effect at the time.

Keep the topic connected

AI in Markets1 FAQS

Agent-Cost Envelope

The agent-cost envelope: the loop of (calls × tokens × retries × model_price) that determines the dollar cost of an LLM-driven trading agent per decision.

Keep readingRead ->

AI in Markets2 FAQS

MCP (Model Context Protocol)

Model Context Protocol: Anthropic's open standard for letting LLMs discover and call tools — the interface, why it matters, and finance MCP server checks.

Keep readingRead ->

Use the calculator with intent

Field by field

Enter inputs

Read outputs

Compare results

Toggle setting

Re-run

Use realistic starting points

High-volume cheap workflow

Low-volume premium workflow

Run the numbers next

Agent Cost Envelope Calculator

Batch vs Real-Time Cost Calculator

Earnings-Call Summarization Cost Calculator

Questions people ask next

Keep the topic connected

Agent-Cost Envelope

MCP (Model Context Protocol)