How to use Token-Cost Optimizer
From prompt length, response length, model choice, retry rate, and call volume, it computes per-decision and monthly token cost across Claude, GPT, and Gemini so you can spot where to trim or switch models.
What It Does
Use the calculator with intent
From prompt length, response length, model choice, retry rate, and call volume, it computes per-decision and monthly token cost across Claude, GPT, and Gemini so you can spot where to trim or switch models.
Builders running LLM workloads at scale who need to know whether the bill comes from prompt length, response length, or sloppy retries.
Interpreting Results
Read the cost-per-call column first; that's the lever you can attack. If response cost dominates, model swap helps most. If prompt cost dominates, prompt caching or shrinking the system message helps more.
Input Steps
Field by field
- 1
Enter inputs
Enter prompt length (input tokens), expected output length (output tokens), call volume (calls per period), and pick the model. Cache hit rate is optional but commonly skipped — fill it in if you have a high-repeat prefix.
- 2
Read outputs
Read total cost per call and per period. Check input vs. output cost split — long-context, short-output workloads (research) skew to input cost; long-output workloads (content) skew to output.
- 3
Compare results
Compare across models by switching the dropdown. Sonnet is typically 5x cheaper than Opus at similar quality; Haiku another 4x cheaper than Sonnet.
- 4
Toggle setting
Toggle prompt caching. If your input has a stable prefix (system prompt, schema), caching the prefix saves 90% on those tokens.
- 5
Re-run
Re-run with worst-case retry assumptions. A retry-on-error workflow can 2-4x the cost — important to budget for.
Common Scenarios
Use realistic starting points
High-volume cheap workflow
Prompt tokens
2000
Response tokens
300
Calls per day
50000
Model
Sonnet
Monthly cost dominated by prompt tokens; prompt caching cuts the bill ~80%. Cheaper model (Haiku) reduces it further if quality holds.
Low-volume premium workflow
Prompt tokens
8000
Response tokens
2000
Calls per day
500
Model
Opus
Response tokens drive cost; Sonnet handles most use-cases at ~30% the cost. Reserve Opus for the calls that genuinely need it.
Try These Tools
Run the numbers next
Agent Cost Envelope Calculator
Model an LLM research loop end-to-end — steps, tool calls, convergence checks, markets per day — and see per-loop, daily, and monthly cost with cost-cap.
Batch vs Real-Time Cost Calculator
Jobs per day, tokens per job, model, deadline — get real-time vs batch cost side-by-side with savings estimate and batch-eligibility flag. Based.
Earnings-Call Summarization Cost Calculator
LLM cost per stock per quarter to summarize earnings transcripts across Sonnet, Opus, GPT-4o, Gemini 2.5 Pro/Flash. Cache-hit-rate aware. Snapshot pricing.
FAQ
Questions people ask next
The short answers readers usually want after the first pass.
Related Content
Keep the topic connected
Agent-Cost Envelope
The agent-cost envelope: the loop of (calls × tokens × retries × model_price) that determines the dollar cost of an LLM-driven trading agent per decision.
MCP (Model Context Protocol)
Model Context Protocol: Anthropic's open standard for letting LLMs discover and call tools — the interface, why it matters, and finance MCP server checks.