Token Cost Optimizer: Worked Examples
Changing one variable at a time against a fixed loop shape is how you isolate where cost actually lives. The loop is held constant: 4,000 input tokens and 1,000 output per call, three calls per idea, 20 ideas per day, 30% validation rate. The scenarios then vary model, caching, and pipeline structure independently. Caching discounts only the input side, and only on Anthropic models here. All figures use published per-million-token list prices.
Worked Examples
See the inputs and outcome together
Each scenario keeps the starting point, the outcome, and the actual lesson in one place so the page reads like a decision notebook, not a data dump.
- 1
Flagship model with caching
Running the loop on Claude Opus 4.7 with a 70 percent cache hit rate on the input prompt and a 10 percent retry rate. This is a quality-first research pipeline.
Cost per call $0.0324, per idea $0.107, per validated trade $0.356, per month $64.15.
Model
Claude Opus 4.7
Input / output tokens per call
4,000 / 1,000
Calls per idea
3
Retry rate
10%
Ideas per day
20
Validation rate
30%
Cache hit rate
70%
Output dominates: at $25 per million the 1,000 output tokens cost $0.025 of the $0.0324 call, while caching cuts the input side to under a cent. On a flagship model your bill is mostly the tokens you generate, not the prompt you send.
- 2
Same loop on a small model
Identical workload moved to Claude Haiku 4.5, keeping the 70 percent cache hit and 10 percent retry rate. The natural choice for filtering and pre-processing.
Cost per call $0.00648, per idea $0.0214, per validated trade $0.0713, per month $12.83.
Model
Claude Haiku 4.5
Input / output tokens per call
4,000 / 1,000
Calls per idea
3
Retry rate
10%
Ideas per day
20
Validation rate
30%
Cache hit rate
70%
Haiku runs the same loop for one fifth the cost of Opus, $12.83 versus $64.15 a month. The five-to-one gap is exactly the output-price ratio ($5 versus $25 per million), since output is the dominant term. Route filtering to Haiku and reserve Opus for the calls that actually need it.
- 3
Cheapest tier, no cache, no retries
The floor of the table: Gemini 2.5 Flash-Lite with no caching and no retries. This is what a throwaway pre-filter or classification pass costs.
Cost per call $0.0008, per idea $0.0024, per validated trade $0.008, per month $1.44.
Model
Gemini 2.5 Flash-Lite
Input / output tokens per call
4,000 / 1,000
Calls per idea
3
Retry rate
0%
Ideas per day
20
Validation rate
30%
Cache hit rate
0%
At $1.44 a month this tier is effectively free for a 20-idea-per-day loop. The lesson is architectural: a cheap model can run a coarse first pass on every idea, and a flagship only touches the survivors, collapsing total spend without losing quality on the calls that matter.
Patterns
Try These Tools
Run the numbers next
Model Selector for Finance
Input task, latency budget, cost budget, context size, and quality sensitivity; get ranked model recommendations with rationale — grounded in published.
Batch vs Real-Time Cost Calculator
Jobs per day, tokens per job, model, deadline — get real-time vs batch cost side-by-side with savings estimate and batch-eligibility flag. Based.
Sources & References
- Anthropic API Pricing — Anthropic (2026)
- Gemini API Pricing — Google (2026)
Related Content
Keep the topic connected
Agent-Cost Envelope
The agent-cost envelope: the loop of (calls × tokens × retries × model_price) that determines the dollar cost of an LLM-driven trading agent per decision.
LLM for Finance Deployment Checklist
A pre-flight checklist for putting a large language model into a finance workflow: scoping, grounding, input security, numerical verification, and drift monitoring.
Regulatory Cost of AI in Finance
Regulatory cost as a function of jurisdiction, model class, and end-use: the FTC vs NLT distinction and the documentation burden by regime.