Skip to main content
aifinhub
AI in Markets Worked Examples

Token Cost Optimizer: Worked Examples

Changing one variable at a time against a fixed loop shape is how you isolate where cost actually lives. The loop is held constant: 4,000 input tokens and 1,000 output per call, three calls per idea, 20 ideas per day, 30% validation rate. The scenarios then vary model, caching, and pipeline structure independently. Caching discounts only the input side, and only on Anthropic models here. All figures use published per-million-token list prices.

By AI Fin Hub Research · AI Fin Hub Team
Best Next MoveCalculators

Token-Cost Optimizer

Compute the dollar cost of a trading research loop across Claude, GPT, and Gemini. Prompt length × model × retry × call volume → cost per idea and per.

CalculatorOpen ->

On This Page

Worked Examples

See the inputs and outcome together

Each scenario keeps the starting point, the outcome, and the actual lesson in one place so the page reads like a decision notebook, not a data dump.

  1. 1

    Flagship model with caching

    Running the loop on Claude Opus 4.7 with a 70 percent cache hit rate on the input prompt and a 10 percent retry rate. This is a quality-first research pipeline.

    Cost per call $0.0324, per idea $0.107, per validated trade $0.356, per month $64.15.

    Model

    Claude Opus 4.7

    Input / output tokens per call

    4,000 / 1,000

    Calls per idea

    3

    Retry rate

    10%

    Ideas per day

    20

    Validation rate

    30%

    Cache hit rate

    70%

    Output dominates: at $25 per million the 1,000 output tokens cost $0.025 of the $0.0324 call, while caching cuts the input side to under a cent. On a flagship model your bill is mostly the tokens you generate, not the prompt you send.

  2. 2

    Same loop on a small model

    Identical workload moved to Claude Haiku 4.5, keeping the 70 percent cache hit and 10 percent retry rate. The natural choice for filtering and pre-processing.

    Cost per call $0.00648, per idea $0.0214, per validated trade $0.0713, per month $12.83.

    Model

    Claude Haiku 4.5

    Input / output tokens per call

    4,000 / 1,000

    Calls per idea

    3

    Retry rate

    10%

    Ideas per day

    20

    Validation rate

    30%

    Cache hit rate

    70%

    Haiku runs the same loop for one fifth the cost of Opus, $12.83 versus $64.15 a month. The five-to-one gap is exactly the output-price ratio ($5 versus $25 per million), since output is the dominant term. Route filtering to Haiku and reserve Opus for the calls that actually need it.

  3. 3

    Cheapest tier, no cache, no retries

    The floor of the table: Gemini 2.5 Flash-Lite with no caching and no retries. This is what a throwaway pre-filter or classification pass costs.

    Cost per call $0.0008, per idea $0.0024, per validated trade $0.008, per month $1.44.

    Model

    Gemini 2.5 Flash-Lite

    Input / output tokens per call

    4,000 / 1,000

    Calls per idea

    3

    Retry rate

    0%

    Ideas per day

    20

    Validation rate

    30%

    Cache hit rate

    0%

    At $1.44 a month this tier is effectively free for a 20-idea-per-day loop. The lesson is architectural: a cheap model can run a coarse first pass on every idea, and a flagship only touches the survivors, collapsing total spend without losing quality on the calls that matter.

Patterns

Output tokens dominate cost on flagship models, so caching the input prompt barely moves the total.
Moving the same loop from Opus to Haiku cut monthly cost five to one, matching the output-price ratio.
The cost-per-validated-trade metric divides total cost by the validation rate, so a low hit rate quietly multiplies your true cost per usable signal.
A two-tier design, cheap model for filtering and flagship for survivors, is the single biggest cost lever this tool surfaces.

Try These Tools

Run the numbers next

Sources & References

Related Content

Keep the topic connected

Planning estimates only — not financial, tax, or investment advice.