At 150 markets a day, a finance research agent blows a $500/month budget by 1.72× before you touch the model choice. For that loop on Claude Sonnet 4.6 (8,000 input + 1,500 output tokens per step, 5 steps per loop, 60% convergence-check rate), the Agent Cost Envelope Calculator returns cost-per-loop $0.260, cost-per-day $39.06, and cost-per-month $859.32. Lifting the convergence check from 60% to 75% raises cost to $882.34 (worse, because more checks). Switching from Sonnet to Opus 4.7 at the same 60% convergence raises cost 1.67× to $1,432.20. The binding constraint at 150 markets a day is not model choice, it is the number of steps multiplied by the per-step token shape.
TL;DR
Three runs of the cost envelope, same workload (150 markets × 5 steps × weekdays = 22 trading days):
| Configuration | Cost/loop | Cost/day | Cost/month | Within $500 budget? |
|---|---|---|---|---|
| Sonnet 4.6, 60% convergence | $0.2604 | $39.06 | $859.32 | No (1.72×) |
| Sonnet 4.6, 75% convergence | $0.2674 | $40.11 | $882.34 | No (1.76×) |
| Opus 4.7, 60% convergence | $0.4340 | $65.10 | $1,432.20 | No (2.86×) |
Sonnet 4.6 at 60% convergence is the cheapest of the three but still over budget. The actionable axes are: cut steps_per_loop from 5 to 3, cut input_tokens_per_step from 8,000 to 4,000, or move to a cheaper model (the engine reports both Sonnet and Opus; Haiku and Gemini Flash live in the Token Cost Optimizer lookup).
The cost decomposition
For the Sonnet 4.6 / 60% convergence run, the engine returns:
- Cost per loop: $0.2604. This is the per-market all-in cost. 5 steps × ($0.024 input + $0.0225 output) per step = $0.2325 in tool-use steps, plus $0.0279 in convergence-check cost. Convergence-check fires at the 60% rate (one check per 0.6 expected loops) and adds incrementally.
- Cost per day: $39.06. 150 markets × $0.2604 per loop. The $39.06 number is the daily run rate at full coverage.
- Cost per month: $859.32. 22 trading days × $39.06. The 22-day convention is weekdays-only; the engine accepts a calendar_mode parameter for 30-day continuous workloads.
- Tokens per loop: 53,200. 5 steps × (8,000 input + 1,500 output) + convergence overhead.
- Blended USD per 1k tokens: $0.00489. This is the effective Sonnet rate after the engine's cache-aware blending (cache reads at 10× discount on Anthropic models).
Why lifting convergence makes it worse, not better
A naïve reading: "convergence-check is a sanity filter, run it more often to catch bad outputs." The engine's output contradicts that. Lifting the convergence rate from 60% to 75% adds $23.02 monthly ($882.34 − $859.32), about $276 over a year. The increment is small but it moves the wrong direction: more checks cost more, and the engine surfaces that cost directly.
The convergence-check architecture decision is when to run it, not how often. A convergence check that fires only on low-confidence outputs (say, when the structured-output adherence score drops below 0.7) costs the same as one that fires at 60% rate but catches more meaningful errors. The engine's single-knob model is a simplification; production implementations should fire convergence conditionally.
For a solo retail loop at 150 markets a day, the practical pattern is: run convergence at 100% rate for the first 200 loops to calibrate the confidence-score threshold, then drop convergence to 5–10% rate (conditional on low-confidence flags) for the remainder.
The model-swap arithmetic
The Sonnet → Opus 4.7 swap at 60% convergence: cost-per-loop $0.260 → $0.434, a 1.67× multiplier (consistent with the input/output rate ratio $5/$25 vs $3/$15). The engine accepts this faithfully; the question is whether the 1.67× cost buys 1.67× value.
For a 150-market daily research loop, the value differential between Sonnet and Opus is task-specific. On extraction tasks the gap is small (under 5% absolute accuracy difference on most retail benchmarks). On price-blind multi-step reasoning the gap widens (15–25% accuracy difference on adversarial setups). The defensible procedure is to evaluate the model on a representative 50-task subset before committing to a 30-day run; see Bounded Cost Agentic Research for the eval protocol.
The five-step constraint
5 steps per loop is the binding constraint on cost at this scale. Each step at 8,000 input + 1,500 output tokens costs $0.0465 on Sonnet. Five steps × $0.0465 = $0.2325, 89% of the per-loop tool-use cost. Cutting steps from 5 to 3 drops per-loop tool-use to $0.140, per-month to ~$540. That moves the workload within budget without changing model or convergence.
Steps reduction is the highest-yield optimization axis on the engine output. The architectural question is which two steps to remove: typically the candidate is "deep reasoning" steps that the engine implements as tool-use but whose value is bounded by the first three steps' output quality. A retail workflow that pares to 3 steps (gather, structure, conclude) and uses convergence selectively often ships at half the cost.
The token shape
8,000 input + 1,500 output tokens per step is a typical "long-context summarization" shape. The cost is dominated by input tokens, 8,000 × $3/Mtok = $0.024 vs 1,500 × $15/Mtok = $0.0225. They are roughly balanced because Sonnet's output-to-input rate ratio is 5×.
For a workload that could shape itself as 16,000 input + 250 output (deeper context, terser output), the per-step cost becomes 16,000 × $3/Mtok = $0.048 vs 250 × $15/Mtok = $0.00375. The reshaped step is 1.4× the cost but operates on 2× the input — which can be cheaper per unit of context processed if the workflow runs fewer steps.
The reshape only pays back if the output is dense enough to drop a downstream step. On a 150-market loop the budget pressure justifies aggressive output compression: prefer JSON over prose, prefer enumerated decisions over free-form reasoning. The Token Cost Optimizer lets you sweep the (input, output) shape against the workload's marginal cost.
Where the engine breaks
The engine assumes the steps are sequential and converged. A real agent that retries on tool-error, branches on intermediate output, or uses extended-thinking budgets above the per-step token count will see 1.3–2.5× the engine's reported cost. Production budgets should add a 30–50% buffer on top of the engine number.
The engine also models cache as a single per-call hit rate. For a 150-market loop the cache architecture matters: per-market prompts share little context (each market is independent), per-step system prompts share a lot (the agent's instruction block is identical). A 60% blended cache hit rate is plausible if the system prompt is well-cached; closer to 30% if only the system prompt caches and the per-market context evicts.
Convergence cost is modeled as a per-loop overhead with a fixed share of the total tokens. Production convergence implementations vary widely: structured-output validation costs almost nothing; LLM-as-judge convergence costs as much as a full step. The engine's single-line cost is a midpoint estimate.
The strategic question
If 150 markets a day costs $859/month at Sonnet and the target is $500, three architectural choices each cut cost to within budget:
- Cut market count. 150 → 90 markets a day at the same loop cost: $39.06 × (90/150) × 22 = $515/month. The workflow shrinks; coverage drops.
- Cut steps per loop. 5 → 3 steps at 150 markets: ~$540/month. The workflow narrows; per-market depth drops.
- Switch model. Sonnet 4.6 → Claude Haiku 4.5 at 150 markets, 5 steps: ~$280/month using the Token Cost Optimizer rate. The workflow keeps coverage; quality drops.
There is no fourth option at this scale. Each cut is a coverage / depth / quality trade-off. The engine's value is that it makes the trade-offs explicit in dollars, not in vibes.
Connects to
- Bounded Cost Agentic Research — the eval protocol for sub-step-count decisions.
- Inference Cost Attribution per Trade — how to amortize loop cost to per-validated-trade cost.
- Cost per Validated Trade Framework — the broader unit economics.
- Agent Cost Envelope Calculator — engine endpoint.
- Token Cost Optimizer — for sweeping cheaper-model alternatives.
- Model Selector for Finance — the tier-selection layer above the cost-envelope analysis.
References
- Anthropic. "Claude Pricing." anthropic.com, accessed 2026-05-21. https://www.anthropic.com/pricing
- OpenAI. "API Pricing." openai.com, accessed 2026-05-21. https://openai.com/api/pricing/
- Anthropic. "Prompt caching." docs.anthropic.com/en/docs/build-with-claude/prompt-caching, accessed 2026-05-21.
- Anthropic. "Agent SDK overview." docs.anthropic.com/en/docs/build-with-claude/agent-sdk, accessed 2026-05-21. Reference for the cost-envelope step model.
- BIS. "Cost-efficiency in technology systems for banks." BIS Working Paper, 2023. Reference for tooling-cost-per-decision benchmarks.
Verified engine output
Show the recompute-verified inputs and outputs
| model_id | claude-sonnet-4-6 |
|---|---|
| input_tokens_per_step | 8000 |
| output_tokens_per_step | 1500 |
| steps_per_loop | 5 |
| convergence_check_pct | 60 |
| markets_per_day | 150 |
| target_monthly_usd | 500 |
| calendar_mode | business |
| model › id | claude-sonnet-4-6 |
|---|---|
| model › provider | anthropic |
| model › name | Claude Sonnet 4.6 |
| model › tier | mid |
| model › input usd per mtoken | 3 |
| model › output usd per mtoken | 15 |
| model › cache read usd per mtoken | 0.3 |
| model › context window | 500000 |
| model › notes | Default pick for bulk research loops. |
| steps › row 1 › label | Step 1 — tool call + reasoning |
| steps › row 1 › kind | step |
| steps › row 1 › input cost | 0.024 |
| steps › row 1 › output cost | 0.0225 |
| steps › row 1 › total cost | 0.0465 |
| steps › row 2 › label | Step 2 — tool call + reasoning |
| steps › row 2 › kind | step |
| steps › row 2 › input cost | 0.024 |
| steps › row 2 › output cost | 0.0225 |
| steps › row 2 › total cost | 0.0465 |
| steps › row 3 › label | Step 3 — tool call + reasoning |
| steps › row 3 › kind | step |
| steps › row 3 › input cost | 0.024 |
| steps › row 3 › output cost | 0.0225 |
| steps › row 3 › total cost | 0.0465 |
| steps › row 4 › label | Step 4 — tool call + reasoning |
| steps › row 4 › kind | step |
| steps › row 4 › input cost | 0.024 |
| steps › row 4 › output cost | 0.0225 |
| steps › row 4 › total cost | 0.0465 |
| steps › row 5 › label | Step 5 — tool call + reasoning |
| steps › row 5 › kind | step |
| steps › row 5 › input cost | 0.024 |
| steps › row 5 › output cost | 0.0225 |
| steps › row 5 › total cost | 0.0465 |
| steps › row 6 › label | Convergence check — final analysis |
| steps › row 6 › kind | convergence |
| steps › row 6 › input cost | 0.0144 |
| steps › row 6 › output cost | 0.0135 |
| steps › row 6 › total cost | 0.027899999999999998 |
| cost per loop | 0.26039999999999996 |
| tool use subtotal | 0.23249999999999998 |
| convergence cost | 0.027899999999999998 |
| cost per day | 39.059999999999995 |
| cost per month | 859.3199999999999 |
| days per month | 22 |
| tokens per loop | 53200 |
| blended usd per1 ktokens | 0.0048947368421052625 |
| within budget | false |
| budget utilization | 1.71864 |
Computed live at build time.
Frequently asked questions
- Why does the engine return $859/month when 150 × $0.26 × 22 = $858?
- Floating-point and convergence-cost rounding. The engine returns budgetUtilization = 1.7186 with full-precision arithmetic; the dollar display rounds.
- Is the 22-day calendar correct for crypto markets?
- No. For crypto use calendar_mode = continuous (30 days). The canonical input here is weekdays so equity-market timing applies; crypto re-runs at 30/22 = 1.36× the monthly cost.
- Should I trust the convergence-cost portion?
- It is a midpoint model. Production convergence varies from near-zero (schema-only validation) to step-equivalent (LLM-as-judge). Calibrate against your own implementation.
- What's the realistic floor on cost-per-loop at this token shape?
- The cheapest tier, Gemini 2.5 Flash-Lite, lands near $0.008 per loop (~$26/month) on the 8,000/1,500 shape — within the $500 budget. Gemini 2.5 Flash is ~$114/month and Haiku 4.5 ~$286/month, both under budget. The expensive tiers (Sonnet $859, Opus $1,432) blow through it; where you cannot drop to a cheap model on quality grounds, the token shape is the real lever.
- How sensitive is the cost to input tokens per step?
- Linearly. Cutting input tokens from 8,000 to 4,000 drops monthly cost by about 30%; doubling them roughly doubles the input-cost contribution which is 60% of total.