Why does the engine return $859/month when 150 × $0.26 × 22 = $858?

Floating-point and convergence-cost rounding. The engine returns budgetUtilization = 1.7186 with full-precision arithmetic; the dollar display rounds.

Is the 22-day calendar correct for crypto markets?

No. For crypto use calendar_mode = continuous (30 days). The canonical input here is weekdays so equity-market timing applies; crypto re-runs at 30/22 = 1.36× the monthly cost.

Should I trust the convergence-cost portion?

It is a midpoint model. Production convergence varies from near-zero (schema-only validation) to step-equivalent (LLM-as-judge). Calibrate against your own implementation.

What's the realistic floor on cost-per-loop at this token shape?

The cheapest tier, Gemini 2.5 Flash-Lite, lands near $0.008 per loop (~$26/month) on the 8,000/1,500 shape — within the $500 budget. Gemini 2.5 Flash is ~$114/month and Haiku 4.5 ~$286/month, both under budget. The expensive tiers (Sonnet $859, Opus $1,432) blow through it; where you cannot drop to a cheap model on quality grounds, the token shape is the real lever.

How sensitive is the cost to input tokens per step?

Linearly. Cutting input tokens from 8,000 to 4,000 drops monthly cost by about 30%; doubling them roughly doubles the input-cost contribution which is 60% of total.

Agent Cost Envelope: 150 Markets a Day, Three Configurations

At 150 markets a day, a finance research agent blows a $500/month budget by 1.72× before you touch the model choice. For that loop on Claude Sonnet 4.6 (8,000 input + 1,500 output tokens per step, 5 steps per loop, 60% convergence-check rate), the Agent Cost Envelope Calculator returns cost-per-loop $0.260, cost-per-day $39.06, and cost-per-month $859.32. Lifting the convergence check from 60% to 75% raises cost to $882.34 (worse, because more checks). Switching from Sonnet to Opus 4.8 at the same 60% convergence raises cost 1.67× to $1,432.20. The binding constraint at 150 markets a day is not model choice, it is the number of steps multiplied by the per-step token shape.

TL;DR

Three runs of the cost envelope, same workload (150 markets × 5 steps × weekdays = 22 trading days):

Configuration	Cost/loop	Cost/day	Cost/month	Within $500 budget?
Sonnet 4.6, 60% convergence	$0.2604	$39.06	$859.32	No (1.72×)
Sonnet 4.6, 75% convergence	$0.2674	$40.11	$882.34	No (1.76×)
Opus 4.8, 60% convergence	$0.4340	$65.10	$1,432.20	No (2.86×)

Sonnet 4.6 at 60% convergence is the cheapest of the three but still over budget. The actionable axes are: cut steps_per_loop from 5 to 3, cut input_tokens_per_step from 8,000 to 4,000, or move to a cheaper model (the engine reports both Sonnet and Opus; Haiku and Gemini Flash live in the Token Cost Optimizer lookup).

The cost decomposition

For the Sonnet 4.6 / 60% convergence run, the engine returns:

Cost per loop: $0.2604. This is the per-market all-in cost. 5 steps × ($0.024 input + $0.0225 output) per step = $0.2325 in tool-use steps, plus $0.0279 in convergence-check cost. Convergence-check fires at the 60% rate (one check per 0.6 expected loops) and adds incrementally.
Cost per day: $39.06. 150 markets × $0.2604 per loop. The $39.06 number is the daily run rate at full coverage.
Cost per month: $859.32. 22 trading days × $39.06. The 22-day convention is weekdays-only; the engine accepts a calendar_mode parameter for 30-day continuous workloads.
Tokens per loop: 53,200. 5 steps × (8,000 input + 1,500 output) + convergence overhead.
Blended USD per 1k tokens: $0.00489. This is the effective Sonnet rate after the engine's cache-aware blending (cache reads at 10× discount on Anthropic models).

Why lifting convergence makes it worse, not better

A naïve reading: "convergence-check is a sanity filter, run it more often to catch bad outputs." The engine's output contradicts that. Lifting the convergence rate from 60% to 75% adds $23.02 monthly ($882.34 − $859.32), about $276 over a year. The increment is small but it moves the wrong direction: more checks cost more, and the engine surfaces that cost directly.

The convergence-check architecture decision is when to run it, not how often. A convergence check that fires only on low-confidence outputs (say, when the structured-output adherence score drops below 0.7) costs the same as one that fires at 60% rate but catches more meaningful errors. The engine's single-knob model is a simplification; production implementations should fire convergence conditionally.

For a solo retail loop at 150 markets a day, the practical pattern is: run convergence at 100% rate for the first 200 loops to calibrate the confidence-score threshold, then drop convergence to 5–10% rate (conditional on low-confidence flags) for the remainder.

The model-swap arithmetic

The Sonnet → Opus 4.8 swap at 60% convergence: cost-per-loop $0.260 → $0.434, a 1.67× multiplier (consistent with the input/output rate ratio $5/$25 vs $3/$15). The engine accepts this faithfully; the question is whether the 1.67× cost buys 1.67× value.

For a 150-market daily research loop, the value differential between Sonnet and Opus is task-specific. On extraction tasks the gap is small (under 5% absolute accuracy difference on most retail benchmarks). On price-blind multi-step reasoning the gap widens (15–25% accuracy difference on adversarial setups). The defensible procedure is to evaluate the model on a representative 50-task subset before committing to a 30-day run; see Bounded Cost Agentic Research for the eval protocol.

The five-step constraint

5 steps per loop is the binding constraint on cost at this scale. Each step at 8,000 input + 1,500 output tokens costs $0.0465 on Sonnet. Five steps × $0.0465 = $0.2325, 89% of the per-loop tool-use cost. Cutting steps from 5 to 3 drops per-loop tool-use to $0.140, per-month to ~$540. That moves the workload within budget without changing model or convergence.

Steps reduction is the highest-yield optimization axis on the engine output. The architectural question is which two steps to remove: typically the candidate is "deep reasoning" steps that the engine implements as tool-use but whose value is bounded by the first three steps' output quality. A retail workflow that pares to 3 steps (gather, structure, conclude) and uses convergence selectively often ships at half the cost.

The token shape

8,000 input + 1,500 output tokens per step is a typical "long-context summarization" shape. The cost is dominated by input tokens, 8,000 × $3/Mtok = $0.024 vs 1,500 × $15/Mtok = $0.0225. They are roughly balanced because Sonnet's output-to-input rate ratio is 5×.

For a workload that could shape itself as 16,000 input + 250 output (deeper context, terser output), the per-step cost becomes 16,000 × $3/Mtok = $0.048 vs 250 × $15/Mtok = $0.00375. The reshaped step is 1.4× the cost but operates on 2× the input — which can be cheaper per unit of context processed if the workflow runs fewer steps.

The reshape only pays back if the output is dense enough to drop a downstream step. On a 150-market loop the budget pressure justifies aggressive output compression: prefer JSON over prose, prefer enumerated decisions over free-form reasoning. The Token Cost Optimizer lets you sweep the (input, output) shape against the workload's marginal cost.

Where the engine breaks

The engine assumes the steps are sequential and converged. A real agent that retries on tool-error, branches on intermediate output, or uses extended-thinking budgets above the per-step token count will see 1.3–2.5× the engine's reported cost. Production budgets should add a 30–50% buffer on top of the engine number.

The engine also models cache as a single per-call hit rate. For a 150-market loop the cache architecture matters: per-market prompts share little context (each market is independent), per-step system prompts share a lot (the agent's instruction block is identical). A 60% blended cache hit rate is plausible if the system prompt is well-cached; closer to 30% if only the system prompt caches and the per-market context evicts.

Convergence cost is modeled as a per-loop overhead with a fixed share of the total tokens. Production convergence implementations vary widely: structured-output validation costs almost nothing; LLM-as-judge convergence costs as much as a full step. The engine's single-line cost is a midpoint estimate.

The strategic question

If 150 markets a day costs $859/month at Sonnet and the target is $500, three architectural choices each cut cost to within budget:

Cut market count. 150 → 90 markets a day at the same loop cost: $39.06 × (90/150) × 22 = $515/month. The workflow shrinks; coverage drops.
Cut steps per loop. 5 → 3 steps at 150 markets: ~$540/month. The workflow narrows; per-market depth drops.
Switch model. Sonnet 4.6 → Claude Haiku 4.5 at 150 markets, 5 steps: ~$280/month using the Token Cost Optimizer rate. The workflow keeps coverage; quality drops.

There is no fourth option at this scale. Each cut is a coverage / depth / quality trade-off. The engine's value is that it makes the trade-offs explicit in dollars, not in vibes.

Connects to

Bounded Cost Agentic Research — the eval protocol for sub-step-count decisions.
Inference Cost Attribution per Trade — how to amortize loop cost to per-validated-trade cost.
Cost per Validated Trade Framework — the broader unit economics.
Agent Cost Envelope Calculator — engine endpoint.
Token Cost Optimizer — for sweeping cheaper-model alternatives.
Model Selector for Finance — the tier-selection layer above the cost-envelope analysis.

References

Anthropic. "Claude Pricing." anthropic.com, accessed 2026-06-18. https://www.anthropic.com/pricing
OpenAI. "API Pricing." openai.com, accessed 2026-05-21. https://openai.com/api/pricing/
Anthropic. "Prompt caching." docs.anthropic.com/en/docs/build-with-claude/prompt-caching, accessed 2026-05-21.
Anthropic. "Agent SDK overview." docs.anthropic.com/en/docs/build-with-claude/agent-sdk, accessed 2026-05-21. Reference for the cost-envelope step model.
BIS. "Cost-efficiency in technology systems for banks." BIS Working Paper, 2023. Reference for tooling-cost-per-decision benchmarks.

Verified engine output

Show the recompute-verified inputs and outputs

Inputs
model_id	claude-sonnet-4-6
input_tokens_per_step	8000
output_tokens_per_step	1500
steps_per_loop	5
convergence_check_pct	60
markets_per_day	150
target_monthly_usd	500
calendar_mode	business

Result
model › id	claude-sonnet-4-6
model › provider	anthropic
model › name	Claude Sonnet 4.6
model › tier	mid
model › input usd per mtoken	3
model › output usd per mtoken	15
model › cache read usd per mtoken	0.3
model › context window	500000
model › notes	Default pick for bulk research loops.
steps › row 1 › label	Step 1 — tool call + reasoning
steps › row 1 › kind	step
steps › row 1 › input cost	0.024
steps › row 1 › output cost	0.0225
steps › row 1 › total cost	0.0465
steps › row 2 › label	Step 2 — tool call + reasoning
steps › row 2 › kind	step
steps › row 2 › input cost	0.024
steps › row 2 › output cost	0.0225
steps › row 2 › total cost	0.0465
steps › row 3 › label	Step 3 — tool call + reasoning
steps › row 3 › kind	step
steps › row 3 › input cost	0.024
steps › row 3 › output cost	0.0225
steps › row 3 › total cost	0.0465
steps › row 4 › label	Step 4 — tool call + reasoning
steps › row 4 › kind	step
steps › row 4 › input cost	0.024
steps › row 4 › output cost	0.0225
steps › row 4 › total cost	0.0465
steps › row 5 › label	Step 5 — tool call + reasoning
steps › row 5 › kind	step
steps › row 5 › input cost	0.024
steps › row 5 › output cost	0.0225
steps › row 5 › total cost	0.0465
steps › row 6 › label	Convergence check — final analysis
steps › row 6 › kind	convergence
steps › row 6 › input cost	0.0144
steps › row 6 › output cost	0.0135
steps › row 6 › total cost	0.027899999999999998
cost per loop	0.26039999999999996
tool use subtotal	0.23249999999999998
convergence cost	0.027899999999999998
cost per day	39.059999999999995
cost per month	859.3199999999999
days per month	22
tokens per loop	53200
blended usd per1 ktokens	0.0048947368421052625
within budget	false
budget utilization	1.71864

Computed live at build time.

Frequently asked questions

Why does the engine return $859/month when 150 × $0.26 × 22 = $858?: Floating-point and convergence-cost rounding. The engine returns budgetUtilization = 1.7186 with full-precision arithmetic; the dollar display rounds.
Is the 22-day calendar correct for crypto markets?: No. For crypto use calendar_mode = continuous (30 days). The canonical input here is weekdays so equity-market timing applies; crypto re-runs at 30/22 = 1.36× the monthly cost.
Should I trust the convergence-cost portion?: It is a midpoint model. Production convergence varies from near-zero (schema-only validation) to step-equivalent (LLM-as-judge). Calibrate against your own implementation.
What's the realistic floor on cost-per-loop at this token shape?: The cheapest tier, Gemini 2.5 Flash-Lite, lands near $0.008 per loop (~$26/month) on the 8,000/1,500 shape — within the $500 budget. Gemini 2.5 Flash is ~$114/month and Haiku 4.5 ~$286/month, both under budget. The expensive tiers (Sonnet $859, Opus $1,432) blow through it; where you cannot drop to a cheap model on quality grounds, the token shape is the real lever.
How sensitive is the cost to input tokens per step?: Linearly. Cutting input tokens from 8,000 to 4,000 drops monthly cost by about 30%; doubling them roughly doubles the input-cost contribution which is 60% of total.