Batch vs Realtime LLM Cost: Examples
The 50% batch discount exists only when the deadline allows it. A workload specified by model, jobs per day, tokens per job, and deadline in hours pays real-time rates the moment its deadline falls under 24 hours, regardless of volume. These scenarios show how the same workload flips from half-price to full-price on that single variable. The savings figure is real only when the deadline permits batch.
Worked Examples
See the inputs and outcome together
Each scenario keeps the starting point, the outcome, and the actual lesson in one place so the page reads like a decision notebook, not a data dump.
- 1
Overnight batch on a flagship model
Scoring 1,000 documents a day on Claude Opus 4.7 with a comfortable 24-hour deadline. The textbook case for batch processing.
Real-time $35/day, batch $17.50/day, savings $525/month. Batch is used.
Model
Claude Opus 4.7
Jobs per day
1,000
Input / output tokens per job
3,000 / 800
Deadline
24 hours
The deadline equals the batch SLA, so the job qualifies and you halve the bill to $17.50 a day, $525 saved per month. Any workload that can wait overnight should default to batch; the discount is free money for non-urgent jobs.
- 2
Same job, one-hour deadline
Identical workload, but now the output is needed within an hour, for example a live screening step. The deadline is tighter than the batch SLA.
Effective cost $35/day, savings $0. Real-time required; batch not eligible.
Model
Claude Opus 4.7
Jobs per day
1,000
Input / output tokens per job
3,000 / 800
Deadline
1 hour
The batch price is still $17.50 in theory, but a one-hour deadline cannot wait for a 24-hour SLA, so the effective cost stays at the full $35. The discount is not a pricing choice; it is gated entirely by whether your latency budget allows it.
- 3
High-volume small model, twelve-hour deadline
Classifying 5,000 items a day on Claude Haiku 4.5 with a 12-hour deadline. High volume, cheap model, but the deadline is still under the SLA.
Effective cost $22.50/day, savings $0. Real-time required; 12h is below the 24h SLA.
Model
Claude Haiku 4.5
Jobs per day
5,000
Input / output tokens per job
2,000 / 500
Deadline
12 hours
A 12-hour deadline feels generous but still falls short of the 24-hour batch SLA, so the $22.50 daily cost cannot be halved. If you can stretch the deadline to 24 hours you cut this to $11.25; the cheapest optimization here is patience, not a model swap.
Patterns
Try These Tools
Run the numbers next
Token-Cost Optimizer
Compute the dollar cost of a trading research loop across Claude, GPT, and Gemini. Prompt length × model × retry × call volume → cost per idea and per.
Agent Cost Envelope Calculator
Model an LLM research loop end-to-end — steps, tool calls, convergence checks, markets per day — and see per-loop, daily, and monthly cost with cost-cap.
Sources & References
- Message Batches API — Anthropic (2026)
- OpenAI Batch API — OpenAI (2026)
Related Content
Keep the topic connected
Agent-Cost Envelope
The agent-cost envelope: the loop of (calls × tokens × retries × model_price) that determines the dollar cost of an LLM-driven trading agent per decision.
MCP (Model Context Protocol)
Model Context Protocol: Anthropic's open standard for letting LLMs discover and call tools — the interface, why it matters, and finance MCP server checks.
LLM for Finance Deployment Checklist
A pre-flight checklist for putting a large language model into a finance workflow: scoping, grounding, input security, numerical verification, and drift monitoring.