A 12-hour deadline kills the batch discount. Anthropic's batch SLA is 24 hours, OpenAI's is the same, and Google's is the same, anything tighter pays real-time rates. The Batch vs Realtime Cost Calculator ran on 5,000 daily jobs at 8k input + 1.2k output tokens through Claude Sonnet 4.6: real-time costs €210/day, batch would cost €105/day, but at a 12h deadline the engine returns batchEligible=false and the effective cost stays at €210. The mode-suggestion field is explicit: "Real-time required — deadline 12h < batch SLA 24h." That single boolean is the whole decision.
TL;DR
- Real-time cost on Sonnet 4.6 at 5,000 jobs/day, 8k+1.2k tokens: €210/day (€6,300/month).
- Batch cost at the same volume: €105/day (€3,150/month): but only if deadline ≥ 24h.
- Effective cost with a 12h deadline: €210/day. Batch ineligible.
- Cheapest provider in the comparison: Gemini 2.5 Flash at €27/day real-time, €13.50/day batch.
- The decision rule is mechanical: ≥24h tolerance → batch; <24h → real-time; pick provider on cost-vs-capability per scenario.
The scenario
A nightly research loop processes 5,000 SEC filings per day at 8,000 input tokens + 1,200 output tokens per job. The deadline is 12 hours (start at 19:00 UTC, results required by 07:00 UTC for the morning research review). The Batch vs Realtime Cost Calculator on Claude Sonnet 4.6:
| Metric | Value |
|---|---|
| Real-time cost per day | €210 |
| Batch cost per day | €105 |
| Batch SLA | 24 hours |
| Deadline | 12 hours |
| Batch eligible | No |
| Effective cost per day | €210 |
| Monthly savings if switched | €0 |
| Mode suggestion | "Real-time required — deadline 12h < batch SLA 24h" |
The 50% discount Anthropic publishes for batch processing1 is unavailable because the deadline is tighter than the SLA. Same shape for OpenAI2 and Google3 — all three publish a 24h batch SLA.
The provider comparison
The engine returns a comparison table across providers at the same workload shape (5,000 jobs × 9,200 tokens):
| Provider / Model | Real-time/day | Batch/day | Eligible at 12h? |
|---|---|---|---|
| Anthropic Claude Sonnet 4.6 | €210 | €105 | No |
| Anthropic Claude Haiku 4.5 | €70 | €35 | No |
| OpenAI GPT-5.4 mini | €57 | €28.50 | No |
| Google Gemini 2.5 Flash | €27 | €13.50 | No |
Gemini 2.5 Flash at €27/day is the cheapest at this token shape. The capability gap matters — Flash sits below Sonnet 4.6 on multi-step reasoning and structured-output adherence on finance tasks. The right pick is workload-specific; the price column is one input, not the decision.
Why the 24h SLA is the right design
Batch APIs work because the provider can run inference during low-utilisation windows. The 24h SLA gives the provider's scheduler the slack to fill capacity gaps without breaking the customer's deadline. A 12h SLA would erode the slack and force batch jobs into peak-utilisation hours, defeating the cost advantage that justifies the discount4.
For the customer, the SLA is a hard switch. The engine's batchEligible field encodes the comparison directly: deadline ≥ batchSlaHours → true. Anything else returns false and the comparison treats batch as if it doesn't exist.
Three deadline regimes
The decision tree for batch-vs-real-time is a three-position switch:
Deadline ≥ 24 hours (batch wins)
Overnight loops with a 24-48h delivery window are the textbook batch case. 50% discount applies, no latency cost, batch eligible. Use batch.
Deadline 6-24 hours (mixed regime)
Some workloads have a soft 24h deadline with hard requirements at intermediate checkpoints. Split the workload: send a batch for the bulk and a real-time burst for the deadline-critical subset. The Token Cost Optimizer does the math for blended pipelines.
Deadline < 6 hours (real-time only)
Real-time inference. No discount available. Pick the cheapest provider that meets capability requirements. Gemini 2.5 Flash and Haiku-class tiers cover most retail workloads at real-time without breaking the budget.
What the engine doesn't model
The engine quotes published list prices. Three factors it does not include:
- Prompt caching. Anthropic publishes a 90% discount on cached read tokens; OpenAI publishes a 50% discount on cached input tokens. For research loops with stable system prompts, cache hits drop the effective real-time cost by 40-60%. See Prompt Caching Economics for Finance and the Token Cost Optimizer for the cache-aware version.
- Volume discounts. All three providers offer enterprise discounts above ~€10k/month spend. The engine prices on retail list.
- Provider failover. A multi-provider stack carries failover overhead — typically 5-10% on cost. See Fallback Chain Simulator for the blended-cost model.
The retail accounting
For the 12h deadline scenario, the honest monthly bill on Sonnet 4.6 is €6,300/month. With prompt caching at 60% hit rate, that drops to roughly €2,500/month. With provider switch to Gemini Flash (capability permitting), it drops to €800/month. The cache + cheap-tier combination is the dominant cost lever for retail research loops, not the batch toggle.
For workloads that can tolerate a 24h delay, the batch toggle saves €3,150/month at Sonnet 4.6 prices. That is roughly the same magnitude as a tier switch from Sonnet to Haiku. The trader's choice is whether the capability gap or the latency gap is more tolerable — both close the same euro-amount gap.
Failure modes
- Treating the deadline as advisory. Batch SLAs are firm. A "soft" 24h that occasionally needs to deliver in 18h does not qualify for batch; that workload pays real-time rates.
- Mixing batch and real-time pricing in budgets. Quote either-or, not blended, unless the pipeline is genuinely split.
- Forgetting that batch SLAs are wall-clock from submit-time. A batch submitted at 03:00 with a 24h SLA delivers by 03:00 next day, not by some business-hour proxy.
- Treating the cheapest provider as the right answer. Capability differences across providers on finance-specific tasks can dwarf the cost gap. Run the eval first; see Eval Harness for Finance LLMs.
Connects to
- Batch API Economics for Finance: when 24h windows are available.
- Prompt Caching Economics for Finance: the cache lever.
- Token Cost Reality for LLM Trading Research: wider cost picture.
- Inference Cost Attribution per Trade: per-trade unit economics.
- Batch vs Realtime Cost Calculator: re-run on your spec.
- Batch vs Realtime Cost Calculator methodology: full input/output specification.
References
Footnotes
-
Anthropic (2024). "Batch processing for the Claude API." docs.anthropic.com ↩
-
OpenAI (2024). "Batch API documentation." platform.openai.com ↩
-
Google Cloud (2024). "Vertex AI batch prediction." cloud.google.com ↩
-
Pope, R., et al. (2023). "Efficiently Scaling Transformer Inference." MLSys 2023. arxiv.org/abs/2211.05102 ↩
Verified engine output
Show the recompute-verified inputs and outputs
| model_id | claude-sonnet-4-6 |
|---|---|
| jobs_per_day | 5000 |
| input_tokens_per_job | 8000 |
| output_tokens_per_job | 1200 |
| deadline_hours | 12 |
| model › id | claude-sonnet-4-6 |
|---|---|
| model › provider | anthropic |
| model › name | Claude Sonnet 4.6 |
| model › input usd per mtoken | 3 |
| model › output usd per mtoken | 15 |
| model › supports batch | true |
| model › batch discount pct | 50 |
| model › batch sla hours | 24 |
| model › context window | 500000 |
| model › notes | Best price/performance for bulk workloads. |
| realtime cost per day | 209.99999999999997 |
| batch cost per day | 104.99999999999999 |
| effective cost per day | 209.99999999999997 |
| savings per day | 0 |
| savings per month | 0 |
| batch eligible | false |
| using batch | false |
| mode suggestion | Real-time required — deadline 12h < batch SLA 24h |
| provider comparison › row 1 › provider | anthropic |
| provider comparison › row 1 › model › id | claude-haiku-4-5 |
| provider comparison › row 1 › model › provider | anthropic |
| provider comparison › row 1 › model › name | Claude Haiku 4.5 |
| provider comparison › row 1 › model › input usd per mtoken | 1 |
| provider comparison › row 1 › model › output usd per mtoken | 5 |
| provider comparison › row 1 › model › supports batch | true |
| provider comparison › row 1 › model › batch discount pct | 50 |
| provider comparison › row 1 › model › batch sla hours | 24 |
| provider comparison › row 1 › model › context window | 200000 |
| provider comparison › row 1 › model › notes | Cheapest Anthropic option — filtering, classification. |
| provider comparison › row 1 › realtime cost per day | 69.99999999999999 |
| provider comparison › row 1 › batch cost per day | 34.99999999999999 |
| provider comparison › row 1 › batch eligible | false |
| provider comparison › row 1 › supports batch | true |
| provider comparison › row 1 › batch sla hours | 24 |
| provider comparison › row 2 › provider | openai |
| provider comparison › row 2 › model › id | gpt-5-mini |
| provider comparison › row 2 › model › provider | openai |
| provider comparison › row 2 › model › name | GPT-5.4 mini |
| provider comparison › row 2 › model › input usd per mtoken | 0.75 |
| provider comparison › row 2 › model › output usd per mtoken | 4.5 |
| provider comparison › row 2 › model › supports batch | true |
| provider comparison › row 2 › model › batch discount pct | 50 |
| provider comparison › row 2 › model › batch sla hours | 24 |
| provider comparison › row 2 › model › context window | 256000 |
| provider comparison › row 2 › model › notes | Mid-tier OpenAI. |
| provider comparison › row 2 › realtime cost per day | 57 |
| provider comparison › row 2 › batch cost per day | 28.5 |
| provider comparison › row 2 › batch eligible | false |
| provider comparison › row 2 › supports batch | true |
| provider comparison › row 2 › batch sla hours | 24 |
| provider comparison › row 3 › provider | |
| provider comparison › row 3 › model › id | gemini-2-5-flash-lite |
| provider comparison › row 3 › model › provider | |
| provider comparison › row 3 › model › name | Gemini 2.5 Flash-Lite |
| provider comparison › row 3 › model › input usd per mtoken | 0.1 |
| provider comparison › row 3 › model › output usd per mtoken | 0.4 |
| provider comparison › row 3 › model › supports batch | true |
| provider comparison › row 3 › model › batch discount pct | 50 |
| provider comparison › row 3 › model › batch sla hours | 24 |
| provider comparison › row 3 › model › context window | 1000000 |
| provider comparison › row 3 › model › notes | Cheapest model in this table. Batch mode available. |
| provider comparison › row 3 › realtime cost per day | 6.3999999999999995 |
| provider comparison › row 3 › batch cost per day | 3.1999999999999997 |
| provider comparison › row 3 › batch eligible | false |
| provider comparison › row 3 › supports batch | true |
| provider comparison › row 3 › batch sla hours | 24 |
Computed live at build time.
Frequently asked questions
- Why is batch ineligible at a 12h deadline?
- All three major providers publish a 24h batch SLA. A 12h deadline cannot rely on a 24h SLA. The engine's batchEligible boolean encodes the comparison and returns false. The price column for batch is shown for reference, but the effective cost stays at the real-time rate.
- Can I run hybrid batch + real-time for the same workload?
- Yes. Split the workload into a batch portion with a tolerant deadline and a real-time portion for the deadline-critical subset. Most production research loops are structured this way.
- How does prompt caching change this calculation?
- Caching cuts real-time input-token costs by 50-90% depending on provider and hit rate. For a stable system prompt at 60% hit rate, the effective real-time daily cost drops to roughly half — eliminating most of the batch discount's advantage. The Batch vs Realtime engine does not model caching; the Token Cost Optimizer does.