Why is batch ineligible at a 12h deadline?

All three major providers publish a 24h batch SLA. A 12h deadline cannot rely on a 24h SLA. The engine's batchEligible boolean encodes the comparison and returns false. The price column for batch is shown for reference, but the effective cost stays at the real-time rate.

Can I run hybrid batch + real-time for the same workload?

Yes. Split the workload into a batch portion with a tolerant deadline and a real-time portion for the deadline-critical subset. Most production research loops are structured this way.

How does prompt caching change this calculation?

Caching cuts real-time input-token costs by 50-90% depending on provider and hit rate. For a stable system prompt at 60% hit rate, the effective real-time daily cost drops to roughly half — eliminating most of the batch discount's advantage. The Batch vs Realtime engine does not model caching; the Token Cost Optimizer does.

Batch vs Realtime Overnight Cost

A 12-hour deadline kills the batch discount. Anthropic's batch SLA is 24 hours, OpenAI's is the same, and Google's is the same, anything tighter pays real-time rates. The Batch vs Realtime Cost Calculator ran on 5,000 daily jobs at 8k input + 1.2k output tokens through Claude Sonnet 4.6: real-time costs €210/day, batch would cost €105/day, but at a 12h deadline the engine returns batchEligible=false and the effective cost stays at €210. The mode-suggestion field is explicit: "Real-time required — deadline 12h < batch SLA 24h." That single boolean is the whole decision.

TL;DR

Real-time cost on Sonnet 4.6 at 5,000 jobs/day, 8k+1.2k tokens: €210/day (€6,300/month).
Batch cost at the same volume: €105/day (€3,150/month): but only if deadline ≥ 24h.
Effective cost with a 12h deadline: €210/day. Batch ineligible.
Cheapest provider in the comparison: Gemini 2.5 Flash at €27/day real-time, €13.50/day batch.
The decision rule is mechanical: ≥24h tolerance → batch; <24h → real-time; pick provider on cost-vs-capability per scenario.

The scenario

A nightly research loop processes 5,000 SEC filings per day at 8,000 input tokens + 1,200 output tokens per job. The deadline is 12 hours (start at 19:00 UTC, results required by 07:00 UTC for the morning research review). The Batch vs Realtime Cost Calculator on Claude Sonnet 4.6:

Metric	Value
Real-time cost per day	€210
Batch cost per day	€105
Batch SLA	24 hours
Deadline	12 hours
Batch eligible	No
Effective cost per day	€210
Monthly savings if switched	€0
Mode suggestion	"Real-time required — deadline 12h < batch SLA 24h"

The 50% discount Anthropic publishes for batch processing¹ is unavailable because the deadline is tighter than the SLA. Same shape for OpenAI² and Google³ — all three publish a 24h batch SLA.

The provider comparison

The engine returns a comparison table across providers at the same workload shape (5,000 jobs × 9,200 tokens):

Provider / Model	Real-time/day	Batch/day	Eligible at 12h?
Anthropic Claude Sonnet 4.6	€210	€105	No
Anthropic Claude Haiku 4.5	€70	€35	No
OpenAI GPT-5.4 mini	€57	€28.50	No
Google Gemini 2.5 Flash	€27	€13.50	No

Gemini 2.5 Flash at €27/day is the cheapest at this token shape. The capability gap matters — Flash sits below Sonnet 4.6 on multi-step reasoning and structured-output adherence on finance tasks. The right pick is workload-specific; the price column is one input, not the decision.

Why the 24h SLA is the right design

Batch APIs work because the provider can run inference during low-utilisation windows. The 24h SLA gives the provider's scheduler the slack to fill capacity gaps without breaking the customer's deadline. A 12h SLA would erode the slack and force batch jobs into peak-utilisation hours, defeating the cost advantage that justifies the discount⁴.

For the customer, the SLA is a hard switch. The engine's batchEligible field encodes the comparison directly: deadline ≥ batchSlaHours → true. Anything else returns false and the comparison treats batch as if it doesn't exist.

Three deadline regimes

The decision tree for batch-vs-real-time is a three-position switch:

Deadline ≥ 24 hours (batch wins)

Overnight loops with a 24-48h delivery window are the textbook batch case. 50% discount applies, no latency cost, batch eligible. Use batch.

Deadline 6-24 hours (mixed regime)

Some workloads have a soft 24h deadline with hard requirements at intermediate checkpoints. Split the workload: send a batch for the bulk and a real-time burst for the deadline-critical subset. The Token Cost Optimizer does the math for blended pipelines.

Deadline < 6 hours (real-time only)

Real-time inference. No discount available. Pick the cheapest provider that meets capability requirements. Gemini 2.5 Flash and Haiku-class tiers cover most retail workloads at real-time without breaking the budget.

What the engine doesn't model

The engine quotes published list prices. Three factors it does not include:

Prompt caching. Anthropic publishes a 90% discount on cached read tokens; OpenAI publishes a 50% discount on cached input tokens. For research loops with stable system prompts, cache hits drop the effective real-time cost by 40-60%. See Prompt Caching Economics for Finance and the Token Cost Optimizer for the cache-aware version.
Volume discounts. All three providers offer enterprise discounts above ~€10k/month spend. The engine prices on retail list.
Provider failover. A multi-provider stack carries failover overhead — typically 5-10% on cost. See Fallback Chain Simulator for the blended-cost model.

The retail accounting

For the 12h deadline scenario, the honest monthly bill on Sonnet 4.6 is €6,300/month. With prompt caching at 60% hit rate, that drops to roughly €2,500/month. With provider switch to Gemini Flash (capability permitting), it drops to €800/month. The cache + cheap-tier combination is the dominant cost lever for retail research loops, not the batch toggle.

For workloads that can tolerate a 24h delay, the batch toggle saves €3,150/month at Sonnet 4.6 prices. That is roughly the same magnitude as a tier switch from Sonnet to Haiku. The trader's choice is whether the capability gap or the latency gap is more tolerable — both close the same euro-amount gap.

Failure modes

Treating the deadline as advisory. Batch SLAs are firm. A "soft" 24h that occasionally needs to deliver in 18h does not qualify for batch; that workload pays real-time rates.
Mixing batch and real-time pricing in budgets. Quote either-or, not blended, unless the pipeline is genuinely split.
Forgetting that batch SLAs are wall-clock from submit-time. A batch submitted at 03:00 with a 24h SLA delivers by 03:00 next day, not by some business-hour proxy.
Treating the cheapest provider as the right answer. Capability differences across providers on finance-specific tasks can dwarf the cost gap. Run the eval first; see Eval Harness for Finance LLMs.

Connects to

Batch API Economics for Finance: when 24h windows are available.
Prompt Caching Economics for Finance: the cache lever.
Token Cost Reality for LLM Trading Research: wider cost picture.
Inference Cost Attribution per Trade: per-trade unit economics.
Batch vs Realtime Cost Calculator: re-run on your spec.
Batch vs Realtime Cost Calculator methodology: full input/output specification.

References

Anthropic (2024). "Batch processing for the Claude API." docs.anthropic.com ↩
OpenAI (2024). "Batch API documentation." platform.openai.com ↩
Google Cloud (2024). "Vertex AI batch prediction." cloud.google.com ↩
Pope, R., et al. (2023). "Efficiently Scaling Transformer Inference." MLSys 2023. arxiv.org/abs/2211.05102 ↩

Verified engine output

Show the recompute-verified inputs and outputs

Claude Sonnet 4.6, 5,000 jobs/day, 8k input + 1.2k output tokens, 12h deadline

Inputs
model_id	claude-sonnet-4-6
jobs_per_day	5000
input_tokens_per_job	8000
output_tokens_per_job	1200
deadline_hours	12

Result
model › id	claude-sonnet-4-6
model › provider	anthropic
model › name	Claude Sonnet 4.6
model › input usd per mtoken	3
model › output usd per mtoken	15
model › supports batch	true
model › batch discount pct	50
model › batch sla hours	24
model › context window	500000
model › notes	Best price/performance for bulk workloads.
realtime cost per day	209.99999999999997
batch cost per day	104.99999999999999
effective cost per day	209.99999999999997
savings per day	0
savings per month	0
batch eligible	false
using batch	false
mode suggestion	Real-time required — deadline 12h < batch SLA 24h
provider comparison › row 1 › provider	anthropic
provider comparison › row 1 › model › id	claude-haiku-4-5
provider comparison › row 1 › model › provider	anthropic
provider comparison › row 1 › model › name	Claude Haiku 4.5
provider comparison › row 1 › model › input usd per mtoken	1
provider comparison › row 1 › model › output usd per mtoken	5
provider comparison › row 1 › model › supports batch	true
provider comparison › row 1 › model › batch discount pct	50
provider comparison › row 1 › model › batch sla hours	24
provider comparison › row 1 › model › context window	200000
provider comparison › row 1 › model › notes	Cheapest Anthropic option — filtering, classification.
provider comparison › row 1 › realtime cost per day	69.99999999999999
provider comparison › row 1 › batch cost per day	34.99999999999999
provider comparison › row 1 › batch eligible	false
provider comparison › row 1 › supports batch	true
provider comparison › row 1 › batch sla hours	24
provider comparison › row 2 › provider	openai
provider comparison › row 2 › model › id	gpt-5-mini
provider comparison › row 2 › model › provider	openai
provider comparison › row 2 › model › name	GPT-5.4 mini
provider comparison › row 2 › model › input usd per mtoken	0.75
provider comparison › row 2 › model › output usd per mtoken	4.5
provider comparison › row 2 › model › supports batch	true
provider comparison › row 2 › model › batch discount pct	50
provider comparison › row 2 › model › batch sla hours	24
provider comparison › row 2 › model › context window	256000
provider comparison › row 2 › model › notes	Mid-tier OpenAI.
provider comparison › row 2 › realtime cost per day	57
provider comparison › row 2 › batch cost per day	28.5
provider comparison › row 2 › batch eligible	false
provider comparison › row 2 › supports batch	true
provider comparison › row 2 › batch sla hours	24
provider comparison › row 3 › provider	google
provider comparison › row 3 › model › id	gemini-2-5-flash-lite
provider comparison › row 3 › model › provider	google
provider comparison › row 3 › model › name	Gemini 2.5 Flash-Lite
provider comparison › row 3 › model › input usd per mtoken	0.1
provider comparison › row 3 › model › output usd per mtoken	0.4
provider comparison › row 3 › model › supports batch	true
provider comparison › row 3 › model › batch discount pct	50
provider comparison › row 3 › model › batch sla hours	24
provider comparison › row 3 › model › context window	1000000
provider comparison › row 3 › model › notes	Cheapest model in this table. Batch mode available.
provider comparison › row 3 › realtime cost per day	6.3999999999999995
provider comparison › row 3 › batch cost per day	3.1999999999999997
provider comparison › row 3 › batch eligible	false
provider comparison › row 3 › supports batch	true
provider comparison › row 3 › batch sla hours	24

Computed live at build time.

Frequently asked questions

Why is batch ineligible at a 12h deadline?: All three major providers publish a 24h batch SLA. A 12h deadline cannot rely on a 24h SLA. The engine's batchEligible boolean encodes the comparison and returns false. The price column for batch is shown for reference, but the effective cost stays at the real-time rate.
Can I run hybrid batch + real-time for the same workload?: Yes. Split the workload into a batch portion with a tolerant deadline and a real-time portion for the deadline-critical subset. Most production research loops are structured this way.
How does prompt caching change this calculation?: Caching cuts real-time input-token costs by 50-90% depending on provider and hit rate. For a stable system prompt at 60% hit rate, the effective real-time daily cost drops to roughly half — eliminating most of the batch discount's advantage. The Batch vs Realtime engine does not model caching; the Token Cost Optimizer does.