A 12-hour deadline kills the batch discount. Anthropic's batch SLA is 24 hours, OpenAI's is the same, and Google's is the same, anything tighter pays real-time rates. The Batch vs Realtime Cost Calculator ran on 5,000 daily jobs at 8k input + 1.2k output tokens through Claude Sonnet 4.6: real-time costs €210/day, batch would cost €105/day, but at a 12h deadline the engine returns batchEligible=false and the effective cost stays at €210. The mode-suggestion field is explicit: "Real-time required — deadline 12h < batch SLA 24h." That single boolean is the whole decision.

TL;DR

  • Real-time cost on Sonnet 4.6 at 5,000 jobs/day, 8k+1.2k tokens: €210/day (€6,300/month).
  • Batch cost at the same volume: €105/day (€3,150/month): but only if deadline ≥ 24h.
  • Effective cost with a 12h deadline: €210/day. Batch ineligible.
  • Cheapest provider in the comparison: Gemini 2.5 Flash at €27/day real-time, €13.50/day batch.
  • The decision rule is mechanical: ≥24h tolerance → batch; <24h → real-time; pick provider on cost-vs-capability per scenario.

The scenario

A nightly research loop processes 5,000 SEC filings per day at 8,000 input tokens + 1,200 output tokens per job. The deadline is 12 hours (start at 19:00 UTC, results required by 07:00 UTC for the morning research review). The Batch vs Realtime Cost Calculator on Claude Sonnet 4.6:

Metric Value
Real-time cost per day €210
Batch cost per day €105
Batch SLA 24 hours
Deadline 12 hours
Batch eligible No
Effective cost per day €210
Monthly savings if switched €0
Mode suggestion "Real-time required — deadline 12h < batch SLA 24h"

The 50% discount Anthropic publishes for batch processing1 is unavailable because the deadline is tighter than the SLA. Same shape for OpenAI2 and Google3 — all three publish a 24h batch SLA.

The provider comparison

The engine returns a comparison table across providers at the same workload shape (5,000 jobs × 9,200 tokens):

Provider / Model Real-time/day Batch/day Eligible at 12h?
Anthropic Claude Sonnet 4.6 €210 €105 No
Anthropic Claude Haiku 4.5 €70 €35 No
OpenAI GPT-5.4 mini €57 €28.50 No
Google Gemini 2.5 Flash €27 €13.50 No

Gemini 2.5 Flash at €27/day is the cheapest at this token shape. The capability gap matters — Flash sits below Sonnet 4.6 on multi-step reasoning and structured-output adherence on finance tasks. The right pick is workload-specific; the price column is one input, not the decision.

Why the 24h SLA is the right design

Batch APIs work because the provider can run inference during low-utilisation windows. The 24h SLA gives the provider's scheduler the slack to fill capacity gaps without breaking the customer's deadline. A 12h SLA would erode the slack and force batch jobs into peak-utilisation hours, defeating the cost advantage that justifies the discount4.

For the customer, the SLA is a hard switch. The engine's batchEligible field encodes the comparison directly: deadline ≥ batchSlaHours → true. Anything else returns false and the comparison treats batch as if it doesn't exist.

Three deadline regimes

The decision tree for batch-vs-real-time is a three-position switch:

Deadline ≥ 24 hours (batch wins)

Overnight loops with a 24-48h delivery window are the textbook batch case. 50% discount applies, no latency cost, batch eligible. Use batch.

Deadline 6-24 hours (mixed regime)

Some workloads have a soft 24h deadline with hard requirements at intermediate checkpoints. Split the workload: send a batch for the bulk and a real-time burst for the deadline-critical subset. The Token Cost Optimizer does the math for blended pipelines.

Deadline < 6 hours (real-time only)

Real-time inference. No discount available. Pick the cheapest provider that meets capability requirements. Gemini 2.5 Flash and Haiku-class tiers cover most retail workloads at real-time without breaking the budget.

What the engine doesn't model

The engine quotes published list prices. Three factors it does not include:

  • Prompt caching. Anthropic publishes a 90% discount on cached read tokens; OpenAI publishes a 50% discount on cached input tokens. For research loops with stable system prompts, cache hits drop the effective real-time cost by 40-60%. See Prompt Caching Economics for Finance and the Token Cost Optimizer for the cache-aware version.
  • Volume discounts. All three providers offer enterprise discounts above ~€10k/month spend. The engine prices on retail list.
  • Provider failover. A multi-provider stack carries failover overhead — typically 5-10% on cost. See Fallback Chain Simulator for the blended-cost model.

The retail accounting

For the 12h deadline scenario, the honest monthly bill on Sonnet 4.6 is €6,300/month. With prompt caching at 60% hit rate, that drops to roughly €2,500/month. With provider switch to Gemini Flash (capability permitting), it drops to €800/month. The cache + cheap-tier combination is the dominant cost lever for retail research loops, not the batch toggle.

For workloads that can tolerate a 24h delay, the batch toggle saves €3,150/month at Sonnet 4.6 prices. That is roughly the same magnitude as a tier switch from Sonnet to Haiku. The trader's choice is whether the capability gap or the latency gap is more tolerable — both close the same euro-amount gap.

Failure modes

  • Treating the deadline as advisory. Batch SLAs are firm. A "soft" 24h that occasionally needs to deliver in 18h does not qualify for batch; that workload pays real-time rates.
  • Mixing batch and real-time pricing in budgets. Quote either-or, not blended, unless the pipeline is genuinely split.
  • Forgetting that batch SLAs are wall-clock from submit-time. A batch submitted at 03:00 with a 24h SLA delivers by 03:00 next day, not by some business-hour proxy.
  • Treating the cheapest provider as the right answer. Capability differences across providers on finance-specific tasks can dwarf the cost gap. Run the eval first; see Eval Harness for Finance LLMs.

Connects to

References

Footnotes

  1. Anthropic (2024). "Batch processing for the Claude API." docs.anthropic.com

  2. OpenAI (2024). "Batch API documentation." platform.openai.com

  3. Google Cloud (2024). "Vertex AI batch prediction." cloud.google.com

  4. Pope, R., et al. (2023). "Efficiently Scaling Transformer Inference." MLSys 2023. arxiv.org/abs/2211.05102

Verified engine output

Show the recompute-verified inputs and outputs
Claude Sonnet 4.6, 5,000 jobs/day, 8k input + 1.2k output tokens, 12h deadline
Inputs
model_idclaude-sonnet-4-6
jobs_per_day5000
input_tokens_per_job8000
output_tokens_per_job1200
deadline_hours12
Result
model › idclaude-sonnet-4-6
model › provideranthropic
model › nameClaude Sonnet 4.6
model › input usd per mtoken3
model › output usd per mtoken15
model › supports batchtrue
model › batch discount pct50
model › batch sla hours24
model › context window500000
model › notesBest price/performance for bulk workloads.
realtime cost per day209.99999999999997
batch cost per day104.99999999999999
effective cost per day209.99999999999997
savings per day0
savings per month0
batch eligiblefalse
using batchfalse
mode suggestionReal-time required — deadline 12h < batch SLA 24h
provider comparison › row 1 › provideranthropic
provider comparison › row 1 › model › idclaude-haiku-4-5
provider comparison › row 1 › model › provideranthropic
provider comparison › row 1 › model › nameClaude Haiku 4.5
provider comparison › row 1 › model › input usd per mtoken1
provider comparison › row 1 › model › output usd per mtoken5
provider comparison › row 1 › model › supports batchtrue
provider comparison › row 1 › model › batch discount pct50
provider comparison › row 1 › model › batch sla hours24
provider comparison › row 1 › model › context window200000
provider comparison › row 1 › model › notesCheapest Anthropic option — filtering, classification.
provider comparison › row 1 › realtime cost per day69.99999999999999
provider comparison › row 1 › batch cost per day34.99999999999999
provider comparison › row 1 › batch eligiblefalse
provider comparison › row 1 › supports batchtrue
provider comparison › row 1 › batch sla hours24
provider comparison › row 2 › provideropenai
provider comparison › row 2 › model › idgpt-5-mini
provider comparison › row 2 › model › provideropenai
provider comparison › row 2 › model › nameGPT-5.4 mini
provider comparison › row 2 › model › input usd per mtoken0.75
provider comparison › row 2 › model › output usd per mtoken4.5
provider comparison › row 2 › model › supports batchtrue
provider comparison › row 2 › model › batch discount pct50
provider comparison › row 2 › model › batch sla hours24
provider comparison › row 2 › model › context window256000
provider comparison › row 2 › model › notesMid-tier OpenAI.
provider comparison › row 2 › realtime cost per day57
provider comparison › row 2 › batch cost per day28.5
provider comparison › row 2 › batch eligiblefalse
provider comparison › row 2 › supports batchtrue
provider comparison › row 2 › batch sla hours24
provider comparison › row 3 › providergoogle
provider comparison › row 3 › model › idgemini-2-5-flash-lite
provider comparison › row 3 › model › providergoogle
provider comparison › row 3 › model › nameGemini 2.5 Flash-Lite
provider comparison › row 3 › model › input usd per mtoken0.1
provider comparison › row 3 › model › output usd per mtoken0.4
provider comparison › row 3 › model › supports batchtrue
provider comparison › row 3 › model › batch discount pct50
provider comparison › row 3 › model › batch sla hours24
provider comparison › row 3 › model › context window1000000
provider comparison › row 3 › model › notesCheapest model in this table. Batch mode available.
provider comparison › row 3 › realtime cost per day6.3999999999999995
provider comparison › row 3 › batch cost per day3.1999999999999997
provider comparison › row 3 › batch eligiblefalse
provider comparison › row 3 › supports batchtrue
provider comparison › row 3 › batch sla hours24

Computed live at build time.

Frequently asked questions

Why is batch ineligible at a 12h deadline?
All three major providers publish a 24h batch SLA. A 12h deadline cannot rely on a 24h SLA. The engine's batchEligible boolean encodes the comparison and returns false. The price column for batch is shown for reference, but the effective cost stays at the real-time rate.
Can I run hybrid batch + real-time for the same workload?
Yes. Split the workload into a batch portion with a tolerant deadline and a real-time portion for the deadline-critical subset. Most production research loops are structured this way.
How does prompt caching change this calculation?
Caching cuts real-time input-token costs by 50-90% depending on provider and hit rate. For a stable system prompt at 60% hit rate, the effective real-time daily cost drops to roughly half — eliminating most of the batch discount's advantage. The Batch vs Realtime engine does not model caching; the Token Cost Optimizer does.