Methodology: Batch vs Real-Time Cost Calculator

What the tool computes

For a configured workload — jobs per day, input and output tokens per job, chosen model, and a hard deadline in hours — the calculator produces:

Real-time (direct API) daily cost.
Batch daily cost, applying each vendor's published discount on input + output.
Savings per day and per month when batch is feasible.
A mode suggestion: batch-eligible, real-time required, or vendor has no batch endpoint.
A per-provider comparison using each provider's cheapest model at the current workload.

How batch APIs work

Anthropic, OpenAI, and Google all expose an asynchronous batch endpoint: you upload a file of requests, the vendor processes them when capacity is available, and you download results within a vendor SLA — typically up to 24 hours. In exchange for giving up real-time response, each provider currently offers a 50% discount on input and output tokens against the posted per-token rates. The calculator treats that as a flat multiplier on the per-job cost.

If your use case permits an overnight or intraday-lagged result — end-of-day report generation, document triage, research batches, classification passes over news archives — batch is usually the correct default. See Batch API economics for finance LLM workloads for a fuller breakdown with worked examples.

Formulas

cost_per_job_realtime = input × price_in + output × price_out
cost_per_job_batch    = cost_per_job_realtime × (1 - batch_discount)
cost_per_day          = cost_per_job × jobs_per_day
use_batch             = supports_batch AND deadline_hours >= batch_sla_hours
effective_cost_day    = use_batch ? batch_cost_day : realtime_cost_day
savings_per_day       = use_batch ? realtime - batch : 0
savings_per_month     = savings_per_day × 30

Pricing + batch table (2026-04-23, USD per 1M tokens)

Model	Input	Output	Batch?	Discount	SLA
Claude Opus 4.7	$15	$75	yes	50%	24h
Claude Sonnet 4.6	$3	$15	yes	50%	24h
Claude Haiku 4.5	$1	$5	yes	50%	24h
GPT-5	$10	$40	yes	50%	24h
GPT-5 mini	$2	$8	yes	50%	24h
o4-mini	$3	$12	yes	50%	24h
Gemini 2.5 Pro	$1.25	$10	yes	50%	24h
Gemini 2.5 Flash	$0.30	$2.50	yes	50%	24h

Batch discount sources

Anthropic — Message Batches (50% discount, up to 24h SLA).
OpenAI — Batch API guide and pricing (50% discount, up to 24h SLA).
Google — Gemini Batch mode and pricing (50% discount on supported tiers, up to 24h SLA).

When batch is not a valid choice

Deadline shorter than batch SLA. A 24-hour SLA means the vendor only guarantees completion within 24 hours. If your deadline is 4 hours, you cannot rely on batch — the calculator flips the mode to real-time required and surfaces the full direct-API cost.
User-facing latency. Any interactive chat, agentic loop, or human-in-the-loop workflow is real-time by definition. Batch is for jobs the user never waits on.
Tool-use round trips. Multi-turn agentic flows with tool calls cannot be batched — each turn depends on the last. Only the final independent passes (e.g. a post-hoc summarization over completed trades) are candidates.
Streaming required. Anywhere you need token-by-token output — live writing assistants, dashboards — batch is off the table.
Model not covered. Not every SKU at every provider supports batch; always verify the exact model ID against the vendor page before committing to a batch-based budget.

Assumptions + limitations

Flat discount model. The calculator applies a single discount percentage to input + output. Vendors occasionally price input and output discounts differently — verify against the live pricing page.
No cache, no tool-use accounting. Prompt caching and tool-call overhead are modeled in separate AI Fin Hub calculators.
Deterministic token counts. Use the average of a representative sample for your input and output token estimates.
SLA is not latency. Vendor SLAs are upper bounds. Jobs often complete faster, but you cannot budget on that.
Planning tool only. This is not investment advice. Verify all numbers against vendor pages before locking a production budget.

Batch API economics for finance LLM workloads — when overnight pipelines beat real-time.
Model selection framework for finance — choosing between Opus / Sonnet / Haiku / GPT / Gemini tiers.
Inference cost attribution per trade — attributing LLM spend to validated trades and P&L.

Changelog

2026-04-23 — Initial release. 8 models, batch SLA + discount flags per provider, per-provider comparison.

How Batch vs Real-Time Cost Calculator works