Methodology · Tool · Last updated 2026-04-23
How Batch vs Real-Time Cost Calculator works
How the Batch vs Real-Time Cost Calculator prices LLM workloads against each vendor's batch endpoint, and when batch is not a valid choice.
What the tool computes
For a configured workload — jobs per day, input and output tokens per job, chosen model, and a hard deadline in hours — the calculator produces:
- Real-time (direct API) daily cost.
- Batch daily cost, applying each vendor's published discount on input + output.
- Savings per day and per month when batch is feasible.
- A mode suggestion: batch-eligible, real-time required, or vendor has no batch endpoint.
- A per-provider comparison using each provider's cheapest model at the current workload.
How batch APIs work
Anthropic, OpenAI, and Google all expose an asynchronous batch endpoint: you upload a file of requests, the vendor processes them when capacity is available, and you download results within a vendor SLA — typically up to 24 hours. In exchange for giving up real-time response, each provider currently offers a 50% discount on input and output tokens against the posted per-token rates. The calculator treats that as a flat multiplier on the per-job cost.
If your use case permits an overnight or intraday-lagged result — end-of-day report generation, document triage, research batches, classification passes over news archives — batch is usually the correct default. See Batch API economics for finance LLM workloads for a fuller breakdown with worked examples.
Formulas
cost_per_job_realtime = input × price_in + output × price_out
cost_per_job_batch = cost_per_job_realtime × (1 - batch_discount)
cost_per_day = cost_per_job × jobs_per_day
use_batch = supports_batch AND deadline_hours >= batch_sla_hours
effective_cost_day = use_batch ? batch_cost_day : realtime_cost_day
savings_per_day = use_batch ? realtime - batch : 0
savings_per_month = savings_per_day × 30 Pricing + batch table (2026-04-23, USD per 1M tokens)
| Model | Input | Output | Batch? | Discount | SLA |
|---|---|---|---|---|---|
| Claude Opus 4.7 | $15 | $75 | yes | 50% | 24h |
| Claude Sonnet 4.6 | $3 | $15 | yes | 50% | 24h |
| Claude Haiku 4.5 | $1 | $5 | yes | 50% | 24h |
| GPT-5 | $10 | $40 | yes | 50% | 24h |
| GPT-5 mini | $2 | $8 | yes | 50% | 24h |
| o4-mini | $3 | $12 | yes | 50% | 24h |
| Gemini 2.5 Pro | $1.25 | $10 | yes | 50% | 24h |
| Gemini 2.5 Flash | $0.30 | $2.50 | yes | 50% | 24h |
Batch discount sources
- Anthropic — Message Batches (50% discount, up to 24h SLA).
- OpenAI — Batch API guide and pricing (50% discount, up to 24h SLA).
- Google — Gemini Batch mode and pricing (50% discount on supported tiers, up to 24h SLA).
When batch is not a valid choice
- Deadline shorter than batch SLA. A 24-hour SLA means the vendor only guarantees completion within 24 hours. If your deadline is 4 hours, you cannot rely on batch — the calculator flips the mode to real-time required and surfaces the full direct-API cost.
- User-facing latency. Any interactive chat, agentic loop, or human-in-the-loop workflow is real-time by definition. Batch is for jobs the user never waits on.
- Tool-use round trips. Multi-turn agentic flows with tool calls cannot be batched — each turn depends on the last. Only the final independent passes (e.g. a post-hoc summarization over completed trades) are candidates.
- Streaming required. Anywhere you need token-by-token output — live writing assistants, dashboards — batch is off the table.
- Model not covered. Not every SKU at every provider supports batch; always verify the exact model ID against the vendor page before committing to a batch-based budget.
Assumptions + limitations
- Flat discount model. The calculator applies a single discount percentage to input + output. Vendors occasionally price input and output discounts differently — verify against the live pricing page.
- No cache, no tool-use accounting. Prompt caching and tool-call overhead are modeled in separate AI Fin Hub calculators.
- Deterministic token counts. Use the average of a representative sample for your input and output token estimates.
- SLA is not latency. Vendor SLAs are upper bounds. Jobs often complete faster, but you cannot budget on that.
- Planning tool only. This is not investment advice. Verify all numbers against vendor pages before locking a production budget.
Related articles
- Batch API economics for finance LLM workloads — when overnight pipelines beat real-time.
- Model selection framework for finance — choosing between Opus / Sonnet / Haiku / GPT / Gemini tiers.
- Inference cost attribution per trade — attributing LLM spend to validated trades and P&L.
Changelog
- 2026-04-23 — Initial release. 8 models, batch SLA + discount flags per provider, per-provider comparison.