aifinhub

Methodology · Tool · Last updated 2026-04-23

How Batch vs Real-Time Cost Calculator works

How the Batch vs Real-Time Cost Calculator prices LLM workloads against each vendor's batch endpoint, and when batch is not a valid choice.

What the tool computes

For a configured workload — jobs per day, input and output tokens per job, chosen model, and a hard deadline in hours — the calculator produces:

  • Real-time (direct API) daily cost.
  • Batch daily cost, applying each vendor's published discount on input + output.
  • Savings per day and per month when batch is feasible.
  • A mode suggestion: batch-eligible, real-time required, or vendor has no batch endpoint.
  • A per-provider comparison using each provider's cheapest model at the current workload.

How batch APIs work

Anthropic, OpenAI, and Google all expose an asynchronous batch endpoint: you upload a file of requests, the vendor processes them when capacity is available, and you download results within a vendor SLA — typically up to 24 hours. In exchange for giving up real-time response, each provider currently offers a 50% discount on input and output tokens against the posted per-token rates. The calculator treats that as a flat multiplier on the per-job cost.

If your use case permits an overnight or intraday-lagged result — end-of-day report generation, document triage, research batches, classification passes over news archives — batch is usually the correct default. See Batch API economics for finance LLM workloads for a fuller breakdown with worked examples.

Formulas

cost_per_job_realtime = input × price_in + output × price_out
cost_per_job_batch    = cost_per_job_realtime × (1 - batch_discount)
cost_per_day          = cost_per_job × jobs_per_day
use_batch             = supports_batch AND deadline_hours >= batch_sla_hours
effective_cost_day    = use_batch ? batch_cost_day : realtime_cost_day
savings_per_day       = use_batch ? realtime - batch : 0
savings_per_month     = savings_per_day × 30

Pricing + batch table (2026-04-23, USD per 1M tokens)

ModelInputOutputBatch?DiscountSLA
Claude Opus 4.7$15$75yes50%24h
Claude Sonnet 4.6$3$15yes50%24h
Claude Haiku 4.5$1$5yes50%24h
GPT-5$10$40yes50%24h
GPT-5 mini$2$8yes50%24h
o4-mini$3$12yes50%24h
Gemini 2.5 Pro$1.25$10yes50%24h
Gemini 2.5 Flash$0.30$2.50yes50%24h

Batch discount sources

When batch is not a valid choice

  1. Deadline shorter than batch SLA. A 24-hour SLA means the vendor only guarantees completion within 24 hours. If your deadline is 4 hours, you cannot rely on batch — the calculator flips the mode to real-time required and surfaces the full direct-API cost.
  2. User-facing latency. Any interactive chat, agentic loop, or human-in-the-loop workflow is real-time by definition. Batch is for jobs the user never waits on.
  3. Tool-use round trips. Multi-turn agentic flows with tool calls cannot be batched — each turn depends on the last. Only the final independent passes (e.g. a post-hoc summarization over completed trades) are candidates.
  4. Streaming required. Anywhere you need token-by-token output — live writing assistants, dashboards — batch is off the table.
  5. Model not covered. Not every SKU at every provider supports batch; always verify the exact model ID against the vendor page before committing to a batch-based budget.

Assumptions + limitations

  1. Flat discount model. The calculator applies a single discount percentage to input + output. Vendors occasionally price input and output discounts differently — verify against the live pricing page.
  2. No cache, no tool-use accounting. Prompt caching and tool-call overhead are modeled in separate AI Fin Hub calculators.
  3. Deterministic token counts. Use the average of a representative sample for your input and output token estimates.
  4. SLA is not latency. Vendor SLAs are upper bounds. Jobs often complete faster, but you cannot budget on that.
  5. Planning tool only. This is not investment advice. Verify all numbers against vendor pages before locking a production budget.

Related articles

Changelog

  • 2026-04-23 — Initial release. 8 models, batch SLA + discount flags per provider, per-provider comparison.
Planning estimates only — not financial, tax, or investment advice.