Methodology · Tool · Last updated 2026-05-08
How Earnings-Call Summarization Cost works
Pricing snapshot, cost formula, and caveats for the Earnings-Call Summarization Cost Calculator.
Cost formula
fresh_input_tok = transcript_tok · (1 − cache_hit_rate)
cached_input_tok = transcript_tok · cache_hit_rate
input_cost = fresh_input_tok · price_input / 1e6
+ cached_input_tok · price_cache / 1e6
output_cost = summary_tok · price_output / 1e6
cost_per_attempt = input_cost + output_cost
cost_per_stock_per_quarter = cost_per_attempt · attempts
cost_per_stock_per_year = cost_per_stock_per_quarter · 4
total_universe_per_year = cost_per_stock_per_year · tickers Pricing snapshot — as of 2026-04-25
Prices in USD per million tokens. Listed input/output (no batch discount). Cache read price is the discounted rate vendors charge after a cache write has been paid at full input rate.
- Anthropic Claude Sonnet 4.5: input $3.00, output $15.00, cache read $0.30.
- Anthropic Claude Opus 4.7: input $15.00, output $75.00, cache read $1.50.
- Anthropic Claude Haiku 4.5: input $1.00, output $5.00, cache read $0.10.
- OpenAI GPT-4o: input $2.50, output $10.00, cache read $1.25.
- OpenAI GPT-4o mini: input $0.15, output $0.60, cache read $0.075.
- Google Gemini 2.5 Pro: input $1.25, output $10.00, cache read $0.31.
- Google Gemini 2.5 Flash: input $0.30, output $2.50, cache read $0.075.
Assumptions
- Cache hit applies only to the system-prompt and few-shot prefix; the transcript itself is treated as cold (uncached) input.
- Tokens are model-tokenizer agnostic — we report a single transcript-token figure. In practice tokenizers differ by 5–15% across vendors.
- Output tokens are typical summary length; longer multi-section outputs scale linearly.
Limitations
- Pricing changes. The snapshot date is the only thing on this page that's authoritative — verify with vendor pricing pages before committing.
- Batch APIs (Anthropic, OpenAI) discount further, typically 50%. Add a second tool run with halved costs if you can defer to overnight.
- Reasoning/thinking-tokens (o-series, extended thinking) are not modelled — they are an additional output billing line.