Calculator
Earnings-Call Summarization Cost Calculator
Compute LLM cost per stock per quarter to summarize earnings transcripts across Sonnet, Opus, GPT-4o, Gemini 2.5 Pro/Flash. Cache-hit-rate aware.
- Inputs
- Form inputs / CSV
- Runtime
- Instant
- Privacy
- Client-side · no upload
- API key
- Not required
- Methodology
- Open →
Workload
Primary model (hero)
Pricing snapshot as of 2026-04-25. List prices, no batch discount.
Cost per stock per quarter
$0.04
Claude Sonnet 4.5 · $0.15/stock/year · $7.35 for full 50-ticker universe per year.
Ranked by annual cost (full universe)
| Model | $/stock/qtr | $/stock/yr | Universe/yr |
|---|---|---|---|
| OpenAI · GPT-4o mini | $0.002 | $0.01 | $0.43 |
| Google · Gemini 2.5 Flash | $0.005 | $0.02 | $0.96 |
| Anthropic · Claude Haiku 4.5 | $0.012 | $0.05 | $2.45 |
| Google · Gemini 2.5 Pro | $0.020 | $0.08 | $3.94 |
| OpenAI · GPT-4o | $0.036 | $0.14 | $7.22 |
| Anthropic · Claude Sonnet 4.5 | $0.037 | $0.15 | $7.35 |
| Anthropic · Claude Opus 4.7 | $0.184 | $0.73 | $36.75 |
Caveats
Estimates assume average tokens; actual transcripts vary 5×. Cache reads only apply when prompts share a stable prefix (system prompt + few-shot). Earnings transcripts themselves are cold input. See methodology.
How to use
Step-by-step
- 1
Enter average call duration in minutes (typical: 60), or upload sample transcripts to measure actual token counts.
- 2
Pick model: Sonnet for budget efficiency, Opus for highest extraction accuracy, GPT-4 for tool-use chains.
- 3
Set summary output length (typical: 500-1500 tokens for a structured summary).
- 4
Read per-call cost and per-quarter cost (multiplied by your call volume). Compare across models.
- 5
Toggle prompt caching if you have a stable extraction schema. 90% discount on cached tokens often shifts the cost ranking between models.
Glossary references
Terms used by this tool
Questions people ask next
FAQ
What's the per-call cost range?
Depends on transcript length and model. A 60-minute earnings call is roughly 9,000-12,000 tokens of transcript. At Claude Sonnet pricing (input $3/M, output $15/M), summarization with 1,000-token output costs ~$0.04. At Opus, ~$0.20. At GPT-4, ~$0.30. For 500 companies/quarter, full-Opus runs ~$100, Sonnet ~$20.
Why does long-context model selection matter?
Some calls run long (Q&A pushes 80+ minutes) — 15,000+ token transcripts. Models with 200K context (Claude, GPT-4) handle these in one pass. Models with 32K or 16K context need chunking, which increases cost (each chunk has its own prompt overhead) and degrades summary quality. The tool surfaces this tradeoff.
How does prompt caching help here?
If your summarization prompt is the same across calls (system prompt + extraction schema), caching the prompt saves 90% on the input cost of those tokens. For 500 calls/quarter with a 2,000-token system prompt, that's ~$30 saved on Sonnet, ~$200 on Opus. The tool models this directly.
Can I summarize calls in real-time?
Latency-wise, yes — 12K input → 1K output runs in 5-15 seconds across the major providers. The bottleneck isn't compute, it's transcript availability. Most calls have transcripts available within 1-3 hours. For real-time research, audio-to-text + immediate summarization is possible but adds Whisper-style transcription cost (~$0.06/call).
What if I need to extract structured data, not summary?
Structured extraction (e.g., guidance changes, KPI mentions) costs more than summarization because it requires longer outputs and stricter prompting. The tool has a 'structured extraction' mode that adjusts the output token estimate accordingly. For high-volume extraction at scale, fine-tuning a smaller model often beats the big-model API.
Related deep dive
All articles →Read further
Long-form context behind the tool output.
- Tutorial · Runnable·12 min
Prompt Patterns for Earnings Calls
Five copy-paste patterns — speaker attribution, hedged-guidance confidence, multi-quarter delta, risk aggregator, forward-outlook separator.
Read - Pillar · Guide·10 min
MCP Server Latency: The Hidden Cost of Tool-Call
Each MCP tool call adds latency. For multi-step agents, the total roundtrip cost dominates. Architecture patterns to amortize it.
Read - Methodology · Opinion·8 min
The Token-Cost Reality of LLM Trading Research
What LLM trading research costs per idea and per validated trade across Claude, GPT-5, and Gemini 2.5. Pricing, caching, model-mix under $200/month.
Read
Complementary tools
Users of this tool often explore
Financial Document Token Estimator
Paste a 10-K, 10-Q, 8-K or earnings transcript and see token count + one-pass extraction cost across eight frontier LLMs, with cache-hit toggle.
Token-Cost Optimizer
Compute the dollar cost of a trading research loop across Claude, GPT, and Gemini. Prompt length × model × retry × call volume → cost per idea and per.
Model Selector for Finance
Input task, latency budget, cost budget, context size, and quality sensitivity; get ranked model recommendations with rationale — grounded in published.