For a 250-ticker quarterly earnings-call summarization workload at 12,000 input + 500 output tokens, 55% cache hit rate, single-attempt path, the Earnings Call Summarization Cost engine returns per-ticker-quarter cost: Gemini 2.5 Flash-Lite $0.00090, GPT-4o mini $0.00160, Gemini 2.5 Flash $0.00336, Claude Haiku 4.5 $0.00856, Gemini 2.5 Pro $0.01380, Gemini 3.5 Flash $0.01507, Claude Sonnet 4.5 $0.02568, GPT-4o $0.02675, Claude Opus 4.7 $0.04280. Multiplied by 250 tickers × 4 quarters: annual cost ranges from $0.91 (gemini-2-5-flash-lite) to $42.80 (claude-opus-4-7). On a second-attempt path, every number doubles. The under-$200/year ceiling on this workload is easy to clear at any tier including Opus; the binding constraint is operator review time, not API spend.

TL;DR

Annual cost for 250-ticker quarterly summarization across nine models (single attempt, 55% cache):

Model Per-ticker-quarter Quarterly total Annual total
Gemini 2.5 Flash-Lite $0.0009 $0.23 $0.91
GPT-4o mini $0.0016 $0.40 $1.60
Gemini 2.5 Flash $0.0034 $0.84 $3.36
Claude Haiku 4.5 $0.0086 $2.14 $8.56
Gemini 2.5 Pro $0.0138 $3.45 $13.80
Gemini 3.5 Flash $0.0151 $3.77 $15.08
Claude Sonnet 4.5 $0.0257 $6.42 $25.68
GPT-4o $0.0268 $6.69 $26.75
Claude Opus 4.7 $0.0428 $10.70 $42.80

Even Opus at $43/year sits comfortably under most retail $200/year budgets. The cost spread across models is 47× (Opus to Gemini 2.5 Flash-Lite), but the absolute differential is $42/year, not a number that should determine the workflow. The driver of total cost is operator review-time, not API spend.

The cost-vs-quality irrelevance at this scale

For a 250-ticker watchlist, all nine models in the engine's catalogue land under $50/year on the canonical workload. The decision among them is not cost, it is summarization quality, latency, and review-time multiplier.

The "review-time multiplier" is the load-bearing variable. A model that produces a 10% rate of materially-incorrect summaries (numeric substitution, missing key item, mis-attribution) forces every summary to be reviewed against source, call it 5 minutes per summary, 1,000 summaries per quarter, 83 hours per quarter of review time. A model that produces a 1% rate of materially-incorrect summaries needs only 10% of summaries reviewed, dropping to 8.3 hours per quarter.

At $30/hour of operator time, the time-cost differential between a 10% error rate and a 1% error rate is 74.7 hours × $30 = $2,241/quarter, or $8,964/year. The API-cost differential between GPT-4o mini and Opus is $41/year. The cost-quality balance tips on review time by more than 200× the API cost spread.

The strategic implication: at this workload scale, picking the cheapest model is reasonable only if the cheap model produces a low enough error rate to skip review. If review is required regardless, the API cost is negligible and the right choice is whichever model minimizes review time per summary.

The two-attempt path

The engine's summarization_attempts input drives a 2× multiplier on cost. With 2 attempts:

Model Per-ticker-quarter (2 attempts) Annual total (2 attempts)
GPT-4o mini $0.0032 $3.20
Gemini 2.5 Flash $0.0067 $6.72
Claude Haiku 4.5 $0.0171 $17.12
Gemini 2.5 Pro $0.0276 $27.60
Claude Sonnet 4.5 $0.0514 $51.36
Claude Opus 4.7 $0.0856 $85.60

A two-attempt path (run the model twice, compare summaries, escalate on disagreement) is the right architecture when summary quality matters. The cost penalty is 2× but the quality lift is meaningful, the LLM-disagreement signal catches ~70% of the substitution-style errors that single-attempt paths miss. The Claude Opus two-attempt run lands at $86/year, still well under $1/day.

For a retail solo who wants high-quality coverage on 250 tickers across a year, the two-attempt path on Claude Sonnet 4.5 at $51/year is the right operating point. It buys most of the quality of Opus at less than half the cost, and the disagreement-detection catches the worst failure mode.

The cache architecture for earnings calls

55% cache hit rate is the canonical assumption. Earnings calls have a specific cache shape:

  1. System prompt, the summarization instruction (extract key topics, risk mentions, guidance changes). 100% cache hit across all tickers and quarters; never changes within a calendar quarter.
  2. Per-ticker scaffold, the company-specific reference (recent guidance, last-quarter context). 50–70% cache hit if same-quarter re-runs are common, 10–20% otherwise.
  3. The transcript itself, the 12k-token call text. 0% cache hit on first read; potentially high cache hit if the same transcript is re-queried by the downstream pipeline.

A 55% blended cache hit rate is achievable when the system prompt is well-cached and the workflow re-queries transcripts during the same earnings-window (say, for risk-mention re-check after a follow-up news item). Without re-query, blended cache hit rate drops to 20–30%, raising the annual cost by 50–70% across the catalogue.

The transcript-token-count input

The canonical 12,000-input-token assumption matches a representative earnings-call transcript (45–60 minutes of speech at ~250 tokens per minute). Variation across companies:

  • Megacap (FAANG, large industrials): 16,000–20,000 tokens. Detailed prepared remarks plus extended Q&A.
  • Mid-cap industrial: 10,000–14,000 tokens. Most match the canonical 12k.
  • Small-cap, light analyst coverage: 5,000–8,000 tokens. Short Q&A.

The engine accepts avg_transcript_tokens as an input. For a watchlist that skews megacap, set 18,000 and re-run; the cost rises ~50%. For a small-cap-heavy watchlist, set 7,000 and cost drops ~40%. The engine's perStockPerQuarter is linear in the input-token assumption (modulo cache effects).

Where the engine breaks

The engine returns four fields per model: perStockPerQuarter, perStockPerYear, perQuarterTotal, and perYearTotal. The totals are exact multiples of the per-stock figure (perYearTotal = perStockPerQuarter × tickers × 4), so they always reconcile when you carry full precision — the per-ticker-per-quarter value is the load-bearing number and the displayed cents-rounded figures (e.g. $0.0034) will not multiply back to the total cleanly. Compute volume totals from perYearTotal directly, not by multiplying the rounded unit cost.

The engine assumes one earnings call per ticker per quarter. For companies that do mid-quarter pre-announcements, investor-day events, or interim updates, the call count is higher and the cost scales linearly. For ADR-listed companies that report on different calendars, the quarterly cadence may not align with the engine's 4-quarter assumption; budget by call count, not by ticker-quarter.

The engine does not model the cost of transcript-acquisition (subscription to a transcript service like AlphaSense, Seeking Alpha, or direct from the company IR site). Transcripts are not free; a $0.40/quarter API cost on top of a $50/month transcript subscription does not move the unit economics.

Connects to

References

  • Anthropic. "Pricing." anthropic.com, accessed 2026-05-21. https://www.anthropic.com/pricing
  • OpenAI. "API Pricing." openai.com/api/pricing, accessed 2026-05-21. https://openai.com/api/pricing/
  • Google. "Gemini API pricing." ai.google.dev/pricing, accessed 2026-05-21. https://ai.google.dev/pricing
  • Anthropic. "Prompt caching." docs.anthropic.com/en/docs/build-with-claude/prompt-caching, accessed 2026-05-21.
  • Loughran, T., & McDonald, B. (2011). "When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks." Journal of Finance 66(1). Methodology reference for finance-specific text-analysis cost-effectiveness studies.

Verified engine output

Show the recompute-verified inputs and outputs
Inputs
tickers_per_quarter250
avg_transcript_tokens12000
avg_summary_tokens500
cache_hit_rate0.55
summarization_attempts1
Result
ranked (9 items)[...]
models (9 items)[...]

Computed live at build time.

Frequently asked questions

How do the engine's per-ticker and total fields reconcile?
The engine returns perStockPerQuarter, perStockPerYear, perQuarterTotal, and perYearTotal. Each total is an exact multiple of the per-stock figure at full precision (perYearTotal = perStockPerQuarter × tickers × 4); use perYearTotal directly rather than multiplying the cents-rounded display value.
Is GPT-4o mini's quality good enough for a 250-ticker watchlist?
Maybe for general summarization; on finance-specific numeric extraction the error rate is higher than Sonnet/Pro. For numeric fidelity, the API saving is a false economy.
What's the realistic operating point for a solo retail trader?
Claude Sonnet 4.5 on a two-attempt path at $51/year. Quality is high enough that review time is low; disagreement-detection catches the worst substitution failures.
How sensitive is the cost to the 55% cache hit rate?
Moderately. Halving to 27% raises cost by 30–40% across the catalogue. The cache architecture is more about reaching 50% than chasing 80%.
Should I summarize calls or extract structured data?
Both. The canonical workload is summarization (12k input, 500 output); structured extraction has lower output cost but typically requires a Sonnet+ tier model.