The short answer

The best LLM for SEC 10-K and earnings extraction in 2026 is a budget tier matched to extraction difficulty, not one model. Per document (80k in, 4k out): Gemini 2.5 Flash-Lite costs $0.0096, Gemini 3.5 Flash $0.156, Claude Opus 4.7 $0.32, and GPT-5.5 $0.52 — a 54x spread, computed from the Token Cost Optimizer. Route the easy 90% to the cheap tier and escalate hard cases. Verified 2026-06-07.

The best LLM for SEC 10-K and earnings extraction in 2026 is not one model — it is a budget tier matched to how hard the extraction is. On a per-document basis for a research desk (80k input, 4k output, 50 documents a day, verified vendor list prices), the cost spread is roughly 54x: Gemini 2.5 Flash-Lite at $0.0096 per document versus GPT-5.5 at $0.52, with Claude Opus 4.7 at $0.32 and Gemini 3.5 Flash at $0.156 in between. Every figure below is computed live from the Token Cost Optimizer on the shipped bundle, not typed by hand. The model choice is a cost-vs-accuracy bet, and this page sizes the cost side honestly so you can weigh it against an accuracy eval you run on your own filings.

TL;DR — cost per document and per month

Model Cost / document Cost / month (50/day) Context Tier
Gemini 2.5 Flash-Lite $0.0096 $15.12 1M budget
Gemini 2.5 Flash $0.0340 $53.55 1M economy
Claude Haiku 4.5 $0.0640 $100.80 200K budget (cached)
Gemini 2.5 Pro $0.1400 $220.50 2M frontier
Gemini 3.5 Flash $0.1560 $245.70 1M frontier
Claude Opus 4.7 $0.3200 $504.00 1M frontier
GPT-5.5 $0.5200 $819.00 400K frontier

Same workload for every row: 80,000 input + 4,000 output tokens per document, one call per document, 5% retry, a 0.90 validation rate, 50 documents a day. Anthropic input reflects a 50% cache hit (a realistic figure when re-extracting many sections of the same filing); Google and OpenAI are priced at full list input. Verified vendor list prices, recomputed by CI against the engine bundle.

Why "best" is a routing decision, not a single model

A 10-K and an earnings transcript are not one extraction problem. Pulling a revenue line, a filing date, or a standard balance-sheet item from a regularly-formatted table is structural, and a budget model handles it. Reconciling a restatement, resolving an ambiguous segment disclosure, or reading the intent of a footnote is a reasoning problem where a frontier model may earn its 30-to-50x cost premium in fewer downstream errors. The right answer is to route the easy 90% to the cheap tier and escalate the hard cases, which is why "best LLM" resolves to a budget-tier default plus a frontier verifier, not a single pick.

The cheap end: Gemini 2.5 Flash-Lite and Flash

At $0.0096 per document ($15.12/month at 50/day), Gemini 2.5 Flash-Lite is the cost floor, and its 1M-token context holds a full 10-K body in one pass with no chunking penalty. Gemini 2.5 Flash steps up to $0.0340 per document ($53.55/month) for a stronger model at the same 1M context. For structural extraction over regularly-formatted filings, the budget tier is usually enough, and the premium over it buys little on the easy cases.

The word doing the work is "structural." Flash-Lite is the cheapest model that can hold a full filing in context; it is not automatically the most accurate on adversarial fields. That is an accuracy question this cost analysis does not answer — you answer it by running an eval on your own filings.

The frontier end: when 30-50x might pay for itself

At the top, the per-document cost rises sharply:

  • Gemini 2.5 Pro — $0.1400/document ($220.50/month), with the largest context here (2M) for documents that exceed a 1M window.
  • Gemini 3.5 Flash — $0.1560/document ($245.70/month), the cheapest frontier-reasoning pick at Flash latency.
  • Claude Opus 4.7 — $0.3200/document ($504.00/month), even with a 50% input cache hit applied. Its cache-read rate makes repeated passes over the same filing cheaper than the sticker implies.
  • GPT-5.5 — $0.5200/document ($819.00/month), the most expensive single-model path here.

A desk standardized on GPT-5.5 pays roughly 54x what Flash-Lite costs for the same documents. That premium is only defensible if your eval shows the frontier model is materially more accurate on the fields you actually extract. On structural fields it usually is not; on footnote-reasoning and restatement fields it can be.

The two-stage path is the real "best"

The cost-optimal architecture is rarely one model. Run all 50 documents a day through Gemini 2.5 Flash-Lite ($15.12/month) for the structural pass, validate the output, and escalate only the fraction that fails validation to a frontier model. If 10% of documents need a Gemini 3.5 Flash re-pass, that adds about $24.57/month (10% of $245.70), for a blended bill near $40/month versus $245.70 to run every document on the frontier model. You capture most of the frontier accuracy on the hard cases while paying the budget rate on the easy 90%. This is why the "best LLM" for extraction is a pipeline, not a checkbox.

On accuracy: what this page does and does not claim

These are cost numbers, computed from verified list prices and recomputed by CI against the engine. No model here was tested for extraction accuracy, and this page makes no accuracy ranking. The honest workflow is: price the tiers (done here), then run a labeled eval on a sample of your own 10-Ks and earnings transcripts, measure field-level accuracy per model, and pick the cheapest model that clears your error bar, escalating only the fields it misses. A budget model that misreads a parenthetical "(loss)" as a gain is expensive in errors, not cheap in dollars.

Decision guidance

  1. Default to the budget tier for structural fields. Gemini 2.5 Flash-Lite at $0.0096/document is the floor; most line-item and date extraction does not need more.
  2. Eval before you escalate. Measure field-level accuracy on your own filings; do not assume the frontier model is worth 54x without evidence.
  3. Two-stage the hard fields. A budget extractor plus a frontier verifier on the contested subset beats running everything on either alone.
  4. Mind the cache on Anthropic. Opus 4.7's cache-read rate makes repeated passes over one filing materially cheaper; the $0.32/document above already assumes a 50% cache hit.
  5. Recompute with your real token shape. Filing size and output verbosity move the per-document cost more than the model choice does within a tier. Use the Token Cost Optimizer with your numbers.

Connects to

References

Verified engine output

Show the recompute-verified inputs and outputs
Per-document — Gemini 2.5 Flash-Lite (cost floor)
Inputs
input_tokens_per_call80000
output_tokens_per_call4000
calls_per_idea1
retry_rate0.05
ideas_per_day50
validation_rate0.9
cache_hit_rate0.5
model_idgemini-2-5-flash-lite
Result
model › idgemini-2-5-flash-lite
model › providergoogle
model › nameGemini 2.5 Flash-Lite
model › input usd per mtoken0.1
model › output usd per mtoken0.4
model › context window1000000
model › notesCheapest tier in this table; 1M context.
effective cost per call0.009600000000000001
cost per idea0.010080000000000002
cost per validated trade0.011200000000000002
cost per day0.5040000000000001
cost per month15.120000000000003
cost per year183.96000000000004

Computed live at build time.

Per-document — Gemini 2.5 Flash
Inputs
input_tokens_per_call80000
output_tokens_per_call4000
calls_per_idea1
retry_rate0.05
ideas_per_day50
validation_rate0.9
cache_hit_rate0.5
model_idgemini-2-5-flash
Result
model › idgemini-2-5-flash
model › providergoogle
model › nameGemini 2.5 Flash
model › input usd per mtoken0.3
model › output usd per mtoken2.5
model › context window1000000
model › notesFast mid-tier; 1M context.
effective cost per call0.034
cost per idea0.0357
cost per validated trade0.03966666666666667
cost per day1.7850000000000001
cost per month53.550000000000004
cost per year651.5250000000001

Computed live at build time.

Per-document — Claude Haiku 4.5 (50% cache hit)
Inputs
input_tokens_per_call80000
output_tokens_per_call4000
calls_per_idea1
retry_rate0.05
ideas_per_day50
validation_rate0.9
cache_hit_rate0.5
model_idclaude-haiku-4-5
Result
model › idclaude-haiku-4-5
model › provideranthropic
model › nameClaude Haiku 4.5
model › input usd per mtoken1
model › output usd per mtoken5
model › cache write usd per mtoken1.25
model › cache read usd per mtoken0.1
model › context window200000
model › notesFast, cheap — filtering + pre-processing layers.
effective cost per call0.064
cost per idea0.06720000000000001
cost per validated trade0.07466666666666667
cost per day3.3600000000000003
cost per month100.80000000000001
cost per year1226.4

Computed live at build time.

Per-document — Gemini 2.5 Pro (2M context)
Inputs
input_tokens_per_call80000
output_tokens_per_call4000
calls_per_idea1
retry_rate0.05
ideas_per_day50
validation_rate0.9
cache_hit_rate0.5
model_idgemini-2-5-pro
Result
model › idgemini-2-5-pro
model › providergoogle
model › nameGemini 2.5 Pro
model › input usd per mtoken1.25
model › output usd per mtoken10
model › context window2000000
model › notesLarge context (2M). Strong on document analysis.
effective cost per call0.14
cost per idea0.14700000000000002
cost per validated trade0.16333333333333336
cost per day7.350000000000001
cost per month220.50000000000006
cost per year2682.7500000000005

Computed live at build time.

Per-document — Gemini 3.5 Flash (cheapest frontier)
Inputs
input_tokens_per_call80000
output_tokens_per_call4000
calls_per_idea1
retry_rate0.05
ideas_per_day50
validation_rate0.9
cache_hit_rate0.5
model_idgemini-3-5-flash
Result
model › idgemini-3-5-flash
model › providergoogle
model › nameGemini 3.5 Flash
model › input usd per mtoken1.5
model › output usd per mtoken9
model › context window1000000
model › notesFrontier agent-tier at Flash speed — not a budget model (output ~3.6x Gemini 2.5 Flash).
effective cost per call0.156
cost per idea0.1638
cost per validated trade0.182
cost per day8.19
cost per month245.7
cost per year2989.35

Computed live at build time.

Per-document — Claude Opus 4.7 (50% input cache hit)
Inputs
input_tokens_per_call80000
output_tokens_per_call4000
calls_per_idea1
retry_rate0.05
ideas_per_day50
validation_rate0.9
cache_hit_rate0.5
model_idclaude-opus-4-7
Result
model › idclaude-opus-4-7
model › provideranthropic
model › nameClaude Opus 4.7
model › input usd per mtoken5
model › output usd per mtoken25
model › cache write usd per mtoken6.25
model › cache read usd per mtoken0.5
model › context window1000000
model › notesFlagship reasoning model — 1M context.
effective cost per call0.32
cost per idea0.336
cost per validated trade0.37333333333333335
cost per day16.8
cost per month504
cost per year6132

Computed live at build time.

Per-document — GPT-5.5 (premium)
Inputs
input_tokens_per_call80000
output_tokens_per_call4000
calls_per_idea1
retry_rate0.05
ideas_per_day50
validation_rate0.9
cache_hit_rate0.5
model_idgpt-5
Result
model › idgpt-5
model › provideropenai
model › nameGPT-5.5
model › input usd per mtoken5
model › output usd per mtoken30
model › context window400000
model › notesOpenAI frontier model (GPT-5.5).
effective cost per call0.52
cost per idea0.546
cost per validated trade0.6066666666666667
cost per day27.3
cost per month819
cost per year9964.5

Computed live at build time.

Frequently asked questions

What is the cheapest LLM for SEC 10-K and earnings extraction in 2026?
Gemini 2.5 Flash-Lite at $0.0096 per document (80k input, 4k output), or $15.12/month at 50 documents a day, computed from the Token Cost Optimizer. Its 1M context holds a full 10-K in one pass. It is the cost floor but not automatically the most accurate on adversarial fields. Verified 2026-06-07.
How much more does GPT-5.5 or Claude Opus 4.7 cost than a budget model for filing extraction?
On the same per-document workload, GPT-5.5 costs $0.52 and Claude Opus 4.7 costs $0.32 (with a 50% input cache hit), versus $0.0096 for Gemini 2.5 Flash-Lite — roughly 54x and 33x. That premium is only worth paying if your accuracy eval shows the frontier model is materially better on the fields you extract. Verified 2026-06-07.
Should I use one LLM or a pipeline for SEC filing extraction?
A pipeline. Run all documents through a budget model like Gemini 2.5 Flash-Lite for the structural pass, validate, and escalate only the roughly 10% that fail to a frontier model. At 50 documents a day that blends to near $40/month versus $245.70 to run everything on Gemini 3.5 Flash. The best 'model' for extraction is a two-stage route.
Do these numbers rank model accuracy?
No. Every figure is a cost number computed from verified vendor list prices and recomputed by CI against the shipped cost engine. No model was tested for extraction accuracy. Run a labeled eval on your own 10-Ks and earnings transcripts before relying on the cheapest tier.
Why does Anthropic's per-document cost assume a cache hit?
Re-extracting many sections of the same filing reuses the same input context, so Anthropic's prompt cache applies. The $0.32 per-document figure for Claude Opus 4.7 reflects a 50% input cache hit at the $0.50-per-Mtok cache-read rate; without caching the cost would be higher. Verified 2026-06-07.