What is the cheapest LLM for SEC 10-K and earnings extraction in 2026?

Gemini 2.5 Flash-Lite at $0.0096 per document (80k input, 4k output), or $15.12/month at 50 documents a day, computed from the Token Cost Optimizer. Its 1M context holds a full 10-K in one pass. It is the cost floor but not automatically the most accurate on adversarial fields. Verified 2026-06-07.

How much more does GPT-5.5 or Claude Opus 4.7 cost than a budget model for filing extraction?

On the same per-document workload, GPT-5.5 costs $0.52 and Claude Opus 4.7 costs $0.32 (with a 50% input cache hit), versus $0.0096 for Gemini 2.5 Flash-Lite — roughly 54x and 33x. That premium is only worth paying if your accuracy eval shows the frontier model is materially better on the fields you extract. Verified 2026-06-07.

Should I use one LLM or a pipeline for SEC filing extraction?

A pipeline. Run all documents through a budget model like Gemini 2.5 Flash-Lite for the structural pass, validate, and escalate only the roughly 10% that fail to a frontier model. At 50 documents a day that blends to near $40/month versus $245.70 to run everything on Gemini 3.5 Flash. The best 'model' for extraction is a two-stage route.

Do these numbers rank model accuracy?

No. Every figure is a cost number computed from verified vendor list prices and recomputed by CI against the shipped cost engine. No model was tested for extraction accuracy. Run a labeled eval on your own 10-Ks and earnings transcripts before relying on the cheapest tier.

Why does Anthropic's per-document cost assume a cache hit?

Re-extracting many sections of the same filing reuses the same input context, so Anthropic's prompt cache applies. The $0.32 per-document figure for Claude Opus 4.7 reflects a 50% input cache hit at the $0.50-per-Mtok cache-read rate; without caching the cost would be higher. Verified 2026-06-07.

Best LLM for SEC 10-K and Earnings Extraction 2026

The short answer

The best LLM for SEC 10-K and earnings extraction in 2026 is a budget tier matched to extraction difficulty, not one model. Per document (80k in, 4k out): Gemini 2.5 Flash-Lite costs $0.0096, Gemini 3.5 Flash $0.156, Claude Opus 4.7 $0.32, and GPT-5.5 $0.52 — a 54x spread, computed from the Token Cost Optimizer. Route the easy 90% to the cheap tier and escalate hard cases. Verified 2026-06-07.

The best LLM for SEC 10-K and earnings extraction in 2026 is not one model — it is a budget tier matched to how hard the extraction is. On a per-document basis for a research desk (80k input, 4k output, 50 documents a day, verified vendor list prices), the cost spread is roughly 54x: Gemini 2.5 Flash-Lite at $0.0096 per document versus GPT-5.5 at $0.52, with Claude Opus 4.7 at $0.32 and Gemini 3.5 Flash at $0.156 in between. Every figure below is computed live from the Token Cost Optimizer on the shipped bundle, not typed by hand. The model choice is a cost-vs-accuracy bet, and this page sizes the cost side honestly so you can weigh it against an accuracy eval you run on your own filings.

TL;DR — cost per document and per month

Model	Cost / document	Cost / month (50/day)	Context	Tier
Gemini 2.5 Flash-Lite	$0.0096	$15.12	1M	budget
Gemini 2.5 Flash	$0.0340	$53.55	1M	economy
Claude Haiku 4.5	$0.0640	$100.80	200K	budget (cached)
Gemini 2.5 Pro	$0.1400	$220.50	2M	frontier
Gemini 3.5 Flash	$0.1560	$245.70	1M	frontier
Claude Opus 4.7	$0.3200	$504.00	1M	frontier
GPT-5.5	$0.5200	$819.00	400K	frontier

Same workload for every row: 80,000 input + 4,000 output tokens per document, one call per document, 5% retry, a 0.90 validation rate, 50 documents a day. Anthropic input reflects a 50% cache hit (a realistic figure when re-extracting many sections of the same filing); Google and OpenAI are priced at full list input. Verified vendor list prices, recomputed by CI against the engine bundle.

Why "best" is a routing decision, not a single model

A 10-K and an earnings transcript are not one extraction problem. Pulling a revenue line, a filing date, or a standard balance-sheet item from a regularly-formatted table is structural, and a budget model handles it. Reconciling a restatement, resolving an ambiguous segment disclosure, or reading the intent of a footnote is a reasoning problem where a frontier model may earn its 30-to-50x cost premium in fewer downstream errors. The right answer is to route the easy 90% to the cheap tier and escalate the hard cases, which is why "best LLM" resolves to a budget-tier default plus a frontier verifier, not a single pick.

The cheap end: Gemini 2.5 Flash-Lite and Flash

At $0.0096 per document ($15.12/month at 50/day), Gemini 2.5 Flash-Lite is the cost floor, and its 1M-token context holds a full 10-K body in one pass with no chunking penalty. Gemini 2.5 Flash steps up to $0.0340 per document ($53.55/month) for a stronger model at the same 1M context. For structural extraction over regularly-formatted filings, the budget tier is usually enough, and the premium over it buys little on the easy cases.

The word doing the work is "structural." Flash-Lite is the cheapest model that can hold a full filing in context; it is not automatically the most accurate on adversarial fields. That is an accuracy question this cost analysis does not answer — you answer it by running an eval on your own filings.

The frontier end: when 30-50x might pay for itself

At the top, the per-document cost rises sharply:

Gemini 2.5 Pro — $0.1400/document ($220.50/month), with the largest context here (2M) for documents that exceed a 1M window.
Gemini 3.5 Flash — $0.1560/document ($245.70/month), the cheapest frontier-reasoning pick at Flash latency.
Claude Opus 4.7 — $0.3200/document ($504.00/month), even with a 50% input cache hit applied. Its cache-read rate makes repeated passes over the same filing cheaper than the sticker implies.
GPT-5.5 — $0.5200/document ($819.00/month), the most expensive single-model path here.

A desk standardized on GPT-5.5 pays roughly 54x what Flash-Lite costs for the same documents. That premium is only defensible if your eval shows the frontier model is materially more accurate on the fields you actually extract. On structural fields it usually is not; on footnote-reasoning and restatement fields it can be.

The two-stage path is the real "best"

The cost-optimal architecture is rarely one model. Run all 50 documents a day through Gemini 2.5 Flash-Lite ($15.12/month) for the structural pass, validate the output, and escalate only the fraction that fails validation to a frontier model. If 10% of documents need a Gemini 3.5 Flash re-pass, that adds about $24.57/month (10% of $245.70), for a blended bill near $40/month versus $245.70 to run every document on the frontier model. You capture most of the frontier accuracy on the hard cases while paying the budget rate on the easy 90%. This is why the "best LLM" for extraction is a pipeline, not a checkbox.

On accuracy: what this page does and does not claim

These are cost numbers, computed from verified list prices and recomputed by CI against the engine. No model here was tested for extraction accuracy, and this page makes no accuracy ranking. The honest workflow is: price the tiers (done here), then run a labeled eval on a sample of your own 10-Ks and earnings transcripts, measure field-level accuracy per model, and pick the cheapest model that clears your error bar, escalating only the fields it misses. A budget model that misreads a parenthetical "(loss)" as a gain is expensive in errors, not cheap in dollars.

Decision guidance

Default to the budget tier for structural fields. Gemini 2.5 Flash-Lite at $0.0096/document is the floor; most line-item and date extraction does not need more.
Eval before you escalate. Measure field-level accuracy on your own filings; do not assume the frontier model is worth 54x without evidence.
Two-stage the hard fields. A budget extractor plus a frontier verifier on the contested subset beats running everything on either alone.
Mind the cache on Anthropic. Opus 4.7's cache-read rate makes repeated passes over one filing materially cheaper; the $0.32/document above already assumes a 50% cache hit.
Recompute with your real token shape. Filing size and output verbosity move the per-document cost more than the model choice does within a tier. Use the Token Cost Optimizer with your numbers.

Cheapest LLM for SEC 10-K Extraction at 10,000 Filings a Month 2026: the same cost engine at market-sweep scale.
Best LLM APIs for SEC Filing Extraction 2026: the API-surface and capability survey.
Cheapest LLM for SEC Filings 2026: the per-filing budget deep dive.
Best LLM for Financial Analysis 2026: the broader model-selection guide.

Connects to

Token Cost Optimizer: the per-document cost engine behind every figure here. Recompute at your own volume and token shape.
Model Selector for Finance: match a model to a finance task profile.

References

Verified engine output

Show the recompute-verified inputs and outputs

Per-document — Gemini 2.5 Flash-Lite (cost floor)

Inputs
input_tokens_per_call	80000
output_tokens_per_call	4000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	50
validation_rate	0.9
cache_hit_rate	0.5
model_id	gemini-2-5-flash-lite

Result
model › id	gemini-2-5-flash-lite
model › provider	google
model › name	Gemini 2.5 Flash-Lite
model › input usd per mtoken	0.1
model › output usd per mtoken	0.4
model › context window	1000000
model › notes	Cheapest tier in this table; 1M context.
effective cost per call	0.009600000000000001
cost per idea	0.010080000000000002
cost per validated trade	0.011200000000000002
cost per day	0.5040000000000001
cost per month	15.120000000000003
cost per year	183.96000000000004

Computed live at build time.

Per-document — Gemini 2.5 Flash

Inputs
input_tokens_per_call	80000
output_tokens_per_call	4000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	50
validation_rate	0.9
cache_hit_rate	0.5
model_id	gemini-2-5-flash

Result
model › id	gemini-2-5-flash
model › provider	google
model › name	Gemini 2.5 Flash
model › input usd per mtoken	0.3
model › output usd per mtoken	2.5
model › context window	1000000
model › notes	Fast mid-tier; 1M context.
effective cost per call	0.034
cost per idea	0.0357
cost per validated trade	0.03966666666666667
cost per day	1.7850000000000001
cost per month	53.550000000000004
cost per year	651.5250000000001

Computed live at build time.

Per-document — Claude Haiku 4.5 (50% cache hit)

Inputs
input_tokens_per_call	80000
output_tokens_per_call	4000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	50
validation_rate	0.9
cache_hit_rate	0.5
model_id	claude-haiku-4-5

Result
model › id	claude-haiku-4-5
model › provider	anthropic
model › name	Claude Haiku 4.5
model › input usd per mtoken	1
model › output usd per mtoken	5
model › cache write usd per mtoken	1.25
model › cache read usd per mtoken	0.1
model › context window	200000
model › notes	Fast, cheap — filtering + pre-processing layers.
effective cost per call	0.064
cost per idea	0.06720000000000001
cost per validated trade	0.07466666666666667
cost per day	3.3600000000000003
cost per month	100.80000000000001
cost per year	1226.4

Computed live at build time.

Per-document — Gemini 2.5 Pro (2M context)

Inputs
input_tokens_per_call	80000
output_tokens_per_call	4000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	50
validation_rate	0.9
cache_hit_rate	0.5
model_id	gemini-2-5-pro

Result
model › id	gemini-2-5-pro
model › provider	google
model › name	Gemini 2.5 Pro
model › input usd per mtoken	1.25
model › output usd per mtoken	10
model › context window	2000000
model › notes	Large context (2M). Strong on document analysis.
effective cost per call	0.14
cost per idea	0.14700000000000002
cost per validated trade	0.16333333333333336
cost per day	7.350000000000001
cost per month	220.50000000000006
cost per year	2682.7500000000005

Computed live at build time.

Per-document — Gemini 3.5 Flash (cheapest frontier)

Inputs
input_tokens_per_call	80000
output_tokens_per_call	4000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	50
validation_rate	0.9
cache_hit_rate	0.5
model_id	gemini-3-5-flash

Result
model › id	gemini-3-5-flash
model › provider	google
model › name	Gemini 3.5 Flash
model › input usd per mtoken	1.5
model › output usd per mtoken	9
model › context window	1000000
model › notes	Frontier agent-tier at Flash speed — not a budget model (output ~3.6x Gemini 2.5 Flash).
effective cost per call	0.156
cost per idea	0.1638
cost per validated trade	0.182
cost per day	8.19
cost per month	245.7
cost per year	2989.35

Computed live at build time.

Per-document — Claude Opus 4.7 (50% input cache hit)

Inputs
input_tokens_per_call	80000
output_tokens_per_call	4000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	50
validation_rate	0.9
cache_hit_rate	0.5
model_id	claude-opus-4-7

Result
model › id	claude-opus-4-7
model › provider	anthropic
model › name	Claude Opus 4.7
model › input usd per mtoken	5
model › output usd per mtoken	25
model › cache write usd per mtoken	6.25
model › cache read usd per mtoken	0.5
model › context window	1000000
model › notes	Flagship reasoning model — 1M context.
effective cost per call	0.32
cost per idea	0.336
cost per validated trade	0.37333333333333335
cost per day	16.8
cost per month	504
cost per year	6132

Computed live at build time.

Per-document — GPT-5.5 (premium)

Inputs
input_tokens_per_call	80000
output_tokens_per_call	4000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	50
validation_rate	0.9
cache_hit_rate	0.5
model_id	gpt-5

Result
model › id	gpt-5
model › provider	openai
model › name	GPT-5.5
model › input usd per mtoken	5
model › output usd per mtoken	30
model › context window	400000
model › notes	OpenAI frontier model (GPT-5.5).
effective cost per call	0.52
cost per idea	0.546
cost per validated trade	0.6066666666666667
cost per day	27.3
cost per month	819
cost per year	9964.5

Computed live at build time.

Frequently asked questions

What is the cheapest LLM for SEC 10-K and earnings extraction in 2026?: Gemini 2.5 Flash-Lite at $0.0096 per document (80k input, 4k output), or $15.12/month at 50 documents a day, computed from the Token Cost Optimizer. Its 1M context holds a full 10-K in one pass. It is the cost floor but not automatically the most accurate on adversarial fields. Verified 2026-06-07.
How much more does GPT-5.5 or Claude Opus 4.7 cost than a budget model for filing extraction?: On the same per-document workload, GPT-5.5 costs $0.52 and Claude Opus 4.7 costs $0.32 (with a 50% input cache hit), versus $0.0096 for Gemini 2.5 Flash-Lite — roughly 54x and 33x. That premium is only worth paying if your accuracy eval shows the frontier model is materially better on the fields you extract. Verified 2026-06-07.
Should I use one LLM or a pipeline for SEC filing extraction?: A pipeline. Run all documents through a budget model like Gemini 2.5 Flash-Lite for the structural pass, validate, and escalate only the roughly 10% that fail to a frontier model. At 50 documents a day that blends to near $40/month versus $245.70 to run everything on Gemini 3.5 Flash. The best 'model' for extraction is a two-stage route.
Do these numbers rank model accuracy?: No. Every figure is a cost number computed from verified vendor list prices and recomputed by CI against the shipped cost engine. No model was tested for extraction accuracy. Run a labeled eval on your own 10-Ks and earnings transcripts before relying on the cheapest tier.
Why does Anthropic's per-document cost assume a cache hit?: Re-extracting many sections of the same filing reuses the same input context, so Anthropic's prompt cache applies. The $0.32 per-document figure for Claude Opus 4.7 reflects a 50% input cache hit at the $0.50-per-Mtok cache-read rate; without caching the cost would be higher. Verified 2026-06-07.

TL;DR — cost per document and per month

Why "best" is a routing decision, not a single model

The cheap end: Gemini 2.5 Flash-Lite and Flash

The frontier end: when 30-50x might pay for itself

The two-stage path is the real "best"

On accuracy: what this page does and does not claim

Decision guidance

Related in this series

Connects to

References

Verified engine output

Frequently asked questions