The short answer
The best LLM for SEC 10-K and earnings extraction in 2026 is a budget tier matched to extraction difficulty, not one model. Per document (80k in, 4k out): Gemini 2.5 Flash-Lite costs $0.0096, Gemini 3.5 Flash $0.156, Claude Opus 4.7 $0.32, and GPT-5.5 $0.52 — a 54x spread, computed from the Token Cost Optimizer. Route the easy 90% to the cheap tier and escalate hard cases. Verified 2026-06-07.
The best LLM for SEC 10-K and earnings extraction in 2026 is not one model — it is a budget tier matched to how hard the extraction is. On a per-document basis for a research desk (80k input, 4k output, 50 documents a day, verified vendor list prices), the cost spread is roughly 54x: Gemini 2.5 Flash-Lite at $0.0096 per document versus GPT-5.5 at $0.52, with Claude Opus 4.7 at $0.32 and Gemini 3.5 Flash at $0.156 in between. Every figure below is computed live from the Token Cost Optimizer on the shipped bundle, not typed by hand. The model choice is a cost-vs-accuracy bet, and this page sizes the cost side honestly so you can weigh it against an accuracy eval you run on your own filings.
TL;DR — cost per document and per month
| Model | Cost / document | Cost / month (50/day) | Context | Tier |
|---|---|---|---|---|
| Gemini 2.5 Flash-Lite | $0.0096 | $15.12 | 1M | budget |
| Gemini 2.5 Flash | $0.0340 | $53.55 | 1M | economy |
| Claude Haiku 4.5 | $0.0640 | $100.80 | 200K | budget (cached) |
| Gemini 2.5 Pro | $0.1400 | $220.50 | 2M | frontier |
| Gemini 3.5 Flash | $0.1560 | $245.70 | 1M | frontier |
| Claude Opus 4.7 | $0.3200 | $504.00 | 1M | frontier |
| GPT-5.5 | $0.5200 | $819.00 | 400K | frontier |
Same workload for every row: 80,000 input + 4,000 output tokens per document, one call per document, 5% retry, a 0.90 validation rate, 50 documents a day. Anthropic input reflects a 50% cache hit (a realistic figure when re-extracting many sections of the same filing); Google and OpenAI are priced at full list input. Verified vendor list prices, recomputed by CI against the engine bundle.
Why "best" is a routing decision, not a single model
A 10-K and an earnings transcript are not one extraction problem. Pulling a revenue line, a filing date, or a standard balance-sheet item from a regularly-formatted table is structural, and a budget model handles it. Reconciling a restatement, resolving an ambiguous segment disclosure, or reading the intent of a footnote is a reasoning problem where a frontier model may earn its 30-to-50x cost premium in fewer downstream errors. The right answer is to route the easy 90% to the cheap tier and escalate the hard cases, which is why "best LLM" resolves to a budget-tier default plus a frontier verifier, not a single pick.
The cheap end: Gemini 2.5 Flash-Lite and Flash
At $0.0096 per document ($15.12/month at 50/day), Gemini 2.5 Flash-Lite is the cost floor, and its 1M-token context holds a full 10-K body in one pass with no chunking penalty. Gemini 2.5 Flash steps up to $0.0340 per document ($53.55/month) for a stronger model at the same 1M context. For structural extraction over regularly-formatted filings, the budget tier is usually enough, and the premium over it buys little on the easy cases.
The word doing the work is "structural." Flash-Lite is the cheapest model that can hold a full filing in context; it is not automatically the most accurate on adversarial fields. That is an accuracy question this cost analysis does not answer — you answer it by running an eval on your own filings.
The frontier end: when 30-50x might pay for itself
At the top, the per-document cost rises sharply:
- Gemini 2.5 Pro — $0.1400/document ($220.50/month), with the largest context here (2M) for documents that exceed a 1M window.
- Gemini 3.5 Flash — $0.1560/document ($245.70/month), the cheapest frontier-reasoning pick at Flash latency.
- Claude Opus 4.7 — $0.3200/document ($504.00/month), even with a 50% input cache hit applied. Its cache-read rate makes repeated passes over the same filing cheaper than the sticker implies.
- GPT-5.5 — $0.5200/document ($819.00/month), the most expensive single-model path here.
A desk standardized on GPT-5.5 pays roughly 54x what Flash-Lite costs for the same documents. That premium is only defensible if your eval shows the frontier model is materially more accurate on the fields you actually extract. On structural fields it usually is not; on footnote-reasoning and restatement fields it can be.
The two-stage path is the real "best"
The cost-optimal architecture is rarely one model. Run all 50 documents a day through Gemini 2.5 Flash-Lite ($15.12/month) for the structural pass, validate the output, and escalate only the fraction that fails validation to a frontier model. If 10% of documents need a Gemini 3.5 Flash re-pass, that adds about $24.57/month (10% of $245.70), for a blended bill near $40/month versus $245.70 to run every document on the frontier model. You capture most of the frontier accuracy on the hard cases while paying the budget rate on the easy 90%. This is why the "best LLM" for extraction is a pipeline, not a checkbox.
On accuracy: what this page does and does not claim
These are cost numbers, computed from verified list prices and recomputed by CI against the engine. No model here was tested for extraction accuracy, and this page makes no accuracy ranking. The honest workflow is: price the tiers (done here), then run a labeled eval on a sample of your own 10-Ks and earnings transcripts, measure field-level accuracy per model, and pick the cheapest model that clears your error bar, escalating only the fields it misses. A budget model that misreads a parenthetical "(loss)" as a gain is expensive in errors, not cheap in dollars.
Decision guidance
- Default to the budget tier for structural fields. Gemini 2.5 Flash-Lite at $0.0096/document is the floor; most line-item and date extraction does not need more.
- Eval before you escalate. Measure field-level accuracy on your own filings; do not assume the frontier model is worth 54x without evidence.
- Two-stage the hard fields. A budget extractor plus a frontier verifier on the contested subset beats running everything on either alone.
- Mind the cache on Anthropic. Opus 4.7's cache-read rate makes repeated passes over one filing materially cheaper; the $0.32/document above already assumes a 50% cache hit.
- Recompute with your real token shape. Filing size and output verbosity move the per-document cost more than the model choice does within a tier. Use the Token Cost Optimizer with your numbers.
Related in this series
- Cheapest LLM for SEC 10-K Extraction at 10,000 Filings a Month 2026: the same cost engine at market-sweep scale.
- Best LLM APIs for SEC Filing Extraction 2026: the API-surface and capability survey.
- Cheapest LLM for SEC Filings 2026: the per-filing budget deep dive.
- Best LLM for Financial Analysis 2026: the broader model-selection guide.
Connects to
- Token Cost Optimizer: the per-document cost engine behind every figure here. Recompute at your own volume and token shape.
- Model Selector for Finance: match a model to a finance task profile.
References
Verified engine output
Show the recompute-verified inputs and outputs
| input_tokens_per_call | 80000 |
|---|---|
| output_tokens_per_call | 4000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 50 |
| validation_rate | 0.9 |
| cache_hit_rate | 0.5 |
| model_id | gemini-2-5-flash-lite |
| model › id | gemini-2-5-flash-lite |
|---|---|
| model › provider | |
| model › name | Gemini 2.5 Flash-Lite |
| model › input usd per mtoken | 0.1 |
| model › output usd per mtoken | 0.4 |
| model › context window | 1000000 |
| model › notes | Cheapest tier in this table; 1M context. |
| effective cost per call | 0.009600000000000001 |
| cost per idea | 0.010080000000000002 |
| cost per validated trade | 0.011200000000000002 |
| cost per day | 0.5040000000000001 |
| cost per month | 15.120000000000003 |
| cost per year | 183.96000000000004 |
Computed live at build time.
| input_tokens_per_call | 80000 |
|---|---|
| output_tokens_per_call | 4000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 50 |
| validation_rate | 0.9 |
| cache_hit_rate | 0.5 |
| model_id | gemini-2-5-flash |
| model › id | gemini-2-5-flash |
|---|---|
| model › provider | |
| model › name | Gemini 2.5 Flash |
| model › input usd per mtoken | 0.3 |
| model › output usd per mtoken | 2.5 |
| model › context window | 1000000 |
| model › notes | Fast mid-tier; 1M context. |
| effective cost per call | 0.034 |
| cost per idea | 0.0357 |
| cost per validated trade | 0.03966666666666667 |
| cost per day | 1.7850000000000001 |
| cost per month | 53.550000000000004 |
| cost per year | 651.5250000000001 |
Computed live at build time.
| input_tokens_per_call | 80000 |
|---|---|
| output_tokens_per_call | 4000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 50 |
| validation_rate | 0.9 |
| cache_hit_rate | 0.5 |
| model_id | claude-haiku-4-5 |
| model › id | claude-haiku-4-5 |
|---|---|
| model › provider | anthropic |
| model › name | Claude Haiku 4.5 |
| model › input usd per mtoken | 1 |
| model › output usd per mtoken | 5 |
| model › cache write usd per mtoken | 1.25 |
| model › cache read usd per mtoken | 0.1 |
| model › context window | 200000 |
| model › notes | Fast, cheap — filtering + pre-processing layers. |
| effective cost per call | 0.064 |
| cost per idea | 0.06720000000000001 |
| cost per validated trade | 0.07466666666666667 |
| cost per day | 3.3600000000000003 |
| cost per month | 100.80000000000001 |
| cost per year | 1226.4 |
Computed live at build time.
| input_tokens_per_call | 80000 |
|---|---|
| output_tokens_per_call | 4000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 50 |
| validation_rate | 0.9 |
| cache_hit_rate | 0.5 |
| model_id | gemini-2-5-pro |
| model › id | gemini-2-5-pro |
|---|---|
| model › provider | |
| model › name | Gemini 2.5 Pro |
| model › input usd per mtoken | 1.25 |
| model › output usd per mtoken | 10 |
| model › context window | 2000000 |
| model › notes | Large context (2M). Strong on document analysis. |
| effective cost per call | 0.14 |
| cost per idea | 0.14700000000000002 |
| cost per validated trade | 0.16333333333333336 |
| cost per day | 7.350000000000001 |
| cost per month | 220.50000000000006 |
| cost per year | 2682.7500000000005 |
Computed live at build time.
| input_tokens_per_call | 80000 |
|---|---|
| output_tokens_per_call | 4000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 50 |
| validation_rate | 0.9 |
| cache_hit_rate | 0.5 |
| model_id | gemini-3-5-flash |
| model › id | gemini-3-5-flash |
|---|---|
| model › provider | |
| model › name | Gemini 3.5 Flash |
| model › input usd per mtoken | 1.5 |
| model › output usd per mtoken | 9 |
| model › context window | 1000000 |
| model › notes | Frontier agent-tier at Flash speed — not a budget model (output ~3.6x Gemini 2.5 Flash). |
| effective cost per call | 0.156 |
| cost per idea | 0.1638 |
| cost per validated trade | 0.182 |
| cost per day | 8.19 |
| cost per month | 245.7 |
| cost per year | 2989.35 |
Computed live at build time.
| input_tokens_per_call | 80000 |
|---|---|
| output_tokens_per_call | 4000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 50 |
| validation_rate | 0.9 |
| cache_hit_rate | 0.5 |
| model_id | claude-opus-4-7 |
| model › id | claude-opus-4-7 |
|---|---|
| model › provider | anthropic |
| model › name | Claude Opus 4.7 |
| model › input usd per mtoken | 5 |
| model › output usd per mtoken | 25 |
| model › cache write usd per mtoken | 6.25 |
| model › cache read usd per mtoken | 0.5 |
| model › context window | 1000000 |
| model › notes | Flagship reasoning model — 1M context. |
| effective cost per call | 0.32 |
| cost per idea | 0.336 |
| cost per validated trade | 0.37333333333333335 |
| cost per day | 16.8 |
| cost per month | 504 |
| cost per year | 6132 |
Computed live at build time.
| input_tokens_per_call | 80000 |
|---|---|
| output_tokens_per_call | 4000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 50 |
| validation_rate | 0.9 |
| cache_hit_rate | 0.5 |
| model_id | gpt-5 |
| model › id | gpt-5 |
|---|---|
| model › provider | openai |
| model › name | GPT-5.5 |
| model › input usd per mtoken | 5 |
| model › output usd per mtoken | 30 |
| model › context window | 400000 |
| model › notes | OpenAI frontier model (GPT-5.5). |
| effective cost per call | 0.52 |
| cost per idea | 0.546 |
| cost per validated trade | 0.6066666666666667 |
| cost per day | 27.3 |
| cost per month | 819 |
| cost per year | 9964.5 |
Computed live at build time.
Frequently asked questions
- What is the cheapest LLM for SEC 10-K and earnings extraction in 2026?
- Gemini 2.5 Flash-Lite at $0.0096 per document (80k input, 4k output), or $15.12/month at 50 documents a day, computed from the Token Cost Optimizer. Its 1M context holds a full 10-K in one pass. It is the cost floor but not automatically the most accurate on adversarial fields. Verified 2026-06-07.
- How much more does GPT-5.5 or Claude Opus 4.7 cost than a budget model for filing extraction?
- On the same per-document workload, GPT-5.5 costs $0.52 and Claude Opus 4.7 costs $0.32 (with a 50% input cache hit), versus $0.0096 for Gemini 2.5 Flash-Lite — roughly 54x and 33x. That premium is only worth paying if your accuracy eval shows the frontier model is materially better on the fields you extract. Verified 2026-06-07.
- Should I use one LLM or a pipeline for SEC filing extraction?
- A pipeline. Run all documents through a budget model like Gemini 2.5 Flash-Lite for the structural pass, validate, and escalate only the roughly 10% that fail to a frontier model. At 50 documents a day that blends to near $40/month versus $245.70 to run everything on Gemini 3.5 Flash. The best 'model' for extraction is a two-stage route.
- Do these numbers rank model accuracy?
- No. Every figure is a cost number computed from verified vendor list prices and recomputed by CI against the shipped cost engine. No model was tested for extraction accuracy. Run a labeled eval on your own 10-Ks and earnings transcripts before relying on the cheapest tier.
- Why does Anthropic's per-document cost assume a cache hit?
- Re-extracting many sections of the same filing reuses the same input context, so Anthropic's prompt cache applies. The $0.32 per-document figure for Claude Opus 4.7 reflects a 50% input cache hit at the $0.50-per-Mtok cache-read rate; without caching the cost would be higher. Verified 2026-06-07.