The short answer
The cheapest LLM for SEC filings in 2026 that still fits a full 10-K in context is Gemini 2.5 Flash at $0.30/$2.50 per Mtok, about $0.046 per 120k-token filing with a 1M window. DeepSeek V4-Flash is cheaper on paper ($0.14/$0.28, about $0.018/filing); the deciding axis is context fit and your accuracy floor, not the headline rate.
The cheapest LLM for SEC filings in 2026 that still fits a full 10-K in context is Gemini 2.5 Flash at $0.30 / $2.50 per Mtok: about $0.046 to read a 120k-token filing and emit a 4k-token extraction, with a 1M-token window that swallows the largest filings whole. DeepSeek V4-Flash is cheaper still on paper ($0.14 / $0.28, ~$0.018/filing) with a 1M window, and GPT-5.4-nano lands between them. Verified list prices below. The deciding axis is not the headline rate but context fit and the accuracy floor your extraction needs. Model the per-filing cost with the Token Cost Optimizer.
TL;DR
| Model | Input $/Mtok | Output $/Mtok | Context | ~$/filing (120k in + 4k out) |
|---|---|---|---|---|
| DeepSeek V4-Flash | $0.14 | $0.28 | 1M | $0.018 |
| GPT-5.4-nano | $0.20 | $1.25 | long-context | $0.029 |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M | $0.046 |
| GPT-5.4-mini | $0.75 | $4.50 | long-context | $0.108 |
| Claude Haiku 4.5 | $1.00 | $5.00 | 200k | $0.140 |
| Gemini 2.5 Pro (≤200k) | $1.25 | $10.00 | 1M+ | $0.190 |
Per-filing costs are computed from the verified list prices (120,000 input tokens × input rate + 4,000 output tokens × output rate), not from a benchmark run. All prices verified 2026-05-25 against each vendor's official pricing page.
Why context fit decides this, not the headline rate
A 10-K is large. The text body of a full annual report commonly lands in the ~100k–150k-token range, and complex filings push higher. If the model's context window cannot hold the filing, the cheap per-token rate is irrelevant: you are forced into chunking, retrieval, and stitching, which adds engineering cost and accuracy risk.
That immediately reshapes the budget shortlist:
- Claude Haiku 4.5 is cheap ($1/$5) but its 200k context window is the binding constraint for the largest filings: comfortable for most 10-Ks, tight for the biggest.1
- Gemini 2.5 Flash carries a 1M-token window, so any single filing fits with room for instructions and few-shot examples.2
- DeepSeek V4-Flash also carries a 1M context window with 384k max output, making it the cheapest full-filing-fit option on a per-token basis.3
- GPT-5.4-nano / mini use tiered short/long-context pricing; verify your filing size against the tier boundary before relying on the short-context rate.4
The verified budget-tier prices
Gemini 2.5 Flash $0.30 / Mtok input, $2.50 / Mtok output, 1M context.2
Claude Haiku 4.5 $1 / Mtok input, $5 / Mtok output, 200k context; cache reads at 0.1x base input.1
GPT-5.4-mini ($0.75 / Mtok input ($0.075 cached), $4.50 / Mtok output. GPT-5.4-nano) $0.20 / Mtok input ($0.02 cached), $1.25 / Mtok output.4
DeepSeek V4-Flash $0.14 / Mtok cache-miss input ($0.0028 cache-hit), $0.28 / Mtok output, 1M context. Automatic context caching is on by default, so the boilerplate that repeats across filings can hit the cache-hit rate.3
Caching amortizes the filing boilerplate
Filings share enormous structural boilerplate: risk-factor headers, accounting-policy language, standard table layouts. If your prompt pins a fixed extraction schema and instruction block ahead of the filing-specific text, caching that prefix turns it nearly free after the first call. Anthropic cache reads are 0.1x base input; OpenAI cached input is a 90% discount; DeepSeek's cache-hit input is ~2% of the cache-miss rate ($0.0028 vs $0.14).143 Over a thousand-filing sweep, the cached prefix can shave a meaningful fraction off the bill.
The accuracy floor caveat
Cheaper does not mean adequate. A budget model that misreads a parenthetical "(loss)" as a positive number, or drops a footnote that reverses the headline figure, is not cheap: it is expensive in errors. The defensible workflow: pick the cheapest model that clears your extraction accuracy bar on your own filings, measured with an eval harness, not the cheapest model outright. For high-stakes numeric fields, a budget extractor feeding a frontier verifier (see Claude vs GPT-5 vs Gemini) often beats either alone.
Verified engine output
The block below runs the Token Cost Optimizer on a filing-extraction workload: Gemini 2.5 Flash, 120k input + 4k output per call, one call per filing, 50 filings/day, with a partial cache-hit assumption. It returns the per-call, per-validated, and monthly cost from the engine's own rate table. The output is computed live from the shipped bundle, not typed by hand.
Decision guidance
- Absolute cheapest, full-filing fit, hosted API DeepSeek V4-Flash (1M context, ~$0.018/filing). Verify data-handling terms suit your use.
- Cheapest from a major US-frontier vendor with 1M context Gemini 2.5 Flash (~$0.046/filing).
- Need 90%+ extraction accuracy on hard numeric fields run an eval; a budget extractor + frontier verifier may be the real cheapest-correct path.
- Repeated schema/boilerplate across thousands of filings turn on caching; it changes the per-call math materially.
Connects to
- Token Cost Optimizer: per-filing cost from your own token shape.
- Cheapest SEC EDGAR API source: the free filing text these models read.
- Claude vs GPT-5 vs Gemini for Financial Analysis 2026: when the task needs a reasoning tier above budget.
- DeepSeek vs Mistral for Financial Analysis 2026: the open-weight budget corner in depth.
References
Footnotes
-
Anthropic. "Pricing." platform.claude.com, verified 2026-05-25. https://platform.claude.com/docs/en/about-claude/pricing ↩ ↩2 ↩3
-
Google. "Gemini Developer API pricing." ai.google.dev, verified 2026-05-25. https://ai.google.dev/gemini-api/docs/pricing ↩ ↩2
-
DeepSeek. "Models & Pricing." api-docs.deepseek.com, verified 2026-05-25. https://api-docs.deepseek.com/quick_start/pricing ↩ ↩2 ↩3
-
OpenAI. "API Pricing." developers.openai.com, verified 2026-05-25. https://developers.openai.com/api/docs/pricing ↩ ↩2 ↩3
Verified engine output
Show the recompute-verified inputs and outputs
| model_id | gemini-2-5-flash |
|---|---|
| input_tokens_per_call | 120000 |
| output_tokens_per_call | 4000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 50 |
| validation_rate | 0.9 |
| cache_hit_rate | 0.6 |
| model › id | gemini-2-5-flash |
|---|---|
| model › provider | |
| model › name | Gemini 2.5 Flash |
| model › input usd per mtoken | 0.3 |
| model › output usd per mtoken | 2.5 |
| model › context window | 1000000 |
| model › notes | Fast mid-tier; 1M context. |
| effective cost per call | 0.046 |
| cost per idea | 0.0483 |
| cost per validated trade | 0.05366666666666667 |
| cost per day | 2.415 |
| cost per month | 72.45 |
| cost per year | 881.475 |
Computed live at build time.
Frequently asked questions
- What is the cheapest LLM for processing SEC filings in 2026?
- On a per-token basis with full-filing context fit, DeepSeek V4-Flash ($0.14/$0.28, 1M context) is cheapest at ~$0.018 per filing. Among major US-frontier vendors, Gemini 2.5 Flash ($0.30/$2.50, 1M context) at ~$0.046 per filing.
- Which budget models fit a full 10-K in context?
- Gemini 2.5 Flash and DeepSeek V4-Flash both carry 1M-token windows. Claude Haiku 4.5 has 200k, fine for most filings but tight for the largest. Verify GPT-5.4 nano/mini against their context-tier boundary.
- Does caching make filing extraction cheaper?
- Yes. Filings share heavy boilerplate; caching a fixed schema/instruction prefix costs about 10% of base input on Anthropic and Gemini, a 90% discount on OpenAI, and about 2% of cache-miss on DeepSeek.
- Is the cheapest model good enough for filings?
- Not automatically. A budget model that misreads a numeric field is expensive in errors. Pick the cheapest model that clears your accuracy bar on an eval of your own filings.
- Where do these prices come from?
- Each vendor's official pricing page, verified 2026-05-25. Per-filing costs are computed from those list prices, not a benchmark run.