The short answer

The cheapest LLM for SEC filings in 2026 that still fits a full 10-K in context is Gemini 2.5 Flash at $0.30/$2.50 per Mtok, about $0.046 per 120k-token filing with a 1M window. DeepSeek V4-Flash is cheaper on paper ($0.14/$0.28, about $0.018/filing); the deciding axis is context fit and your accuracy floor, not the headline rate.

The cheapest LLM for SEC filings in 2026 that still fits a full 10-K in context is Gemini 2.5 Flash at $0.30 / $2.50 per Mtok: about $0.046 to read a 120k-token filing and emit a 4k-token extraction, with a 1M-token window that swallows the largest filings whole. DeepSeek V4-Flash is cheaper still on paper ($0.14 / $0.28, ~$0.018/filing) with a 1M window, and GPT-5.4-nano lands between them. Verified list prices below. The deciding axis is not the headline rate but context fit and the accuracy floor your extraction needs. Model the per-filing cost with the Token Cost Optimizer.

TL;DR

Model Input $/Mtok Output $/Mtok Context ~$/filing (120k in + 4k out)
DeepSeek V4-Flash $0.14 $0.28 1M $0.018
GPT-5.4-nano $0.20 $1.25 long-context $0.029
Gemini 2.5 Flash $0.30 $2.50 1M $0.046
GPT-5.4-mini $0.75 $4.50 long-context $0.108
Claude Haiku 4.5 $1.00 $5.00 200k $0.140
Gemini 2.5 Pro (≤200k) $1.25 $10.00 1M+ $0.190

Per-filing costs are computed from the verified list prices (120,000 input tokens × input rate + 4,000 output tokens × output rate), not from a benchmark run. All prices verified 2026-05-25 against each vendor's official pricing page.

Why context fit decides this, not the headline rate

A 10-K is large. The text body of a full annual report commonly lands in the ~100k–150k-token range, and complex filings push higher. If the model's context window cannot hold the filing, the cheap per-token rate is irrelevant: you are forced into chunking, retrieval, and stitching, which adds engineering cost and accuracy risk.

That immediately reshapes the budget shortlist:

  • Claude Haiku 4.5 is cheap ($1/$5) but its 200k context window is the binding constraint for the largest filings: comfortable for most 10-Ks, tight for the biggest.1
  • Gemini 2.5 Flash carries a 1M-token window, so any single filing fits with room for instructions and few-shot examples.2
  • DeepSeek V4-Flash also carries a 1M context window with 384k max output, making it the cheapest full-filing-fit option on a per-token basis.3
  • GPT-5.4-nano / mini use tiered short/long-context pricing; verify your filing size against the tier boundary before relying on the short-context rate.4

The verified budget-tier prices

Gemini 2.5 Flash $0.30 / Mtok input, $2.50 / Mtok output, 1M context.2

Claude Haiku 4.5 $1 / Mtok input, $5 / Mtok output, 200k context; cache reads at 0.1x base input.1

GPT-5.4-mini ($0.75 / Mtok input ($0.075 cached), $4.50 / Mtok output. GPT-5.4-nano) $0.20 / Mtok input ($0.02 cached), $1.25 / Mtok output.4

DeepSeek V4-Flash $0.14 / Mtok cache-miss input ($0.0028 cache-hit), $0.28 / Mtok output, 1M context. Automatic context caching is on by default, so the boilerplate that repeats across filings can hit the cache-hit rate.3

Caching amortizes the filing boilerplate

Filings share enormous structural boilerplate: risk-factor headers, accounting-policy language, standard table layouts. If your prompt pins a fixed extraction schema and instruction block ahead of the filing-specific text, caching that prefix turns it nearly free after the first call. Anthropic cache reads are 0.1x base input; OpenAI cached input is a 90% discount; DeepSeek's cache-hit input is ~2% of the cache-miss rate ($0.0028 vs $0.14).143 Over a thousand-filing sweep, the cached prefix can shave a meaningful fraction off the bill.

The accuracy floor caveat

Cheaper does not mean adequate. A budget model that misreads a parenthetical "(loss)" as a positive number, or drops a footnote that reverses the headline figure, is not cheap: it is expensive in errors. The defensible workflow: pick the cheapest model that clears your extraction accuracy bar on your own filings, measured with an eval harness, not the cheapest model outright. For high-stakes numeric fields, a budget extractor feeding a frontier verifier (see Claude vs GPT-5 vs Gemini) often beats either alone.

Verified engine output

The block below runs the Token Cost Optimizer on a filing-extraction workload: Gemini 2.5 Flash, 120k input + 4k output per call, one call per filing, 50 filings/day, with a partial cache-hit assumption. It returns the per-call, per-validated, and monthly cost from the engine's own rate table. The output is computed live from the shipped bundle, not typed by hand.

Decision guidance

  • Absolute cheapest, full-filing fit, hosted API DeepSeek V4-Flash (1M context, ~$0.018/filing). Verify data-handling terms suit your use.
  • Cheapest from a major US-frontier vendor with 1M context Gemini 2.5 Flash (~$0.046/filing).
  • Need 90%+ extraction accuracy on hard numeric fields run an eval; a budget extractor + frontier verifier may be the real cheapest-correct path.
  • Repeated schema/boilerplate across thousands of filings turn on caching; it changes the per-call math materially.

Connects to

References

Footnotes

  1. Anthropic. "Pricing." platform.claude.com, verified 2026-05-25. https://platform.claude.com/docs/en/about-claude/pricing 2 3

  2. Google. "Gemini Developer API pricing." ai.google.dev, verified 2026-05-25. https://ai.google.dev/gemini-api/docs/pricing 2

  3. DeepSeek. "Models & Pricing." api-docs.deepseek.com, verified 2026-05-25. https://api-docs.deepseek.com/quick_start/pricing 2 3

  4. OpenAI. "API Pricing." developers.openai.com, verified 2026-05-25. https://developers.openai.com/api/docs/pricing 2 3

Verified engine output

Show the recompute-verified inputs and outputs
Filing extraction: Gemini 2.5 Flash, 120k in + 4k out, 50 filings/day
Inputs
model_idgemini-2-5-flash
input_tokens_per_call120000
output_tokens_per_call4000
calls_per_idea1
retry_rate0.05
ideas_per_day50
validation_rate0.9
cache_hit_rate0.6
Result
model › idgemini-2-5-flash
model › providergoogle
model › nameGemini 2.5 Flash
model › input usd per mtoken0.3
model › output usd per mtoken2.5
model › context window1000000
model › notesFast mid-tier; 1M context.
effective cost per call0.046
cost per idea0.0483
cost per validated trade0.05366666666666667
cost per day2.415
cost per month72.45
cost per year881.475

Computed live at build time.

Frequently asked questions

What is the cheapest LLM for processing SEC filings in 2026?
On a per-token basis with full-filing context fit, DeepSeek V4-Flash ($0.14/$0.28, 1M context) is cheapest at ~$0.018 per filing. Among major US-frontier vendors, Gemini 2.5 Flash ($0.30/$2.50, 1M context) at ~$0.046 per filing.
Which budget models fit a full 10-K in context?
Gemini 2.5 Flash and DeepSeek V4-Flash both carry 1M-token windows. Claude Haiku 4.5 has 200k, fine for most filings but tight for the largest. Verify GPT-5.4 nano/mini against their context-tier boundary.
Does caching make filing extraction cheaper?
Yes. Filings share heavy boilerplate; caching a fixed schema/instruction prefix costs about 10% of base input on Anthropic and Gemini, a 90% discount on OpenAI, and about 2% of cache-miss on DeepSeek.
Is the cheapest model good enough for filings?
Not automatically. A budget model that misreads a numeric field is expensive in errors. Pick the cheapest model that clears your accuracy bar on an eval of your own filings.
Where do these prices come from?
Each vendor's official pricing page, verified 2026-05-25. Per-filing costs are computed from those list prices, not a benchmark run.