The short answer

At 10,000 filings a month, the cheapest viable LLM for 10-K extraction is Gemini 2.5 Flash-Lite at $161.70/month, computed from the Token Cost Optimizer on a 130k-input filing shape. Gemini 3.5 Flash runs $2,614.50, Claude Opus 4.7 $5,943.00, and GPT-5.5 $8,715.00 for the same 10,000 extractions.

At 10,000 filings a month, the cheapest viable LLM for 10-K extraction is Gemini 2.5 Flash-Lite at $161.70/month1, computed live from the Token Cost Optimizer on a 130k-input, 6k-output filing shape. The next rung, Gemini 2.5 Flash, is $567.00/month. Every frontier model runs into four or five figures: Gemini 3.5 Flash $2,614.50, Claude Opus 4.7 $5,943.002, GPT-5.5 $8,715.003. At this volume the model tier is not a preference, it is the budget.

TL;DR

Model Cost / filing Cost / month (10,000 filings) Cost / year
Gemini 2.5 Flash-Lite $0.0162 $161.70 $1,967.35
Gemini 2.5 Flash $0.0567 $567.00 $6,898.50
Claude Haiku 4.5 $0.1189 $1,188.60 $14,461.30
GPT-5.4 mini $0.1307 $1,307.25 $15,904.88
Gemini 2.5 Pro $0.2336 $2,336.25 $28,424.38
Gemini 3.5 Flash $0.2615 $2,614.50 $31,809.75
Claude Opus 4.7 $0.5943 $5,943.00 $72,306.50
GPT-5.5 $0.8715 $8,715.00 $106,032.50

Same workload for every row: 130,000 input + 6,000 output tokens per filing, one call per filing, 5% retry, an 0.85 validation rate, and 10,000 filings a month (the engine run sets ideas/day to 333.3 so the monthly figure is exactly 10,000 filings). Anthropic input reflects a 40% cache hit; Google and OpenAI are priced at full list input.

Why 10,000 filings a month is the number that matters

A single filing's cost is a rounding error: even GPT-5.5, the most expensive model here, costs $0.87 for one 10-K. The decision feels free, so teams pick on familiarity. At 10,000 filings a month, the same per-filing rate compounds into a $1,967/year versus $106,032/year decision: a $104,000 annual gap for the identical token shape. Volume is what turns a per-call rounding error into a line item that needs a sign-off.

10,000 filings a month is a realistic figure for a market-wide nightly sweep. The US has roughly 6,000 to 8,000 reporting companies filing 10-Ks, 10-Qs, and 8-Ks; a pipeline that processes every new filing for a broad universe, re-extracts on amendments, and back-fills history lands in this range fast.

The cheapest viable model: Gemini 2.5 Flash-Lite

At $161.70/month, Gemini 2.5 Flash-Lite is 3.5x cheaper than the next rung (Gemini 2.5 Flash at $567.00) and 16x cheaper than the cheapest frontier model (Gemini 3.5 Flash at $2,614.50). Its 1M context window fits a full 10-K body in a single pass, so there is no chunking penalty for context fit.

The word "viable" is doing real work. Flash-Lite is the cheapest model that can hold a full filing in context; it is not automatically the most accurate. For structural extraction, pulling line items, dates, totals, and standard disclosures where the layout is regular, the budget tier is usually enough, and the 16x premium over Flash-Lite buys little. For extraction that requires reasoning across footnotes, reconciling a restatement, or resolving an ambiguous segment disclosure, a frontier model may earn its cost in fewer downstream errors. That is an accuracy question this cost analysis does not answer.

The frontier tier is a four-to-five-figure monthly commitment

At 10,000 filings a month, every frontier model crosses into serious money:

  • Gemini 3.5 Flash $2,614.50/month. The cheapest frontier pick, at Flash latency. Worth it only when the extraction genuinely needs agent-tier reasoning.
  • Gemini 2.5 Pro $2,336.25/month, with the largest context window here (2M) at slightly lower cost than 3.5 Flash.
  • Claude Opus 4.7 $5,943.00/month, even with a 40% input cache hit applied.
  • GPT-5.5 $8,715.00/month, the most expensive single-model path at this scale.

A team standardized on Opus or GPT-5.5 is paying a 2.3x to 3.3x premium over Gemini 3.5 Flash, or 37x to 54x over Flash-Lite, for the same 10,000 extractions.

The two-stage path beats any single model

The cost-optimal architecture at this volume is rarely one model. Route all 10,000 filings through Flash-Lite ($161.70/month) for the structural pass, run a cheap validator on the output, and escalate only the small fraction that fails validation to a frontier model. If 10% of filings need a frontier re-pass on Gemini 3.5 Flash, that adds roughly $261/month (10% of $2,614.50), for a blended bill near $423/month, versus $2,614.50 to run every filing on the frontier model. The two-stage pipeline captures most of the frontier accuracy on the hard cases while paying the budget rate on the easy 90%.

Decision guidance

  1. Eval accuracy on your own filings first. A budget model that misreads a parenthetical "(loss)" is expensive in errors, not cheap.
  2. Price your real token shape. Filing size and output verbosity move the per-call cost more than the model choice does within a tier. Recompute the table above with your numbers in the Token Cost Optimizer.
  3. Two-stage the hard fields. A budget extractor with a frontier verifier on the contested subset beats running everything on either model alone.
  4. Watch the validation rate. At an 0.85 validation rate, 15% of extractions are reworked. Lifting validation from 0.85 to 0.95 cuts effective cost more than switching down one model tier in many cases.

Connects to

References

Footnotes

  1. Google. "Gemini Developer API pricing." ai.google.dev, verified 2026-05-26. https://ai.google.dev/gemini-api/docs/pricing

  2. Anthropic. "Pricing." platform.claude.com, verified 2026-05-26. https://platform.claude.com/docs/en/about-claude/pricing

  3. OpenAI. "API Pricing." developers.openai.com, verified 2026-05-26. https://developers.openai.com/api/docs/pricing

Verified engine output

Show the recompute-verified inputs and outputs
10,000 filings/month — Gemini 2.5 Flash-Lite (cheapest viable)
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day333.3333333333333
validation_rate0.85
cache_hit_rate0.4
model_idgemini-2-5-flash-lite
Result
model › idgemini-2-5-flash-lite
model › providergoogle
model › nameGemini 2.5 Flash-Lite
model › input usd per mtoken0.1
model › output usd per mtoken0.4
model › context window1000000
model › notesCheapest tier in this table; 1M context.
effective cost per call0.0154
cost per idea0.01617
cost per validated trade0.019023529411764706
cost per day5.39
cost per month161.7
cost per year1967.35

Computed live at build time.

10,000 filings/month — Gemini 2.5 Flash
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day333.3333333333333
validation_rate0.85
cache_hit_rate0.4
model_idgemini-2-5-flash
Result
model › idgemini-2-5-flash
model › providergoogle
model › nameGemini 2.5 Flash
model › input usd per mtoken0.3
model › output usd per mtoken2.5
model › context window1000000
model › notesFast mid-tier; 1M context.
effective cost per call0.054
cost per idea0.0567
cost per validated trade0.06670588235294118
cost per day18.9
cost per month567
cost per year6898.499999999999

Computed live at build time.

10,000 filings/month — Claude Haiku 4.5
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day333.3333333333333
validation_rate0.85
cache_hit_rate0.4
model_idclaude-haiku-4-5
Result
model › idclaude-haiku-4-5
model › provideranthropic
model › nameClaude Haiku 4.5
model › input usd per mtoken1
model › output usd per mtoken5
model › cache write usd per mtoken1.25
model › cache read usd per mtoken0.1
model › context window200000
model › notesFast, cheap — filtering + pre-processing layers.
effective cost per call0.1132
cost per idea0.11886
cost per validated trade0.13983529411764706
cost per day39.62
cost per month1188.6
cost per year14461.3

Computed live at build time.

10,000 filings/month — GPT-5.4 mini
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day333.3333333333333
validation_rate0.85
cache_hit_rate0.4
model_idgpt-5-mini
Result
model › idgpt-5-mini
model › provideropenai
model › nameGPT-5.4 mini
model › input usd per mtoken0.75
model › output usd per mtoken4.5
model › context window256000
model › notesMid-tier OpenAI (GPT-5.4 mini).
effective cost per call0.1245
cost per idea0.130725
cost per validated trade0.15379411764705883
cost per day43.575
cost per month1307.25
cost per year15904.875000000002

Computed live at build time.

10,000 filings/month — Gemini 2.5 Pro (2M context)
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day333.3333333333333
validation_rate0.85
cache_hit_rate0.4
model_idgemini-2-5-pro
Result
model › idgemini-2-5-pro
model › providergoogle
model › nameGemini 2.5 Pro
model › input usd per mtoken1.25
model › output usd per mtoken10
model › context window2000000
model › notesLarge context (2M). Strong on document analysis.
effective cost per call0.2225
cost per idea0.23362500000000003
cost per validated trade0.27485294117647063
cost per day77.875
cost per month2336.25
cost per year28424.375

Computed live at build time.

10,000 filings/month — Gemini 3.5 Flash (cheapest frontier)
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day333.3333333333333
validation_rate0.85
cache_hit_rate0.4
model_idgemini-3-5-flash
Result
model › idgemini-3-5-flash
model › providergoogle
model › nameGemini 3.5 Flash
model › input usd per mtoken1.5
model › output usd per mtoken9
model › context window1000000
model › notesFrontier agent-tier at Flash speed — not a budget model (output ~3.6x Gemini 2.5 Flash).
effective cost per call0.249
cost per idea0.26145
cost per validated trade0.30758823529411766
cost per day87.15
cost per month2614.5
cost per year31809.750000000004

Computed live at build time.

10,000 filings/month — Claude Opus 4.7 (40% cache hit on input)
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day333.3333333333333
validation_rate0.85
cache_hit_rate0.4
model_idclaude-opus-4-7
Result
model › idclaude-opus-4-7
model › provideranthropic
model › nameClaude Opus 4.7
model › input usd per mtoken5
model › output usd per mtoken25
model › cache write usd per mtoken6.25
model › cache read usd per mtoken0.5
model › context window1000000
model › notesFlagship reasoning model — 1M context.
effective cost per call0.5660000000000001
cost per idea0.5943
cost per validated trade0.6991764705882354
cost per day198.1
cost per month5943
cost per year72306.5

Computed live at build time.

10,000 filings/month — GPT-5.5 (premium)
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day333.3333333333333
validation_rate0.85
cache_hit_rate0.4
model_idgpt-5
Result
model › idgpt-5
model › provideropenai
model › nameGPT-5.5
model › input usd per mtoken5
model › output usd per mtoken30
model › context window400000
model › notesOpenAI frontier model (GPT-5.5).
effective cost per call0.8300000000000001
cost per idea0.8715000000000002
cost per validated trade1.0252941176470591
cost per day290.50000000000006
cost per month8715.000000000002
cost per year106032.50000000001

Computed live at build time.

Frequently asked questions

What is the cheapest LLM for SEC 10-K extraction at 10,000 filings a month?
Gemini 2.5 Flash-Lite at $161.70 per month, computed from the Token Cost Optimizer on a 130k-input, 6k-output filing shape. It is 3.5x cheaper than Gemini 2.5 Flash ($567.00) and 16x cheaper than the cheapest frontier model, Gemini 3.5 Flash ($2,614.50).
How much does GPT-5.5 cost to extract 10,000 10-Ks a month?
$8,715.00 per month at the verified $5/$30 per Mtok rate on a 130k-input, 6k-output filing shape, the most expensive single-model path at this scale and 54x the cost of Gemini 2.5 Flash-Lite.
Should I use a frontier model for SEC filing extraction at scale?
Only for the subset that needs reasoning. The cost-optimal pattern is a two-stage pipeline: run all 10,000 filings through Gemini 2.5 Flash-Lite ($161.70/month), then escalate the roughly 10% that fail validation to a frontier model, for a blended bill near $423/month instead of $2,614.50 to run everything on Gemini 3.5 Flash.
Are these accuracy rankings?
No. Every figure is a cost number computed from verified vendor list prices and recomputed by CI against the shipped cost engine. No model was tested for extraction accuracy. Run your own eval on your own filings before relying on the cheapest tier.
Why does the monthly cost scale linearly with filing count?
Each filing is one independent extraction call, so cost scales linearly with volume. That is why a per-filing difference of fractions of a cent becomes a $104,000-a-year decision between Gemini 2.5 Flash-Lite and GPT-5.5 at 10,000 filings a month.