The short answer
At 10,000 filings a month, the cheapest viable LLM for 10-K extraction is Gemini 2.5 Flash-Lite at $161.70/month, computed from the Token Cost Optimizer on a 130k-input filing shape. Gemini 3.5 Flash runs $2,614.50, Claude Opus 4.7 $5,943.00, and GPT-5.5 $8,715.00 for the same 10,000 extractions.
At 10,000 filings a month, the cheapest viable LLM for 10-K extraction is Gemini 2.5 Flash-Lite at $161.70/month1, computed live from the Token Cost Optimizer on a 130k-input, 6k-output filing shape. The next rung, Gemini 2.5 Flash, is $567.00/month. Every frontier model runs into four or five figures: Gemini 3.5 Flash $2,614.50, Claude Opus 4.7 $5,943.002, GPT-5.5 $8,715.003. At this volume the model tier is not a preference, it is the budget.
TL;DR
| Model | Cost / filing | Cost / month (10,000 filings) | Cost / year |
|---|---|---|---|
| Gemini 2.5 Flash-Lite | $0.0162 | $161.70 | $1,967.35 |
| Gemini 2.5 Flash | $0.0567 | $567.00 | $6,898.50 |
| Claude Haiku 4.5 | $0.1189 | $1,188.60 | $14,461.30 |
| GPT-5.4 mini | $0.1307 | $1,307.25 | $15,904.88 |
| Gemini 2.5 Pro | $0.2336 | $2,336.25 | $28,424.38 |
| Gemini 3.5 Flash | $0.2615 | $2,614.50 | $31,809.75 |
| Claude Opus 4.7 | $0.5943 | $5,943.00 | $72,306.50 |
| GPT-5.5 | $0.8715 | $8,715.00 | $106,032.50 |
Same workload for every row: 130,000 input + 6,000 output tokens per filing, one call per filing, 5% retry, an 0.85 validation rate, and 10,000 filings a month (the engine run sets ideas/day to 333.3 so the monthly figure is exactly 10,000 filings). Anthropic input reflects a 40% cache hit; Google and OpenAI are priced at full list input.
Why 10,000 filings a month is the number that matters
A single filing's cost is a rounding error: even GPT-5.5, the most expensive model here, costs $0.87 for one 10-K. The decision feels free, so teams pick on familiarity. At 10,000 filings a month, the same per-filing rate compounds into a $1,967/year versus $106,032/year decision: a $104,000 annual gap for the identical token shape. Volume is what turns a per-call rounding error into a line item that needs a sign-off.
10,000 filings a month is a realistic figure for a market-wide nightly sweep. The US has roughly 6,000 to 8,000 reporting companies filing 10-Ks, 10-Qs, and 8-Ks; a pipeline that processes every new filing for a broad universe, re-extracts on amendments, and back-fills history lands in this range fast.
The cheapest viable model: Gemini 2.5 Flash-Lite
At $161.70/month, Gemini 2.5 Flash-Lite is 3.5x cheaper than the next rung (Gemini 2.5 Flash at $567.00) and 16x cheaper than the cheapest frontier model (Gemini 3.5 Flash at $2,614.50). Its 1M context window fits a full 10-K body in a single pass, so there is no chunking penalty for context fit.
The word "viable" is doing real work. Flash-Lite is the cheapest model that can hold a full filing in context; it is not automatically the most accurate. For structural extraction, pulling line items, dates, totals, and standard disclosures where the layout is regular, the budget tier is usually enough, and the 16x premium over Flash-Lite buys little. For extraction that requires reasoning across footnotes, reconciling a restatement, or resolving an ambiguous segment disclosure, a frontier model may earn its cost in fewer downstream errors. That is an accuracy question this cost analysis does not answer.
The frontier tier is a four-to-five-figure monthly commitment
At 10,000 filings a month, every frontier model crosses into serious money:
- Gemini 3.5 Flash $2,614.50/month. The cheapest frontier pick, at Flash latency. Worth it only when the extraction genuinely needs agent-tier reasoning.
- Gemini 2.5 Pro $2,336.25/month, with the largest context window here (2M) at slightly lower cost than 3.5 Flash.
- Claude Opus 4.7 $5,943.00/month, even with a 40% input cache hit applied.
- GPT-5.5 $8,715.00/month, the most expensive single-model path at this scale.
A team standardized on Opus or GPT-5.5 is paying a 2.3x to 3.3x premium over Gemini 3.5 Flash, or 37x to 54x over Flash-Lite, for the same 10,000 extractions.
The two-stage path beats any single model
The cost-optimal architecture at this volume is rarely one model. Route all 10,000 filings through Flash-Lite ($161.70/month) for the structural pass, run a cheap validator on the output, and escalate only the small fraction that fails validation to a frontier model. If 10% of filings need a frontier re-pass on Gemini 3.5 Flash, that adds roughly $261/month (10% of $2,614.50), for a blended bill near $423/month, versus $2,614.50 to run every filing on the frontier model. The two-stage pipeline captures most of the frontier accuracy on the hard cases while paying the budget rate on the easy 90%.
Decision guidance
- Eval accuracy on your own filings first. A budget model that misreads a parenthetical "(loss)" is expensive in errors, not cheap.
- Price your real token shape. Filing size and output verbosity move the per-call cost more than the model choice does within a tier. Recompute the table above with your numbers in the Token Cost Optimizer.
- Two-stage the hard fields. A budget extractor with a frontier verifier on the contested subset beats running everything on either model alone.
- Watch the validation rate. At an 0.85 validation rate, 15% of extractions are reworked. Lifting validation from 0.85 to 0.95 cuts effective cost more than switching down one model tier in many cases.
Connects to
- Token Cost Optimizer: the per-filing cost engine behind every figure here. Recompute at your own volume.
- The LLM-in-Finance Economics Report 2026: the full four-workload report this spoke feeds into.
- Cheapest LLM for SEC Filings 2026: the per-filing budget deep dive across all vendors.
- Gemini 3.5 Flash vs GPT-5.5 vs Claude Opus 4.7 for Finance Extraction 2026: the focused three-way extraction comparison.
References
Footnotes
-
Google. "Gemini Developer API pricing." ai.google.dev, verified 2026-05-26. https://ai.google.dev/gemini-api/docs/pricing ↩
-
Anthropic. "Pricing." platform.claude.com, verified 2026-05-26. https://platform.claude.com/docs/en/about-claude/pricing ↩
-
OpenAI. "API Pricing." developers.openai.com, verified 2026-05-26. https://developers.openai.com/api/docs/pricing ↩
Verified engine output
Show the recompute-verified inputs and outputs
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 333.3333333333333 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | gemini-2-5-flash-lite |
| model › id | gemini-2-5-flash-lite |
|---|---|
| model › provider | |
| model › name | Gemini 2.5 Flash-Lite |
| model › input usd per mtoken | 0.1 |
| model › output usd per mtoken | 0.4 |
| model › context window | 1000000 |
| model › notes | Cheapest tier in this table; 1M context. |
| effective cost per call | 0.0154 |
| cost per idea | 0.01617 |
| cost per validated trade | 0.019023529411764706 |
| cost per day | 5.39 |
| cost per month | 161.7 |
| cost per year | 1967.35 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 333.3333333333333 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | gemini-2-5-flash |
| model › id | gemini-2-5-flash |
|---|---|
| model › provider | |
| model › name | Gemini 2.5 Flash |
| model › input usd per mtoken | 0.3 |
| model › output usd per mtoken | 2.5 |
| model › context window | 1000000 |
| model › notes | Fast mid-tier; 1M context. |
| effective cost per call | 0.054 |
| cost per idea | 0.0567 |
| cost per validated trade | 0.06670588235294118 |
| cost per day | 18.9 |
| cost per month | 567 |
| cost per year | 6898.499999999999 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 333.3333333333333 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | claude-haiku-4-5 |
| model › id | claude-haiku-4-5 |
|---|---|
| model › provider | anthropic |
| model › name | Claude Haiku 4.5 |
| model › input usd per mtoken | 1 |
| model › output usd per mtoken | 5 |
| model › cache write usd per mtoken | 1.25 |
| model › cache read usd per mtoken | 0.1 |
| model › context window | 200000 |
| model › notes | Fast, cheap — filtering + pre-processing layers. |
| effective cost per call | 0.1132 |
| cost per idea | 0.11886 |
| cost per validated trade | 0.13983529411764706 |
| cost per day | 39.62 |
| cost per month | 1188.6 |
| cost per year | 14461.3 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 333.3333333333333 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | gpt-5-mini |
| model › id | gpt-5-mini |
|---|---|
| model › provider | openai |
| model › name | GPT-5.4 mini |
| model › input usd per mtoken | 0.75 |
| model › output usd per mtoken | 4.5 |
| model › context window | 256000 |
| model › notes | Mid-tier OpenAI (GPT-5.4 mini). |
| effective cost per call | 0.1245 |
| cost per idea | 0.130725 |
| cost per validated trade | 0.15379411764705883 |
| cost per day | 43.575 |
| cost per month | 1307.25 |
| cost per year | 15904.875000000002 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 333.3333333333333 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | gemini-2-5-pro |
| model › id | gemini-2-5-pro |
|---|---|
| model › provider | |
| model › name | Gemini 2.5 Pro |
| model › input usd per mtoken | 1.25 |
| model › output usd per mtoken | 10 |
| model › context window | 2000000 |
| model › notes | Large context (2M). Strong on document analysis. |
| effective cost per call | 0.2225 |
| cost per idea | 0.23362500000000003 |
| cost per validated trade | 0.27485294117647063 |
| cost per day | 77.875 |
| cost per month | 2336.25 |
| cost per year | 28424.375 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 333.3333333333333 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | gemini-3-5-flash |
| model › id | gemini-3-5-flash |
|---|---|
| model › provider | |
| model › name | Gemini 3.5 Flash |
| model › input usd per mtoken | 1.5 |
| model › output usd per mtoken | 9 |
| model › context window | 1000000 |
| model › notes | Frontier agent-tier at Flash speed — not a budget model (output ~3.6x Gemini 2.5 Flash). |
| effective cost per call | 0.249 |
| cost per idea | 0.26145 |
| cost per validated trade | 0.30758823529411766 |
| cost per day | 87.15 |
| cost per month | 2614.5 |
| cost per year | 31809.750000000004 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 333.3333333333333 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | claude-opus-4-7 |
| model › id | claude-opus-4-7 |
|---|---|
| model › provider | anthropic |
| model › name | Claude Opus 4.7 |
| model › input usd per mtoken | 5 |
| model › output usd per mtoken | 25 |
| model › cache write usd per mtoken | 6.25 |
| model › cache read usd per mtoken | 0.5 |
| model › context window | 1000000 |
| model › notes | Flagship reasoning model — 1M context. |
| effective cost per call | 0.5660000000000001 |
| cost per idea | 0.5943 |
| cost per validated trade | 0.6991764705882354 |
| cost per day | 198.1 |
| cost per month | 5943 |
| cost per year | 72306.5 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 333.3333333333333 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | gpt-5 |
| model › id | gpt-5 |
|---|---|
| model › provider | openai |
| model › name | GPT-5.5 |
| model › input usd per mtoken | 5 |
| model › output usd per mtoken | 30 |
| model › context window | 400000 |
| model › notes | OpenAI frontier model (GPT-5.5). |
| effective cost per call | 0.8300000000000001 |
| cost per idea | 0.8715000000000002 |
| cost per validated trade | 1.0252941176470591 |
| cost per day | 290.50000000000006 |
| cost per month | 8715.000000000002 |
| cost per year | 106032.50000000001 |
Computed live at build time.
Frequently asked questions
- What is the cheapest LLM for SEC 10-K extraction at 10,000 filings a month?
- Gemini 2.5 Flash-Lite at $161.70 per month, computed from the Token Cost Optimizer on a 130k-input, 6k-output filing shape. It is 3.5x cheaper than Gemini 2.5 Flash ($567.00) and 16x cheaper than the cheapest frontier model, Gemini 3.5 Flash ($2,614.50).
- How much does GPT-5.5 cost to extract 10,000 10-Ks a month?
- $8,715.00 per month at the verified $5/$30 per Mtok rate on a 130k-input, 6k-output filing shape, the most expensive single-model path at this scale and 54x the cost of Gemini 2.5 Flash-Lite.
- Should I use a frontier model for SEC filing extraction at scale?
- Only for the subset that needs reasoning. The cost-optimal pattern is a two-stage pipeline: run all 10,000 filings through Gemini 2.5 Flash-Lite ($161.70/month), then escalate the roughly 10% that fail validation to a frontier model, for a blended bill near $423/month instead of $2,614.50 to run everything on Gemini 3.5 Flash.
- Are these accuracy rankings?
- No. Every figure is a cost number computed from verified vendor list prices and recomputed by CI against the shipped cost engine. No model was tested for extraction accuracy. Run your own eval on your own filings before relying on the cheapest tier.
- Why does the monthly cost scale linearly with filing count?
- Each filing is one independent extraction call, so cost scales linearly with volume. That is why a per-filing difference of fractions of a cent becomes a $104,000-a-year decision between Gemini 2.5 Flash-Lite and GPT-5.5 at 10,000 filings a month.