How much does GPT-5.5 cost to extract 10,000 10-Ks a month?

$8,715.00 per month at the verified $5/$30 per Mtok rate on a 130k-input, 6k-output filing shape, the most expensive single-model path at this scale and 54x the cost of Gemini 2.5 Flash-Lite.

Should I use a frontier model for SEC filing extraction at scale?

Only for the subset that needs reasoning. The cost-optimal pattern is a two-stage pipeline: run all 10,000 filings through Gemini 2.5 Flash-Lite ($161.70/month), then escalate the roughly 10% that fail validation to a frontier model, for a blended bill near $423/month instead of $2,614.50 to run everything on Gemini 3.5 Flash.

Are these accuracy rankings?

No. Every figure is a cost number computed from verified vendor list prices and recomputed by CI against the shipped cost engine. No model was tested for extraction accuracy. Run your own eval on your own filings before relying on the cheapest tier.

Why does the monthly cost scale linearly with filing count?

Each filing is one independent extraction call, so cost scales linearly with volume. That is why a per-filing difference of fractions of a cent becomes a $104,000-a-year decision between Gemini 2.5 Flash-Lite and GPT-5.5 at 10,000 filings a month.

Cheapest LLM for SEC 10-K Extraction at 10,000 Filings a Month 2026

The short answer

At 10,000 filings a month, the cheapest viable LLM for 10-K extraction is Gemini 2.5 Flash-Lite at $161.70/month, computed from the Token Cost Optimizer on a 130k-input filing shape. Gemini 3.5 Flash runs $2,614.50, Claude Opus 4.8 $5,943.00, and GPT-5.5 $8,715.00 for the same 10,000 extractions.

At 10,000 filings a month, the cheapest viable LLM for 10-K extraction is Gemini 2.5 Flash-Lite at $161.70/month¹, computed live from the Token Cost Optimizer on a 130k-input, 6k-output filing shape. The next rung, Gemini 2.5 Flash, is $567.00/month. Every frontier model runs into four or five figures: Gemini 3.5 Flash $2,614.50, Claude Opus 4.8 $5,943.00², GPT-5.5 $8,715.00³. At this volume the model tier is not a preference, it is the budget.

TL;DR

Model	Cost / filing	Cost / month (10,000 filings)	Cost / year
Gemini 2.5 Flash-Lite	$0.0162	$161.70	$1,967.35
Gemini 2.5 Flash	$0.0567	$567.00	$6,898.50
Claude Haiku 4.5	$0.1189	$1,188.60	$14,461.30
GPT-5.4 mini	$0.1307	$1,307.25	$15,904.88
Gemini 2.5 Pro	$0.2336	$2,336.25	$28,424.38
Gemini 3.5 Flash	$0.2615	$2,614.50	$31,809.75
Claude Opus 4.8	$0.5943	$5,943.00	$72,306.50
GPT-5.5	$0.8715	$8,715.00	$106,032.50

Same workload for every row: 130,000 input + 6,000 output tokens per filing, one call per filing, 5% retry, an 0.85 validation rate, and 10,000 filings a month (the engine run sets ideas/day to 333.3 so the monthly figure is exactly 10,000 filings). Anthropic input reflects a 40% cache hit; Google and OpenAI are priced at full list input.

Why 10,000 filings a month is the number that matters

A single filing's cost is a rounding error: even GPT-5.5, the most expensive model here, costs $0.87 for one 10-K. The decision feels free, so teams pick on familiarity. At 10,000 filings a month, the same per-filing rate compounds into a $1,967/year versus $106,032/year decision: a $104,000 annual gap for the identical token shape. Volume is what turns a per-call rounding error into a line item that needs a sign-off.

10,000 filings a month is a realistic figure for a market-wide nightly sweep. The US has roughly 6,000 to 8,000 reporting companies filing 10-Ks, 10-Qs, and 8-Ks; a pipeline that processes every new filing for a broad universe, re-extracts on amendments, and back-fills history lands in this range fast.

The cheapest viable model: Gemini 2.5 Flash-Lite

At $161.70/month, Gemini 2.5 Flash-Lite is 3.5x cheaper than the next rung (Gemini 2.5 Flash at $567.00) and 16x cheaper than the cheapest frontier model (Gemini 3.5 Flash at $2,614.50). Its 1M context window fits a full 10-K body in a single pass, so there is no chunking penalty for context fit.

The word "viable" is doing real work. Flash-Lite is the cheapest model that can hold a full filing in context; it is not automatically the most accurate. For structural extraction, pulling line items, dates, totals, and standard disclosures where the layout is regular, the budget tier is usually enough, and the 16x premium over Flash-Lite buys little. For extraction that requires reasoning across footnotes, reconciling a restatement, or resolving an ambiguous segment disclosure, a frontier model may earn its cost in fewer downstream errors. That is an accuracy question this cost analysis does not answer.

The frontier tier is a four-to-five-figure monthly commitment

At 10,000 filings a month, every frontier model crosses into serious money:

Gemini 3.5 Flash $2,614.50/month. The cheapest frontier pick, at Flash latency. Worth it only when the extraction genuinely needs agent-tier reasoning.
Gemini 2.5 Pro $2,336.25/month, with the largest context window here (2M) at slightly lower cost than 3.5 Flash.
Claude Opus 4.8 $5,943.00/month, even with a 40% input cache hit applied.
GPT-5.5 $8,715.00/month, the most expensive single-model path at this scale.

A team standardized on Opus or GPT-5.5 is paying a 2.3x to 3.3x premium over Gemini 3.5 Flash, or 37x to 54x over Flash-Lite, for the same 10,000 extractions.

The two-stage path beats any single model

The cost-optimal architecture at this volume is rarely one model. Route all 10,000 filings through Flash-Lite ($161.70/month) for the structural pass, run a cheap validator on the output, and escalate only the small fraction that fails validation to a frontier model. If 10% of filings need a frontier re-pass on Gemini 3.5 Flash, that adds roughly $261/month (10% of $2,614.50), for a blended bill near $423/month, versus $2,614.50 to run every filing on the frontier model. The two-stage pipeline captures most of the frontier accuracy on the hard cases while paying the budget rate on the easy 90%.

Decision guidance

Eval accuracy on your own filings first. A budget model that misreads a parenthetical "(loss)" is expensive in errors, not cheap.
Price your real token shape. Filing size and output verbosity move the per-call cost more than the model choice does within a tier. Recompute the table above with your numbers in the Token Cost Optimizer.
Two-stage the hard fields. A budget extractor with a frontier verifier on the contested subset beats running everything on either model alone.
Watch the validation rate. At an 0.85 validation rate, 15% of extractions are reworked. Lifting validation from 0.85 to 0.95 cuts effective cost more than switching down one model tier in many cases.

Connects to

Token Cost Optimizer: the per-filing cost engine behind every figure here. Recompute at your own volume.
The LLM-in-Finance Economics Report 2026: the full four-workload report this spoke feeds into.
Cheapest LLM for SEC Filings 2026: the per-filing budget deep dive across all vendors.
Gemini 3.5 Flash vs GPT-5.5 vs Claude Opus 4.8 for Finance Extraction 2026: the focused three-way extraction comparison.

References

Google. "Gemini Developer API pricing." ai.google.dev, verified 2026-05-26. https://ai.google.dev/gemini-api/docs/pricing ↩
Anthropic. "Pricing." platform.claude.com, verified 2026-06-18. https://platform.claude.com/docs/en/about-claude/pricing ↩
OpenAI. "API Pricing." developers.openai.com, verified 2026-05-26. https://developers.openai.com/api/docs/pricing ↩

Verified engine output

Show the recompute-verified inputs and outputs

10,000 filings/month — Gemini 2.5 Flash-Lite (cheapest viable)

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	333.3333333333333
validation_rate	0.85
cache_hit_rate	0.4
model_id	gemini-2-5-flash-lite

Result
model › id	gemini-2-5-flash-lite
model › provider	google
model › name	Gemini 2.5 Flash-Lite
model › input usd per mtoken	0.1
model › output usd per mtoken	0.4
model › context window	1000000
model › notes	Cheapest tier in this table; 1M context.
effective cost per call	0.0154
cost per idea	0.01617
cost per validated trade	0.019023529411764706
cost per day	5.39
cost per month	161.7
cost per year	1967.35

Computed live at build time.

10,000 filings/month — Gemini 2.5 Flash

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	333.3333333333333
validation_rate	0.85
cache_hit_rate	0.4
model_id	gemini-2-5-flash

Result
model › id	gemini-2-5-flash
model › provider	google
model › name	Gemini 2.5 Flash
model › input usd per mtoken	0.3
model › output usd per mtoken	2.5
model › context window	1000000
model › notes	Fast mid-tier; 1M context.
effective cost per call	0.054
cost per idea	0.0567
cost per validated trade	0.06670588235294118
cost per day	18.9
cost per month	567
cost per year	6898.499999999999

Computed live at build time.

10,000 filings/month — Claude Haiku 4.5

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	333.3333333333333
validation_rate	0.85
cache_hit_rate	0.4
model_id	claude-haiku-4-5

Result
model › id	claude-haiku-4-5
model › provider	anthropic
model › name	Claude Haiku 4.5
model › input usd per mtoken	1
model › output usd per mtoken	5
model › cache write usd per mtoken	1.25
model › cache read usd per mtoken	0.1
model › context window	200000
model › notes	Fast, cheap — filtering + pre-processing layers.
effective cost per call	0.1132
cost per idea	0.11886
cost per validated trade	0.13983529411764706
cost per day	39.62
cost per month	1188.6
cost per year	14461.3

Computed live at build time.

10,000 filings/month — GPT-5.4 mini

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	333.3333333333333
validation_rate	0.85
cache_hit_rate	0.4
model_id	gpt-5-mini

Result
model › id	gpt-5-mini
model › provider	openai
model › name	GPT-5.4 mini
model › input usd per mtoken	0.75
model › output usd per mtoken	4.5
model › context window	256000
model › notes	Mid-tier OpenAI (GPT-5.4 mini).
effective cost per call	0.1245
cost per idea	0.130725
cost per validated trade	0.15379411764705883
cost per day	43.575
cost per month	1307.25
cost per year	15904.875000000002

Computed live at build time.

10,000 filings/month — Gemini 2.5 Pro (2M context)

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	333.3333333333333
validation_rate	0.85
cache_hit_rate	0.4
model_id	gemini-2-5-pro

Result
model › id	gemini-2-5-pro
model › provider	google
model › name	Gemini 2.5 Pro
model › input usd per mtoken	1.25
model › output usd per mtoken	10
model › context window	2000000
model › notes	Large context (2M). Strong on document analysis.
effective cost per call	0.2225
cost per idea	0.23362500000000003
cost per validated trade	0.27485294117647063
cost per day	77.875
cost per month	2336.25
cost per year	28424.375

Computed live at build time.

10,000 filings/month — Gemini 3.5 Flash (cheapest frontier)

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	333.3333333333333
validation_rate	0.85
cache_hit_rate	0.4
model_id	gemini-3-5-flash

Result
model › id	gemini-3-5-flash
model › provider	google
model › name	Gemini 3.5 Flash
model › input usd per mtoken	1.5
model › output usd per mtoken	9
model › context window	1000000
model › notes	Frontier agent-tier at Flash speed — not a budget model (output ~3.6x Gemini 2.5 Flash).
effective cost per call	0.249
cost per idea	0.26145
cost per validated trade	0.30758823529411766
cost per day	87.15
cost per month	2614.5
cost per year	31809.750000000004

Computed live at build time.

10,000 filings/month — Claude Opus 4.8 (40% cache hit on input)

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	333.3333333333333
validation_rate	0.85
cache_hit_rate	0.4
model_id	claude-opus-4-8

Result
model › id	claude-opus-4-8
model › provider	anthropic
model › name	Claude Opus 4.8
model › input usd per mtoken	5
model › output usd per mtoken	25
model › cache write usd per mtoken	6.25
model › cache read usd per mtoken	0.5
model › context window	1000000
model › notes	Flagship reasoning model — 1M context.
effective cost per call	0.5660000000000001
cost per idea	0.5943
cost per validated trade	0.6991764705882354
cost per day	198.1
cost per month	5943
cost per year	72306.5

Computed live at build time.

10,000 filings/month — GPT-5.5 (premium)

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	333.3333333333333
validation_rate	0.85
cache_hit_rate	0.4
model_id	gpt-5

Result
model › id	gpt-5
model › provider	openai
model › name	GPT-5.5
model › input usd per mtoken	5
model › output usd per mtoken	30
model › context window	400000
model › notes	OpenAI frontier model (GPT-5.5).
effective cost per call	0.8300000000000001
cost per idea	0.8715000000000002
cost per validated trade	1.0252941176470591
cost per day	290.50000000000006
cost per month	8715.000000000002
cost per year	106032.50000000001

Computed live at build time.

Frequently asked questions

What is the cheapest LLM for SEC 10-K extraction at 10,000 filings a month?: Gemini 2.5 Flash-Lite at $161.70 per month, computed from the Token Cost Optimizer on a 130k-input, 6k-output filing shape. It is 3.5x cheaper than Gemini 2.5 Flash ($567.00) and 16x cheaper than the cheapest frontier model, Gemini 3.5 Flash ($2,614.50).
How much does GPT-5.5 cost to extract 10,000 10-Ks a month?: $8,715.00 per month at the verified $5/$30 per Mtok rate on a 130k-input, 6k-output filing shape, the most expensive single-model path at this scale and 54x the cost of Gemini 2.5 Flash-Lite.
Should I use a frontier model for SEC filing extraction at scale?: Only for the subset that needs reasoning. The cost-optimal pattern is a two-stage pipeline: run all 10,000 filings through Gemini 2.5 Flash-Lite ($161.70/month), then escalate the roughly 10% that fail validation to a frontier model, for a blended bill near $423/month instead of $2,614.50 to run everything on Gemini 3.5 Flash.
Are these accuracy rankings?: No. Every figure is a cost number computed from verified vendor list prices and recomputed by CI against the shipped cost engine. No model was tested for extraction accuracy. Run your own eval on your own filings before relying on the cheapest tier.
Why does the monthly cost scale linearly with filing count?: Each filing is one independent extraction call, so cost scales linearly with volume. That is why a per-filing difference of fractions of a cent becomes a $104,000-a-year decision between Gemini 2.5 Flash-Lite and GPT-5.5 at 10,000 filings a month.

TL;DR

Why 10,000 filings a month is the number that matters

The cheapest viable model: Gemini 2.5 Flash-Lite

The frontier tier is a four-to-five-figure monthly commitment

The two-stage path beats any single model

Decision guidance

Connects to

References

Footnotes

Verified engine output

Frequently asked questions