Which is cheapest for 10-K extraction: Gemini 3.5 Flash, GPT-5.5, or Opus 4.8?

Gemini 3.5 Flash at $0.2490 per call ($235.31 per month on a 30-filing/day sweep), against Opus 4.8 at $0.5660 per call and GPT-5.5 at $0.8300 per call. But Gemini 2.5 Flash-Lite ($0.0154 per call) is about 16x cheaper still if the task does not need agent-tier reasoning.

Is Gemini 3.5 Flash a budget model for filings?

No. It is the cheapest of the three frontier picks here, but about 16x Gemini 2.5 Flash-Lite and about 4.6x Gemini 2.5 Flash on the same filing. The budget tier is Flash-Lite; Gemini 3.5 Flash is frontier intelligence at Flash speed.

Why does Opus 4.8 look cheaper than its raw rate suggests?

The engine applies a 40% cache-hit on Anthropic input (cache reads at 0.1x base input). Google and OpenAI are priced at full list input here, a conservative modeling choice. Without caching, Opus would cost more.

Is Google's claim that 3.5 Flash beats 3.1 Pro verified for extraction?

No. That is Google's launch benchmark claim, not an independent finance-extraction eval, and none was run here. These are cost numbers from list prices; run your own accuracy eval before relying on a model.

Where do these extraction-cost numbers come from?

Each model's verified 2026-05-25 list rate, run through the Token Cost Optimizer on one fixed extraction workload. Recomputed from the shipped bundle, not a benchmark run.

What's the cheapest LLM for 10-K extraction under $100/month?

On a 30-filing/day sweep, only the two economy Gemini tiers clear $100: Gemini 2.5 Flash-Lite at $14.55 per month and Gemini 2.5 Flash at $51.03 per month. Gemini 2.5 Pro ($210.26), Gemini 3.5 Flash ($235.31), Opus 4.8 ($534.87), and GPT-5.5 ($784.35) all run well over $100 at this volume.

Which Gemini tier should I use for SEC filing extraction?

For pure field extraction where the structure is regular, Gemini 2.5 Flash-Lite — $14.55 per month on this sweep, with a 1M context that swallows a full 10-K. Step up to Gemini 3.5 Flash ($235.31 per month) only when the extraction needs agent-tier reasoning (reconciling a restatement, chasing a figure across a footnote) at Flash latency. The 3.5 Flash premium is about 16x Flash-Lite per filing.

Is Gemini 3.5 Flash worth it over GPT-5.5 for filing extraction at scale?

On cost, yes — Gemini 3.5 Flash is $235.31 per month against GPT-5.5's $784.35 per month on the same 30-filing/day sweep, a $549 per month gap for the same token shape. These are list-price cost numbers only; run your own accuracy eval before switching, since neither was independently benchmarked here for extraction accuracy.

Gemini 3.5 Flash vs GPT-5.5 vs Claude Opus 4.8 for Finance Extraction 2026

The short answer

For 10-K extraction in 2026, the three-way pick is not close on price. The Token Cost Optimizer prices Gemini 3.5 Flash at $235.31/month, against GPT-5.5 at $784.35/mo and Claude Opus 4.8 at $534.87/mo. Gemini 3.5 Flash is cheapest of the three but still about 16x the budget tier, Gemini 2.5 Flash-Lite. Frontier intelligence at Flash speed, not budget pricing.

For 10-K extraction in 2026, the three-way pick among Gemini 3.5 Flash, GPT-5.5, and Claude Opus 4.8 is not close on price. On a full-filing extraction workload (130k input + 6k output per filing, 30 filings/day) the Token Cost Optimizer prices Gemini 3.5 Flash at $0.2490/call and $235.31/month¹, against GPT-5.5 at $0.8300/call ($784.35/mo)² and Claude Opus 4.8 at $0.5660/call ($534.87/mo)³. The honest caveat: Gemini 3.5 Flash is the cheapest of those three, but it is still ~16x the genuine budget tier, Gemini 2.5 Flash-Lite ($0.0154/call), and ~4.6x Gemini 2.5 Flash. Frontier intelligence at Flash speed, not budget pricing. Every figure below is recomputed live from the shipped engine bundle.

TL;DR

Model	$/Mtok in	$/Mtok out	Cost / filing	Cost / validated	Cost / month
Gemini 2.5 Flash-Lite	$0.10	$0.40	$0.0154	$0.0190	$14.55
Gemini 2.5 Flash	$0.30	$2.50	$0.0540	$0.0667	$51.03
Gemini 2.5 Pro	$1.25	$10.00	$0.2225	$0.2749	$210.26
Gemini 3.5 Flash	$1.50	$9.00	$0.2490	$0.3076	$235.31
Claude Opus 4.8	$5.00	$25.00	$0.5660	$0.6992	$534.87
GPT-5.5	$5.00	$30.00	$0.8300	$1.0253	$784.35

Same workload for every row: 130,000 input + 6,000 output tokens per filing, one call per filing, 5% retry, 30 filings/day, 0.85 validation rate, 0.40 cache-hit assumption. Per-call and monthly costs are the engine's own output on each model's verified list rate, not a benchmark run.

The 10-K extraction scenario

A full 10-K body lands around 100k-150k tokens. The workload here pins 130k input (the filing plus a fixed extraction schema) and 6k output (a structured field dump), one call per filing, 30 filings a day — a market-wide nightly sweep. The validation rate of 0.85 means 85% of extractions clear the downstream check and become usable, so the cost-per-validated figure marks up the raw call cost by ~18%.

The cache-hit assumption (0.40) matters for one family only. The Token Cost Optimizer applies cache pricing to Anthropic models (cache reads at 0.1x base input); for Google and OpenAI the engine prices input at the full list rate. That is a deliberate, conservative modeling choice, and it is why Claude Opus 4.8's $0.5660/call already reflects a 40% cache hit on its input — without caching, Opus would be more expensive still.

Gemini 3.5 Flash is the cheapest of the three frontier picks

Among the three headline models, the ranking is unambiguous: Gemini 3.5 Flash ($0.2490/call) beats Claude Opus 4.8 ($0.5660/call, 2.3x more) and GPT-5.5 ($0.8300/call, 3.3x more). On a 30-filing-a-day sweep that is the difference between $235/mo and $784/mo — a $549/mo gap for the same token shape.

If your extraction needs agent-tier reasoning (chasing a figure across a footnote, reconciling a restatement, resolving an ambiguous segment disclosure) and you want it at Flash latency, Gemini 3.5 Flash is the value pick of the frontier tier. That is the genuine win here, and it is worth stating plainly.

But it is not the budget pick

Now the honest part. The same engine, same workload, prices Gemini 2.5 Flash-Lite at $0.0154/call and Gemini 2.5 Flash at $0.0540/call. Gemini 3.5 Flash is ~16x Flash-Lite and ~4.6x Flash on this filing. For pure field extraction — pulling line items, dates, and totals where the structure is regular and the model does not need to reason — that 16x premium buys very little. The "cheap Gemini for filings" story is Flash-Lite, and it has the same 1M context window that swallows a full 10-K whole.

Note too that Gemini 3.5 Flash ($0.2490/call) and Gemini 2.5 Pro ($0.2225/call) sit within 12% of each other. Gemini 3.5 Flash's edge over Pro is latency, not price.

Google's benchmark claim is Google's claim

Google said Gemini 3.5 Flash beats Gemini 3.1 Pro on coding and agentic benchmarks at the I/O launch. On extraction accuracy for financial filings specifically, there was no independent third-party eval at launch, and we have run none. The numbers in this article are cost numbers, computed from list prices; they say nothing about which model reads a 10-K most accurately. The defensible workflow is unchanged: run a small accuracy eval on your own filings, pick the cheapest model that clears your bar, and do not let a vendor benchmark stand in for your own.

Where each model is the right pick

Pure field extraction at volume Gemini 2.5 Flash-Lite ($0.0154/call). The cheapest correct path when the task is structural, not reasoning-heavy.
Extraction that needs agent-tier reasoning, at Flash latency Gemini 3.5 Flash ($0.2490/call). The frontier value pick; budget ~$235/mo for the sweep above.
Maximum extraction accuracy with a 2M window, latency not critical Gemini 2.5 Pro ($0.2225/call), at near-identical cost.
You are already standardized on Claude or OpenAI Opus 4.8 ($0.5660/call) or GPT-5.5 ($0.8300/call) — pay the premium for the vendor relationship, or route only the hard subset to them.

Decision guidance

Eval accuracy first. A budget model that misreads a parenthetical "(loss)" is expensive in errors, not cheap.
Match the tier to the task. Structural extraction → Flash-Lite. Reasoning-heavy extraction at speed → Gemini 3.5 Flash.
Two-stage the hard fields. A budget extractor feeding a frontier verifier on the contested fields often beats either model alone.
Price your real token shape. Filing size and output verbosity move the per-call cost more than the model choice does within a tier.

Connects to

Token Cost Optimizer: the per-filing cost engine behind every number here.
Cheapest LLM for SEC Filings 2026: the budget-extraction deep dive across all vendors.
Best LLM for Financial Analysis 2026: the task-tiered pillar.
Claude vs GPT-5 vs Gemini for Financial Analysis 2026: the tier-vs-vendor framing in full.

References

Google. "Gemini Developer API pricing." ai.google.dev, verified 2026-05-25. https://ai.google.dev/gemini-api/docs/pricing ↩
OpenAI. "API Pricing." developers.openai.com, verified 2026-05-25. https://developers.openai.com/api/docs/pricing ↩
Anthropic. "Pricing." platform.claude.com, verified 2026-05-25. https://platform.claude.com/docs/en/about-claude/pricing ↩

Verified engine output

Show the recompute-verified inputs and outputs

10-K extraction — Gemini 3.5 Flash (130k in + 6k out, 30 filings/day)

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	30
validation_rate	0.85
cache_hit_rate	0.4
model_id	gemini-3-5-flash

Result
model › id	gemini-3-5-flash
model › provider	google
model › name	Gemini 3.5 Flash
model › input usd per mtoken	1.5
model › output usd per mtoken	9
model › context window	1000000
model › notes	Frontier agent-tier at Flash speed — not a budget model (output ~3.6x Gemini 2.5 Flash).
effective cost per call	0.249
cost per idea	0.26145
cost per validated trade	0.30758823529411766
cost per day	7.843500000000001
cost per month	235.305
cost per year	2862.8775

Computed live at build time.

Same workload — GPT-5.5

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	30
validation_rate	0.85
cache_hit_rate	0.4
model_id	gpt-5

Result
model › id	gpt-5
model › provider	openai
model › name	GPT-5.5
model › input usd per mtoken	5
model › output usd per mtoken	30
model › context window	400000
model › notes	OpenAI frontier model (GPT-5.5).
effective cost per call	0.8300000000000001
cost per idea	0.8715000000000002
cost per validated trade	1.0252941176470591
cost per day	26.145000000000003
cost per month	784.3500000000001
cost per year	9542.925000000001

Computed live at build time.

Same workload — Claude Opus 4.8 (40% cache hit on input)

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	30
validation_rate	0.85
cache_hit_rate	0.4
model_id	claude-opus-4-8

Result
model › id	claude-opus-4-8
model › provider	anthropic
model › name	Claude Opus 4.8
model › input usd per mtoken	5
model › output usd per mtoken	25
model › cache write usd per mtoken	6.25
model › cache read usd per mtoken	0.5
model › context window	1000000
model › notes	Flagship reasoning model — 1M context.
effective cost per call	0.5660000000000001
cost per idea	0.5943
cost per validated trade	0.6991764705882354
cost per day	17.829
cost per month	534.87
cost per year	6507.585

Computed live at build time.

Same workload — Gemini 2.5 Pro

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	30
validation_rate	0.85
cache_hit_rate	0.4
model_id	gemini-2-5-pro

Result
model › id	gemini-2-5-pro
model › provider	google
model › name	Gemini 2.5 Pro
model › input usd per mtoken	1.25
model › output usd per mtoken	10
model › context window	2000000
model › notes	Large context (2M). Strong on document analysis.
effective cost per call	0.2225
cost per idea	0.23362500000000003
cost per validated trade	0.27485294117647063
cost per day	7.008750000000001
cost per month	210.26250000000002
cost per year	2558.1937500000004

Computed live at build time.

Same workload — Gemini 2.5 Flash

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	30
validation_rate	0.85
cache_hit_rate	0.4
model_id	gemini-2-5-flash

Result
model › id	gemini-2-5-flash
model › provider	google
model › name	Gemini 2.5 Flash
model › input usd per mtoken	0.3
model › output usd per mtoken	2.5
model › context window	1000000
model › notes	Fast mid-tier; 1M context.
effective cost per call	0.054
cost per idea	0.0567
cost per validated trade	0.06670588235294118
cost per day	1.701
cost per month	51.03
cost per year	620.865

Computed live at build time.

Same workload — Gemini 2.5 Flash-Lite (genuine budget tier)

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	30
validation_rate	0.85
cache_hit_rate	0.4
model_id	gemini-2-5-flash-lite

Result
model › id	gemini-2-5-flash-lite
model › provider	google
model › name	Gemini 2.5 Flash-Lite
model › input usd per mtoken	0.1
model › output usd per mtoken	0.4
model › context window	1000000
model › notes	Cheapest tier in this table; 1M context.
effective cost per call	0.0154
cost per idea	0.01617
cost per validated trade	0.019023529411764706
cost per day	0.48510000000000003
cost per month	14.553
cost per year	177.06150000000002

Computed live at build time.

Frequently asked questions

Which is cheapest for 10-K extraction: Gemini 3.5 Flash, GPT-5.5, or Opus 4.8?: Gemini 3.5 Flash at $0.2490 per call ($235.31 per month on a 30-filing/day sweep), against Opus 4.8 at $0.5660 per call and GPT-5.5 at $0.8300 per call. But Gemini 2.5 Flash-Lite ($0.0154 per call) is about 16x cheaper still if the task does not need agent-tier reasoning.
Is Gemini 3.5 Flash a budget model for filings?: No. It is the cheapest of the three frontier picks here, but about 16x Gemini 2.5 Flash-Lite and about 4.6x Gemini 2.5 Flash on the same filing. The budget tier is Flash-Lite; Gemini 3.5 Flash is frontier intelligence at Flash speed.
Why does Opus 4.8 look cheaper than its raw rate suggests?: The engine applies a 40% cache-hit on Anthropic input (cache reads at 0.1x base input). Google and OpenAI are priced at full list input here, a conservative modeling choice. Without caching, Opus would cost more.
Is Google's claim that 3.5 Flash beats 3.1 Pro verified for extraction?: No. That is Google's launch benchmark claim, not an independent finance-extraction eval, and none was run here. These are cost numbers from list prices; run your own accuracy eval before relying on a model.
Where do these extraction-cost numbers come from?: Each model's verified 2026-05-25 list rate, run through the Token Cost Optimizer on one fixed extraction workload. Recomputed from the shipped bundle, not a benchmark run.
What's the cheapest LLM for 10-K extraction under $100/month?: On a 30-filing/day sweep, only the two economy Gemini tiers clear $100: Gemini 2.5 Flash-Lite at $14.55 per month and Gemini 2.5 Flash at $51.03 per month. Gemini 2.5 Pro ($210.26), Gemini 3.5 Flash ($235.31), Opus 4.8 ($534.87), and GPT-5.5 ($784.35) all run well over $100 at this volume.
Which Gemini tier should I use for SEC filing extraction?: For pure field extraction where the structure is regular, Gemini 2.5 Flash-Lite — $14.55 per month on this sweep, with a 1M context that swallows a full 10-K. Step up to Gemini 3.5 Flash ($235.31 per month) only when the extraction needs agent-tier reasoning (reconciling a restatement, chasing a figure across a footnote) at Flash latency. The 3.5 Flash premium is about 16x Flash-Lite per filing.
Is Gemini 3.5 Flash worth it over GPT-5.5 for filing extraction at scale?: On cost, yes — Gemini 3.5 Flash is $235.31 per month against GPT-5.5's $784.35 per month on the same 30-filing/day sweep, a $549 per month gap for the same token shape. These are list-price cost numbers only; run your own accuracy eval before switching, since neither was independently benchmarked here for extraction accuracy.

TL;DR

The 10-K extraction scenario

Gemini 3.5 Flash is the cheapest of the three frontier picks

But it is not the budget pick

Google's benchmark claim is Google's claim

Where each model is the right pick

Decision guidance

Connects to

References

Footnotes

Verified engine output

Frequently asked questions