The short answer

For 10-K extraction in 2026, the three-way pick is not close on price. The Token Cost Optimizer prices Gemini 3.5 Flash at $235.31/month, against GPT-5.5 at $784.35/mo and Claude Opus 4.7 at $534.87/mo. Gemini 3.5 Flash is cheapest of the three but still about 16x the budget tier, Gemini 2.5 Flash-Lite. Frontier intelligence at Flash speed, not budget pricing.

For 10-K extraction in 2026, the three-way pick among Gemini 3.5 Flash, GPT-5.5, and Claude Opus 4.7 is not close on price. On a full-filing extraction workload (130k input + 6k output per filing, 30 filings/day) the Token Cost Optimizer prices Gemini 3.5 Flash at $0.2490/call and $235.31/month1, against GPT-5.5 at $0.8300/call ($784.35/mo)2 and Claude Opus 4.7 at $0.5660/call ($534.87/mo)3. The honest caveat: Gemini 3.5 Flash is the cheapest of those three, but it is still ~16x the genuine budget tier, Gemini 2.5 Flash-Lite ($0.0154/call), and ~4.6x Gemini 2.5 Flash. Frontier intelligence at Flash speed, not budget pricing. Every figure below is recomputed live from the shipped engine bundle.

TL;DR

Model $/Mtok in $/Mtok out Cost / filing Cost / validated Cost / month
Gemini 2.5 Flash-Lite $0.10 $0.40 $0.0154 $0.0190 $14.55
Gemini 2.5 Flash $0.30 $2.50 $0.0540 $0.0667 $51.03
Gemini 2.5 Pro $1.25 $10.00 $0.2225 $0.2749 $210.26
Gemini 3.5 Flash $1.50 $9.00 $0.2490 $0.3076 $235.31
Claude Opus 4.7 $5.00 $25.00 $0.5660 $0.6992 $534.87
GPT-5.5 $5.00 $30.00 $0.8300 $1.0253 $784.35

Same workload for every row: 130,000 input + 6,000 output tokens per filing, one call per filing, 5% retry, 30 filings/day, 0.85 validation rate, 0.40 cache-hit assumption. Per-call and monthly costs are the engine's own output on each model's verified list rate, not a benchmark run.

The 10-K extraction scenario

A full 10-K body lands around 100k-150k tokens. The workload here pins 130k input (the filing plus a fixed extraction schema) and 6k output (a structured field dump), one call per filing, 30 filings a day — a market-wide nightly sweep. The validation rate of 0.85 means 85% of extractions clear the downstream check and become usable, so the cost-per-validated figure marks up the raw call cost by ~18%.

The cache-hit assumption (0.40) matters for one family only. The Token Cost Optimizer applies cache pricing to Anthropic models (cache reads at 0.1x base input); for Google and OpenAI the engine prices input at the full list rate. That is a deliberate, conservative modeling choice, and it is why Claude Opus 4.7's $0.5660/call already reflects a 40% cache hit on its input — without caching, Opus would be more expensive still.

Gemini 3.5 Flash is the cheapest of the three frontier picks

Among the three headline models, the ranking is unambiguous: Gemini 3.5 Flash ($0.2490/call) beats Claude Opus 4.7 ($0.5660/call, 2.3x more) and GPT-5.5 ($0.8300/call, 3.3x more). On a 30-filing-a-day sweep that is the difference between $235/mo and $784/mo — a $549/mo gap for the same token shape.

If your extraction needs agent-tier reasoning (chasing a figure across a footnote, reconciling a restatement, resolving an ambiguous segment disclosure) and you want it at Flash latency, Gemini 3.5 Flash is the value pick of the frontier tier. That is the genuine win here, and it is worth stating plainly.

But it is not the budget pick

Now the honest part. The same engine, same workload, prices Gemini 2.5 Flash-Lite at $0.0154/call and Gemini 2.5 Flash at $0.0540/call. Gemini 3.5 Flash is ~16x Flash-Lite and ~4.6x Flash on this filing. For pure field extraction — pulling line items, dates, and totals where the structure is regular and the model does not need to reason — that 16x premium buys very little. The "cheap Gemini for filings" story is Flash-Lite, and it has the same 1M context window that swallows a full 10-K whole.

Note too that Gemini 3.5 Flash ($0.2490/call) and Gemini 2.5 Pro ($0.2225/call) sit within 12% of each other. Gemini 3.5 Flash's edge over Pro is latency, not price.

Google's benchmark claim is Google's claim

Google said Gemini 3.5 Flash beats Gemini 3.1 Pro on coding and agentic benchmarks at the I/O launch. On extraction accuracy for financial filings specifically, there was no independent third-party eval at launch, and we have run none. The numbers in this article are cost numbers, computed from list prices; they say nothing about which model reads a 10-K most accurately. The defensible workflow is unchanged: run a small accuracy eval on your own filings, pick the cheapest model that clears your bar, and do not let a vendor benchmark stand in for your own.

Where each model is the right pick

  • Pure field extraction at volume Gemini 2.5 Flash-Lite ($0.0154/call). The cheapest correct path when the task is structural, not reasoning-heavy.
  • Extraction that needs agent-tier reasoning, at Flash latency Gemini 3.5 Flash ($0.2490/call). The frontier value pick; budget ~$235/mo for the sweep above.
  • Maximum extraction accuracy with a 2M window, latency not critical Gemini 2.5 Pro ($0.2225/call), at near-identical cost.
  • You are already standardized on Claude or OpenAI Opus 4.7 ($0.5660/call) or GPT-5.5 ($0.8300/call) — pay the premium for the vendor relationship, or route only the hard subset to them.

Decision guidance

  1. Eval accuracy first. A budget model that misreads a parenthetical "(loss)" is expensive in errors, not cheap.
  2. Match the tier to the task. Structural extraction → Flash-Lite. Reasoning-heavy extraction at speed → Gemini 3.5 Flash.
  3. Two-stage the hard fields. A budget extractor feeding a frontier verifier on the contested fields often beats either model alone.
  4. Price your real token shape. Filing size and output verbosity move the per-call cost more than the model choice does within a tier.

Connects to

References

Footnotes

  1. Google. "Gemini Developer API pricing." ai.google.dev, verified 2026-05-25. https://ai.google.dev/gemini-api/docs/pricing

  2. OpenAI. "API Pricing." developers.openai.com, verified 2026-05-25. https://developers.openai.com/api/docs/pricing

  3. Anthropic. "Pricing." platform.claude.com, verified 2026-05-25. https://platform.claude.com/docs/en/about-claude/pricing

Verified engine output

Show the recompute-verified inputs and outputs
10-K extraction — Gemini 3.5 Flash (130k in + 6k out, 30 filings/day)
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day30
validation_rate0.85
cache_hit_rate0.4
model_idgemini-3-5-flash
Result
model › idgemini-3-5-flash
model › providergoogle
model › nameGemini 3.5 Flash
model › input usd per mtoken1.5
model › output usd per mtoken9
model › context window1000000
model › notesFrontier agent-tier at Flash speed — not a budget model (output ~3.6x Gemini 2.5 Flash).
effective cost per call0.249
cost per idea0.26145
cost per validated trade0.30758823529411766
cost per day7.843500000000001
cost per month235.305
cost per year2862.8775

Computed live at build time.

Same workload — GPT-5.5
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day30
validation_rate0.85
cache_hit_rate0.4
model_idgpt-5
Result
model › idgpt-5
model › provideropenai
model › nameGPT-5.5
model › input usd per mtoken5
model › output usd per mtoken30
model › context window400000
model › notesOpenAI frontier model (GPT-5.5).
effective cost per call0.8300000000000001
cost per idea0.8715000000000002
cost per validated trade1.0252941176470591
cost per day26.145000000000003
cost per month784.3500000000001
cost per year9542.925000000001

Computed live at build time.

Same workload — Claude Opus 4.7 (40% cache hit on input)
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day30
validation_rate0.85
cache_hit_rate0.4
model_idclaude-opus-4-7
Result
model › idclaude-opus-4-7
model › provideranthropic
model › nameClaude Opus 4.7
model › input usd per mtoken5
model › output usd per mtoken25
model › cache write usd per mtoken6.25
model › cache read usd per mtoken0.5
model › context window1000000
model › notesFlagship reasoning model — 1M context.
effective cost per call0.5660000000000001
cost per idea0.5943
cost per validated trade0.6991764705882354
cost per day17.829
cost per month534.87
cost per year6507.585

Computed live at build time.

Same workload — Gemini 2.5 Pro
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day30
validation_rate0.85
cache_hit_rate0.4
model_idgemini-2-5-pro
Result
model › idgemini-2-5-pro
model › providergoogle
model › nameGemini 2.5 Pro
model › input usd per mtoken1.25
model › output usd per mtoken10
model › context window2000000
model › notesLarge context (2M). Strong on document analysis.
effective cost per call0.2225
cost per idea0.23362500000000003
cost per validated trade0.27485294117647063
cost per day7.008750000000001
cost per month210.26250000000002
cost per year2558.1937500000004

Computed live at build time.

Same workload — Gemini 2.5 Flash
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day30
validation_rate0.85
cache_hit_rate0.4
model_idgemini-2-5-flash
Result
model › idgemini-2-5-flash
model › providergoogle
model › nameGemini 2.5 Flash
model › input usd per mtoken0.3
model › output usd per mtoken2.5
model › context window1000000
model › notesFast mid-tier; 1M context.
effective cost per call0.054
cost per idea0.0567
cost per validated trade0.06670588235294118
cost per day1.701
cost per month51.03
cost per year620.865

Computed live at build time.

Same workload — Gemini 2.5 Flash-Lite (genuine budget tier)
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day30
validation_rate0.85
cache_hit_rate0.4
model_idgemini-2-5-flash-lite
Result
model › idgemini-2-5-flash-lite
model › providergoogle
model › nameGemini 2.5 Flash-Lite
model › input usd per mtoken0.1
model › output usd per mtoken0.4
model › context window1000000
model › notesCheapest tier in this table; 1M context.
effective cost per call0.0154
cost per idea0.01617
cost per validated trade0.019023529411764706
cost per day0.48510000000000003
cost per month14.553
cost per year177.06150000000002

Computed live at build time.

Frequently asked questions

Which is cheapest for 10-K extraction: Gemini 3.5 Flash, GPT-5.5, or Opus 4.7?
Gemini 3.5 Flash at $0.2490 per call ($235.31 per month on a 30-filing/day sweep), against Opus 4.7 at $0.5660 per call and GPT-5.5 at $0.8300 per call. But Gemini 2.5 Flash-Lite ($0.0154 per call) is about 16x cheaper still if the task does not need agent-tier reasoning.
Is Gemini 3.5 Flash a budget model for filings?
No. It is the cheapest of the three frontier picks here, but about 16x Gemini 2.5 Flash-Lite and about 4.6x Gemini 2.5 Flash on the same filing. The budget tier is Flash-Lite; Gemini 3.5 Flash is frontier intelligence at Flash speed.
Why does Opus 4.7 look cheaper than its raw rate suggests?
The engine applies a 40% cache-hit on Anthropic input (cache reads at 0.1x base input). Google and OpenAI are priced at full list input here, a conservative modeling choice. Without caching, Opus would cost more.
Is Google's claim that 3.5 Flash beats 3.1 Pro verified for extraction?
No. That is Google's launch benchmark claim, not an independent finance-extraction eval, and none was run here. These are cost numbers from list prices; run your own accuracy eval before relying on a model.
Where do these extraction-cost numbers come from?
Each model's verified 2026-05-25 list rate, run through the Token Cost Optimizer on one fixed extraction workload. Recomputed from the shipped bundle, not a benchmark run.
What's the cheapest LLM for 10-K extraction under $100/month?
On a 30-filing/day sweep, only the two economy Gemini tiers clear $100: Gemini 2.5 Flash-Lite at $14.55 per month and Gemini 2.5 Flash at $51.03 per month. Gemini 2.5 Pro ($210.26), Gemini 3.5 Flash ($235.31), Opus 4.7 ($534.87), and GPT-5.5 ($784.35) all run well over $100 at this volume.
Which Gemini tier should I use for SEC filing extraction?
For pure field extraction where the structure is regular, Gemini 2.5 Flash-Lite — $14.55 per month on this sweep, with a 1M context that swallows a full 10-K. Step up to Gemini 3.5 Flash ($235.31 per month) only when the extraction needs agent-tier reasoning (reconciling a restatement, chasing a figure across a footnote) at Flash latency. The 3.5 Flash premium is about 16x Flash-Lite per filing.
Is Gemini 3.5 Flash worth it over GPT-5.5 for filing extraction at scale?
On cost, yes — Gemini 3.5 Flash is $235.31 per month against GPT-5.5's $784.35 per month on the same 30-filing/day sweep, a $549 per month gap for the same token shape. These are list-price cost numbers only; run your own accuracy eval before switching, since neither was independently benchmarked here for extraction accuracy.