The short answer
For 10-K extraction in 2026, the three-way pick is not close on price. The Token Cost Optimizer prices Gemini 3.5 Flash at $235.31/month, against GPT-5.5 at $784.35/mo and Claude Opus 4.7 at $534.87/mo. Gemini 3.5 Flash is cheapest of the three but still about 16x the budget tier, Gemini 2.5 Flash-Lite. Frontier intelligence at Flash speed, not budget pricing.
For 10-K extraction in 2026, the three-way pick among Gemini 3.5 Flash, GPT-5.5, and Claude Opus 4.7 is not close on price. On a full-filing extraction workload (130k input + 6k output per filing, 30 filings/day) the Token Cost Optimizer prices Gemini 3.5 Flash at $0.2490/call and $235.31/month1, against GPT-5.5 at $0.8300/call ($784.35/mo)2 and Claude Opus 4.7 at $0.5660/call ($534.87/mo)3. The honest caveat: Gemini 3.5 Flash is the cheapest of those three, but it is still ~16x the genuine budget tier, Gemini 2.5 Flash-Lite ($0.0154/call), and ~4.6x Gemini 2.5 Flash. Frontier intelligence at Flash speed, not budget pricing. Every figure below is recomputed live from the shipped engine bundle.
TL;DR
| Model | $/Mtok in | $/Mtok out | Cost / filing | Cost / validated | Cost / month |
|---|---|---|---|---|---|
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | $0.0154 | $0.0190 | $14.55 |
| Gemini 2.5 Flash | $0.30 | $2.50 | $0.0540 | $0.0667 | $51.03 |
| Gemini 2.5 Pro | $1.25 | $10.00 | $0.2225 | $0.2749 | $210.26 |
| Gemini 3.5 Flash | $1.50 | $9.00 | $0.2490 | $0.3076 | $235.31 |
| Claude Opus 4.7 | $5.00 | $25.00 | $0.5660 | $0.6992 | $534.87 |
| GPT-5.5 | $5.00 | $30.00 | $0.8300 | $1.0253 | $784.35 |
Same workload for every row: 130,000 input + 6,000 output tokens per filing, one call per filing, 5% retry, 30 filings/day, 0.85 validation rate, 0.40 cache-hit assumption. Per-call and monthly costs are the engine's own output on each model's verified list rate, not a benchmark run.
The 10-K extraction scenario
A full 10-K body lands around 100k-150k tokens. The workload here pins 130k input (the filing plus a fixed extraction schema) and 6k output (a structured field dump), one call per filing, 30 filings a day — a market-wide nightly sweep. The validation rate of 0.85 means 85% of extractions clear the downstream check and become usable, so the cost-per-validated figure marks up the raw call cost by ~18%.
The cache-hit assumption (0.40) matters for one family only. The Token Cost Optimizer applies cache pricing to Anthropic models (cache reads at 0.1x base input); for Google and OpenAI the engine prices input at the full list rate. That is a deliberate, conservative modeling choice, and it is why Claude Opus 4.7's $0.5660/call already reflects a 40% cache hit on its input — without caching, Opus would be more expensive still.
Gemini 3.5 Flash is the cheapest of the three frontier picks
Among the three headline models, the ranking is unambiguous: Gemini 3.5 Flash ($0.2490/call) beats Claude Opus 4.7 ($0.5660/call, 2.3x more) and GPT-5.5 ($0.8300/call, 3.3x more). On a 30-filing-a-day sweep that is the difference between $235/mo and $784/mo — a $549/mo gap for the same token shape.
If your extraction needs agent-tier reasoning (chasing a figure across a footnote, reconciling a restatement, resolving an ambiguous segment disclosure) and you want it at Flash latency, Gemini 3.5 Flash is the value pick of the frontier tier. That is the genuine win here, and it is worth stating plainly.
But it is not the budget pick
Now the honest part. The same engine, same workload, prices Gemini 2.5 Flash-Lite at $0.0154/call and Gemini 2.5 Flash at $0.0540/call. Gemini 3.5 Flash is ~16x Flash-Lite and ~4.6x Flash on this filing. For pure field extraction — pulling line items, dates, and totals where the structure is regular and the model does not need to reason — that 16x premium buys very little. The "cheap Gemini for filings" story is Flash-Lite, and it has the same 1M context window that swallows a full 10-K whole.
Note too that Gemini 3.5 Flash ($0.2490/call) and Gemini 2.5 Pro ($0.2225/call) sit within 12% of each other. Gemini 3.5 Flash's edge over Pro is latency, not price.
Google's benchmark claim is Google's claim
Google said Gemini 3.5 Flash beats Gemini 3.1 Pro on coding and agentic benchmarks at the I/O launch. On extraction accuracy for financial filings specifically, there was no independent third-party eval at launch, and we have run none. The numbers in this article are cost numbers, computed from list prices; they say nothing about which model reads a 10-K most accurately. The defensible workflow is unchanged: run a small accuracy eval on your own filings, pick the cheapest model that clears your bar, and do not let a vendor benchmark stand in for your own.
Where each model is the right pick
- Pure field extraction at volume Gemini 2.5 Flash-Lite ($0.0154/call). The cheapest correct path when the task is structural, not reasoning-heavy.
- Extraction that needs agent-tier reasoning, at Flash latency Gemini 3.5 Flash ($0.2490/call). The frontier value pick; budget ~$235/mo for the sweep above.
- Maximum extraction accuracy with a 2M window, latency not critical Gemini 2.5 Pro ($0.2225/call), at near-identical cost.
- You are already standardized on Claude or OpenAI Opus 4.7 ($0.5660/call) or GPT-5.5 ($0.8300/call) — pay the premium for the vendor relationship, or route only the hard subset to them.
Decision guidance
- Eval accuracy first. A budget model that misreads a parenthetical "(loss)" is expensive in errors, not cheap.
- Match the tier to the task. Structural extraction → Flash-Lite. Reasoning-heavy extraction at speed → Gemini 3.5 Flash.
- Two-stage the hard fields. A budget extractor feeding a frontier verifier on the contested fields often beats either model alone.
- Price your real token shape. Filing size and output verbosity move the per-call cost more than the model choice does within a tier.
Connects to
- Token Cost Optimizer: the per-filing cost engine behind every number here.
- Cheapest LLM for SEC Filings 2026: the budget-extraction deep dive across all vendors.
- Best LLM for Financial Analysis 2026: the task-tiered pillar.
- Claude vs GPT-5 vs Gemini for Financial Analysis 2026: the tier-vs-vendor framing in full.
References
Footnotes
-
Google. "Gemini Developer API pricing." ai.google.dev, verified 2026-05-25. https://ai.google.dev/gemini-api/docs/pricing ↩
-
OpenAI. "API Pricing." developers.openai.com, verified 2026-05-25. https://developers.openai.com/api/docs/pricing ↩
-
Anthropic. "Pricing." platform.claude.com, verified 2026-05-25. https://platform.claude.com/docs/en/about-claude/pricing ↩
Verified engine output
Show the recompute-verified inputs and outputs
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 30 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | gemini-3-5-flash |
| model › id | gemini-3-5-flash |
|---|---|
| model › provider | |
| model › name | Gemini 3.5 Flash |
| model › input usd per mtoken | 1.5 |
| model › output usd per mtoken | 9 |
| model › context window | 1000000 |
| model › notes | Frontier agent-tier at Flash speed — not a budget model (output ~3.6x Gemini 2.5 Flash). |
| effective cost per call | 0.249 |
| cost per idea | 0.26145 |
| cost per validated trade | 0.30758823529411766 |
| cost per day | 7.843500000000001 |
| cost per month | 235.305 |
| cost per year | 2862.8775 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 30 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | gpt-5 |
| model › id | gpt-5 |
|---|---|
| model › provider | openai |
| model › name | GPT-5.5 |
| model › input usd per mtoken | 5 |
| model › output usd per mtoken | 30 |
| model › context window | 400000 |
| model › notes | OpenAI frontier model (GPT-5.5). |
| effective cost per call | 0.8300000000000001 |
| cost per idea | 0.8715000000000002 |
| cost per validated trade | 1.0252941176470591 |
| cost per day | 26.145000000000003 |
| cost per month | 784.3500000000001 |
| cost per year | 9542.925000000001 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 30 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | claude-opus-4-7 |
| model › id | claude-opus-4-7 |
|---|---|
| model › provider | anthropic |
| model › name | Claude Opus 4.7 |
| model › input usd per mtoken | 5 |
| model › output usd per mtoken | 25 |
| model › cache write usd per mtoken | 6.25 |
| model › cache read usd per mtoken | 0.5 |
| model › context window | 1000000 |
| model › notes | Flagship reasoning model — 1M context. |
| effective cost per call | 0.5660000000000001 |
| cost per idea | 0.5943 |
| cost per validated trade | 0.6991764705882354 |
| cost per day | 17.829 |
| cost per month | 534.87 |
| cost per year | 6507.585 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 30 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | gemini-2-5-pro |
| model › id | gemini-2-5-pro |
|---|---|
| model › provider | |
| model › name | Gemini 2.5 Pro |
| model › input usd per mtoken | 1.25 |
| model › output usd per mtoken | 10 |
| model › context window | 2000000 |
| model › notes | Large context (2M). Strong on document analysis. |
| effective cost per call | 0.2225 |
| cost per idea | 0.23362500000000003 |
| cost per validated trade | 0.27485294117647063 |
| cost per day | 7.008750000000001 |
| cost per month | 210.26250000000002 |
| cost per year | 2558.1937500000004 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 30 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | gemini-2-5-flash |
| model › id | gemini-2-5-flash |
|---|---|
| model › provider | |
| model › name | Gemini 2.5 Flash |
| model › input usd per mtoken | 0.3 |
| model › output usd per mtoken | 2.5 |
| model › context window | 1000000 |
| model › notes | Fast mid-tier; 1M context. |
| effective cost per call | 0.054 |
| cost per idea | 0.0567 |
| cost per validated trade | 0.06670588235294118 |
| cost per day | 1.701 |
| cost per month | 51.03 |
| cost per year | 620.865 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 30 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | gemini-2-5-flash-lite |
| model › id | gemini-2-5-flash-lite |
|---|---|
| model › provider | |
| model › name | Gemini 2.5 Flash-Lite |
| model › input usd per mtoken | 0.1 |
| model › output usd per mtoken | 0.4 |
| model › context window | 1000000 |
| model › notes | Cheapest tier in this table; 1M context. |
| effective cost per call | 0.0154 |
| cost per idea | 0.01617 |
| cost per validated trade | 0.019023529411764706 |
| cost per day | 0.48510000000000003 |
| cost per month | 14.553 |
| cost per year | 177.06150000000002 |
Computed live at build time.
Frequently asked questions
- Which is cheapest for 10-K extraction: Gemini 3.5 Flash, GPT-5.5, or Opus 4.7?
- Gemini 3.5 Flash at $0.2490 per call ($235.31 per month on a 30-filing/day sweep), against Opus 4.7 at $0.5660 per call and GPT-5.5 at $0.8300 per call. But Gemini 2.5 Flash-Lite ($0.0154 per call) is about 16x cheaper still if the task does not need agent-tier reasoning.
- Is Gemini 3.5 Flash a budget model for filings?
- No. It is the cheapest of the three frontier picks here, but about 16x Gemini 2.5 Flash-Lite and about 4.6x Gemini 2.5 Flash on the same filing. The budget tier is Flash-Lite; Gemini 3.5 Flash is frontier intelligence at Flash speed.
- Why does Opus 4.7 look cheaper than its raw rate suggests?
- The engine applies a 40% cache-hit on Anthropic input (cache reads at 0.1x base input). Google and OpenAI are priced at full list input here, a conservative modeling choice. Without caching, Opus would cost more.
- Is Google's claim that 3.5 Flash beats 3.1 Pro verified for extraction?
- No. That is Google's launch benchmark claim, not an independent finance-extraction eval, and none was run here. These are cost numbers from list prices; run your own accuracy eval before relying on a model.
- Where do these extraction-cost numbers come from?
- Each model's verified 2026-05-25 list rate, run through the Token Cost Optimizer on one fixed extraction workload. Recomputed from the shipped bundle, not a benchmark run.
- What's the cheapest LLM for 10-K extraction under $100/month?
- On a 30-filing/day sweep, only the two economy Gemini tiers clear $100: Gemini 2.5 Flash-Lite at $14.55 per month and Gemini 2.5 Flash at $51.03 per month. Gemini 2.5 Pro ($210.26), Gemini 3.5 Flash ($235.31), Opus 4.7 ($534.87), and GPT-5.5 ($784.35) all run well over $100 at this volume.
- Which Gemini tier should I use for SEC filing extraction?
- For pure field extraction where the structure is regular, Gemini 2.5 Flash-Lite — $14.55 per month on this sweep, with a 1M context that swallows a full 10-K. Step up to Gemini 3.5 Flash ($235.31 per month) only when the extraction needs agent-tier reasoning (reconciling a restatement, chasing a figure across a footnote) at Flash latency. The 3.5 Flash premium is about 16x Flash-Lite per filing.
- Is Gemini 3.5 Flash worth it over GPT-5.5 for filing extraction at scale?
- On cost, yes — Gemini 3.5 Flash is $235.31 per month against GPT-5.5's $784.35 per month on the same 30-filing/day sweep, a $549 per month gap for the same token shape. These are list-price cost numbers only; run your own accuracy eval before switching, since neither was independently benchmarked here for extraction accuracy.