The short answer
After Google I/O 2026, the Gemini finance lineup is a three-rung cost ladder, and Gemini 3.5 Flash sits at the top, not the bottom. On a general research loop the Token Cost Optimizer prices it at $118.80/month, about tied with Gemini 2.5 Pro, roughly 4x Gemini 2.5 Flash, and about 18x Gemini 2.5 Flash-Lite. The Flash name suggests budget; the price says frontier.
After Google I/O 2026, the Gemini lineup for finance work is a clean three-rung cost ladder, and Gemini 3.5 Flash (launched May 19) sits at the top of it, not the bottom. On a general finance research loop (12k in / 2k out per call, 5 calls per idea, 20 ideas/day) the Token Cost Optimizer prices Gemini 3.5 Flash at $0.0360/call and $118.80/month: essentially tied with Gemini 2.5 Pro ($115.50/mo), about 4x Gemini 2.5 Flash ($28.38/mo), and ~18x Gemini 2.5 Flash-Lite ($6.60/mo). The "Flash" name suggests budget; the price says frontier. Every number below is recomputed live from the shipped engine bundle.
TL;DR: the 2026 Gemini cost ladder
| Rung | Model | $/Mtok in | $/Mtok out | Cost / idea | Cost / month |
|---|---|---|---|---|---|
| Budget floor | Gemini 2.5 Flash-Lite | $0.10 | $0.40 | $0.0110 | $6.60 |
| Fast economy | Gemini 2.5 Flash | $0.30 | $2.50 | $0.0473 | $28.38 |
| Frontier (latency) | Gemini 3.5 Flash | $1.50 | $9.00 | $0.1980 | $118.80 |
| Frontier (context) | Gemini 2.5 Pro | $1.25 | $10.00 | $0.1925 | $115.50 |
Same loop for every row: 12,000 input + 2,000 output tokens per call, 5 calls per idea, 10% retry, 20 ideas/day, 0.25 validation rate, 0.50 cache-hit. Costs are the engine's own output on each model's verified list rate, not a benchmark run.
The three rungs
Rung 1, budget floor: Gemini 2.5 Flash-Lite ($0.10 / $0.40). The cheapest tier in the lineup, $6.60/mo on this loop. A 1M context window means it still fits a full 10-K. This is the model for high-volume extraction, classification, routing, and any step where the work is structural rather than reasoning-heavy.
Rung 2, fast economy: Gemini 2.5 Flash ($0.30 / $2.50). $28.38/mo, ~4.3x Flash-Lite. The default workhorse for mid-weight tasks: summarization, light synthesis, multi-document comparison that does not need frontier judgment.
Rung 3, frontier: Gemini 3.5 Flash ($1.50 / $9.00) and Gemini 2.5 Pro ($1.25 / $10.00). $118.80/mo and $115.50/mo respectively, within 3% of each other and ~18x the budget floor. Gemini 3.5 Flash buys agent-tier reasoning at Flash latency; Gemini 2.5 Pro buys a 2M context window. The price barely separates them; the choice is latency-vs-context.
Where Gemini 3.5 Flash actually fits
The mistake the name invites is treating Gemini 3.5 Flash as a drop-in upgrade for Gemini 2.5 Flash. It is not a fast-economy model: its output rate ($9.00) is ~3.6x Gemini 2.5 Flash's ($2.50) and the loop cost reflects that ($118.80 vs $28.38/mo, a 4.2x jump). Gemini 3.5 Flash earns its place when the task genuinely needs frontier reasoning and you cannot accept Pro-tier latency. For anything below that bar, you are paying a frontier premium for economy-tier work.
The cleanest pattern is to tier the loop: run the bulk of the calls on Flash-Lite or Flash, and route only the steps that need agent-tier judgment to Gemini 3.5 Flash. On the loop above, a mostly-economy loop with a thin frontier layer lands far closer to the $6-28/mo rungs than to the $119/mo rung.
Google's launch claim, stated as Google's claim
At I/O, Google said Gemini 3.5 Flash beats Gemini 3.1 Pro on coding and agentic benchmarks. That is a vendor benchmark. There was no independent third-party finance-task eval at launch, and none was run here. This article ranks the models on cost (computed from verified list prices) and says nothing about which reads a filing or reasons over a thesis most accurately. Confirm capability on your own task; the ladder here only tells you what each rung costs.
Decision guidance
- Start at the bottom. Default a finance step to Flash-Lite; promote it up a rung only when an eval shows the cheaper rung misses.
- Treat Gemini 3.5 Flash as a frontier choice, not an economy upgrade. Budget ~$119/mo for the loop above; scale linearly with calls and output tokens.
- Pick the top rung on latency vs context. Gemini 3.5 Flash for Flash latency, Gemini 2.5 Pro for the 2M window. Cost is a near-tie.
- Output tokens drive the bill at the top rung. Tighter prompts and terse outputs cut the dominant cost term.
Connects to
- Token Cost Optimizer: the per-call cost engine behind every rung here.
- Cheapest LLM for SEC Filings 2026: where the budget rung does its best work.
- Claude vs GPT-5 vs Gemini for Financial Analysis 2026: the cross-vendor tier comparison.
- Best LLM for Financial Analysis 2026: the task-tiered pillar that places each model.
References
- Google. "Gemini Developer API pricing." ai.google.dev, verified 2026-05-25. https://ai.google.dev/gemini-api/docs/pricing
- Anthropic. "Pricing." platform.claude.com, verified 2026-05-25. https://platform.claude.com/docs/en/about-claude/pricing
Verified engine output
Show the recompute-verified inputs and outputs
| input_tokens_per_call | 12000 |
|---|---|
| output_tokens_per_call | 2000 |
| calls_per_idea | 5 |
| retry_rate | 0.1 |
| ideas_per_day | 20 |
| validation_rate | 0.25 |
| cache_hit_rate | 0.5 |
| model_id | gemini-2-5-flash-lite |
| model › id | gemini-2-5-flash-lite |
|---|---|
| model › provider | |
| model › name | Gemini 2.5 Flash-Lite |
| model › input usd per mtoken | 0.1 |
| model › output usd per mtoken | 0.4 |
| model › context window | 1000000 |
| model › notes | Cheapest tier in this table; 1M context. |
| effective cost per call | 0.002 |
| cost per idea | 0.011 |
| cost per validated trade | 0.044 |
| cost per day | 0.21999999999999997 |
| cost per month | 6.6 |
| cost per year | 80.3 |
Computed live at build time.
| input_tokens_per_call | 12000 |
|---|---|
| output_tokens_per_call | 2000 |
| calls_per_idea | 5 |
| retry_rate | 0.1 |
| ideas_per_day | 20 |
| validation_rate | 0.25 |
| cache_hit_rate | 0.5 |
| model_id | gemini-2-5-flash |
| model › id | gemini-2-5-flash |
|---|---|
| model › provider | |
| model › name | Gemini 2.5 Flash |
| model › input usd per mtoken | 0.3 |
| model › output usd per mtoken | 2.5 |
| model › context window | 1000000 |
| model › notes | Fast mid-tier; 1M context. |
| effective cost per call | 0.0086 |
| cost per idea | 0.0473 |
| cost per validated trade | 0.1892 |
| cost per day | 0.9460000000000001 |
| cost per month | 28.380000000000003 |
| cost per year | 345.29 |
Computed live at build time.
| input_tokens_per_call | 12000 |
|---|---|
| output_tokens_per_call | 2000 |
| calls_per_idea | 5 |
| retry_rate | 0.1 |
| ideas_per_day | 20 |
| validation_rate | 0.25 |
| cache_hit_rate | 0.5 |
| model_id | gemini-3-5-flash |
| model › id | gemini-3-5-flash |
|---|---|
| model › provider | |
| model › name | Gemini 3.5 Flash |
| model › input usd per mtoken | 1.5 |
| model › output usd per mtoken | 9 |
| model › context window | 1000000 |
| model › notes | Frontier agent-tier at Flash speed — not a budget model (output ~3.6x Gemini 2.5 Flash). |
| effective cost per call | 0.036000000000000004 |
| cost per idea | 0.198 |
| cost per validated trade | 0.792 |
| cost per day | 3.96 |
| cost per month | 118.8 |
| cost per year | 1445.4 |
Computed live at build time.
| input_tokens_per_call | 12000 |
|---|---|
| output_tokens_per_call | 2000 |
| calls_per_idea | 5 |
| retry_rate | 0.1 |
| ideas_per_day | 20 |
| validation_rate | 0.25 |
| cache_hit_rate | 0.5 |
| model_id | gemini-2-5-pro |
| model › id | gemini-2-5-pro |
|---|---|
| model › provider | |
| model › name | Gemini 2.5 Pro |
| model › input usd per mtoken | 1.25 |
| model › output usd per mtoken | 10 |
| model › context window | 2000000 |
| model › notes | Large context (2M). Strong on document analysis. |
| effective cost per call | 0.035 |
| cost per idea | 0.1925 |
| cost per validated trade | 0.77 |
| cost per day | 3.85 |
| cost per month | 115.5 |
| cost per year | 1405.25 |
Computed live at build time.
Frequently asked questions
- What is the cheapest Gemini model for finance in 2026?
- Gemini 2.5 Flash-Lite ($0.10/$0.40 per Mtok), at $6.60 per month on the research loop here — the budget floor of the lineup, with a 1M context window. Gemini 2.5 Flash ($28.38 per month) is the next rung up.
- Is Gemini 3.5 Flash the budget option?
- No. On this loop it costs $118.80 per month — about 18x Gemini 2.5 Flash-Lite and 4x Gemini 2.5 Flash. It is a frontier agent-tier model at Flash latency, priced like Gemini 2.5 Pro, not like the economy tier.
- Gemini 3.5 Flash or Gemini 2.5 Pro?
- They cost within 3% on the same loop ($118.80 vs $115.50 per month). Choose Gemini 3.5 Flash for Flash latency, Gemini 2.5 Pro for the 2M context window. Price barely decides it.
- Is Google's benchmark claim for 3.5 Flash verified?
- No. It is Google's own launch claim on coding and agentic benchmarks, not an independent finance-task eval, and none was run here. Confirm capability on your own task.
- Where do these Gemini costs come from?
- Each model's verified 2026-05-25 list rate, run through the Token Cost Optimizer on one fixed research loop. Recomputed from the shipped bundle, not a benchmark run.
- What's the cheapest Gemini for a finance research loop under $10/month?
- Gemini 2.5 Flash-Lite, at $6.60 per month on this loop, is the only rung under $10. The next rung, Gemini 2.5 Flash, is $28.38 per month — already over. Frontier rungs (Gemini 3.5 Flash $118.80, Gemini 2.5 Pro $115.50) are an order of magnitude above a $10 budget.
- Which Gemini tier should I use for high-volume SEC filing extraction?
- Gemini 2.5 Flash-Lite. It is the budget floor ($6.60 per month on this loop) and its 1M context still fits a full 10-K, so structural extraction work does not need a higher rung. Promote a step to Gemini 2.5 Flash ($28.38 per month) or a frontier rung only when an eval shows Flash-Lite actually misses on that step.
- Is Gemini 3.5 Flash worth it over Gemini 2.5 Flash for mid-weight summarization?
- Usually not. On this loop 3.5 Flash is $118.80 per month against $28.38 for 2.5 Flash — a 4x jump for the same token shape — and summarization rarely needs frontier reasoning. Reserve 3.5 Flash for steps that genuinely need agent-tier judgment at Flash latency; keep summarization on 2.5 Flash.