What is the cheapest LLM for 10-K extraction in 2026?

Gemini 2.5 Flash-Lite at $0.0162 per filing ($14.55 per month on a 30-filing/day sweep), with a 1M context that fits a full 10-K. It is about 16x cheaper than the cheapest frontier model, Gemini 3.5 Flash, for structural extraction that does not need agent-tier reasoning.

How big is the cost gap between the cheapest and most expensive LLM for finance extraction?

On the 10-K extraction workload (130k input, 6k output, 30 filings/day), the spread is a factor of 54: Gemini 2.5 Flash-Lite at $14.55/month against GPT-5.5 at $784.35/month, for the identical token shape.

Are these numbers benchmark results or accuracy scores?

Neither. They are cost numbers computed from verified vendor list prices, run through this site's shipped cost engines and recomputed by CI on every build. No model was tested or scored for accuracy here. Cost tells you the ceiling of what a workload can cost, never which model reads a filing correctly.

Does prompt caching make Claude Opus 4.8 competitive on cost?

Caching narrows the gap but does not close it. The cost engine applies cache pricing to Anthropic input only (reads at 0.1x base input); it does nothing for output. On an output-heavy agent loop, even 90% cache on Opus 4.8 does not beat Gemini 3.5 Flash uncached, because Opus's $25/Mtok output rate dominates.

Why are DeepSeek V4 and Grok 4.3 not costed in this report?

They are not in this site's verified-price cost-engine table, so running them would require unverified numbers. Their published rates are treated as vendor claims only and never run through the cost math. Every costed figure traces to a model in the engine's verified 2026-05-26 rate table.

Which model should I pick for real-time news sentiment scoring?

Gemini 2.5 Flash-Lite at $1.00 per 1,000 calls. Sentiment is high-volume structural classification where the budget tier wins decisively (GPT-5.5 costs $55.00 per 1,000 on the same shape). Escalate to a frontier model only for the ambiguous items a cheap first pass flags.

The LLM-in-Finance Economics Report 2026

The short answer

The LLM-in-Finance Economics Report prices every frontier model against four finance workloads using verified 2026 list rates. On 10-K extraction at 30 filings a day, cost runs from $14.55/month (Gemini 2.5 Flash-Lite) to $784.35/month (GPT-5.5), a factor of 54. The cheapest correct answer is matching tier to task, not picking a cheaper frontier model.

This report prices every frontier LLM against four real finance workloads, using the same shipped cost engines that power the calculators on this site. No model was benchmarked for accuracy here. Every dollar figure is computed live at build from each vendor's verified 2026-05-26 list rate, recompiled on every deploy, and independently recomputed by a CI gate against the engine bundle. The headline: on 10-K extraction at 30 filings a day, the spread between the cheapest viable frontier model and the most expensive is a factor of 54, and the cheapest correct answer is rarely the cheapest model.

TL;DR

The cheapest-viable model per workload, derived from the computed costs below:

Workload	Cheapest viable	Computed cost	Frontier value pick	Premium pick
10-K extraction (per filing, 30/day)	Gemini 2.5 Flash-Lite	$0.0162/filing, $14.55/mo	Gemini 3.5 Flash ($235.31/mo)	GPT-5.5 ($784.35/mo)
Earnings-call summary (per stock/quarter)	Gemini 2.5 Flash-Lite	$0.0013, $0.50/yr at 100 tickers	Gemini 3.5 Flash ($7.90/yr)	Opus 4.8 ($24.92/yr)
News sentiment (cost per 1,000 calls)	Gemini 2.5 Flash-Lite	$1.00 / 1,000	Gemini 3.5 Flash ($16.50)	GPT-5.5 ($55.00)
10-K extraction at scale (10,000 filings/mo)	Gemini 2.5 Flash-Lite	$161.70/mo	Gemini 3.5 Flash ($2,614.50/mo)	GPT-5.5 ($8,715/mo)

The single most reliable finding across all four: the budget tier (Gemini 2.5 Flash-Lite) is one to three orders of magnitude cheaper than the frontier tier, and the choice that actually saves money is matching the tier to the task, not picking a cheaper frontier model.

What this report is, and is not

This is a cost report. Costs are computed, not estimated: each figure is the output of a shipped engine bundle run on a verified list price, recomputed on every build. That makes the numbers reproducible and auditable.

It is not an accuracy benchmark. No model was tested, measured, or scored for extraction quality, summarization faithfulness, or reasoning here. Where a vendor makes a capability claim, it is cited as that vendor's claim. Anywhere a number would require a private eval we did not run, the report says so. Cost tells you the ceiling of what a workload can cost; it never tells you which model reads a 10-K correctly. That answer comes from your own eval on your own documents.

The model lineup and prices below are the engine's verified rate table as of 2026-05-26. Two models named in common comparisons, DeepSeek V4 and Grok 4.3, are not in this engine's priced table, so they are not costed here; their published rates are noted as vendor claims only, never run through the cost math.

The verified price table

Every rate below is the list price the cost engine uses, cross-checked against the vendor's own pricing page on 2026-05-26: Anthropic¹, OpenAI², and Google³.

Model	Provider	$/Mtok input	$/Mtok output	Context	Cache read
Gemini 2.5 Flash-Lite	Google	$0.10	$0.40	1M	not priced separately here
Gemini 2.5 Flash	Google	$0.30	$2.50	1M	not priced separately here
Claude Haiku 4.5	Anthropic	$1.00	$5.00	200K	$0.10 (0.1x input)
Gemini 2.5 Pro	Google	$1.25	$10.00	2M	not priced separately here
Gemini 3.5 Flash	Google	$1.50	$9.00	1M	$0.15 (in earnings engine)
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	1M	$0.30 (0.1x input)
o4-mini	OpenAI	$3.00	$12.00	200K	not priced separately here
GPT-5.4 mini	OpenAI	$0.75	$4.50	256K	not priced separately here
Claude Opus 4.8	Anthropic	$5.00	$25.00	1M	$0.50 (0.1x input)
GPT-5.5	OpenAI	$5.00	$30.00	400K	not priced separately here

Two modeling rules carry through every cost in this report. First, the Token Cost Optimizer applies prompt-cache pricing to Anthropic input only (cache reads at 0.1x the base input rate); Google and OpenAI input is priced at the full list rate, a deliberately conservative choice that, if anything, understates the Anthropic-vs-rest gap when caching is heavy. Second, the cache-hit assumption used below (0.40 on the extraction sweep) lowers the effective Anthropic input cost but does nothing for Google or OpenAI in these figures.

Workload 1: 10-K extraction

The scenario pins 130,000 input tokens (a full 10-K body plus a fixed extraction schema) and 6,000 output tokens (a structured field dump), one call per filing, a 5% retry rate, 30 filings a day, an 0.85 validation rate, and a 0.40 cache-hit assumption. A 10-K body runs roughly 100k to 150k tokens, so 130k is a realistic single-pass shape on any 1M-context model.

Model	Cost / filing	Cost / validated	Cost / month
Gemini 2.5 Flash-Lite	$0.0162	$0.0190	$14.55
Gemini 2.5 Flash	$0.0567	$0.0667	$51.03
Claude Haiku 4.5	$0.1189	$0.1398	$106.97
GPT-5.4 mini	$0.1307	$0.1538	$117.65
Gemini 2.5 Pro	$0.2336	$0.2749	$210.26
Gemini 3.5 Flash	$0.2615	$0.3076	$235.31
Claude Sonnet 4.6	$0.3566	$0.4195	$320.92
o4-mini	$0.4851	$0.5707	$436.59
Claude Opus 4.8	$0.5943	$0.6992	$534.87
GPT-5.5	$0.8715	$1.0253	$784.35

The spread runs from $14.55/mo to $784.35/mo for the identical token shape: a factor of 54 between the cheapest and the costliest. The "cost per validated" column marks each call up by the inverse of the 0.85 validation rate, the honest unit if 15% of extractions fail a downstream check and must be reworked.

The frontier picks cluster: Gemini 3.5 Flash ($0.2615/filing) sits within 12% of Gemini 2.5 Pro ($0.2336) and is 2.3x cheaper than Opus 4.8 and 3.3x cheaper than GPT-5.5. For pure structural extraction, though, none of that matters: Flash-Lite at $0.0162 is about 16x cheaper than Gemini 3.5 Flash and clears the same 1M context.

The retrieval layer adds almost nothing

If the extraction runs through a RAG layer rather than full-context, the embedding cost is a rounding error next to the LLM cost. The SEC Filing Chunk Optimizer, run on a 10-K body with a 768-token structural chunking strategy, a 12% overlap, and the voyage-finance-2 embedding model re-embedded across 250 queries, returns 178 chunks averaging 766 tokens, 136,348 tokens ingested, a one-time embedding cost of $0.0164 and $0.0012 per 100 queries. The embedding pass costs less than a tenth of a single Gemini 3.5 Flash extraction call. The model choice, not the retrieval design, owns the bill.

And so does the market-data feed

A finance extraction pipeline also needs the filings and fundamentals themselves. The Data Vendor TCO engine, run for a fundamentals-first provider (Financial Modeling Prep) on a medium universe at daily resolution with no live feed, returns the Starter tier at $14/month, $168/year all-in. At the 10,000-filing-a-month scale below, the data feed is a fixed cost dwarfed by inference. The lesson holds at every scale: in an LLM finance stack, the token bill is the variable that moves, and the model tier is the lever that moves it.

Workload 2: earnings-call summarization

The scenario: a 14,000-token transcript (prepared remarks plus Q&A), a 700-token summary, one summarization attempt, a 0.40 cache-hit assumption, across 100 tickers a quarter. The earnings-call engine prices cache reads for every provider, so the Anthropic-only caching caveat does not apply here.

Model	Per stock / quarter	Per stock / year	All 100 tickers / year
Gemini 2.5 Flash-Lite	$0.0013	$0.0050	$0.50
Gemini 2.5 Flash	$0.0047	$0.0188	$1.88
Claude Haiku 4.5	$0.0125	$0.0498	$4.98
Gemini 2.5 Pro	$0.0192	$0.0769	$7.69
Gemini 3.5 Flash	$0.0197	$0.0790	$7.90
Claude Opus 4.8	$0.0623	$0.2492	$24.92

Earnings-call summarization is cheap in absolute terms for every model: even Opus 4.8, the most expensive row, costs under $25 a year to summarize 100 tickers across four quarters. The transcript is short (14k tokens) and the output is short (700 tokens), so the per-unit cost is a fraction of a 10-K. Here the model choice barely matters on cost: the gap between Flash-Lite and Opus is 50x in ratio but $24.42 a year in dollars. This is the one workload where you should pick on quality, not price, because price is nearly free.

Workload 3: real-time news sentiment

Sentiment scoring is high-volume and short-context: an 8,000-token input (a news item plus instruction), a 500-token structured score, a 0.30 cache-hit assumption, priced as cost per 1,000 calls.

Model	Cost per 1,000 calls
Gemini 2.5 Flash-Lite	$1.00
Gemini 2.5 Flash	$3.65
Claude Haiku 4.5	$8.34
Gemini 3.5 Flash	$16.50
Claude Opus 4.8	$41.70
GPT-5.5	$55.00

At a realistic news-feed volume (tens of thousands of items a day), the model choice compounds fast. A desk scoring 50,000 items a day pays $50/day on Flash-Lite versus $2,750/day on GPT-5.5. Sentiment is the textbook case for the budget tier: the task is structural classification, the volume is enormous, and a frontier model buys very little that a fine-tuned cheap model cannot match. Reserve frontier reasoning for the contested items a cheap first-pass flags as ambiguous.

Workload 4: options-greeks reasoning

This is the workload the cost engines deliberately do not price as a single LLM call, and the honest report says why. Options-greeks computation (delta, gamma, theta, vega from spot, strike, time, and implied vol) is deterministic math, handled on this site by the Options Greeks Explorer engine, not an LLM. The LLM's role is reasoning over the computed Greeks, explaining a position, flagging a tail risk, choosing a hedge. That reasoning call looks like the agent workload below, not a fixed extraction, so its cost depends entirely on how many tool-calls and reasoning steps the loop runs. We do not invent a per-call number for it. If your greeks-reasoning agent runs the 6-step loop priced in the prompt-caching spoke, use those figures; if it is a single explanatory call over pre-computed greeks, it costs the same as one news-sentiment call on the relevant model.

The decision matrix

Putting the four workloads together, the cheapest viable model per task tier:

Structural extraction at any volume Gemini 2.5 Flash-Lite. Cheapest correct path when the task is pulling line items, dates, and totals where the structure is regular. The 1M context swallows a full 10-K.
Extraction needing agent-tier reasoning at speed Gemini 3.5 Flash. The frontier value pick: cheapest of the three frontier headline models, at Flash latency. Budget about $235/mo for a 30-filing-a-day sweep.
Maximum context with frontier reasoning, latency not critical Gemini 2.5 Pro (2M context) at near-identical cost to Gemini 3.5 Flash.
High-volume short-context classification (sentiment, tagging) Gemini 2.5 Flash-Lite, with frontier escalation only on flagged-ambiguous items.
You are standardized on Anthropic or OpenAI Opus 4.8 or GPT-5.5, paying the 2.3x to 3.3x premium for the vendor relationship, or routing only the hard subset to them. On Anthropic, heavy prompt caching narrows the input gap but never the output gap (see the caching spoke).
Earnings-call summarization Pick on quality, not price. Every model costs under $25/yr for 100 tickers; the cost difference is noise.

How every number here was produced

Each figure in every table is the output of a shipped engine bundle, computed at build time and embedded in a machine-readable block this site's CI independently recomputes against the same bundle (1e-9 tolerance) on every push. A writer cannot hand-type a cost number; if the prose disagreed with the engine, the build would fail. The verified inputs and outputs for each run are in the expandable block below.

Connects to

Token Cost Optimizer: the per-call cost engine behind workloads 1, 3, and 4. Recompute any row with your own token shapes.
Data Vendor TCO: the market-data feed cost behind the pipeline figure.
Cheapest LLM for SEC 10-K Extraction at 10,000 Filings a Month 2026: the at-scale spoke of this report.
Prompt-Caching ROI for Finance LLM Agents 2026: why caching closes an input gap but never an output gap.
Finance-Workload Cost per 1,000 Tasks: Gemini 3.5 Flash vs Opus 4.8 vs GPT-5.5: the three-way frontier cost spoke.
Gemini 3.5 Flash vs GPT-5.5 vs Claude Opus 4.8 for Finance Extraction 2026: the focused three-way extraction comparison.
Cheapest LLM for SEC Filings 2026: the budget-extraction deep dive.

References

Anthropic. "Pricing." platform.claude.com, verified 2026-05-26. https://platform.claude.com/docs/en/about-claude/pricing ↩
OpenAI. "API Pricing." developers.openai.com, verified 2026-05-26. https://developers.openai.com/api/docs/pricing ↩
Google. "Gemini Developer API pricing." ai.google.dev, verified 2026-05-26. https://ai.google.dev/gemini-api/docs/pricing ↩

Verified engine output

Show the recompute-verified inputs and outputs

10-K extraction — Gemini 2.5 Flash-Lite (budget tier, 130k in + 6k out, 30 filings/day)

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	30
validation_rate	0.85
cache_hit_rate	0.4
model_id	gemini-2-5-flash-lite

Result
model › id	gemini-2-5-flash-lite
model › provider	google
model › name	Gemini 2.5 Flash-Lite
model › input usd per mtoken	0.1
model › output usd per mtoken	0.4
model › context window	1000000
model › notes	Cheapest tier in this table; 1M context.
effective cost per call	0.0154
cost per idea	0.01617
cost per validated trade	0.019023529411764706
cost per day	0.48510000000000003
cost per month	14.553
cost per year	177.06150000000002

Computed live at build time.

10-K extraction — Gemini 2.5 Flash (economy)

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	30
validation_rate	0.85
cache_hit_rate	0.4
model_id	gemini-2-5-flash

Result
model › id	gemini-2-5-flash
model › provider	google
model › name	Gemini 2.5 Flash
model › input usd per mtoken	0.3
model › output usd per mtoken	2.5
model › context window	1000000
model › notes	Fast mid-tier; 1M context.
effective cost per call	0.054
cost per idea	0.0567
cost per validated trade	0.06670588235294118
cost per day	1.701
cost per month	51.03
cost per year	620.865

Computed live at build time.

10-K extraction — Claude Haiku 4.5

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	30
validation_rate	0.85
cache_hit_rate	0.4
model_id	claude-haiku-4-5

Result
model › id	claude-haiku-4-5
model › provider	anthropic
model › name	Claude Haiku 4.5
model › input usd per mtoken	1
model › output usd per mtoken	5
model › cache write usd per mtoken	1.25
model › cache read usd per mtoken	0.1
model › context window	200000
model › notes	Fast, cheap — filtering + pre-processing layers.
effective cost per call	0.1132
cost per idea	0.11886
cost per validated trade	0.13983529411764706
cost per day	3.5658
cost per month	106.97399999999999
cost per year	1301.517

Computed live at build time.

10-K extraction — GPT-5.4 mini

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	30
validation_rate	0.85
cache_hit_rate	0.4
model_id	gpt-5-mini

Result
model › id	gpt-5-mini
model › provider	openai
model › name	GPT-5.4 mini
model › input usd per mtoken	0.75
model › output usd per mtoken	4.5
model › context window	256000
model › notes	Mid-tier OpenAI (GPT-5.4 mini).
effective cost per call	0.1245
cost per idea	0.130725
cost per validated trade	0.15379411764705883
cost per day	3.9217500000000003
cost per month	117.6525
cost per year	1431.43875

Computed live at build time.

10-K extraction — Gemini 2.5 Pro (2M context)

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	30
validation_rate	0.85
cache_hit_rate	0.4
model_id	gemini-2-5-pro

Result
model › id	gemini-2-5-pro
model › provider	google
model › name	Gemini 2.5 Pro
model › input usd per mtoken	1.25
model › output usd per mtoken	10
model › context window	2000000
model › notes	Large context (2M). Strong on document analysis.
effective cost per call	0.2225
cost per idea	0.23362500000000003
cost per validated trade	0.27485294117647063
cost per day	7.008750000000001
cost per month	210.26250000000002
cost per year	2558.1937500000004

Computed live at build time.

10-K extraction — Gemini 3.5 Flash (frontier value pick)

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	30
validation_rate	0.85
cache_hit_rate	0.4
model_id	gemini-3-5-flash

Result
model › id	gemini-3-5-flash
model › provider	google
model › name	Gemini 3.5 Flash
model › input usd per mtoken	1.5
model › output usd per mtoken	9
model › context window	1000000
model › notes	Frontier agent-tier at Flash speed — not a budget model (output ~3.6x Gemini 2.5 Flash).
effective cost per call	0.249
cost per idea	0.26145
cost per validated trade	0.30758823529411766
cost per day	7.843500000000001
cost per month	235.305
cost per year	2862.8775

Computed live at build time.

10-K extraction — Claude Sonnet 4.6

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	30
validation_rate	0.85
cache_hit_rate	0.4
model_id	claude-sonnet-4-6

Result
model › id	claude-sonnet-4-6
model › provider	anthropic
model › name	Claude Sonnet 4.6
model › input usd per mtoken	3
model › output usd per mtoken	15
model › cache write usd per mtoken	3.75
model › cache read usd per mtoken	0.3
model › context window	500000
model › notes	Best price/performance for bulk research loops.
effective cost per call	0.3396
cost per idea	0.35658
cost per validated trade	0.4195058823529412
cost per day	10.6974
cost per month	320.922
cost per year	3904.551

Computed live at build time.

10-K extraction — o4-mini (reasoning)

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	30
validation_rate	0.85
cache_hit_rate	0.4
model_id	o4-mini

Result
model › id	o4-mini
model › provider	openai
model › name	o4-mini (reasoning)
model › input usd per mtoken	3
model › output usd per mtoken	12
model › context window	200000
model › notes	OpenAI reasoning-optimized mid-tier.
effective cost per call	0.462
cost per idea	0.48510000000000003
cost per validated trade	0.5707058823529412
cost per day	14.553
cost per month	436.59000000000003
cost per year	5311.845

Computed live at build time.

10-K extraction — Claude Opus 4.8 (40% cache hit on input)

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	30
validation_rate	0.85
cache_hit_rate	0.4
model_id	claude-opus-4-8

Result
model › id	claude-opus-4-8
model › provider	anthropic
model › name	Claude Opus 4.8
model › input usd per mtoken	5
model › output usd per mtoken	25
model › cache write usd per mtoken	6.25
model › cache read usd per mtoken	0.5
model › context window	1000000
model › notes	Flagship reasoning model — 1M context.
effective cost per call	0.5660000000000001
cost per idea	0.5943
cost per validated trade	0.6991764705882354
cost per day	17.829
cost per month	534.87
cost per year	6507.585

Computed live at build time.

10-K extraction — GPT-5.5 (premium)

Inputs
input_tokens_per_call	130000
output_tokens_per_call	6000
calls_per_idea	1
retry_rate	0.05
ideas_per_day	30
validation_rate	0.85
cache_hit_rate	0.4
model_id	gpt-5

Result
model › id	gpt-5
model › provider	openai
model › name	GPT-5.5
model › input usd per mtoken	5
model › output usd per mtoken	30
model › context window	400000
model › notes	OpenAI frontier model (GPT-5.5).
effective cost per call	0.8300000000000001
cost per idea	0.8715000000000002
cost per validated trade	1.0252941176470591
cost per day	26.145000000000003
cost per month	784.3500000000001
cost per year	9542.925000000001

Computed live at build time.

Earnings-call summary — Gemini 2.5 Flash-Lite (14k transcript, 100 tickers/qtr)

Inputs
tickers_per_quarter	100
avg_transcript_tokens	14000
avg_summary_tokens	700
cache_hit_rate	0.4
summarization_attempts	1
model_id	gemini-2-5-flash-lite

Result
model › id	gemini-2-5-flash-lite
model › name	Gemini 2.5 Flash-Lite
model › provider	Google
model › input usd per mtok	0.1
model › output usd per mtok	0.4
model › cache read usd per mtok	0.025
per stock per quarter	0.0012599999999999998
per stock per year	0.005039999999999999
per quarter total	0.12599999999999997
per year total	0.5039999999999999

Computed live at build time.

Earnings-call summary — Gemini 3.5 Flash

Inputs
tickers_per_quarter	100
avg_transcript_tokens	14000
avg_summary_tokens	700
cache_hit_rate	0.4
summarization_attempts	1
model_id	gemini-3-5-flash

Result
model › id	gemini-3-5-flash
model › name	Gemini 3.5 Flash
model › provider	Google
model › input usd per mtok	1.5
model › output usd per mtok	9
model › cache read usd per mtok	0.15
per stock per quarter	0.01974
per stock per year	0.07896
per quarter total	1.974
per year total	7.896

Computed live at build time.

Earnings-call summary — Claude Opus 4.8

Inputs
tickers_per_quarter	100
avg_transcript_tokens	14000
avg_summary_tokens	700
cache_hit_rate	0.4
summarization_attempts	1
model_id	claude-opus-4-8

Result
model › id	claude-opus-4-8
model › name	Claude Opus 4.8
model › provider	Anthropic
model › input usd per mtok	5
model › output usd per mtok	25
model › cache read usd per mtok	0.5
per stock per quarter	0.0623
per stock per year	0.2492
per quarter total	6.23
per year total	24.92

Computed live at build time.

News sentiment — Gemini 2.5 Flash-Lite, cost per 1,000 calls (costPerDay at 1,000/day)

Inputs
input_tokens_per_call	8000
output_tokens_per_call	500
calls_per_idea	1
retry_rate	0
ideas_per_day	1000
validation_rate	0.9
cache_hit_rate	0.3
model_id	gemini-2-5-flash-lite

Result
model › id	gemini-2-5-flash-lite
model › provider	google
model › name	Gemini 2.5 Flash-Lite
model › input usd per mtoken	0.1
model › output usd per mtoken	0.4
model › context window	1000000
model › notes	Cheapest tier in this table; 1M context.
effective cost per call	0.001
cost per idea	0.001
cost per validated trade	0.0011111111111111111
cost per day	1
cost per month	30
cost per year	365

Computed live at build time.

News sentiment — Gemini 3.5 Flash, cost per 1,000 calls

Inputs
input_tokens_per_call	8000
output_tokens_per_call	500
calls_per_idea	1
retry_rate	0
ideas_per_day	1000
validation_rate	0.9
cache_hit_rate	0.3
model_id	gemini-3-5-flash

Result
model › id	gemini-3-5-flash
model › provider	google
model › name	Gemini 3.5 Flash
model › input usd per mtoken	1.5
model › output usd per mtoken	9
model › context window	1000000
model › notes	Frontier agent-tier at Flash speed — not a budget model (output ~3.6x Gemini 2.5 Flash).
effective cost per call	0.0165
cost per idea	0.0165
cost per validated trade	0.018333333333333333
cost per day	16.5
cost per month	495
cost per year	6022.5

Computed live at build time.

News sentiment — GPT-5.5, cost per 1,000 calls

Inputs
input_tokens_per_call	8000
output_tokens_per_call	500
calls_per_idea	1
retry_rate	0
ideas_per_day	1000
validation_rate	0.9
cache_hit_rate	0.3
model_id	gpt-5

Result
model › id	gpt-5
model › provider	openai
model › name	GPT-5.5
model › input usd per mtoken	5
model › output usd per mtoken	30
model › context window	400000
model › notes	OpenAI frontier model (GPT-5.5).
effective cost per call	0.055
cost per idea	0.055
cost per validated trade	0.06111111111111111
cost per day	55
cost per month	1650
cost per year	20075

Computed live at build time.

RAG embedding layer for a 10-K body (voyage-finance-2, 768-token structural chunks)

Inputs
archetype_id	10k-body
chunk_size	768
overlap_pct	0.12
strategy	structural
embedding_model_id	voyage-finance-2
query_reembed_count	250

Result
archetype › id	10k-body
archetype › name	10-K (full body)
archetype › total tokens	120000
archetype › structural boundaries	12
archetype › table heavy	true
archetype › notes	Form 10-K business + risk + MD&A + financials. ~12 Items. Dense tables in Item 7 / 8.
embedding › id	voyage-finance-2
embedding › name	voyage-finance-2
embedding › vendor	Voyage AI
embedding › usd per mtokens	0.12
embedding › dim	1024
embedding › source	https://docs.voyageai.com/docs/pricing
strategy	structural
chunk count	178
avg tokens	766
min tokens	768
max tokens	768
tokens ingested	136348
embedding cost once	0.01636176
embedding cost per100 queries	0.0012
strategy notes	Respects Items / section headers / speaker turns. Preserves table blocks by keeping heading+table together. Chunk sizes are uneven but semantically clean.

Computed live at build time.

Market-data feed — Financial Modeling Prep, medium universe, daily, no live

Inputs
vendor_id	fmp
universe	medium
resolution	daily
needs_live	false

Result
vendor › id	fmp
vendor › name	Financial Modeling Prep
vendor › url	https://site.financialmodelingprep.com
vendor › short pitch	Fundamentals-heavy. Earnings, filings, transcripts. Price data is a secondary offer.
vendor › has overage	false
vendor › last checked	2026-07-12
vendor › tiers › row 1 › name	Starter
vendor › tiers › row 1 › monthly	22
vendor › tiers › row 1 › includes live	false
vendor › tiers › row 1 › includes options	false
vendor › tiers › row 1 › includes futures	false
vendor › tiers › row 1 › resolutions › row 1	daily
vendor › tiers › row 1 › notes › row 1	Billed annually ($22/mo equivalent)
vendor › tiers › row 1 › notes › row 2	5 years history
vendor › tiers › row 1 › notes › row 3	300 API calls/min
vendor › tiers › row 2 › name	Premium
vendor › tiers › row 2 › monthly	59
vendor › tiers › row 2 › includes live	false
vendor › tiers › row 2 › includes options	false
vendor › tiers › row 2 › includes futures	false
vendor › tiers › row 2 › resolutions › row 1	daily
vendor › tiers › row 2 › resolutions › row 2	minute
vendor › tiers › row 2 › notes › row 1	Billed annually
vendor › tiers › row 2 › notes › row 2	30+ years history
vendor › tiers › row 2 › notes › row 3	750 calls/min
vendor › tiers › row 3 › name	Ultimate
vendor › tiers › row 3 › monthly	149
vendor › tiers › row 3 › includes live	true
vendor › tiers › row 3 › includes options	false
vendor › tiers › row 3 › includes futures	false
vendor › tiers › row 3 › resolutions › row 1	daily
vendor › tiers › row 3 › resolutions › row 2	minute
vendor › tiers › row 3 › resolutions › row 3	second
vendor › tiers › row 3 › notes › row 1	Billed annually
vendor › tiers › row 3 › notes › row 2	Real-time
vendor › tiers › row 3 › notes › row 3	3,000 calls/min
vendor › tiers › row 3 › notes › row 4	Earnings-call transcripts require this tier
tier › name	Starter
tier › monthly	22
tier › includes live	false
tier › includes options	false
tier › includes futures	false
tier › resolutions › row 1	daily
tier › notes › row 1	Billed annually ($22/mo equivalent)
tier › notes › row 2	5 years history
tier › notes › row 3	300 API calls/min
monthly	22
one time	0
annual total	264
meets resolution	true
meets live	true
meets options	true
meets futures	true
meets all	true

Computed live at build time.

Frequently asked questions

What is the cheapest LLM for 10-K extraction in 2026?: Gemini 2.5 Flash-Lite at $0.0162 per filing ($14.55 per month on a 30-filing/day sweep), with a 1M context that fits a full 10-K. It is about 16x cheaper than the cheapest frontier model, Gemini 3.5 Flash, for structural extraction that does not need agent-tier reasoning.
How big is the cost gap between the cheapest and most expensive LLM for finance extraction?: On the 10-K extraction workload (130k input, 6k output, 30 filings/day), the spread is a factor of 54: Gemini 2.5 Flash-Lite at $14.55/month against GPT-5.5 at $784.35/month, for the identical token shape.
Are these numbers benchmark results or accuracy scores?: Neither. They are cost numbers computed from verified vendor list prices, run through this site's shipped cost engines and recomputed by CI on every build. No model was tested or scored for accuracy here. Cost tells you the ceiling of what a workload can cost, never which model reads a filing correctly.
Does prompt caching make Claude Opus 4.8 competitive on cost?: Caching narrows the gap but does not close it. The cost engine applies cache pricing to Anthropic input only (reads at 0.1x base input); it does nothing for output. On an output-heavy agent loop, even 90% cache on Opus 4.8 does not beat Gemini 3.5 Flash uncached, because Opus's $25/Mtok output rate dominates.
Why are DeepSeek V4 and Grok 4.3 not costed in this report?: They are not in this site's verified-price cost-engine table, so running them would require unverified numbers. Their published rates are treated as vendor claims only and never run through the cost math. Every costed figure traces to a model in the engine's verified 2026-05-26 rate table.
Which model should I pick for real-time news sentiment scoring?: Gemini 2.5 Flash-Lite at $1.00 per 1,000 calls. Sentiment is high-volume structural classification where the budget tier wins decisively (GPT-5.5 costs $55.00 per 1,000 on the same shape). Escalate to a frontier model only for the ambiguous items a cheap first pass flags.

TL;DR

What this report is, and is not

The verified price table

Workload 1: 10-K extraction

The retrieval layer adds almost nothing

And so does the market-data feed

Workload 2: earnings-call summarization

Workload 3: real-time news sentiment

Workload 4: options-greeks reasoning

The decision matrix

How every number here was produced

Connects to

References

Footnotes

Verified engine output

Frequently asked questions