The short answer

The LLM-in-Finance Economics Report prices every frontier model against four finance workloads using verified 2026 list rates. On 10-K extraction at 30 filings a day, cost runs from $14.55/month (Gemini 2.5 Flash-Lite) to $784.35/month (GPT-5.5), a factor of 54. The cheapest correct answer is matching tier to task, not picking a cheaper frontier model.

This report prices every frontier LLM against four real finance workloads, using the same shipped cost engines that power the calculators on this site. No model was benchmarked for accuracy here. Every dollar figure is computed live at build from each vendor's verified 2026-05-26 list rate, recompiled on every deploy, and independently recomputed by a CI gate against the engine bundle. The headline: on 10-K extraction at 30 filings a day, the spread between the cheapest viable frontier model and the most expensive is a factor of 54, and the cheapest correct answer is rarely the cheapest model.

TL;DR

The cheapest-viable model per workload, derived from the computed costs below:

Workload Cheapest viable Computed cost Frontier value pick Premium pick
10-K extraction (per filing, 30/day) Gemini 2.5 Flash-Lite $0.0162/filing, $14.55/mo Gemini 3.5 Flash ($235.31/mo) GPT-5.5 ($784.35/mo)
Earnings-call summary (per stock/quarter) Gemini 2.5 Flash-Lite $0.0013, $0.50/yr at 100 tickers Gemini 3.5 Flash ($7.90/yr) Opus 4.7 ($24.92/yr)
News sentiment (cost per 1,000 calls) Gemini 2.5 Flash-Lite $1.00 / 1,000 Gemini 3.5 Flash ($16.50) GPT-5.5 ($55.00)
10-K extraction at scale (10,000 filings/mo) Gemini 2.5 Flash-Lite $161.70/mo Gemini 3.5 Flash ($2,614.50/mo) GPT-5.5 ($8,715/mo)

The single most reliable finding across all four: the budget tier (Gemini 2.5 Flash-Lite) is one to three orders of magnitude cheaper than the frontier tier, and the choice that actually saves money is matching the tier to the task, not picking a cheaper frontier model.

What this report is, and is not

This is a cost report. Costs are computed, not estimated: each figure is the output of a shipped engine bundle run on a verified list price, recomputed on every build. That makes the numbers reproducible and auditable.

It is not an accuracy benchmark. No model was tested, measured, or scored for extraction quality, summarization faithfulness, or reasoning here. Where a vendor makes a capability claim, it is cited as that vendor's claim. Anywhere a number would require a private eval we did not run, the report says so. Cost tells you the ceiling of what a workload can cost; it never tells you which model reads a 10-K correctly. That answer comes from your own eval on your own documents.

The model lineup and prices below are the engine's verified rate table as of 2026-05-26. Two models named in common comparisons, DeepSeek V4 and Grok 4.3, are not in this engine's priced table, so they are not costed here; their published rates are noted as vendor claims only, never run through the cost math.

The verified price table

Every rate below is the list price the cost engine uses, cross-checked against the vendor's own pricing page on 2026-05-26: Anthropic1, OpenAI2, and Google3.

Model Provider $/Mtok input $/Mtok output Context Cache read
Gemini 2.5 Flash-Lite Google $0.10 $0.40 1M not priced separately here
Gemini 2.5 Flash Google $0.30 $2.50 1M not priced separately here
Claude Haiku 4.5 Anthropic $1.00 $5.00 200K $0.10 (0.1x input)
Gemini 2.5 Pro Google $1.25 $10.00 2M not priced separately here
Gemini 3.5 Flash Google $1.50 $9.00 1M $0.15 (in earnings engine)
Claude Sonnet 4.6 Anthropic $3.00 $15.00 1M $0.30 (0.1x input)
o4-mini OpenAI $3.00 $12.00 200K not priced separately here
GPT-5.4 mini OpenAI $0.75 $4.50 256K not priced separately here
Claude Opus 4.7 Anthropic $5.00 $25.00 1M $0.50 (0.1x input)
GPT-5.5 OpenAI $5.00 $30.00 400K not priced separately here

Two modeling rules carry through every cost in this report. First, the Token Cost Optimizer applies prompt-cache pricing to Anthropic input only (cache reads at 0.1x the base input rate); Google and OpenAI input is priced at the full list rate, a deliberately conservative choice that, if anything, understates the Anthropic-vs-rest gap when caching is heavy. Second, the cache-hit assumption used below (0.40 on the extraction sweep) lowers the effective Anthropic input cost but does nothing for Google or OpenAI in these figures.

Workload 1: 10-K extraction

The scenario pins 130,000 input tokens (a full 10-K body plus a fixed extraction schema) and 6,000 output tokens (a structured field dump), one call per filing, a 5% retry rate, 30 filings a day, an 0.85 validation rate, and a 0.40 cache-hit assumption. A 10-K body runs roughly 100k to 150k tokens, so 130k is a realistic single-pass shape on any 1M-context model.

Model Cost / filing Cost / validated Cost / month
Gemini 2.5 Flash-Lite $0.0162 $0.0190 $14.55
Gemini 2.5 Flash $0.0567 $0.0667 $51.03
Claude Haiku 4.5 $0.1189 $0.1398 $106.97
GPT-5.4 mini $0.1307 $0.1538 $117.65
Gemini 2.5 Pro $0.2336 $0.2749 $210.26
Gemini 3.5 Flash $0.2615 $0.3076 $235.31
Claude Sonnet 4.6 $0.3566 $0.4195 $320.92
o4-mini $0.4851 $0.5707 $436.59
Claude Opus 4.7 $0.5943 $0.6992 $534.87
GPT-5.5 $0.8715 $1.0253 $784.35

The spread runs from $14.55/mo to $784.35/mo for the identical token shape: a factor of 54 between the cheapest and the costliest. The "cost per validated" column marks each call up by the inverse of the 0.85 validation rate, the honest unit if 15% of extractions fail a downstream check and must be reworked.

The frontier picks cluster: Gemini 3.5 Flash ($0.2615/filing) sits within 12% of Gemini 2.5 Pro ($0.2336) and is 2.3x cheaper than Opus 4.7 and 3.3x cheaper than GPT-5.5. For pure structural extraction, though, none of that matters: Flash-Lite at $0.0162 is about 16x cheaper than Gemini 3.5 Flash and clears the same 1M context.

The retrieval layer adds almost nothing

If the extraction runs through a RAG layer rather than full-context, the embedding cost is a rounding error next to the LLM cost. The SEC Filing Chunk Optimizer, run on a 10-K body with a 768-token structural chunking strategy, a 12% overlap, and the voyage-finance-2 embedding model re-embedded across 250 queries, returns 178 chunks averaging 766 tokens, 136,348 tokens ingested, a one-time embedding cost of $0.0164 and $0.0012 per 100 queries. The embedding pass costs less than a tenth of a single Gemini 3.5 Flash extraction call. The model choice, not the retrieval design, owns the bill.

And so does the market-data feed

A finance extraction pipeline also needs the filings and fundamentals themselves. The Data Vendor TCO engine, run for a fundamentals-first provider (Financial Modeling Prep) on a medium universe at daily resolution with no live feed, returns the Starter tier at $14/month, $168/year all-in. At the 10,000-filing-a-month scale below, the data feed is a fixed cost dwarfed by inference. The lesson holds at every scale: in an LLM finance stack, the token bill is the variable that moves, and the model tier is the lever that moves it.

Workload 2: earnings-call summarization

The scenario: a 14,000-token transcript (prepared remarks plus Q&A), a 700-token summary, one summarization attempt, a 0.40 cache-hit assumption, across 100 tickers a quarter. The earnings-call engine prices cache reads for every provider, so the Anthropic-only caching caveat does not apply here.

Model Per stock / quarter Per stock / year All 100 tickers / year
Gemini 2.5 Flash-Lite $0.0013 $0.0050 $0.50
Gemini 2.5 Flash $0.0047 $0.0188 $1.88
Claude Haiku 4.5 $0.0125 $0.0498 $4.98
Gemini 2.5 Pro $0.0192 $0.0769 $7.69
Gemini 3.5 Flash $0.0197 $0.0790 $7.90
Claude Opus 4.7 $0.0623 $0.2492 $24.92

Earnings-call summarization is cheap in absolute terms for every model: even Opus 4.7, the most expensive row, costs under $25 a year to summarize 100 tickers across four quarters. The transcript is short (14k tokens) and the output is short (700 tokens), so the per-unit cost is a fraction of a 10-K. Here the model choice barely matters on cost: the gap between Flash-Lite and Opus is 50x in ratio but $24.42 a year in dollars. This is the one workload where you should pick on quality, not price, because price is nearly free.

Workload 3: real-time news sentiment

Sentiment scoring is high-volume and short-context: an 8,000-token input (a news item plus instruction), a 500-token structured score, a 0.30 cache-hit assumption, priced as cost per 1,000 calls.

Model Cost per 1,000 calls
Gemini 2.5 Flash-Lite $1.00
Gemini 2.5 Flash $3.65
Claude Haiku 4.5 $8.34
Gemini 3.5 Flash $16.50
Claude Opus 4.7 $41.70
GPT-5.5 $55.00

At a realistic news-feed volume (tens of thousands of items a day), the model choice compounds fast. A desk scoring 50,000 items a day pays $50/day on Flash-Lite versus $2,750/day on GPT-5.5. Sentiment is the textbook case for the budget tier: the task is structural classification, the volume is enormous, and a frontier model buys very little that a fine-tuned cheap model cannot match. Reserve frontier reasoning for the contested items a cheap first-pass flags as ambiguous.

Workload 4: options-greeks reasoning

This is the workload the cost engines deliberately do not price as a single LLM call, and the honest report says why. Options-greeks computation (delta, gamma, theta, vega from spot, strike, time, and implied vol) is deterministic math, handled on this site by the Options Greeks Explorer engine, not an LLM. The LLM's role is reasoning over the computed Greeks, explaining a position, flagging a tail risk, choosing a hedge. That reasoning call looks like the agent workload below, not a fixed extraction, so its cost depends entirely on how many tool-calls and reasoning steps the loop runs. We do not invent a per-call number for it. If your greeks-reasoning agent runs the 6-step loop priced in the prompt-caching spoke, use those figures; if it is a single explanatory call over pre-computed greeks, it costs the same as one news-sentiment call on the relevant model.

The decision matrix

Putting the four workloads together, the cheapest viable model per task tier:

  • Structural extraction at any volume Gemini 2.5 Flash-Lite. Cheapest correct path when the task is pulling line items, dates, and totals where the structure is regular. The 1M context swallows a full 10-K.
  • Extraction needing agent-tier reasoning at speed Gemini 3.5 Flash. The frontier value pick: cheapest of the three frontier headline models, at Flash latency. Budget about $235/mo for a 30-filing-a-day sweep.
  • Maximum context with frontier reasoning, latency not critical Gemini 2.5 Pro (2M context) at near-identical cost to Gemini 3.5 Flash.
  • High-volume short-context classification (sentiment, tagging) Gemini 2.5 Flash-Lite, with frontier escalation only on flagged-ambiguous items.
  • You are standardized on Anthropic or OpenAI Opus 4.7 or GPT-5.5, paying the 2.3x to 3.3x premium for the vendor relationship, or routing only the hard subset to them. On Anthropic, heavy prompt caching narrows the input gap but never the output gap (see the caching spoke).
  • Earnings-call summarization Pick on quality, not price. Every model costs under $25/yr for 100 tickers; the cost difference is noise.

How every number here was produced

Each figure in every table is the output of a shipped engine bundle, computed at build time and embedded in a machine-readable block this site's CI independently recomputes against the same bundle (1e-9 tolerance) on every push. A writer cannot hand-type a cost number; if the prose disagreed with the engine, the build would fail. The verified inputs and outputs for each run are in the expandable block below.

Connects to

References

Footnotes

  1. Anthropic. "Pricing." platform.claude.com, verified 2026-05-26. https://platform.claude.com/docs/en/about-claude/pricing

  2. OpenAI. "API Pricing." developers.openai.com, verified 2026-05-26. https://developers.openai.com/api/docs/pricing

  3. Google. "Gemini Developer API pricing." ai.google.dev, verified 2026-05-26. https://ai.google.dev/gemini-api/docs/pricing

Verified engine output

Show the recompute-verified inputs and outputs
10-K extraction — Gemini 2.5 Flash-Lite (budget tier, 130k in + 6k out, 30 filings/day)
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day30
validation_rate0.85
cache_hit_rate0.4
model_idgemini-2-5-flash-lite
Result
model › idgemini-2-5-flash-lite
model › providergoogle
model › nameGemini 2.5 Flash-Lite
model › input usd per mtoken0.1
model › output usd per mtoken0.4
model › context window1000000
model › notesCheapest tier in this table; 1M context.
effective cost per call0.0154
cost per idea0.01617
cost per validated trade0.019023529411764706
cost per day0.48510000000000003
cost per month14.553
cost per year177.06150000000002

Computed live at build time.

10-K extraction — Gemini 2.5 Flash (economy)
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day30
validation_rate0.85
cache_hit_rate0.4
model_idgemini-2-5-flash
Result
model › idgemini-2-5-flash
model › providergoogle
model › nameGemini 2.5 Flash
model › input usd per mtoken0.3
model › output usd per mtoken2.5
model › context window1000000
model › notesFast mid-tier; 1M context.
effective cost per call0.054
cost per idea0.0567
cost per validated trade0.06670588235294118
cost per day1.701
cost per month51.03
cost per year620.865

Computed live at build time.

10-K extraction — Claude Haiku 4.5
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day30
validation_rate0.85
cache_hit_rate0.4
model_idclaude-haiku-4-5
Result
model › idclaude-haiku-4-5
model › provideranthropic
model › nameClaude Haiku 4.5
model › input usd per mtoken1
model › output usd per mtoken5
model › cache write usd per mtoken1.25
model › cache read usd per mtoken0.1
model › context window200000
model › notesFast, cheap — filtering + pre-processing layers.
effective cost per call0.1132
cost per idea0.11886
cost per validated trade0.13983529411764706
cost per day3.5658
cost per month106.97399999999999
cost per year1301.517

Computed live at build time.

10-K extraction — GPT-5.4 mini
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day30
validation_rate0.85
cache_hit_rate0.4
model_idgpt-5-mini
Result
model › idgpt-5-mini
model › provideropenai
model › nameGPT-5.4 mini
model › input usd per mtoken0.75
model › output usd per mtoken4.5
model › context window256000
model › notesMid-tier OpenAI (GPT-5.4 mini).
effective cost per call0.1245
cost per idea0.130725
cost per validated trade0.15379411764705883
cost per day3.9217500000000003
cost per month117.6525
cost per year1431.43875

Computed live at build time.

10-K extraction — Gemini 2.5 Pro (2M context)
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day30
validation_rate0.85
cache_hit_rate0.4
model_idgemini-2-5-pro
Result
model › idgemini-2-5-pro
model › providergoogle
model › nameGemini 2.5 Pro
model › input usd per mtoken1.25
model › output usd per mtoken10
model › context window2000000
model › notesLarge context (2M). Strong on document analysis.
effective cost per call0.2225
cost per idea0.23362500000000003
cost per validated trade0.27485294117647063
cost per day7.008750000000001
cost per month210.26250000000002
cost per year2558.1937500000004

Computed live at build time.

10-K extraction — Gemini 3.5 Flash (frontier value pick)
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day30
validation_rate0.85
cache_hit_rate0.4
model_idgemini-3-5-flash
Result
model › idgemini-3-5-flash
model › providergoogle
model › nameGemini 3.5 Flash
model › input usd per mtoken1.5
model › output usd per mtoken9
model › context window1000000
model › notesFrontier agent-tier at Flash speed — not a budget model (output ~3.6x Gemini 2.5 Flash).
effective cost per call0.249
cost per idea0.26145
cost per validated trade0.30758823529411766
cost per day7.843500000000001
cost per month235.305
cost per year2862.8775

Computed live at build time.

10-K extraction — Claude Sonnet 4.6
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day30
validation_rate0.85
cache_hit_rate0.4
model_idclaude-sonnet-4-6
Result
model › idclaude-sonnet-4-6
model › provideranthropic
model › nameClaude Sonnet 4.6
model › input usd per mtoken3
model › output usd per mtoken15
model › cache write usd per mtoken3.75
model › cache read usd per mtoken0.3
model › context window500000
model › notesBest price/performance for bulk research loops.
effective cost per call0.3396
cost per idea0.35658
cost per validated trade0.4195058823529412
cost per day10.6974
cost per month320.922
cost per year3904.551

Computed live at build time.

10-K extraction — o4-mini (reasoning)
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day30
validation_rate0.85
cache_hit_rate0.4
model_ido4-mini
Result
model › ido4-mini
model › provideropenai
model › nameo4-mini (reasoning)
model › input usd per mtoken3
model › output usd per mtoken12
model › context window200000
model › notesOpenAI reasoning-optimized mid-tier.
effective cost per call0.462
cost per idea0.48510000000000003
cost per validated trade0.5707058823529412
cost per day14.553
cost per month436.59000000000003
cost per year5311.845

Computed live at build time.

10-K extraction — Claude Opus 4.7 (40% cache hit on input)
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day30
validation_rate0.85
cache_hit_rate0.4
model_idclaude-opus-4-7
Result
model › idclaude-opus-4-7
model › provideranthropic
model › nameClaude Opus 4.7
model › input usd per mtoken5
model › output usd per mtoken25
model › cache write usd per mtoken6.25
model › cache read usd per mtoken0.5
model › context window1000000
model › notesFlagship reasoning model — 1M context.
effective cost per call0.5660000000000001
cost per idea0.5943
cost per validated trade0.6991764705882354
cost per day17.829
cost per month534.87
cost per year6507.585

Computed live at build time.

10-K extraction — GPT-5.5 (premium)
Inputs
input_tokens_per_call130000
output_tokens_per_call6000
calls_per_idea1
retry_rate0.05
ideas_per_day30
validation_rate0.85
cache_hit_rate0.4
model_idgpt-5
Result
model › idgpt-5
model › provideropenai
model › nameGPT-5.5
model › input usd per mtoken5
model › output usd per mtoken30
model › context window400000
model › notesOpenAI frontier model (GPT-5.5).
effective cost per call0.8300000000000001
cost per idea0.8715000000000002
cost per validated trade1.0252941176470591
cost per day26.145000000000003
cost per month784.3500000000001
cost per year9542.925000000001

Computed live at build time.

Earnings-call summary — Gemini 2.5 Flash-Lite (14k transcript, 100 tickers/qtr)
Inputs
tickers_per_quarter100
avg_transcript_tokens14000
avg_summary_tokens700
cache_hit_rate0.4
summarization_attempts1
model_idgemini-2-5-flash-lite
Result
model › idgemini-2-5-flash-lite
model › nameGemini 2.5 Flash-Lite
model › providerGoogle
model › input usd per mtok0.1
model › output usd per mtok0.4
model › cache read usd per mtok0.025
per stock per quarter0.0012599999999999998
per stock per year0.005039999999999999
per quarter total0.12599999999999997
per year total0.5039999999999999

Computed live at build time.

Earnings-call summary — Gemini 3.5 Flash
Inputs
tickers_per_quarter100
avg_transcript_tokens14000
avg_summary_tokens700
cache_hit_rate0.4
summarization_attempts1
model_idgemini-3-5-flash
Result
model › idgemini-3-5-flash
model › nameGemini 3.5 Flash
model › providerGoogle
model › input usd per mtok1.5
model › output usd per mtok9
model › cache read usd per mtok0.15
per stock per quarter0.01974
per stock per year0.07896
per quarter total1.974
per year total7.896

Computed live at build time.

Earnings-call summary — Claude Opus 4.7
Inputs
tickers_per_quarter100
avg_transcript_tokens14000
avg_summary_tokens700
cache_hit_rate0.4
summarization_attempts1
model_idclaude-opus-4-7
Result
model › idclaude-opus-4-7
model › nameClaude Opus 4.7
model › providerAnthropic
model › input usd per mtok5
model › output usd per mtok25
model › cache read usd per mtok0.5
per stock per quarter0.0623
per stock per year0.2492
per quarter total6.23
per year total24.92

Computed live at build time.

News sentiment — Gemini 2.5 Flash-Lite, cost per 1,000 calls (costPerDay at 1,000/day)
Inputs
input_tokens_per_call8000
output_tokens_per_call500
calls_per_idea1
retry_rate0
ideas_per_day1000
validation_rate0.9
cache_hit_rate0.3
model_idgemini-2-5-flash-lite
Result
model › idgemini-2-5-flash-lite
model › providergoogle
model › nameGemini 2.5 Flash-Lite
model › input usd per mtoken0.1
model › output usd per mtoken0.4
model › context window1000000
model › notesCheapest tier in this table; 1M context.
effective cost per call0.001
cost per idea0.001
cost per validated trade0.0011111111111111111
cost per day1
cost per month30
cost per year365

Computed live at build time.

News sentiment — Gemini 3.5 Flash, cost per 1,000 calls
Inputs
input_tokens_per_call8000
output_tokens_per_call500
calls_per_idea1
retry_rate0
ideas_per_day1000
validation_rate0.9
cache_hit_rate0.3
model_idgemini-3-5-flash
Result
model › idgemini-3-5-flash
model › providergoogle
model › nameGemini 3.5 Flash
model › input usd per mtoken1.5
model › output usd per mtoken9
model › context window1000000
model › notesFrontier agent-tier at Flash speed — not a budget model (output ~3.6x Gemini 2.5 Flash).
effective cost per call0.0165
cost per idea0.0165
cost per validated trade0.018333333333333333
cost per day16.5
cost per month495
cost per year6022.5

Computed live at build time.

News sentiment — GPT-5.5, cost per 1,000 calls
Inputs
input_tokens_per_call8000
output_tokens_per_call500
calls_per_idea1
retry_rate0
ideas_per_day1000
validation_rate0.9
cache_hit_rate0.3
model_idgpt-5
Result
model › idgpt-5
model › provideropenai
model › nameGPT-5.5
model › input usd per mtoken5
model › output usd per mtoken30
model › context window400000
model › notesOpenAI frontier model (GPT-5.5).
effective cost per call0.055
cost per idea0.055
cost per validated trade0.06111111111111111
cost per day55
cost per month1650
cost per year20075

Computed live at build time.

RAG embedding layer for a 10-K body (voyage-finance-2, 768-token structural chunks)
Inputs
archetype_id10k-body
chunk_size768
overlap_pct0.12
strategystructural
embedding_model_idvoyage-finance-2
query_reembed_count250
Result
archetype › id10k-body
archetype › name10-K (full body)
archetype › total tokens120000
archetype › structural boundaries12
archetype › table heavytrue
archetype › notesForm 10-K business + risk + MD&A + financials. ~12 Items. Dense tables in Item 7 / 8.
embedding › idvoyage-finance-2
embedding › namevoyage-finance-2
embedding › vendorVoyage AI
embedding › usd per mtokens0.12
embedding › dim1024
embedding › sourcehttps://docs.voyageai.com/docs/pricing
strategystructural
chunk count178
avg tokens766
min tokens768
max tokens768
tokens ingested136348
embedding cost once0.01636176
embedding cost per100 queries0.0012
strategy notesRespects Items / section headers / speaker turns. Preserves table blocks by keeping heading+table together. Chunk sizes are uneven but semantically clean.

Computed live at build time.

Market-data feed — Financial Modeling Prep, medium universe, daily, no live
Inputs
vendor_idfmp
universemedium
resolutiondaily
needs_livefalse
Result
vendor › idfmp
vendor › nameFinancial Modeling Prep
vendor › urlhttps://site.financialmodelingprep.com
vendor › short pitchFundamentals-heavy. Earnings, filings, transcripts. Price data is a secondary offer.
vendor › has overagefalse
vendor › last checked2026-04-20
vendor › tiers › row 1 › nameStarter
vendor › tiers › row 1 › monthly14
vendor › tiers › row 1 › includes livefalse
vendor › tiers › row 1 › includes optionsfalse
vendor › tiers › row 1 › includes futuresfalse
vendor › tiers › row 1 › resolutions › row 1daily
vendor › tiers › row 1 › notes › row 15 years history
vendor › tiers › row 1 › notes › row 2250 API calls/day
vendor › tiers › row 1 › notes › row 3Price unconfirmed 2026-05-25 — FMP list prices not consistently published
vendor › tiers › row 2 › namePremium
vendor › tiers › row 2 › monthly29
vendor › tiers › row 2 › includes livefalse
vendor › tiers › row 2 › includes optionsfalse
vendor › tiers › row 2 › includes futuresfalse
vendor › tiers › row 2 › resolutions › row 1daily
vendor › tiers › row 2 › resolutions › row 2minute
vendor › tiers › row 2 › notes › row 1Full history
vendor › tiers › row 2 › notes › row 2750 calls/day
vendor › tiers › row 3 › nameUltimate
vendor › tiers › row 3 › monthly79
vendor › tiers › row 3 › includes livetrue
vendor › tiers › row 3 › includes optionsfalse
vendor › tiers › row 3 › includes futuresfalse
vendor › tiers › row 3 › resolutions › row 1daily
vendor › tiers › row 3 › resolutions › row 2minute
vendor › tiers › row 3 › resolutions › row 3second
vendor › tiers › row 3 › notes › row 1Real-time
vendor › tiers › row 3 › notes › row 2Unlimited calls
tier › nameStarter
tier › monthly14
tier › includes livefalse
tier › includes optionsfalse
tier › includes futuresfalse
tier › resolutions › row 1daily
tier › notes › row 15 years history
tier › notes › row 2250 API calls/day
tier › notes › row 3Price unconfirmed 2026-05-25 — FMP list prices not consistently published
monthly14
one time0
annual total168
meets resolutiontrue
meets livetrue
meets optionstrue
meets futurestrue
meets alltrue

Computed live at build time.

Frequently asked questions

What is the cheapest LLM for 10-K extraction in 2026?
Gemini 2.5 Flash-Lite at $0.0162 per filing ($14.55 per month on a 30-filing/day sweep), with a 1M context that fits a full 10-K. It is about 16x cheaper than the cheapest frontier model, Gemini 3.5 Flash, for structural extraction that does not need agent-tier reasoning.
How big is the cost gap between the cheapest and most expensive LLM for finance extraction?
On the 10-K extraction workload (130k input, 6k output, 30 filings/day), the spread is a factor of 54: Gemini 2.5 Flash-Lite at $14.55/month against GPT-5.5 at $784.35/month, for the identical token shape.
Are these numbers benchmark results or accuracy scores?
Neither. They are cost numbers computed from verified vendor list prices, run through this site's shipped cost engines and recomputed by CI on every build. No model was tested or scored for accuracy here. Cost tells you the ceiling of what a workload can cost, never which model reads a filing correctly.
Does prompt caching make Claude Opus 4.7 competitive on cost?
Caching narrows the gap but does not close it. The cost engine applies cache pricing to Anthropic input only (reads at 0.1x base input); it does nothing for output. On an output-heavy agent loop, even 90% cache on Opus 4.7 does not beat Gemini 3.5 Flash uncached, because Opus's $25/Mtok output rate dominates.
Why are DeepSeek V4 and Grok 4.3 not costed in this report?
They are not in this site's verified-price cost-engine table, so running them would require unverified numbers. Their published rates are treated as vendor claims only and never run through the cost math. Every costed figure traces to a model in the engine's verified 2026-05-26 rate table.
Which model should I pick for real-time news sentiment scoring?
Gemini 2.5 Flash-Lite at $1.00 per 1,000 calls. Sentiment is high-volume structural classification where the budget tier wins decisively (GPT-5.5 costs $55.00 per 1,000 on the same shape). Escalate to a frontier model only for the ambiguous items a cheap first pass flags.