The short answer
The LLM-in-Finance Economics Report prices every frontier model against four finance workloads using verified 2026 list rates. On 10-K extraction at 30 filings a day, cost runs from $14.55/month (Gemini 2.5 Flash-Lite) to $784.35/month (GPT-5.5), a factor of 54. The cheapest correct answer is matching tier to task, not picking a cheaper frontier model.
This report prices every frontier LLM against four real finance workloads, using the same shipped cost engines that power the calculators on this site. No model was benchmarked for accuracy here. Every dollar figure is computed live at build from each vendor's verified 2026-05-26 list rate, recompiled on every deploy, and independently recomputed by a CI gate against the engine bundle. The headline: on 10-K extraction at 30 filings a day, the spread between the cheapest viable frontier model and the most expensive is a factor of 54, and the cheapest correct answer is rarely the cheapest model.
TL;DR
The cheapest-viable model per workload, derived from the computed costs below:
| Workload | Cheapest viable | Computed cost | Frontier value pick | Premium pick |
|---|---|---|---|---|
| 10-K extraction (per filing, 30/day) | Gemini 2.5 Flash-Lite | $0.0162/filing, $14.55/mo | Gemini 3.5 Flash ($235.31/mo) | GPT-5.5 ($784.35/mo) |
| Earnings-call summary (per stock/quarter) | Gemini 2.5 Flash-Lite | $0.0013, $0.50/yr at 100 tickers | Gemini 3.5 Flash ($7.90/yr) | Opus 4.7 ($24.92/yr) |
| News sentiment (cost per 1,000 calls) | Gemini 2.5 Flash-Lite | $1.00 / 1,000 | Gemini 3.5 Flash ($16.50) | GPT-5.5 ($55.00) |
| 10-K extraction at scale (10,000 filings/mo) | Gemini 2.5 Flash-Lite | $161.70/mo | Gemini 3.5 Flash ($2,614.50/mo) | GPT-5.5 ($8,715/mo) |
The single most reliable finding across all four: the budget tier (Gemini 2.5 Flash-Lite) is one to three orders of magnitude cheaper than the frontier tier, and the choice that actually saves money is matching the tier to the task, not picking a cheaper frontier model.
What this report is, and is not
This is a cost report. Costs are computed, not estimated: each figure is the output of a shipped engine bundle run on a verified list price, recomputed on every build. That makes the numbers reproducible and auditable.
It is not an accuracy benchmark. No model was tested, measured, or scored for extraction quality, summarization faithfulness, or reasoning here. Where a vendor makes a capability claim, it is cited as that vendor's claim. Anywhere a number would require a private eval we did not run, the report says so. Cost tells you the ceiling of what a workload can cost; it never tells you which model reads a 10-K correctly. That answer comes from your own eval on your own documents.
The model lineup and prices below are the engine's verified rate table as of 2026-05-26. Two models named in common comparisons, DeepSeek V4 and Grok 4.3, are not in this engine's priced table, so they are not costed here; their published rates are noted as vendor claims only, never run through the cost math.
The verified price table
Every rate below is the list price the cost engine uses, cross-checked against the vendor's own pricing page on 2026-05-26: Anthropic1, OpenAI2, and Google3.
| Model | Provider | $/Mtok input | $/Mtok output | Context | Cache read |
|---|---|---|---|---|---|
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M | not priced separately here | |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M | not priced separately here | |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | 200K | $0.10 (0.1x input) |
| Gemini 2.5 Pro | $1.25 | $10.00 | 2M | not priced separately here | |
| Gemini 3.5 Flash | $1.50 | $9.00 | 1M | $0.15 (in earnings engine) | |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 1M | $0.30 (0.1x input) |
| o4-mini | OpenAI | $3.00 | $12.00 | 200K | not priced separately here |
| GPT-5.4 mini | OpenAI | $0.75 | $4.50 | 256K | not priced separately here |
| Claude Opus 4.7 | Anthropic | $5.00 | $25.00 | 1M | $0.50 (0.1x input) |
| GPT-5.5 | OpenAI | $5.00 | $30.00 | 400K | not priced separately here |
Two modeling rules carry through every cost in this report. First, the Token Cost Optimizer applies prompt-cache pricing to Anthropic input only (cache reads at 0.1x the base input rate); Google and OpenAI input is priced at the full list rate, a deliberately conservative choice that, if anything, understates the Anthropic-vs-rest gap when caching is heavy. Second, the cache-hit assumption used below (0.40 on the extraction sweep) lowers the effective Anthropic input cost but does nothing for Google or OpenAI in these figures.
Workload 1: 10-K extraction
The scenario pins 130,000 input tokens (a full 10-K body plus a fixed extraction schema) and 6,000 output tokens (a structured field dump), one call per filing, a 5% retry rate, 30 filings a day, an 0.85 validation rate, and a 0.40 cache-hit assumption. A 10-K body runs roughly 100k to 150k tokens, so 130k is a realistic single-pass shape on any 1M-context model.
| Model | Cost / filing | Cost / validated | Cost / month |
|---|---|---|---|
| Gemini 2.5 Flash-Lite | $0.0162 | $0.0190 | $14.55 |
| Gemini 2.5 Flash | $0.0567 | $0.0667 | $51.03 |
| Claude Haiku 4.5 | $0.1189 | $0.1398 | $106.97 |
| GPT-5.4 mini | $0.1307 | $0.1538 | $117.65 |
| Gemini 2.5 Pro | $0.2336 | $0.2749 | $210.26 |
| Gemini 3.5 Flash | $0.2615 | $0.3076 | $235.31 |
| Claude Sonnet 4.6 | $0.3566 | $0.4195 | $320.92 |
| o4-mini | $0.4851 | $0.5707 | $436.59 |
| Claude Opus 4.7 | $0.5943 | $0.6992 | $534.87 |
| GPT-5.5 | $0.8715 | $1.0253 | $784.35 |
The spread runs from $14.55/mo to $784.35/mo for the identical token shape: a factor of 54 between the cheapest and the costliest. The "cost per validated" column marks each call up by the inverse of the 0.85 validation rate, the honest unit if 15% of extractions fail a downstream check and must be reworked.
The frontier picks cluster: Gemini 3.5 Flash ($0.2615/filing) sits within 12% of Gemini 2.5 Pro ($0.2336) and is 2.3x cheaper than Opus 4.7 and 3.3x cheaper than GPT-5.5. For pure structural extraction, though, none of that matters: Flash-Lite at $0.0162 is about 16x cheaper than Gemini 3.5 Flash and clears the same 1M context.
The retrieval layer adds almost nothing
If the extraction runs through a RAG layer rather than full-context, the embedding cost is a rounding error next to the LLM cost. The SEC Filing Chunk Optimizer, run on a 10-K body with a 768-token structural chunking strategy, a 12% overlap, and the voyage-finance-2 embedding model re-embedded across 250 queries, returns 178 chunks averaging 766 tokens, 136,348 tokens ingested, a one-time embedding cost of $0.0164 and $0.0012 per 100 queries. The embedding pass costs less than a tenth of a single Gemini 3.5 Flash extraction call. The model choice, not the retrieval design, owns the bill.
And so does the market-data feed
A finance extraction pipeline also needs the filings and fundamentals themselves. The Data Vendor TCO engine, run for a fundamentals-first provider (Financial Modeling Prep) on a medium universe at daily resolution with no live feed, returns the Starter tier at $14/month, $168/year all-in. At the 10,000-filing-a-month scale below, the data feed is a fixed cost dwarfed by inference. The lesson holds at every scale: in an LLM finance stack, the token bill is the variable that moves, and the model tier is the lever that moves it.
Workload 2: earnings-call summarization
The scenario: a 14,000-token transcript (prepared remarks plus Q&A), a 700-token summary, one summarization attempt, a 0.40 cache-hit assumption, across 100 tickers a quarter. The earnings-call engine prices cache reads for every provider, so the Anthropic-only caching caveat does not apply here.
| Model | Per stock / quarter | Per stock / year | All 100 tickers / year |
|---|---|---|---|
| Gemini 2.5 Flash-Lite | $0.0013 | $0.0050 | $0.50 |
| Gemini 2.5 Flash | $0.0047 | $0.0188 | $1.88 |
| Claude Haiku 4.5 | $0.0125 | $0.0498 | $4.98 |
| Gemini 2.5 Pro | $0.0192 | $0.0769 | $7.69 |
| Gemini 3.5 Flash | $0.0197 | $0.0790 | $7.90 |
| Claude Opus 4.7 | $0.0623 | $0.2492 | $24.92 |
Earnings-call summarization is cheap in absolute terms for every model: even Opus 4.7, the most expensive row, costs under $25 a year to summarize 100 tickers across four quarters. The transcript is short (14k tokens) and the output is short (700 tokens), so the per-unit cost is a fraction of a 10-K. Here the model choice barely matters on cost: the gap between Flash-Lite and Opus is 50x in ratio but $24.42 a year in dollars. This is the one workload where you should pick on quality, not price, because price is nearly free.
Workload 3: real-time news sentiment
Sentiment scoring is high-volume and short-context: an 8,000-token input (a news item plus instruction), a 500-token structured score, a 0.30 cache-hit assumption, priced as cost per 1,000 calls.
| Model | Cost per 1,000 calls |
|---|---|
| Gemini 2.5 Flash-Lite | $1.00 |
| Gemini 2.5 Flash | $3.65 |
| Claude Haiku 4.5 | $8.34 |
| Gemini 3.5 Flash | $16.50 |
| Claude Opus 4.7 | $41.70 |
| GPT-5.5 | $55.00 |
At a realistic news-feed volume (tens of thousands of items a day), the model choice compounds fast. A desk scoring 50,000 items a day pays $50/day on Flash-Lite versus $2,750/day on GPT-5.5. Sentiment is the textbook case for the budget tier: the task is structural classification, the volume is enormous, and a frontier model buys very little that a fine-tuned cheap model cannot match. Reserve frontier reasoning for the contested items a cheap first-pass flags as ambiguous.
Workload 4: options-greeks reasoning
This is the workload the cost engines deliberately do not price as a single LLM call, and the honest report says why. Options-greeks computation (delta, gamma, theta, vega from spot, strike, time, and implied vol) is deterministic math, handled on this site by the Options Greeks Explorer engine, not an LLM. The LLM's role is reasoning over the computed Greeks, explaining a position, flagging a tail risk, choosing a hedge. That reasoning call looks like the agent workload below, not a fixed extraction, so its cost depends entirely on how many tool-calls and reasoning steps the loop runs. We do not invent a per-call number for it. If your greeks-reasoning agent runs the 6-step loop priced in the prompt-caching spoke, use those figures; if it is a single explanatory call over pre-computed greeks, it costs the same as one news-sentiment call on the relevant model.
The decision matrix
Putting the four workloads together, the cheapest viable model per task tier:
- Structural extraction at any volume Gemini 2.5 Flash-Lite. Cheapest correct path when the task is pulling line items, dates, and totals where the structure is regular. The 1M context swallows a full 10-K.
- Extraction needing agent-tier reasoning at speed Gemini 3.5 Flash. The frontier value pick: cheapest of the three frontier headline models, at Flash latency. Budget about $235/mo for a 30-filing-a-day sweep.
- Maximum context with frontier reasoning, latency not critical Gemini 2.5 Pro (2M context) at near-identical cost to Gemini 3.5 Flash.
- High-volume short-context classification (sentiment, tagging) Gemini 2.5 Flash-Lite, with frontier escalation only on flagged-ambiguous items.
- You are standardized on Anthropic or OpenAI Opus 4.7 or GPT-5.5, paying the 2.3x to 3.3x premium for the vendor relationship, or routing only the hard subset to them. On Anthropic, heavy prompt caching narrows the input gap but never the output gap (see the caching spoke).
- Earnings-call summarization Pick on quality, not price. Every model costs under $25/yr for 100 tickers; the cost difference is noise.
How every number here was produced
Each figure in every table is the output of a shipped engine bundle, computed at build time and embedded in a machine-readable block this site's CI independently recomputes against the same bundle (1e-9 tolerance) on every push. A writer cannot hand-type a cost number; if the prose disagreed with the engine, the build would fail. The verified inputs and outputs for each run are in the expandable block below.
Connects to
- Token Cost Optimizer: the per-call cost engine behind workloads 1, 3, and 4. Recompute any row with your own token shapes.
- Data Vendor TCO: the market-data feed cost behind the pipeline figure.
- Cheapest LLM for SEC 10-K Extraction at 10,000 Filings a Month 2026: the at-scale spoke of this report.
- Prompt-Caching ROI for Finance LLM Agents 2026: why caching closes an input gap but never an output gap.
- Finance-Workload Cost per 1,000 Tasks: Gemini 3.5 Flash vs Opus 4.7 vs GPT-5.5: the three-way frontier cost spoke.
- Gemini 3.5 Flash vs GPT-5.5 vs Claude Opus 4.7 for Finance Extraction 2026: the focused three-way extraction comparison.
- Cheapest LLM for SEC Filings 2026: the budget-extraction deep dive.
References
Footnotes
-
Anthropic. "Pricing." platform.claude.com, verified 2026-05-26. https://platform.claude.com/docs/en/about-claude/pricing ↩
-
OpenAI. "API Pricing." developers.openai.com, verified 2026-05-26. https://developers.openai.com/api/docs/pricing ↩
-
Google. "Gemini Developer API pricing." ai.google.dev, verified 2026-05-26. https://ai.google.dev/gemini-api/docs/pricing ↩
Verified engine output
Show the recompute-verified inputs and outputs
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 30 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | gemini-2-5-flash-lite |
| model › id | gemini-2-5-flash-lite |
|---|---|
| model › provider | |
| model › name | Gemini 2.5 Flash-Lite |
| model › input usd per mtoken | 0.1 |
| model › output usd per mtoken | 0.4 |
| model › context window | 1000000 |
| model › notes | Cheapest tier in this table; 1M context. |
| effective cost per call | 0.0154 |
| cost per idea | 0.01617 |
| cost per validated trade | 0.019023529411764706 |
| cost per day | 0.48510000000000003 |
| cost per month | 14.553 |
| cost per year | 177.06150000000002 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 30 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | gemini-2-5-flash |
| model › id | gemini-2-5-flash |
|---|---|
| model › provider | |
| model › name | Gemini 2.5 Flash |
| model › input usd per mtoken | 0.3 |
| model › output usd per mtoken | 2.5 |
| model › context window | 1000000 |
| model › notes | Fast mid-tier; 1M context. |
| effective cost per call | 0.054 |
| cost per idea | 0.0567 |
| cost per validated trade | 0.06670588235294118 |
| cost per day | 1.701 |
| cost per month | 51.03 |
| cost per year | 620.865 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 30 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | claude-haiku-4-5 |
| model › id | claude-haiku-4-5 |
|---|---|
| model › provider | anthropic |
| model › name | Claude Haiku 4.5 |
| model › input usd per mtoken | 1 |
| model › output usd per mtoken | 5 |
| model › cache write usd per mtoken | 1.25 |
| model › cache read usd per mtoken | 0.1 |
| model › context window | 200000 |
| model › notes | Fast, cheap — filtering + pre-processing layers. |
| effective cost per call | 0.1132 |
| cost per idea | 0.11886 |
| cost per validated trade | 0.13983529411764706 |
| cost per day | 3.5658 |
| cost per month | 106.97399999999999 |
| cost per year | 1301.517 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 30 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | gpt-5-mini |
| model › id | gpt-5-mini |
|---|---|
| model › provider | openai |
| model › name | GPT-5.4 mini |
| model › input usd per mtoken | 0.75 |
| model › output usd per mtoken | 4.5 |
| model › context window | 256000 |
| model › notes | Mid-tier OpenAI (GPT-5.4 mini). |
| effective cost per call | 0.1245 |
| cost per idea | 0.130725 |
| cost per validated trade | 0.15379411764705883 |
| cost per day | 3.9217500000000003 |
| cost per month | 117.6525 |
| cost per year | 1431.43875 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 30 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | gemini-2-5-pro |
| model › id | gemini-2-5-pro |
|---|---|
| model › provider | |
| model › name | Gemini 2.5 Pro |
| model › input usd per mtoken | 1.25 |
| model › output usd per mtoken | 10 |
| model › context window | 2000000 |
| model › notes | Large context (2M). Strong on document analysis. |
| effective cost per call | 0.2225 |
| cost per idea | 0.23362500000000003 |
| cost per validated trade | 0.27485294117647063 |
| cost per day | 7.008750000000001 |
| cost per month | 210.26250000000002 |
| cost per year | 2558.1937500000004 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 30 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | gemini-3-5-flash |
| model › id | gemini-3-5-flash |
|---|---|
| model › provider | |
| model › name | Gemini 3.5 Flash |
| model › input usd per mtoken | 1.5 |
| model › output usd per mtoken | 9 |
| model › context window | 1000000 |
| model › notes | Frontier agent-tier at Flash speed — not a budget model (output ~3.6x Gemini 2.5 Flash). |
| effective cost per call | 0.249 |
| cost per idea | 0.26145 |
| cost per validated trade | 0.30758823529411766 |
| cost per day | 7.843500000000001 |
| cost per month | 235.305 |
| cost per year | 2862.8775 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 30 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | claude-sonnet-4-6 |
| model › id | claude-sonnet-4-6 |
|---|---|
| model › provider | anthropic |
| model › name | Claude Sonnet 4.6 |
| model › input usd per mtoken | 3 |
| model › output usd per mtoken | 15 |
| model › cache write usd per mtoken | 3.75 |
| model › cache read usd per mtoken | 0.3 |
| model › context window | 500000 |
| model › notes | Best price/performance for bulk research loops. |
| effective cost per call | 0.3396 |
| cost per idea | 0.35658 |
| cost per validated trade | 0.4195058823529412 |
| cost per day | 10.6974 |
| cost per month | 320.922 |
| cost per year | 3904.551 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 30 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | o4-mini |
| model › id | o4-mini |
|---|---|
| model › provider | openai |
| model › name | o4-mini (reasoning) |
| model › input usd per mtoken | 3 |
| model › output usd per mtoken | 12 |
| model › context window | 200000 |
| model › notes | OpenAI reasoning-optimized mid-tier. |
| effective cost per call | 0.462 |
| cost per idea | 0.48510000000000003 |
| cost per validated trade | 0.5707058823529412 |
| cost per day | 14.553 |
| cost per month | 436.59000000000003 |
| cost per year | 5311.845 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 30 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | claude-opus-4-7 |
| model › id | claude-opus-4-7 |
|---|---|
| model › provider | anthropic |
| model › name | Claude Opus 4.7 |
| model › input usd per mtoken | 5 |
| model › output usd per mtoken | 25 |
| model › cache write usd per mtoken | 6.25 |
| model › cache read usd per mtoken | 0.5 |
| model › context window | 1000000 |
| model › notes | Flagship reasoning model — 1M context. |
| effective cost per call | 0.5660000000000001 |
| cost per idea | 0.5943 |
| cost per validated trade | 0.6991764705882354 |
| cost per day | 17.829 |
| cost per month | 534.87 |
| cost per year | 6507.585 |
Computed live at build time.
| input_tokens_per_call | 130000 |
|---|---|
| output_tokens_per_call | 6000 |
| calls_per_idea | 1 |
| retry_rate | 0.05 |
| ideas_per_day | 30 |
| validation_rate | 0.85 |
| cache_hit_rate | 0.4 |
| model_id | gpt-5 |
| model › id | gpt-5 |
|---|---|
| model › provider | openai |
| model › name | GPT-5.5 |
| model › input usd per mtoken | 5 |
| model › output usd per mtoken | 30 |
| model › context window | 400000 |
| model › notes | OpenAI frontier model (GPT-5.5). |
| effective cost per call | 0.8300000000000001 |
| cost per idea | 0.8715000000000002 |
| cost per validated trade | 1.0252941176470591 |
| cost per day | 26.145000000000003 |
| cost per month | 784.3500000000001 |
| cost per year | 9542.925000000001 |
Computed live at build time.
| tickers_per_quarter | 100 |
|---|---|
| avg_transcript_tokens | 14000 |
| avg_summary_tokens | 700 |
| cache_hit_rate | 0.4 |
| summarization_attempts | 1 |
| model_id | gemini-2-5-flash-lite |
| model › id | gemini-2-5-flash-lite |
|---|---|
| model › name | Gemini 2.5 Flash-Lite |
| model › provider | |
| model › input usd per mtok | 0.1 |
| model › output usd per mtok | 0.4 |
| model › cache read usd per mtok | 0.025 |
| per stock per quarter | 0.0012599999999999998 |
| per stock per year | 0.005039999999999999 |
| per quarter total | 0.12599999999999997 |
| per year total | 0.5039999999999999 |
Computed live at build time.
| tickers_per_quarter | 100 |
|---|---|
| avg_transcript_tokens | 14000 |
| avg_summary_tokens | 700 |
| cache_hit_rate | 0.4 |
| summarization_attempts | 1 |
| model_id | gemini-3-5-flash |
| model › id | gemini-3-5-flash |
|---|---|
| model › name | Gemini 3.5 Flash |
| model › provider | |
| model › input usd per mtok | 1.5 |
| model › output usd per mtok | 9 |
| model › cache read usd per mtok | 0.15 |
| per stock per quarter | 0.01974 |
| per stock per year | 0.07896 |
| per quarter total | 1.974 |
| per year total | 7.896 |
Computed live at build time.
| tickers_per_quarter | 100 |
|---|---|
| avg_transcript_tokens | 14000 |
| avg_summary_tokens | 700 |
| cache_hit_rate | 0.4 |
| summarization_attempts | 1 |
| model_id | claude-opus-4-7 |
| model › id | claude-opus-4-7 |
|---|---|
| model › name | Claude Opus 4.7 |
| model › provider | Anthropic |
| model › input usd per mtok | 5 |
| model › output usd per mtok | 25 |
| model › cache read usd per mtok | 0.5 |
| per stock per quarter | 0.0623 |
| per stock per year | 0.2492 |
| per quarter total | 6.23 |
| per year total | 24.92 |
Computed live at build time.
| input_tokens_per_call | 8000 |
|---|---|
| output_tokens_per_call | 500 |
| calls_per_idea | 1 |
| retry_rate | 0 |
| ideas_per_day | 1000 |
| validation_rate | 0.9 |
| cache_hit_rate | 0.3 |
| model_id | gemini-2-5-flash-lite |
| model › id | gemini-2-5-flash-lite |
|---|---|
| model › provider | |
| model › name | Gemini 2.5 Flash-Lite |
| model › input usd per mtoken | 0.1 |
| model › output usd per mtoken | 0.4 |
| model › context window | 1000000 |
| model › notes | Cheapest tier in this table; 1M context. |
| effective cost per call | 0.001 |
| cost per idea | 0.001 |
| cost per validated trade | 0.0011111111111111111 |
| cost per day | 1 |
| cost per month | 30 |
| cost per year | 365 |
Computed live at build time.
| input_tokens_per_call | 8000 |
|---|---|
| output_tokens_per_call | 500 |
| calls_per_idea | 1 |
| retry_rate | 0 |
| ideas_per_day | 1000 |
| validation_rate | 0.9 |
| cache_hit_rate | 0.3 |
| model_id | gemini-3-5-flash |
| model › id | gemini-3-5-flash |
|---|---|
| model › provider | |
| model › name | Gemini 3.5 Flash |
| model › input usd per mtoken | 1.5 |
| model › output usd per mtoken | 9 |
| model › context window | 1000000 |
| model › notes | Frontier agent-tier at Flash speed — not a budget model (output ~3.6x Gemini 2.5 Flash). |
| effective cost per call | 0.0165 |
| cost per idea | 0.0165 |
| cost per validated trade | 0.018333333333333333 |
| cost per day | 16.5 |
| cost per month | 495 |
| cost per year | 6022.5 |
Computed live at build time.
| input_tokens_per_call | 8000 |
|---|---|
| output_tokens_per_call | 500 |
| calls_per_idea | 1 |
| retry_rate | 0 |
| ideas_per_day | 1000 |
| validation_rate | 0.9 |
| cache_hit_rate | 0.3 |
| model_id | gpt-5 |
| model › id | gpt-5 |
|---|---|
| model › provider | openai |
| model › name | GPT-5.5 |
| model › input usd per mtoken | 5 |
| model › output usd per mtoken | 30 |
| model › context window | 400000 |
| model › notes | OpenAI frontier model (GPT-5.5). |
| effective cost per call | 0.055 |
| cost per idea | 0.055 |
| cost per validated trade | 0.06111111111111111 |
| cost per day | 55 |
| cost per month | 1650 |
| cost per year | 20075 |
Computed live at build time.
| archetype_id | 10k-body |
|---|---|
| chunk_size | 768 |
| overlap_pct | 0.12 |
| strategy | structural |
| embedding_model_id | voyage-finance-2 |
| query_reembed_count | 250 |
| archetype › id | 10k-body |
|---|---|
| archetype › name | 10-K (full body) |
| archetype › total tokens | 120000 |
| archetype › structural boundaries | 12 |
| archetype › table heavy | true |
| archetype › notes | Form 10-K business + risk + MD&A + financials. ~12 Items. Dense tables in Item 7 / 8. |
| embedding › id | voyage-finance-2 |
| embedding › name | voyage-finance-2 |
| embedding › vendor | Voyage AI |
| embedding › usd per mtokens | 0.12 |
| embedding › dim | 1024 |
| embedding › source | https://docs.voyageai.com/docs/pricing |
| strategy | structural |
| chunk count | 178 |
| avg tokens | 766 |
| min tokens | 768 |
| max tokens | 768 |
| tokens ingested | 136348 |
| embedding cost once | 0.01636176 |
| embedding cost per100 queries | 0.0012 |
| strategy notes | Respects Items / section headers / speaker turns. Preserves table blocks by keeping heading+table together. Chunk sizes are uneven but semantically clean. |
Computed live at build time.
| vendor_id | fmp |
|---|---|
| universe | medium |
| resolution | daily |
| needs_live | false |
| vendor › id | fmp |
|---|---|
| vendor › name | Financial Modeling Prep |
| vendor › url | https://site.financialmodelingprep.com |
| vendor › short pitch | Fundamentals-heavy. Earnings, filings, transcripts. Price data is a secondary offer. |
| vendor › has overage | false |
| vendor › last checked | 2026-04-20 |
| vendor › tiers › row 1 › name | Starter |
| vendor › tiers › row 1 › monthly | 14 |
| vendor › tiers › row 1 › includes live | false |
| vendor › tiers › row 1 › includes options | false |
| vendor › tiers › row 1 › includes futures | false |
| vendor › tiers › row 1 › resolutions › row 1 | daily |
| vendor › tiers › row 1 › notes › row 1 | 5 years history |
| vendor › tiers › row 1 › notes › row 2 | 250 API calls/day |
| vendor › tiers › row 1 › notes › row 3 | Price unconfirmed 2026-05-25 — FMP list prices not consistently published |
| vendor › tiers › row 2 › name | Premium |
| vendor › tiers › row 2 › monthly | 29 |
| vendor › tiers › row 2 › includes live | false |
| vendor › tiers › row 2 › includes options | false |
| vendor › tiers › row 2 › includes futures | false |
| vendor › tiers › row 2 › resolutions › row 1 | daily |
| vendor › tiers › row 2 › resolutions › row 2 | minute |
| vendor › tiers › row 2 › notes › row 1 | Full history |
| vendor › tiers › row 2 › notes › row 2 | 750 calls/day |
| vendor › tiers › row 3 › name | Ultimate |
| vendor › tiers › row 3 › monthly | 79 |
| vendor › tiers › row 3 › includes live | true |
| vendor › tiers › row 3 › includes options | false |
| vendor › tiers › row 3 › includes futures | false |
| vendor › tiers › row 3 › resolutions › row 1 | daily |
| vendor › tiers › row 3 › resolutions › row 2 | minute |
| vendor › tiers › row 3 › resolutions › row 3 | second |
| vendor › tiers › row 3 › notes › row 1 | Real-time |
| vendor › tiers › row 3 › notes › row 2 | Unlimited calls |
| tier › name | Starter |
| tier › monthly | 14 |
| tier › includes live | false |
| tier › includes options | false |
| tier › includes futures | false |
| tier › resolutions › row 1 | daily |
| tier › notes › row 1 | 5 years history |
| tier › notes › row 2 | 250 API calls/day |
| tier › notes › row 3 | Price unconfirmed 2026-05-25 — FMP list prices not consistently published |
| monthly | 14 |
| one time | 0 |
| annual total | 168 |
| meets resolution | true |
| meets live | true |
| meets options | true |
| meets futures | true |
| meets all | true |
Computed live at build time.
Frequently asked questions
- What is the cheapest LLM for 10-K extraction in 2026?
- Gemini 2.5 Flash-Lite at $0.0162 per filing ($14.55 per month on a 30-filing/day sweep), with a 1M context that fits a full 10-K. It is about 16x cheaper than the cheapest frontier model, Gemini 3.5 Flash, for structural extraction that does not need agent-tier reasoning.
- How big is the cost gap between the cheapest and most expensive LLM for finance extraction?
- On the 10-K extraction workload (130k input, 6k output, 30 filings/day), the spread is a factor of 54: Gemini 2.5 Flash-Lite at $14.55/month against GPT-5.5 at $784.35/month, for the identical token shape.
- Are these numbers benchmark results or accuracy scores?
- Neither. They are cost numbers computed from verified vendor list prices, run through this site's shipped cost engines and recomputed by CI on every build. No model was tested or scored for accuracy here. Cost tells you the ceiling of what a workload can cost, never which model reads a filing correctly.
- Does prompt caching make Claude Opus 4.7 competitive on cost?
- Caching narrows the gap but does not close it. The cost engine applies cache pricing to Anthropic input only (reads at 0.1x base input); it does nothing for output. On an output-heavy agent loop, even 90% cache on Opus 4.7 does not beat Gemini 3.5 Flash uncached, because Opus's $25/Mtok output rate dominates.
- Why are DeepSeek V4 and Grok 4.3 not costed in this report?
- They are not in this site's verified-price cost-engine table, so running them would require unverified numbers. Their published rates are treated as vendor claims only and never run through the cost math. Every costed figure traces to a model in the engine's verified 2026-05-26 rate table.
- Which model should I pick for real-time news sentiment scoring?
- Gemini 2.5 Flash-Lite at $1.00 per 1,000 calls. Sentiment is high-volume structural classification where the budget tier wins decisively (GPT-5.5 costs $55.00 per 1,000 on the same shape). Escalate to a frontier model only for the ambiguous items a cheap first pass flags.