Name: State of LLM Pricing for Finance Q3 2026
Creator: AI Fin Hub Research
Published: 2026-06-21
License: https://creativecommons.org/licenses/by/4.0/

Education · Not investment advice. BaFin/EU framework. Past performance does not indicate future results. Q3 2026 pricing inventory. Figures pulled from vendor pricing pages on 2026-06-21. Vendors may change rates at any time. Supersedes the Q2 2026 snapshot. Editorial standards Sponsor disclosure Corrections

TL;DR

This release is a pricing-grade snapshot of every finance-relevant LLM tier as of 2026-06-21, sourced line-by-line from official vendor pricing pages. It now spans six providers: Anthropic (Opus 4.8, Fable 5, Sonnet 4.6, Haiku 4.5), OpenAI (GPT-5.5 and the GPT-5.4 family), Google (Gemini 3.5 Flash plus the Gemini 2.5 Pro / Flash / Flash-Lite tiers), xAI (Grok 4.3), DeepSeek (V4 Flash and V4 Pro), and Mistral (Medium 3.5, Large 3, Small 4). Four worked finance scenarios show per-model cost at published rates: nightly SEC-filing triage, quarterly earnings-call summarization, multi-document peer-comparison synthesis, and an intraday research loop. No quality or accuracy measurements are included; those require running a task-specific eval. The purpose is traceable inventory, not ranking. This supersedes the Q2 2026 snapshot.

What changed since Q2

The Q2 snapshot (2026-04-23) sat on the GPT-5.4 / Opus 4.6-4.7 / Gemini 2.5 generation. By Q3 the frontier moved on four fronts:

Anthropic shipped Opus 4.8 ($5/$25, the current Opus flagship) and the GA top-of-stack Fable 5 ($10/$50, roughly 2x Opus per token). Opus 4.1 ($15/$75) is deprecated and retires 2026-08-05.
OpenAI released GPT-5.5 ($5/$30) as the new flagship; the >272K-token long-context premium is still in force ($10/$45 above the boundary). There is no GPT-5.5 mini or nano — the small tiers remain on GPT-5.4.
Google brought Gemini 3.5 Flash to GA at $1.50/$9 — an agent-tier reasoning model priced like a frontier model, not a budget tier. The Gemini 2.5 Pro two-tier split (≤200K vs >200K input) carries over.
xAI, DeepSeek, and Mistral are added to the benchmark for the first time. Grok 4.3 is xAI's current flagship at a notably low $1.25/$2.50; DeepSeek V4 ships as Flash and Pro SKUs; Mistral Medium 3.5 is the current premier flagship.

Methodology

Every figure in this release traces to a vendor pricing page pulled on 2026-06-21. No quality scores, latency measurements, or accuracy numbers are reported. Running an eval against your own task is the only defensible way to rank models on output quality.

The sources are:

Anthropic: platform.claude.com/docs/en/about-claude/pricing¹ and the model-overview page for context windows and max output tokens². (The legacy docs.anthropic.com pricing URL now 301-redirects here.)
OpenAI: developers.openai.com/api/docs/pricing³, the canonical API pricing page; the consumer-facing openai.com/pricing pages are not machine-readable (HTTP 403) and are not used.
Google: ai.google.dev/gemini-api/docs/pricing⁴ plus per-model spec pages for context windows.
xAI: docs.x.ai/developers/models and the per-model pages⁵.
DeepSeek: api-docs.deepseek.com/quick_start/pricing⁶.
Mistral: mistral.ai/pricing⁷ and the model-overview page.

Where a vendor page does not publish a figure, the field is recorded as null in the accompanying data.csv and data.json. Historical rates are not reconstructed from memory or third-party pages. Three aggregator figures were checked against the official pages and rejected as wrong: a "$4/$24" GPT-5.5 reading, a "$2/$6, 2M context" Grok 4.3 reading, and a "$0.10/$0.30" Mistral Small 4 reading (a JS-render parse artifact).

Prices are in USD per 1M tokens (text input/output) unless stated. Batch API discounts are 50 percent where the vendor offers a batch tier; prompt caching uses each vendor's published multiplier relative to base input. The four worked scenarios are computed arithmetically from the master table, so a reader can reproduce every figure from the input and output rates alone.

Pricing changes. Vendors ship price cuts and new tiers on monthly or quarterly cadence. The snapshot is correct on the as-of date; readers running material workloads should re-pull the vendor pages before any spend decision.

Master pricing table

All figures pulled 2026-06-21. "MTok" means per million tokens. Empty cells indicate the figure is not published on the vendor pricing page.

Anthropic (Claude family)

Model	Input / MTok	Output / MTok	Cache read / MTok	5m cache write / MTok	1h cache write / MTok	Batch	Context	Max output
Claude Opus 4.8	$5.00	$25.00	$0.50	$6.25	$10.00	50%	1M	128K
Claude Fable 5	$10.00	$50.00	$1.00	$12.50	$20.00	50%	1M	128K
Claude Sonnet 4.6	$3.00	$15.00	$0.30	$3.75	$6.00	50%	1M	64K
Claude Haiku 4.5	$1.00	$5.00	$0.10	$1.25	$2.00	50%	200K	64K

Cache multipliers across the Claude family are constant: read at 0.1x input, 5-minute write at 1.25x input, 1-hour write at 2x input¹. There is no long-context premium on any current Claude model — the full 1M window bills at the standard rate. Opus 4.8 has a premium fast-mode research preview at $10/$50; the table above is the standard rate. Fable 5 (GA 2026-06-09) is the most capable Anthropic model overall and is priced at roughly 2x Opus 4.8.

OpenAI (GPT-5.5 and GPT-5.4 families)

Model	Input / MTok	Cached input / MTok	Output / MTok	Batch	Context	Max output
GPT-5.5	$5.00	$0.50	$30.00	50%	1.05M	128K
GPT-5.5 (prompts > 272K)	$10.00	$1.00	$45.00	50%	1.05M	128K
GPT-5.5 Pro	$30.00	—	$180.00	50%	1.05M	128K
GPT-5.4	$2.50	$0.25	$15.00	50%	1.05M	128K
GPT-5.4 mini	$0.75	$0.075	$4.50	50%	400K	128K
GPT-5.4 nano	$0.20	$0.02	$1.25	50%	400K	128K

The >272K-token pricing boundary still applies on GPT-5.5 and GPT-5.4: prompts above 272,000 input tokens are billed at 2x input and 1.5x output for the full session³. On GPT-5.5 that flips $5/$30 to $10/$45; on GPT-5.4, $2.50/$15 to $5/$22.50. The boundary does not apply to GPT-5.5 Pro, mini, or nano. There is no GPT-5.5 mini or nano — the small tiers stay on GPT-5.4. Cached input is billed at 0.1x standard input across the family. Older GPT-5 and o3 snapshots were noticed for deprecation on 2026-06-11 and leave the API on 2026-12-11.

Google (Gemini family)

Model	Input / MTok	Output / MTok	Cache read / MTok	Batch	Context	Max output
Gemini 3.5 Flash	$1.50	$9.00	$0.15	50%	1,048,576	65,536
Gemini 2.5 Pro (prompts <= 200K)	$1.25	$10.00	$0.125	50%	1,048,576	65,536
Gemini 2.5 Pro (prompts > 200K)	$2.50	$15.00	$0.25	50%	1,048,576	65,536
Gemini 2.5 Flash	$0.30	$2.50	$0.03	50%	1,048,576	65,536
Gemini 2.5 Flash-Lite	$0.10	$0.40	$0.01	50%	1,048,576	65,536

Gemini 3.5 Flash is GA and priced as an agent-tier reasoning model at $1.50/$9 — well above the 2.5 Flash economy tier, so it is not a drop-in cheap default⁴. Gemini 2.5 Pro keeps its two-tier structure: at the 200K input-token line the per-token rate doubles on input and scales 1.5x on output. Audio input carries a higher price ceiling on the Flash tiers; the figures above track the text / image / video rate that nearly all finance workloads hit. Context cache storage is billed separately (Pro at $4.50, Flash at $1.00 per 1M tokens per hour). The prior-generation Gemini 2.0 Flash and 2.0 Flash-Lite are retired (no longer served as of 2026-06-01) and are excluded from this table.

xAI (Grok)

Model	Input / MTok	Output / MTok	Cached input / MTok	Batch	Context	Max output
Grok 4.3	$1.25	$2.50	$0.20	—	1,000,000	—

Grok 4.3 (launched 2026-04-30) is xAI's current flagship, aliased grok-4.3-latest and grok-latest⁵. The entire 4.x lineup shares one price, and the historical context-tier premium is gone — the full 1M window bills at the flat rate. The output rate ($2.50) is conspicuously low against the frontier. xAI does not publish a max-output token figure or a batch-discount percentage; live-search and tool calls are billed separately per 1,000 calls. Grok 4, Grok 4 Fast, and Grok 4.1 Fast are no longer in the current lineup.

DeepSeek

Model	Input (cache miss) / MTok	Input (cache hit) / MTok	Output / MTok	Context	Max output
DeepSeek V4 Flash	$0.14	$0.0028	$0.28	1M	384K
DeepSeek V4 Pro	$0.435	$0.003625	$0.87	1M	384K

DeepSeek splits input into cache-miss and cache-hit rates rather than a separate cache-read line⁶. V4 ships as two SKUs: Flash for cost-sensitive throughput and Pro for harder reasoning. The V4 Pro list price is the standard rate — the 75 percent launch discount was made permanent around 2026-05-22, not a temporary promo. No off-peak discount is documented for V4; the legacy V3/R1 off-peak window does not appear on the current page, so it is reported as not publicly documented. The deepseek-chat and deepseek-reasoner model names deprecate 2026-07-24 and now alias V4 Flash modes.

Mistral

Model	Input / MTok	Output / MTok	Batch	Context
Mistral Medium 3.5	$1.50	$7.50	50%	128K
Mistral Large 3	$0.50	$1.50	50%	128K
Mistral Small 4	$0.15	$0.60	50%	128K

Mistral Medium 3.5 is the current premier flagship⁷. Small 4 is $0.15/$0.60 — the lower $0.10/$0.30 figure that some aggregators show is a JavaScript-render parse artifact and is not the published rate. Mistral does not publish max-output tokens. Mistral Medium 3.1 ($0.40/$2.00) is deprecated and retires 2026-08-31.

Worked scenario A: nightly SEC-filing triage

Setup: 500 10-Ks per night, 4 fact-extraction questions per filing, 50,000 input tokens per question (the filing plus extraction prompt), 1,000 output tokens per answer. 21 trading days per month.

Volume: 500 x 4 x 21 = 42,000 queries/month. 2,100 MTok input, 42 MTok output.

Model	Standard monthly cost	Batch monthly cost
Gemini 2.5 Flash-Lite	$226.80	$113.40
DeepSeek V4 Flash	$305.76	n/a
Mistral Small 4	$340.20	$170.10
GPT-5.4 nano	$472.50	$236.25
Gemini 2.5 Flash	$735.00	$367.50
Claude Haiku 4.5	$2,310.00	$1,155.00
Grok 4.3	$2,730.00	n/a
Gemini 2.5 Pro (<= 200K)	$3,045.00	$1,522.50
Mistral Medium 3.5	$3,465.00	$1,732.50
Gemini 3.5 Flash	$3,528.00	$1,764.00
Claude Sonnet 4.6	$6,930.00	$3,465.00
Claude Opus 4.8	$11,550.00	$5,775.00
GPT-5.5	$11,760.00	$5,880.00
Claude Fable 5	$23,100.00	$11,550.00

The spread between the cheapest option (Flash-Lite at $226.80) and the most expensive (Fable 5 at $23,100) on a triage workload is about 100x. A triage pipeline that uses a frontier model to read every 10-K sentence burns two orders of magnitude more cash than one that runs the fact-extraction pass on Flash-Lite, DeepSeek V4 Flash, or nano and escalates to a stronger model only on low confidence. DeepSeek V4 Flash and Grok 4.3 carry no published batch tier, so their batch column is n/a. The Token-Cost Optimizer and the Financial Document Token Estimator accept the same input shape as the table, so the numbers retarget to a custom universe or question count without leaving the site.

Caching changes the picture sharply. If the 10-K body is cached as a system-prompt prefix and reused across the four questions per filing, Anthropic's 0.1x cache-read multiplier drops the input share of Haiku 4.5 from $2,100 to about $210 on the three cache-hit passes per filing, which near-halves the monthly bill for that provider. The same lever is even sharper on DeepSeek V4, where cache-hit input is $0.0028/MTok against $0.14 cache-miss. Details belong in Prompt Caching Economics for Finance LLMs.

Worked scenario B: quarterly earnings-call summarization

Setup: 20 tickers, one earnings call per quarter, 50,000 input tokens per call (transcript plus context), 3,000 output tokens (structured summary). Quarterly volume: 20 calls, 1 MTok input, 0.06 MTok output.

Model	Per-quarter cost
Gemini 2.5 Flash-Lite	$0.12
DeepSeek V4 Flash	$0.16
Mistral Small 4	$0.19
GPT-5.4 nano	$0.28
Gemini 2.5 Flash	$0.45
Claude Haiku 4.5	$1.30
Grok 4.3	$1.40
Gemini 2.5 Pro (<= 200K)	$1.85
Mistral Medium 3.5	$1.95
Gemini 3.5 Flash	$2.04
Claude Sonnet 4.6	$3.90
Claude Opus 4.8	$6.50
GPT-5.5	$6.80
Claude Fable 5	$13.00

This workload is the classic case where token cost is a rounding error and output quality dominates the decision. A small-universe earnings-call loop faces a total quarterly bill of under $13 on every model listed; the difference between the cheapest and the most expensive is $12.88 per quarter. Choosing on price alone is a category error at this scale. The numbers only matter to show how the per-token spread compounds into scenarios A, C, and D, where it scales into five- and six-figure monthly variance. The 50,000-token call transcript sits inside every model's context window, so no tier boundary is crossed. Scaling from 20 tickers to 2,000 multiplies each figure by 100 and keeps the per-prompt token budget identical.

Worked scenario C: multi-document peer-comparison synthesis

Setup: 5 filings fed into a single call, 125,000 input tokens per run, 2,000 output tokens. Run daily across 21 business days. Monthly volume: 2.625 MTok input, 0.042 MTok output.

Model	Standard monthly cost	Batch monthly cost
Gemini 2.5 Flash-Lite	$0.28	$0.14
DeepSeek V4 Flash	$0.38	n/a
Mistral Small 4	$0.42	$0.21
GPT-5.4 nano	$0.58	$0.29
Gemini 2.5 Flash	$0.89	$0.45
Claude Haiku 4.5	$2.83	$1.42
Grok 4.3	$3.39	n/a
Gemini 2.5 Pro (<= 200K)	$3.70	$1.85
Mistral Medium 3.5	$4.25	$2.13
Gemini 3.5 Flash	$4.32	$2.16
Claude Sonnet 4.6	$8.51	$4.25
Claude Opus 4.8	$14.18	$7.09
GPT-5.5	$14.38	$7.19
Claude Fable 5	$28.35	$14.18

At 125,000 input tokens, the prompt fits inside the <= 200K Gemini Pro tier and inside the < 272K GPT-5.5 tier, so the cheaper per-token rate applies for both. Push the peer set from 5 to 10 filings and per-run input becomes 250,000 tokens, which crosses the Gemini Pro 200K boundary (input doubles, output 1.5x) and the GPT-5.5 272K boundary (input 2x, output 1.5x). A practitioner who designs the synthesis step to sit just under those boundaries extracts a real per-call discount; one who lets input drift over surprises the bill reviewer. On Claude Sonnet 4.6, Opus 4.8, Fable 5, and Grok 4.3, a 125K-token prompt is inside a flat-priced window with no kink — the bill scales linearly with tokens, which is the strongest single argument for these models on long-context synthesis.

Worked scenario D: intraday agent research loop

Setup: 10 research ideas per day, 8,000 input tokens per idea, 1,000 output tokens. Monthly volume over 30 days: 2.4 MTok input, 0.3 MTok output.

Model	Standard monthly cost	Batch monthly cost
Gemini 2.5 Flash-Lite	$0.36	$0.18
DeepSeek V4 Flash	$0.42	n/a
Mistral Small 4	$0.54	$0.27
GPT-5.4 nano	$0.85	$0.43
Gemini 2.5 Flash	$1.47	$0.74
Grok 4.3	$3.75	n/a
Claude Haiku 4.5	$3.90	$1.95
Mistral Medium 3.5	$5.85	$2.93
Gemini 2.5 Pro (<= 200K)	$6.00	$3.00
Gemini 3.5 Flash	$6.30	$3.15
Claude Sonnet 4.6	$11.70	$5.85
Claude Opus 4.8	$19.50	$9.75
GPT-5.5	$21.00	$10.50
Claude Fable 5	$39.00	$19.50

Agent loops with tight per-idea budgets are the tier where batch becomes awkward: a real-time loop (fire on signal, consume a recommendation within minutes) cannot use the 50 percent batch discount, which applies only to asynchronous submissions with 24-hour SLAs. Real-time is the right column; batch is a reference for off-hours overnight re-scoring. Grok 4.3 stands out here: its flat $1.25/$2.50 rate plus the low output price keep a 10-idea loop near $3.75/month, comparable to Haiku and below every Gemini Pro / frontier option. The Agent Cost Envelope Calculator models scenario D at arbitrary idea volume and context size; the Batch vs Realtime Cost Calculator is built for the boundary case.

Which tier wins for which workload

Mapped from the four worked scenarios against the master table. "Wins" means lowest total cost at published rates for the given input and output volumes; it does not mean best output quality, which is not measured here.

Workload archetype	Cheapest at published rates	Runner-up
High-volume triage (scenario A)	Gemini 2.5 Flash-Lite	DeepSeek V4 Flash
Low-volume summarization (scenario B)	Gemini 2.5 Flash-Lite	DeepSeek V4 Flash
Mid-context synthesis under 200K (scenario C)	Gemini 2.5 Flash-Lite	DeepSeek V4 Flash
Real-time agent loop (scenario D)	Gemini 2.5 Flash-Lite	DeepSeek V4 Flash
Flat-rate long-context (>200K input)	Claude Sonnet 4.6	Grok 4.3 (flat 1M, low output rate)
Cheap reasoning at frontier-ish quality	Grok 4.3 ($1.25/$2.50)	DeepSeek V4 Pro
Deep-reasoning escalation tier	Claude Opus 4.8 ($5/$25)	GPT-5.5 ($5/$30)
Top-of-stack maximum capability	Claude Fable 5 ($10/$50)	GPT-5.5 Pro ($30/$180)

Two caveats sit on top of this table. First, a triage-tier model's price advantage is irrelevant if it hallucinates on a filing excerpt and a downstream step acts on it, which is why the Hallucination Detector and the Price-Blind Auditor exist. Second, the "cheapest" column assumes deterministic token counts; agent loops with variable output lengths need probabilistic cost envelopes, not point estimates. The Model Selector for Finance walks through both modifiers.

Caveats and limitations

This release is a pricing inventory, not a performance ranking or a buy-side recommendation. A small operator choosing a model needs four inputs: per-token cost, output quality on the specific task, latency, and rate-limit ceiling. This document covers only the first axis, and only the published surface of that axis.

Vendor pricing changes on monthly and quarterly cadence. Between Q2 and Q3 alone, OpenAI shipped GPT-5.5, Anthropic shipped Opus 4.8 and Fable 5, Google brought Gemini 3.5 Flash to GA, and three providers (xAI, DeepSeek, Mistral) were added here for the first time. Quality is not measured: two models at the same per-token rate can differ by a factor of two on the fraction of 10-K answers that are factually correct, which is the only question that matters for an agent wired into an execution loop. The correct way to measure that is an eval harness run against a labeled task set on the caller's own data; a minimal reference lives in Eval Harness for Finance LLMs. Quality rankings published without an eval attached are marketing, not evidence.

A few cells are reported as not publicly documented rather than guessed: xAI's max-output token figure and batch discount; Mistral's max-output tokens; DeepSeek's off-peak discount window for V4. Enterprise pricing differs from the self-serve rates tabled above; above roughly $10,000 to $50,000 monthly spend a re-negotiation is standard and the table rates are starting points, not floors. Refresh the vendor pricing pages before any material spend decision.

Data downloads

Master pricing dataset (CSV): /benchmarks/state-of-llm-pricing-for-finance-q3-2026/data.csv
Master pricing dataset (JSON): /benchmarks/state-of-llm-pricing-for-finance-q3-2026/data.json

Both files carry the same 21 model rows across six providers, each with its source URL and as-of date.

Connects to

State of LLM Pricing for Finance Q2 2026: the prior snapshot this release supersedes, kept as a historical record.
The 2026 Engineer's Guide to AI in Markets: pillar context for where pricing fits into a full AI-in-markets stack.
Reading Financial Filings with LLMs (2026): filing-scale prompts and why the token budget matters.
Best LLM for Financial Analysis 2026: the task-tiered model picker that consumes these rates.
Claude vs GPT-5 vs Gemini for Finance Pricing 2026: the head-to-head narrative across the three biggest vendors.
Prompt Caching Economics for Finance LLMs: cache-read and cache-write mechanics applied to SEC-filing workflows.
Model Selection Framework for Finance: a decision tree pairing model choice with task type and budget envelope.
Batch API Economics for Finance LLMs: when the 50-percent batch discount is reachable and when it is not.
Eval Harness for Finance LLMs: the quality axis this release deliberately does not measure.
State of AI Market Data 2026: the sister benchmark covering market-data vendors, MCP servers, and retail adoption signals.
Financial Document Token Estimator: retargets scenario A to a custom filing universe.
Model Selector for Finance: a selection tool parameterised on the master table above.
Batch vs Realtime Cost Calculator: the boundary case between real-time and batch submissions.
Token-Cost Optimizer: the existing calculator that consumes the master rates for arbitrary workloads.
Agent Cost Envelope Calculator: probabilistic envelope for agent loops with variable output length.

References

Anthropic. "Prompt caching." platform.claude.com/docs/en/build-with-claude/prompt-caching. 5-minute and 1-hour cache duration mechanics.
OpenAI. "Batch API." platform.openai.com/docs/guides/batch. Async submission format and 24-hour SLA for the 50-percent discount.
Google. "Batch mode." ai.google.dev/gemini-api/docs/batch-mode. Batch request format and 50-percent discount.

Anthropic. "Pricing." platform.claude.com/docs/en/about-claude/pricing (accessed 2026-06-21). Model-pricing table, batch-processing table, prompt-caching multiplier table. ↩ ↩²
Anthropic. "Models overview." platform.claude.com/docs/en/about-claude/models/overview (accessed 2026-06-21). Context windows and max-output tokens for Opus 4.8, Fable 5, Sonnet 4.6, and Haiku 4.5. ↩
OpenAI. "API pricing." developers.openai.com/api/docs/pricing (accessed 2026-06-21). Standard, batch, and cached-input rows for GPT-5.5, GPT-5.5 Pro, GPT-5.4 / mini / nano, and the >272K long-context tier. ↩ ↩²
Google. "Gemini API pricing." ai.google.dev/gemini-api/docs/pricing (accessed 2026-06-21). Standard and batch rates for Gemini 3.5 Flash, 2.5 Pro (<=200K and >200K tiers), 2.5 Flash, and 2.5 Flash-Lite, plus context-cache storage rates. ↩ ↩²
xAI. "Models." docs.x.ai/developers/models and the grok-4.3 model page (accessed 2026-06-21). Input, output, cached-input rates and 1M context window for Grok 4.3. ↩ ↩²
DeepSeek. "Pricing." api-docs.deepseek.com/quick_start/pricing (accessed 2026-06-21). Cache-hit / cache-miss input and output rates for DeepSeek V4 Flash and V4 Pro. ↩ ↩²
Mistral AI. "Pricing." mistral.ai/pricing and the models overview page (accessed 2026-06-21). Input and output rates for Mistral Medium 3.5, Large 3, and Small 4. ↩ ↩²