AI in Markets Benchmarks

LLM Cost Trends for Finance Workloads Statistics

LLM inference prices for a fixed capability level have fallen between 9 and 900 times per year since 2022, with a median around 40x per year. That rate means a cost estimate from six months ago is already stale. Each figure below measures price-per-capability-level from named research, with source and year; none was generated by this site. These are capability-normalized prices, not single-model prices, and the rate varies by task, so re-run your own cost model before committing a budget.

5 STATSPublished May 26, 2026Live Content

By AI Fin Hub Research · AI Fin Hub Team

On This Page

5 stats Takeaways Methodology

Statistics

The numbers worth quoting

The cost to query a model at GPT-3.5-level performance (64.8% on MMLU) fell about 280-fold, from roughly 20 US dollars per million tokens in November 2022 to about 0.07 US dollars per million tokens by October 2024

The AI Index defines the capability level by MMLU accuracy and tracks the cheapest model meeting it over time. The roughly 18-month window saw the price fall by more than two orders of magnitude.

Source Stanford HAI, Artificial Intelligence Index Report 2025

Across benchmarks, LLM inference prices for a fixed capability have fallen between 9 and 900 times per year, with a median of about 50 times per year

Epoch measured price-to-reach-capability across six benchmarks including MMLU, GPQA Diamond, MATH, HumanEval, and Chatbot Arena ELO. The wide range reflects that some tasks commoditize far faster than others.

Source Epoch AI, LLM inference price trends

Restricting to data from January 2024 onward, the median annual price decline rose to about 200 times per year

Epoch notes the fastest declines (toward the 900x end) begin after January 2024 and cautions these rapid rates may not persist, so they should not be extrapolated linearly.

Source Epoch AI, LLM inference price trends

The price to match GPT-4's performance on a set of PhD-level science questions fell by about 40 times per year

This single-benchmark figure illustrates that even hard, reasoning-heavy tasks (the kind closest to financial analysis) saw order-of-magnitude annual cost declines, though slower than the easiest tasks.

Source Epoch AI, LLM inference price trends

Retrieval-augmented generation adds roughly 5 percentage points of accuracy and fine-tuning roughly 6 percentage points on a domain task, with the gains stacking

Cost-relevant because the cheaper-to-maintain option (RAG) captured most of the accuracy gain of the more expensive option (fine-tuning) in this study, informing build-versus-retrieve cost decisions for domain workloads.

Source Balaguer et al., RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture (Microsoft)

Key Takeaways

Cost to reach a fixed capability level has fallen by orders of magnitude per year, not single-digit percentages.

The Stanford AI Index put the GPT-3.5-level query cost drop at about 280-fold over roughly 18 months.

Decline rates vary hugely by task (9x to 900x per year), so a single blended assumption misleads.

Reasoning-heavy tasks closest to financial analysis still fell about 40x per year but slower than easy tasks.

Re-run finance LLM cost estimates each quarter; the curve moves faster than annual budgeting cycles.

Methodology

Figures are drawn from the Stanford AI Index and Epoch AI, each reported with its source and year. All figures measure price-to-reach a fixed capability level rather than the list price of a single named model, and rates differ by benchmark. No statistic on this page is derived from data collected by this site.

Try These Tools

Run the numbers next

CalculatorsCalculator

Token-Cost Optimizer

Compute the dollar cost of a trading research loop across Claude, GPT, and Gemini. Prompt length × model × retry × call volume → cost per idea and per.

Launch toolOpen ->

CalculatorsCalculator

Batch vs Real-Time Cost Calculator

Jobs per day, tokens per job, model, deadline — get real-time vs batch cost side-by-side with savings estimate and batch-eligibility flag. Based.

Launch toolOpen ->

CalculatorsCalculator

Agent Cost Envelope Calculator

Model an LLM research loop end-to-end — steps, tool calls, convergence checks, markets per day — and see per-loop, daily, and monthly cost with cost-cap.

Launch toolOpen ->

Sources & References

Artificial Intelligence Index Report 2025 — Stanford Institute for Human-Centered AI (2025)
LLM inference prices have fallen rapidly but unequally across tasks — Epoch AI (2025)
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture — Angels Balaguer et al., Microsoft (2024)

Keep the topic connected

AI in Markets9 MIN READ

How to Cut LLM Token Cost in a Finance Agent

Cut LLM token cost in a finance agent: right-size the model, cache the stable prefix, trim context, batch deferrable work, measure cost per decision.

Keep readingRead ->

AI in Markets1 FAQS

Agent-Cost Envelope

The agent-cost envelope: the loop of (calls × tokens × retries × model_price) that determines the dollar cost of an LLM-driven trading agent per decision.

Keep readingRead ->

AI in Markets7 STATS

Generative AI Spend in Financial Services Statistics

Generative-AI spend statistics for financial services: AI budget share, ROI self-estimates, bank tech-budget figures, and production use-case counts.

Keep readingRead ->