LLM Cost Trends for Finance Workloads Statistics
LLM inference prices for a fixed capability level have fallen between 9 and 900 times per year since 2022, with a median around 40x per year. That rate means a cost estimate from six months ago is already stale. Each figure below measures price-per-capability-level from named research, with source and year; none was generated by this site. These are capability-normalized prices, not single-model prices, and the rate varies by task, so re-run your own cost model before committing a budget.
On This Page
Statistics
The numbers worth quoting
The cost to query a model at GPT-3.5-level performance (64.8% on MMLU) fell about 280-fold, from roughly 20 US dollars per million tokens in November 2022 to about 0.07 US dollars per million tokens by October 2024
The AI Index defines the capability level by MMLU accuracy and tracks the cheapest model meeting it over time. The roughly 18-month window saw the price fall by more than two orders of magnitude.
Across benchmarks, LLM inference prices for a fixed capability have fallen between 9 and 900 times per year, with a median of about 50 times per year
Epoch measured price-to-reach-capability across six benchmarks including MMLU, GPQA Diamond, MATH, HumanEval, and Chatbot Arena ELO. The wide range reflects that some tasks commoditize far faster than others.
Restricting to data from January 2024 onward, the median annual price decline rose to about 200 times per year
Epoch notes the fastest declines (toward the 900x end) begin after January 2024 and cautions these rapid rates may not persist, so they should not be extrapolated linearly.
The price to match GPT-4's performance on a set of PhD-level science questions fell by about 40 times per year
This single-benchmark figure illustrates that even hard, reasoning-heavy tasks (the kind closest to financial analysis) saw order-of-magnitude annual cost declines, though slower than the easiest tasks.
Retrieval-augmented generation adds roughly 5 percentage points of accuracy and fine-tuning roughly 6 percentage points on a domain task, with the gains stacking
Cost-relevant because the cheaper-to-maintain option (RAG) captured most of the accuracy gain of the more expensive option (fine-tuning) in this study, informing build-versus-retrieve cost decisions for domain workloads.
Key Takeaways
Methodology
Figures are drawn from the Stanford AI Index and Epoch AI, each reported with its source and year. All figures measure price-to-reach a fixed capability level rather than the list price of a single named model, and rates differ by benchmark. No statistic on this page is derived from data collected by this site.
Try These Tools
Run the numbers next
Token-Cost Optimizer
Compute the dollar cost of a trading research loop across Claude, GPT, and Gemini. Prompt length × model × retry × call volume → cost per idea and per.
Batch vs Real-Time Cost Calculator
Jobs per day, tokens per job, model, deadline — get real-time vs batch cost side-by-side with savings estimate and batch-eligibility flag. Based.
Agent Cost Envelope Calculator
Model an LLM research loop end-to-end — steps, tool calls, convergence checks, markets per day — and see per-loop, daily, and monthly cost with cost-cap.
Sources & References
- Artificial Intelligence Index Report 2025 — Stanford Institute for Human-Centered AI (2025)
- LLM inference prices have fallen rapidly but unequally across tasks — Epoch AI (2025)
- RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture — Angels Balaguer et al., Microsoft (2024)
Related Content
Keep the topic connected
How to Cut LLM Token Cost in a Finance Agent
Cut LLM token cost in a finance agent: right-size the model, cache the stable prefix, trim context, batch deferrable work, measure cost per decision.
Agent-Cost Envelope
The agent-cost envelope: the loop of (calls × tokens × retries × model_price) that determines the dollar cost of an LLM-driven trading agent per decision.
Generative AI Spend in Financial Services Statistics
Generative-AI spend statistics for financial services: AI budget share, ROI self-estimates, bank tech-budget figures, and production use-case counts.