Comparator

Model Selector for Finance

Model selector finance: pick the right LLM for extract, summarize, forecast, compare, rank, synthesize — cost, latency, context, quality axes.

AI Fin Hub Research Published Apr 23, 2026 Methodology Corrections

Transparent by design — computed in your browser from a published formula and sourced rates, not a black box. Data verified May 25, 2026. Sources: Anthropic pricing ↗ · OpenAI pricing ↗ · Google AI / Gemini pricing ↗ Full methodology →

Inputs: Scenario form
Runtime: Instant
Privacy: Client-side · no upload
API key: Not required
Methodology: Open →

Education · Not investment advice. BaFin/EU framework. Past performance does not indicate future results. Editorial standards Sponsor disclosure Corrections

Recommended model

Gemini 2.5 Flash-Lite

$3/mo

google · haiku tier · 1M ctx

Reference workload: 6,000 in / 1,200 out × 3,000 calls/mo.

1 · Configure your task profile

Task typeLatency budgetCost budgetContext-size needQuality sensitivity

Reference workload for cost fit: 6,000 in / 1,200 out × 3,000 calls/mo. See methodology.

2 · Top 3 recommendations

#1google

Gemini 2.5 Flash-Lite

haiku tier · 1M ctx

Cheapest published rate in this table, with 1M context. Built for the highest-volume extraction tiers. Reference monthly spend at this tool's default workload is ~$3, within the $50/mo budget. Published context window 1M covers the 32K–200K requirement. Vendor positions the Haiku tier for summarize workloads.

Vendor pricing →

#2openai

GPT-5.4 mini

sonnet tier · 256K ctx

Mid-tier OpenAI. 256K context at a sub-sonnet input rate. Reference monthly spend at this tool's default workload is ~$30, within the $50/mo budget. Published context window 256K covers the 32K–200K requirement. Vendor positions the Sonnet tier for summarize workloads.

Vendor pricing →

#3google

Gemini 2.5 Flash

haiku tier · 1M ctx

Fast mid-tier with 1M context. Positioned for high-throughput pipelines. Reference monthly spend at this tool's default workload is ~$14, within the $50/mo budget. Published context window 1M covers the 32K–200K requirement. Vendor positions the Haiku tier for summarize workloads.

Vendor pricing →

Published-rate-based; verify with your own eval harness (see D1 — Eval harness for finance LLMs).

3 · Full ranked list with why-not notes

#1Gemini 2.5 Flash-Litegoogle · haiku

score 86

Passes all gates; simply outranked by a model with better combined fit.

#2GPT-5.4 miniopenai · sonnet

score 84

Passes all gates; simply outranked by a model with better combined fit.

#3Gemini 2.5 Flashgoogle · haiku

score 83

Passes all gates; simply outranked by a model with better combined fit.

#4Claude Haiku 4.5anthropic · haiku

score 76

Passes all gates; simply outranked by a model with better combined fit.

#5Claude Sonnet 4.6anthropic · sonnetgate failed

score 13

Over the chosen cost budget at default workload.

#6o4-mini (reasoning)openai · sonnetgate failed

score 1

Over the chosen cost budget at default workload.

#7Claude Opus 4.8anthropic · opusgate failed

score 0

Over the chosen cost budget at default workload.

#8GPT-5.5openai · opusgate failed

score 0

Over the chosen cost budget at default workload.

#9Gemini 2.5 Progoogle · opusgate failed

score 0

Over the chosen cost budget at default workload.

#10Gemini 3.5 Flashgoogle · opusgate failed

score 0

Over the chosen cost budget at default workload.

4 · Per-axis comparison (all models)

Model	Input $/1M	Output $/1M	Context	Thinking	Ref $/mo	Cost	Latency	Ctx	Capability
Gemini 2.5 Flash-Lite	$0.10	$0.40	1M	—	$3	pass	pass	pass	pass
GPT-5.4 mini	$0.75	$4.50	256K	—	$30	pass	pass	pass	pass
Gemini 2.5 Flash	$0.30	$2.50	1M	—	$14	pass	pass	pass	pass
Claude Haiku 4.5	$1.00	$5.00	200K	—	$36	pass	pass	pass	pass
Claude Sonnet 4.6	$3.00	$15.00	1M	yes	$108	fail	pass	pass	pass
o4-mini (reasoning)	$3.00	$12.00	200K	yes	$97	fail	pass	pass	fail
Claude Opus 4.8	$5.00	$25.00	1M	yes	$180	fail	fail	pass	fail
GPT-5.5	$5.00	$30.00	400K	yes	$198	fail	fail	pass	fail
Gemini 2.5 Pro	$1.25	$10.00	2M	yes	$58	fail	fail	pass	pass
Gemini 3.5 Flash	$1.50	$9.00	1M	yes	$59	fail	fail	pass	fail

Hover cells for the axis note. Rates and context windows sourced from vendor pricing pages, as-of 2026-04-23.

Scoring framework

score = cost_match + latency_match + context_match
      + capability_bonus + quality_boost
cost_match    : 0 if monthly estimate > budget ceiling
latency_match : 0 if tier slower than latency budget
context_match : 0 if context window < required
capability    : bonus if task ∈ model.best_for
quality       : boost flagship tiers when quality = high

Deliberately no accuracy numbers. See methodology for why, and the framework article for deeper rationale.

How to use

Step-by-step

Full calculator guide →

1
Enter task type, accuracy requirement (acceptable percentage), latency budget (max acceptable response time), and monthly call volume.
2
Read the recommended model with the cost and latency it implies.
3
Toggle the cost-vs-latency-vs-accuracy axes to see the Pareto frontier — there are usually 2-3 reasonable choices, not one.
4
Cross-check the recommendation against the methodology page's per-task accuracy benchmarks.
5
Re-run when you scale call volume by 5x or more — the cost-optimal model often changes at scale.

For agents

Use in an agent

Same math, same result shape as the UI above — as a static ES module. No HTTP request, no auth, no rate limit.

import { compute } from "https://aifinhub.io/engines/model-selector-finance.js";

Contract: /contracts/model-selector-finance.json Full agent guide →

Glossary references

Terms used by this tool

All glossary →

Questions people ask next

FAQ

How does the selector recommend a model?

Three criteria documented on the methodology page: task fit (does the model class hit acceptable accuracy on this task type?), cost envelope (does the call volume × per-call cost fit budget?), and latency budget (does the model respond fast enough?). The tool shows the Pareto frontier across all three so you can see tradeoffs explicitly.

Why is Claude Sonnet recommended for most workloads?

Three reasons: (1) it hits 90%+ accuracy on most finance task types in the evaluation suite, (2) cost sits in the mid tier — well below Opus 4.8 and GPT-5.5 at the top, above Haiku and Gemini Flash-Lite at the bottom, (3) latency is competitive. Opus is recommended only when accuracy must be at the absolute top end (legal, regulatory, large position sizing). Haiku is recommended when latency is the top constraint.

When does the selector recommend a GPT model over Claude?

When the task requires strong tool-use chains (function calling) at scale — OpenAI's tool-use protocol has more deployed examples. Or when latency at high concurrency is critical — OpenAI's infrastructure has lower tail latency at high volume. Both are deployment-pattern reasons, not capability differences.

Are open-source models considered?

Yes — Llama 3.1 70B, Mistral Large, and Qwen 2.5 72B are scored on the same suite. They typically lag the frontier closed models by 10-20% on finance tasks but cost 5-10× less when self-hosted. The selector recommends them when cost dominates and accuracy is acceptable at the lower band.

How often is the selector updated?

Every 90 days, or when a major model is released (Claude 4.x, GPT-5, Gemini 2.x). Each update re-runs the full eval suite and revises recommendations. The methodology page shows update history.

Related deep dive

All articles →

Read further

Long-form context behind the tool output.

Complementary tools

Token-Cost Optimizer

Compute the dollar cost of a trading research loop across Claude, GPT, and Gemini. Prompt length × model × retry × call volume → cost per idea and per.

Calculators Open

Financial Document Token Estimator

Paste a 10-K, 10-Q, 8-K or earnings transcript and see token count + one-pass extraction cost across eight frontier LLMs, with cache-hit toggle.

Calculators Open

Batch vs Real-Time Cost Calculator

Jobs per day, tokens per job, model, deadline — get real-time vs batch cost side-by-side with savings estimate and batch-eligibility flag. Based.

Calculators Open

Planning estimates only — not financial, tax, or investment advice.