Comparator
Model Selector for Finance
Model selector finance: pick the right LLM for extract, summarize, forecast, compare, rank, synthesize — cost, latency, context, quality axes.
Transparent by design — computed in your browser from a published formula and sourced rates, not a black box. Data verified May 25, 2026. Sources: Anthropic pricing ↗ · OpenAI pricing ↗ · Google AI / Gemini pricing ↗ Full methodology →
- Inputs
- Scenario form
- Runtime
- Instant
- Privacy
- Client-side · no upload
- API key
- Not required
- Methodology
- Open →
Recommended model
Gemini 2.5 Flash-Lite
$3/mo
google · haiku tier · 1M ctx
Reference workload: 6,000 in / 1,200 out × 3,000 calls/mo.
1 · Configure your task profile
Reference workload for cost fit: 6,000 in / 1,200 out × 3,000 calls/mo. See methodology.
2 · Top 3 recommendations
Gemini 2.5 Flash-Lite
haiku tier · 1M ctx
Cheapest published rate in this table, with 1M context. Built for the highest-volume extraction tiers. Reference monthly spend at this tool's default workload is ~$3, within the $50/mo budget. Published context window 1M covers the 32K–200K requirement. Vendor positions the Haiku tier for summarize workloads.
Vendor pricing →GPT-5.4 mini
sonnet tier · 256K ctx
Mid-tier OpenAI. 256K context at a sub-sonnet input rate. Reference monthly spend at this tool's default workload is ~$30, within the $50/mo budget. Published context window 256K covers the 32K–200K requirement. Vendor positions the Sonnet tier for summarize workloads.
Vendor pricing →Gemini 2.5 Flash
haiku tier · 1M ctx
Fast mid-tier with 1M context. Positioned for high-throughput pipelines. Reference monthly spend at this tool's default workload is ~$14, within the $50/mo budget. Published context window 1M covers the 32K–200K requirement. Vendor positions the Haiku tier for summarize workloads.
Vendor pricing →Published-rate-based; verify with your own eval harness (see D1 — Eval harness for finance LLMs).
3 · Full ranked list with why-not notes
Passes all gates; simply outranked by a model with better combined fit.
Passes all gates; simply outranked by a model with better combined fit.
Passes all gates; simply outranked by a model with better combined fit.
Passes all gates; simply outranked by a model with better combined fit.
Over the chosen cost budget at default workload.
Over the chosen cost budget at default workload.
Over the chosen cost budget at default workload.
Over the chosen cost budget at default workload.
Over the chosen cost budget at default workload.
Over the chosen cost budget at default workload.
4 · Per-axis comparison (all models)
| Model | Input $/1M | Output $/1M | Context | Thinking | Ref $/mo | Cost | Latency | Ctx | Capability |
|---|---|---|---|---|---|---|---|---|---|
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M | — | $3 | pass | pass | pass | pass |
| GPT-5.4 mini | $0.75 | $4.50 | 256K | — | $30 | pass | pass | pass | pass |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M | — | $14 | pass | pass | pass | pass |
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K | — | $36 | pass | pass | pass | pass |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M | yes | $108 | fail | pass | pass | pass |
| o4-mini (reasoning) | $3.00 | $12.00 | 200K | yes | $97 | fail | pass | pass | fail |
| Claude Opus 4.8 | $5.00 | $25.00 | 1M | yes | $180 | fail | fail | pass | fail |
| GPT-5.5 | $5.00 | $30.00 | 400K | yes | $198 | fail | fail | pass | fail |
| Gemini 2.5 Pro | $1.25 | $10.00 | 2M | yes | $58 | fail | fail | pass | pass |
| Gemini 3.5 Flash | $1.50 | $9.00 | 1M | yes | $59 | fail | fail | pass | fail |
Hover cells for the axis note. Rates and context windows sourced from vendor pricing pages, as-of 2026-04-23.
Scoring framework
score = cost_match + latency_match + context_match
+ capability_bonus + quality_boost
cost_match : 0 if monthly estimate > budget ceiling
latency_match : 0 if tier slower than latency budget
context_match : 0 if context window < required
capability : bonus if task ∈ model.best_for
quality : boost flagship tiers when quality = highDeliberately no accuracy numbers. See methodology for why, and the framework article for deeper rationale.
How to use
Step-by-step
- 1
Enter task type, accuracy requirement (acceptable percentage), latency budget (max acceptable response time), and monthly call volume.
- 2
Read the recommended model with the cost and latency it implies.
- 3
Toggle the cost-vs-latency-vs-accuracy axes to see the Pareto frontier — there are usually 2-3 reasonable choices, not one.
- 4
Cross-check the recommendation against the methodology page's per-task accuracy benchmarks.
- 5
Re-run when you scale call volume by 5x or more — the cost-optimal model often changes at scale.
For agents
Use in an agent
Same math, same result shape as the UI above — as a static ES module. No HTTP request, no auth, no rate limit.
import { compute } from "https://aifinhub.io/engines/model-selector-finance.js"; Contract: /contracts/model-selector-finance.json Full agent guide →
Glossary references
Terms used by this tool
Questions people ask next
FAQ
How does the selector recommend a model?
Three criteria documented on the methodology page: task fit (does the model class hit acceptable accuracy on this task type?), cost envelope (does the call volume × per-call cost fit budget?), and latency budget (does the model respond fast enough?). The tool shows the Pareto frontier across all three so you can see tradeoffs explicitly.
Why is Claude Sonnet recommended for most workloads?
Three reasons: (1) it hits 90%+ accuracy on most finance task types in the evaluation suite, (2) cost sits in the mid tier — well below Opus 4.8 and GPT-5.5 at the top, above Haiku and Gemini Flash-Lite at the bottom, (3) latency is competitive. Opus is recommended only when accuracy must be at the absolute top end (legal, regulatory, large position sizing). Haiku is recommended when latency is the top constraint.
When does the selector recommend a GPT model over Claude?
When the task requires strong tool-use chains (function calling) at scale — OpenAI's tool-use protocol has more deployed examples. Or when latency at high concurrency is critical — OpenAI's infrastructure has lower tail latency at high volume. Both are deployment-pattern reasons, not capability differences.
Are open-source models considered?
Yes — Llama 3.1 70B, Mistral Large, and Qwen 2.5 72B are scored on the same suite. They typically lag the frontier closed models by 10-20% on finance tasks but cost 5-10× less when self-hosted. The selector recommends them when cost dominates and accuracy is acceptable at the lower band.
How often is the selector updated?
Every 90 days, or when a major model is released (Claude 4.x, GPT-5, Gemini 2.x). Each update re-runs the full eval suite and revises recommendations. The methodology page shows update history.
Related deep dive
All articles →Read further
Long-form context behind the tool output.
- Comparison · Benchmark·10 min
Financial QA LLM Benchmarks 2026: FinanceBench & Fin-RATE
Financial QA LLM benchmarks 2026: FinQA, FinanceBench, DocFinQA, and Fin-RATE leaderboard scores, plus whole-filing read costs verified 2026-06-17.
Read - Comparison · Benchmark·12 min
Model Selection Framework for Finance Tasks
A task × latency × cost × context decision tree for finance LLM workloads. Ten concrete scenarios mapped to tier bands. Grounded in published pricing, not.
Read - Methodology · Opinion·11 min
Thinking Tokens for Finance Tasks
When extended-thinking and reasoning-effort modes earn their 3-10x cost tax on finance workloads — and when they are a silent drain on the budget.
Read
Complementary tools
Users of this tool often explore
Token-Cost Optimizer
Compute the dollar cost of a trading research loop across Claude, GPT, and Gemini. Prompt length × model × retry × call volume → cost per idea and per.
Financial Document Token Estimator
Paste a 10-K, 10-Q, 8-K or earnings transcript and see token count + one-pass extraction cost across eight frontier LLMs, with cache-hit toggle.
Batch vs Real-Time Cost Calculator
Jobs per day, tokens per job, model, deadline — get real-time vs batch cost side-by-side with savings estimate and batch-eligibility flag. Based.