Skip to main content
aifinhub

Worked example

Running the shipped model-selector-finance engine on the input below produces exactly this output. Continuous integration recomputes it against the engine bundle on every build, so these numbers cannot drift from the code.

Input

{
  "tool": "model_selector_finance",
  "task": "extract",
  "latency": "sub_5s",
  "cost": "b50",
  "context": "k32_200k",
  "quality": "medium"
}

Output

{
  "ranked": [
    {
      "model": {
        "id": "gemini-2-5-flash-lite",
        "name": "Gemini 2.5 Flash-Lite",
        "provider": "google",
        "tier": "haiku",
        "inputRate": 0.1,
        "outputRate": 0.4,
        "contextWindow": 1000000,
        "supportsThinking": false,
        "bestFor": [
          "extract",
          "summarize"
        ],
        "docsUrl": "https://ai.google.dev/pricing",
        "positioning": "Cheapest published rate in this table, with 1M context. Built for the highest-volume extraction tiers."
      },
      "score": 86.02799999999999,
      "rationale": "Cheapest published rate in this table, with 1M context. Built for the highest-volume extraction tiers. Reference monthly spend at this tool's default workload is ~$3, within the $50/mo budget. Published context window 1M covers the 32K–200K requirement. Vendor positions the Haiku tier for extract workloads.",
      "whyNot": "Passes all gates; simply outranked by a model with better combined fit.",
      "axes": [
        {
          "axis": "cost",
          "pass": true,
          "note": "Fits in $50/mo at reference workload (~$3/mo)."
        },
        {
          "axis": "latency",
          "pass": true,
          "note": "Haiku tier fits a < 5s budget."
        },
        {
          "axis": "context",
          "pass": true,
          "note": "1M window covers 32K–200K need."
        },
        {
          "axis": "capability",
          "pass": true,
          "note": "Vendor-positioned for extract workloads at this tier."
        },
        {
          "axis": "quality",
          "pass": true,
          "note": "Medium-quality budget; Haiku tier fits."
        }
      ],
      "monthlyBudgetEstimate": 3.24,
      "disqualified": false
    },
    {
      "model": {
        "id": "gpt-5-mini",
        "name": "GPT-5.4 mini",
        "provider": "openai",
        "tier": "sonnet",
        "inputRate": 0.75,
        "outputRate": 4.5,
        "contextWindow": 256000,
        "supportsThinking": false,
        "bestFor": [
          "summarize",
          "extract",
          "compare"
        ],
        "docsUrl": "https://openai.com/api/pricing/",
        "positioning": "Mid-tier OpenAI. 256K context at a sub-sonnet input rate."
      },
      "score": 84.09,
      "rationale": "Mid-tier OpenAI. 256K context at a sub-sonnet input rate. Reference monthly spend at this tool's default workload is ~$30, within the $50/mo budget. Published context window 256K covers the 32K–200K requirement. Vendor positions the Sonnet tier for extract workloads.",
      "whyNot": "Passes all gates; simply outranked by a model with better combined fit.",
      "axes": [
        {
          "axis": "cost",
          "pass": true,
          "note": "Fits in $50/mo at reference workload (~$30/mo)."
        },
        {
          "axis": "latency",
          "pass": true,
          "note": "Sonnet tier fits a < 5s budget."
        },
        {
          "axis": "context",
          "pass": true,
          "note": "256K window covers 32K–200K need."
        },
        {
          "axis": "capability",
          "pass": true,
          "note": "Vendor-positioned for extract workloads at this tier."
        },
        {
          "axis": "quality",
          "pass": true,
          "note": "Medium-quality budget; Sonnet tier fits."
        }
      ],
      "monthlyBudgetEstimate": 29.700000000000003,
      "disqualified": false
    },
    {
      "model": {
        "id": "gemini-2-5-flash",
        "name": "Gemini 2.5 Flash",
        "provider": "google",
        "tier": "haiku",
        "inputRate": 0.3,
        "outputRate": 2.5,
        "contextWindow": 1000000,
        "supportsThinking": false,
        "bestFor": [
          "extract",
          "summarize"
        ],
        "docsUrl": "https://ai.google.dev/pricing",
        "positioning": "Fast mid-tier with 1M context. Positioned for high-throughput pipelines."
      },
      "score": 82.68,
      "rationale": "Fast mid-tier with 1M context. Positioned for high-throughput pipelines. Reference monthly spend at this tool's default workload is ~$14, within the $50/mo budget. Published context window 1M covers the 32K–200K requirement. Vendor positions the Haiku tier for extract workloads.",
      "whyNot": "Passes all gates; simply outranked by a model with better combined fit.",
      "axes": [
        {
          "axis": "cost",
          "pass": true,
          "note": "Fits in $50/mo at reference workload (~$14/mo)."
        },
        {
          "axis": "latency",
          "pass": true,
          "note": "Haiku tier fits a < 5s budget."
        },
        {
          "axis": "context",
          "pass": true,
          "note": "1M window covers 32K–200K need."
        },
        {
          "axis": "capability",
          "pass": true,
          "note": "Vendor-positioned for extract workloads at this tier."
        },
        {
          "axis": "quality",
          "pass": true,
          "note": "Medium-quality budget; Haiku tier fits."
        }
      ],
      "monthlyBudgetEstimate": 14.399999999999999,
      "disqualified": false
    },
    {
      "model": {
        "id": "claude-haiku-4-5",
        "name": "Claude Haiku 4.5",
        "provider": "anthropic",
        "tier": "haiku",
        "inputRate": 1,
        "outputRate": 5,
        "contextWindow": 200000,
        "supportsThinking": false,
        "bestFor": [
          "extract",
          "summarize"
        ],
        "docsUrl": "https://www.anthropic.com/pricing",
        "positioning": "Haiku-tier. Cheapest Anthropic rate, positioned for latency-sensitive filtering and extraction."
      },
      "score": 76.2,
      "rationale": "Haiku-tier. Cheapest Anthropic rate, positioned for latency-sensitive filtering and extraction. Reference monthly spend at this tool's default workload is ~$36, within the $50/mo budget. Published context window 200K covers the 32K–200K requirement. Vendor positions the Haiku tier for extract workloads.",
      "whyNot": "Passes all gates; simply outranked by a model with better combined fit.",
      "axes": [
        {
          "axis": "cost",
          "pass": true,
          "note": "Fits in $50/mo at reference workload (~$36/mo)."
        },
        {
          "axis": "latency",
          "pass": true,
          "note": "Haiku tier fits a < 5s budget."
        },
        {
          "axis": "context",
          "pass": true,
          "note": "200K window covers 32K–200K need."
        },
        {
          "axis": "capability",
          "pass": true,
          "note": "Vendor-positioned for extract workloads at this tier."
        },
        {
          "axis": "quality",
          "pass": true,
          "note": "Medium-quality budget; Haiku tier fits."
        }
      ],
      "monthlyBudgetEstimate": 36,
      "disqualified": false
    },
    {
      "model": {
        "id": "claude-sonnet-4-6",
        "name": "Claude Sonnet 4.6",
        "provider": "anthropic",
        "tier": "sonnet",
        "inputRate": 3,
        "outputRate": 15,
        "contextWindow": 1000000,
        "supportsThinking": true,
        "bestFor": [
          "summarize",
          "extract",
          "compare",
          "synthesize"
        ],
        "docsUrl": "https://www.anthropic.com/pricing",
        "positioning": "Sonnet-tier workhorse. 1M context and thinking-tokens at 1/5 of opus input rate."
      },
      "score": 13,
      "rationale": "Sonnet-tier workhorse. 1M context and thinking-tokens at 1/5 of opus input rate. Reference monthly spend (~$108) exceeds the $50/mo budget at default workload. Published context window 1M covers the 32K–200K requirement. Vendor positions the Sonnet tier for extract workloads.",
      "whyNot": "Over the chosen cost budget at default workload.",
      "axes": [
        {
          "axis": "cost",
          "pass": false,
          "note": "Exceeds $50/mo at reference workload (~$108/mo)."
        },
        {
          "axis": "latency",
          "pass": true,
          "note": "Sonnet tier fits a < 5s budget."
        },
        {
          "axis": "context",
          "pass": true,
          "note": "1M window covers 32K–200K need."
        },
        {
          "axis": "capability",
          "pass": true,
          "note": "Vendor-positioned for extract workloads at this tier."
        },
        {
          "axis": "quality",
          "pass": true,
          "note": "Medium-quality budget; Sonnet tier fits."
        }
      ],
      "monthlyBudgetEstimate": 108,
      "disqualified": true
    },
    {
      "model": {
        "id": "o4-mini",
        "name": "o4-mini (reasoning)",
        "provider": "openai",
        "tier": "sonnet",
        "inputRate": 3,
        "outputRate": 12,
        "contextWindow": 200000,
        "supportsThinking": true,
        "bestFor": [
          "forecast",
          "rank",
          "compare"
        ],
        "docsUrl": "https://openai.com/api/pricing/",
        "positioning": "OpenAI reasoning-optimized mid-tier. Thinking-mode at sonnet-class input rate."
      },
      "score": 1,
      "rationale": "OpenAI reasoning-optimized mid-tier. Thinking-mode at sonnet-class input rate. Reference monthly spend (~$97) exceeds the $50/mo budget at default workload. Published context window 200K covers the 32K–200K requirement.",
      "whyNot": "Over the chosen cost budget at default workload.",
      "axes": [
        {
          "axis": "cost",
          "pass": false,
          "note": "Exceeds $50/mo at reference workload (~$97/mo)."
        },
        {
          "axis": "latency",
          "pass": true,
          "note": "Sonnet tier fits a < 5s budget."
        },
        {
          "axis": "context",
          "pass": true,
          "note": "200K window covers 32K–200K need."
        },
        {
          "axis": "capability",
          "pass": false,
          "note": "Usable but not the tier's primary positioning for extract."
        },
        {
          "axis": "quality",
          "pass": true,
          "note": "Medium-quality budget; Sonnet tier fits."
        }
      ],
      "monthlyBudgetEstimate": 97.2,
      "disqualified": true
    },
    {
      "model": {
        "id": "claude-opus-4-7",
        "name": "Claude Opus 4.7",
        "provider": "anthropic",
        "tier": "opus",
        "inputRate": 5,
        "outputRate": 25,
        "contextWindow": 1000000,
        "supportsThinking": true,
        "bestFor": [
          "forecast",
          "synthesize",
          "compare",
          "rank"
        ],
        "docsUrl": "https://www.anthropic.com/pricing",
        "positioning": "Anthropic flagship. 1M context and thinking-tokens at published opus-tier rates."
      },
      "score": 0,
      "rationale": "Anthropic flagship. 1M context and thinking-tokens at published opus-tier rates. Reference monthly spend (~$180) exceeds the $50/mo budget at default workload. Published context window 1M covers the 32K–200K requirement.",
      "whyNot": "Over the chosen cost budget at default workload.",
      "axes": [
        {
          "axis": "cost",
          "pass": false,
          "note": "Exceeds $50/mo at reference workload (~$180/mo)."
        },
        {
          "axis": "latency",
          "pass": false,
          "note": "Opus tier typically heavier than < 5s."
        },
        {
          "axis": "context",
          "pass": true,
          "note": "1M window covers 32K–200K need."
        },
        {
          "axis": "capability",
          "pass": false,
          "note": "Usable but not the tier's primary positioning for extract."
        },
        {
          "axis": "quality",
          "pass": true,
          "note": "Medium-quality budget; Opus tier fits."
        }
      ],
      "monthlyBudgetEstimate": 180,
      "disqualified": true
    },
    {
      "model": {
        "id": "gpt-5",
        "name": "GPT-5.5",
        "provider": "openai",
        "tier": "opus",
        "inputRate": 5,
        "outputRate": 30,
        "contextWindow": 400000,
        "supportsThinking": true,
        "bestFor": [
          "forecast",
          "synthesize",
          "compare",
          "rank"
        ],
        "docsUrl": "https://openai.com/api/pricing/",
        "positioning": "OpenAI frontier. 400K context, reasoning-mode support at published flagship rates."
      },
      "score": 0,
      "rationale": "OpenAI frontier. 400K context, reasoning-mode support at published flagship rates. Reference monthly spend (~$198) exceeds the $50/mo budget at default workload. Published context window 400K covers the 32K–200K requirement.",
      "whyNot": "Over the chosen cost budget at default workload.",
      "axes": [
        {
          "axis": "cost",
          "pass": false,
          "note": "Exceeds $50/mo at reference workload (~$198/mo)."
        },
        {
          "axis": "latency",
          "pass": false,
          "note": "Opus tier typically heavier than < 5s."
        },
        {
          "axis": "context",
          "pass": true,
          "note": "400K window covers 32K–200K need."
        },
        {
          "axis": "capability",
          "pass": false,
          "note": "Usable but not the tier's primary positioning for extract."
        },
        {
          "axis": "quality",
          "pass": true,
          "note": "Medium-quality budget; Opus tier fits."
        }
      ],
      "monthlyBudgetEstimate": 198,
      "disqualified": true
    },
    {
      "model": {
        "id": "gemini-2-5-pro",
        "name": "Gemini 2.5 Pro",
        "provider": "google",
        "tier": "opus",
        "inputRate": 1.25,
        "outputRate": 10,
        "contextWindow": 2000000,
        "supportsThinking": true,
        "bestFor": [
          "synthesize",
          "summarize",
          "compare",
          "rank"
        ],
        "docsUrl": "https://ai.google.dev/pricing",
        "positioning": "Largest context window in this table (2M). Published input rate below sonnet-class."
      },
      "score": 0,
      "rationale": "Largest context window in this table (2M). Published input rate below sonnet-class. Reference monthly spend (~$58) exceeds the $50/mo budget at default workload. Published context window 2M covers the 32K–200K requirement.",
      "whyNot": "Over the chosen cost budget at default workload.",
      "axes": [
        {
          "axis": "cost",
          "pass": false,
          "note": "Exceeds $50/mo at reference workload (~$58/mo)."
        },
        {
          "axis": "latency",
          "pass": false,
          "note": "Opus tier typically heavier than < 5s."
        },
        {
          "axis": "context",
          "pass": true,
          "note": "2M window covers 32K–200K need."
        },
        {
          "axis": "capability",
          "pass": false,
          "note": "Usable but not the tier's primary positioning for extract."
        },
        {
          "axis": "quality",
          "pass": true,
          "note": "Medium-quality budget; Opus tier fits."
        }
      ],
      "monthlyBudgetEstimate": 58.49999999999999,
      "disqualified": true
    },
    {
      "model": {
        "id": "gemini-3-5-flash",
        "name": "Gemini 3.5 Flash",
        "provider": "google",
        "tier": "opus",
        "inputRate": 1.5,
        "outputRate": 9,
        "contextWindow": 1000000,
        "supportsThinking": true,
        "bestFor": [
          "forecast",
          "synthesize",
          "compare",
          "rank"
        ],
        "docsUrl": "https://ai.google.dev/pricing",
        "positioning": "Frontier agent-tier at Flash speed, with 1M context. Output rate ~3.6x Gemini 2.5 Flash — a capability pick, not a budget one."
      },
      "score": 0,
      "rationale": "Frontier agent-tier at Flash speed, with 1M context. Output rate ~3.6x Gemini 2.5 Flash — a capability pick, not a budget one. Reference monthly spend (~$59) exceeds the $50/mo budget at default workload. Published context window 1M covers the 32K–200K requirement.",
      "whyNot": "Over the chosen cost budget at default workload.",
      "axes": [
        {
          "axis": "cost",
          "pass": false,
          "note": "Exceeds $50/mo at reference workload (~$59/mo)."
        },
        {
          "axis": "latency",
          "pass": false,
          "note": "Opus tier typically heavier than < 5s."
        },
        {
          "axis": "context",
          "pass": true,
          "note": "1M window covers 32K–200K need."
        },
        {
          "axis": "capability",
          "pass": false,
          "note": "Usable but not the tier's primary positioning for extract."
        },
        {
          "axis": "quality",
          "pass": true,
          "note": "Medium-quality budget; Opus tier fits."
        }
      ],
      "monthlyBudgetEstimate": 59.400000000000006,
      "disqualified": true
    }
  ]
}

Frequently asked questions

What does the Model Selector for Finance methodology page document?
How the Model Selector for Finance ranks LLMs — pricing, context, latency, capability. No accuracy numbers; verification belongs in your harness. It states the formulas, assumptions, data sources, limitations, and reproducibility steps behind the Model Selector for Finance, in the Finance category.
When was the Model Selector for Finance methodology last reviewed?
This methodology was last reviewed on 2026-04-24. The matching tool is at https://aifinhub.io/model-selector-finance/.
Are the Model Selector for Finance numbers reproducible?
Yes. This page embeds a worked example whose output is the verbatim result of running the shipped model-selector-finance engine on a fixed input; the embedded JSON is recomputed and diffed against the engine in CI, so the numbers cannot drift from the code.

How Model Selector for Finance works

The Model Selector for Finance ranks ten LLMs against a task profile you provide. It scores every model on five axes — cost, latency, context, capability, and quality sensitivity — and returns a full ranking with per-axis pass/fail notes and plain-English rationale. It does not rank models by benchmark accuracy. That is a deliberate design choice explained below.

What the tool computes

You pick a task (extract, summarize, forecast, compare, rank, synthesize), a latency budget, a cost budget, a context-size need, and a quality sensitivity. The engine evaluates each model against those inputs and outputs a ranked list with:

Inputs and assumptions

Scoring framework

The score for each model is the sum of five terms:

score = cost_match + latency_match + context_match
      + capability_bonus + quality_boost

cost_match    : 0 if monthly estimate > budget ceiling, else 25 + headroom bonus
latency_match : 0 if tier slower than latency budget, else base + haiku bonus
context_match : 0 if context window < required, else base + large-context bonus
capability    : bonus if task ∈ model.best_for
quality       : boost flagship tiers when quality = high; boost haiku when low

A model that fails any hard gate (cost, latency, or context) is flagged "gate failed" and pushed below all qualifying models in the ranking. It is still displayed, with an axis note explaining which constraint it missed, so you can see what you would gain or lose by loosening a requirement.

Why there are no accuracy numbers

Published LLM leaderboards drift, are gamed, and almost never match your finance workload. A selector that claims "Sonnet scored 89% on this benchmark" pretends those numbers transfer to your extraction, forecast, or comparison pipeline. They usually do not.

The alternative is honest: frame selection around pricing, context, latency, and vendor-documented capabilities — and insist that quality be measured in your harness, on your data. The related article Eval harness for finance LLMs walks through how to build one in a weekend.

So the tool does what it can ground: it picks the models that fit your budget, context, and latency gates, nudges you toward vendor-positioned tiers for your task, and then hands the accuracy question back to you.

Formulas and sources

Reference monthly dollar estimate, used only to compare models against a cost budget:

ref_monthly_usd =
  (REF_INPUT_TOKENS_PER_CALL  / 1e6) × input_rate  × REF_CALLS_PER_MONTH
+ (REF_OUTPUT_TOKENS_PER_CALL / 1e6) × output_rate × REF_CALLS_PER_MONTH

REF_INPUT_TOKENS_PER_CALL  = 6000
REF_OUTPUT_TOKENS_PER_CALL = 1200
REF_CALLS_PER_MONTH        = 3000

Rates and context windows are sourced from vendor pricing pages as of 2026-04-23:

Per-tier latency conventions

These are vendor-positioning conventions, not guaranteed SLAs. Always measure latency in your own deployment before committing a production path.

Limitations

Related articles

Changelog

Planning estimates only — not financial, tax, or investment advice.