Skip to main content
aifinhub
AI in Markets Calculator Guide

How to use LLM Finance Error Taxonomy

12 documented LLM-on-finance failure modes (hallucinated ticker, stale price, units, currency, off-by-100, fictional source, more). Paste an LLM output and the page flags which categories trigger so you can triage fast.

By Orbyd Editorial · AI Fin Hub Team
Best Next MovePlaygrounds

LLM Finance Error Taxonomy

12 documented LLM-on-finance failure modes (hallucinated ticker, stale price, units, currency, off-by-100, fictional source, more). Paste output, see flags.

CalculatorOpen ->

On This Page

What It Does

Use the calculator with intent

12 documented LLM-on-finance failure modes (hallucinated ticker, stale price, units, currency, off-by-100, fictional source, more). Paste an LLM output and the page flags which categories trigger so you can triage fast.

Engineers debugging LLM-driven finance pipelines who need a structured taxonomy of failure modes rather than chasing each bug fresh every time.

Interpreting Results

Each triggered category is a flag, not a verdict. Most triggers are false alarms but each one should be reviewed by a human before the output goes downstream. Off-by-100 is the most insidious — the answer looks right at a glance.

Input Steps

Field by field

  1. 1

    Browse

    Browse the six top-level categories: factual, reasoning, arithmetic, formatting, refusal, prompt-injection.

  2. 2

    Drill

    Drill into a category to see specific failure modes with example prompts and expected vs. observed outputs.

  3. 3

    Use result

    Use the category structure to design your own evals: pick the categories most relevant to your task.

  4. 4

    Reference

    Reference the per-model error rates on the methodology page when choosing a model — error profile matters more than aggregate accuracy.

  5. 5

    Step 5

    Re-check the taxonomy after each major model release; error rates shift with new versions.

Common Scenarios

Use realistic starting points

Quarterly earnings extraction

Output type

table of financial numbers

Off-by-100 (basis points vs percent) and currency confusion are the most common triggers; rarely actual hallucination if the source was provided.

Macro analysis output

Output type

narrative analysis with cited stats

Fictional source and stale price more common here; LLM may cite a Bloomberg link that doesn't exist or quote a price from 6 months ago.

Try These Tools

Run the numbers next

FAQ

Questions people ask next

The short answers readers usually want after the first pass.

Six top-level categories from the methodology page: factual errors (date/number/entity wrong), reasoning errors (correct facts, wrong inference), arithmetic errors (compute mistakes on simple math), formatting errors (output schema violations), refusal errors (model refuses to answer when it shouldn't), and prompt-injection compromises. Each has 3-7 subcategories.

Related Content

Keep the topic connected

Planning estimates only — not financial, tax, or investment advice.