What does the LLM Finance Error Taxonomy methodology page document?

The 12 documented LLM-on-finance failure modes recognised by the AI Fin Hub LLM Finance Error Taxonomy detector. It states the formulas, assumptions, data sources, limitations, and reproducibility steps behind the LLM Finance Error Taxonomy, in the Finance category.

When was the LLM Finance Error Taxonomy methodology last reviewed?

This methodology was last reviewed on 2026-05-08. The matching tool is at https://aifinhub.io/llm-finance-error-taxonomy/.

Are the LLM Finance Error Taxonomy numbers reproducible?

Yes. This page embeds a worked example whose output is the verbatim result of running the shipped llm-finance-error-taxonomy engine on a fixed input; the embedded JSON is recomputed and diffed against the engine in CI, so the numbers cannot drift from the code.

Worked example

Running the shipped llm-finance-error-taxonomy engine on the input below produces exactly this output. Continuous integration recomputes it against the engine bundle on every build, so these numbers cannot drift from the code.

Input

{
  "tool": "llm_finance_error_taxonomy",
  "text": "XYZW shares trade at €120 with P/E = price * earnings.",
  "ground_truth": ""
}

Output89 lines

{
  "text": "XYZW shares trade at €120 with P/E = price * earnings.",
  "flags": [
    {
      "modeId": "hallucinated_ticker",
      "evidence": "Symbol \"XYZW\" is referenced as a ticker but is not in the verified universe.",
      "confidence": "low"
    },
    {
      "modeId": "ratio_mistake",
      "evidence": "P/E stated as price × earnings (should be price / earnings)",
      "confidence": "high"
    }
  ],
  "modes": [
    {
      "id": "hallucinated_ticker",
      "label": "Hallucinated ticker",
      "description": "Fabricated stock symbol or one that does not exist on the implied exchange.",
      "remediation": "Constrain to a verified ticker list and reject unknown symbols."
    },
    {
      "id": "stale_price",
      "label": "Stale price",
      "description": "Price quoted appears to come from training data, not a live source. Phrases like 'as of my last update' or specific old-looking dates.",
      "remediation": "Require live data tool calls; reject answers with 'as of my knowledge' caveats."
    },
    {
      "id": "ratio_mistake",
      "label": "Ratio mistake",
      "description": "Standard financial ratio formula is mis-stated (e.g. 'P/E = price × earnings').",
      "remediation": "Cross-check ratio formulas against a fixed glossary before publishing."
    },
    {
      "id": "units_error",
      "label": "Units error",
      "description": "Percentage where decimal expected, basis points where percent expected, or shares where dollars expected.",
      "remediation": "Strict input/output schemas with explicit units; validate after generation."
    },
    {
      "id": "currency_error",
      "label": "Currency error",
      "description": "Numbers presented in the wrong currency or without currency markers (e.g. EUR for USD).",
      "remediation": "Force explicit currency annotation; cross-check against ticker exchange."
    },
    {
      "id": "timezone_error",
      "label": "Time-zone error",
      "description": "Market times reported in wrong timezone (e.g. 09:30 CET for NYSE open).",
      "remediation": "Convert all times to UTC before display and require timezone tags."
    },
    {
      "id": "magnitude_off_100",
      "label": "Off-by-100 magnitude",
      "description": "Decimals shifted: 5% reported as 0.05% or 500%.",
      "remediation": "Sanity-check ranges (return |x| < 100, ratios within plausible bands)."
    },
    {
      "id": "fictional_source",
      "label": "Fictional source",
      "description": "Citation to a paper, page, or filing that doesn't exist or is fabricated.",
      "remediation": "Require URLs that retrieve OK and DOIs that resolve before citing."
    },
    {
      "id": "wrong_period",
      "label": "Wrong period",
      "description": "Quarterly figures presented as annual (or vice versa) without conversion.",
      "remediation": "Tag every figure with its period; reject mixed-period aggregations."
    },
    {
      "id": "wrong_split_adj",
      "label": "Wrong split-adjustment",
      "description": "Pre-split prices used in calculations involving post-split share count, or vice versa.",
      "remediation": "Always work with adjusted close; verify against vendor-published adjustment factors."
    },
    {
      "id": "wrong_tax_bracket",
      "label": "Wrong tax bracket / rate",
      "description": "Tax bracket from a different jurisdiction or year applied to the calculation.",
      "remediation": "Annotate tax tables with year + jurisdiction; reject answers without year stamps."
    },
    {
      "id": "double_count_dividend",
      "label": "Double-counted dividend",
      "description": "Dividend included in both total return and as a separate income line — counted twice.",
      "remediation": "Choose one convention (price return vs total return) and stick to it across the answer."
    }
  ]
}

Frequently asked questions

What does the LLM Finance Error Taxonomy methodology page document?: The 12 documented LLM-on-finance failure modes recognised by the AI Fin Hub LLM Finance Error Taxonomy detector. It states the formulas, assumptions, data sources, limitations, and reproducibility steps behind the LLM Finance Error Taxonomy, in the Finance category.
When was the LLM Finance Error Taxonomy methodology last reviewed?: This methodology was last reviewed on 2026-05-08. The matching tool is at https://aifinhub.io/llm-finance-error-taxonomy/.
Are the LLM Finance Error Taxonomy numbers reproducible?: Yes. This page embeds a worked example whose output is the verbatim result of running the shipped llm-finance-error-taxonomy engine on a fixed input; the embedded JSON is recomputed and diffed against the engine in CI, so the numbers cannot drift from the code.

Methodology · Tool · Last updated 2026-05-08

How LLM Finance Error Taxonomy works

The 12 documented failure modes detected by the LLM Finance Error Taxonomy tool.

The 12 modes

Hallucinated ticker — fabricated or non-existent symbol on the implied exchange.
Stale price — quote sourced from training data, not live data.
Ratio mistake — financial ratio formula corrupted (e.g. P/E inverted).
Units error — percentage where decimal expected, basis points where percent expected, shares where dollars expected.
Currency error — wrong currency or missing currency tag.
Time-zone error — market times reported in wrong tz (NYSE in CET, LSE in ET).
Off-by-100 magnitude — decimal/percent confusion (5% as 500% or 0.05%).
Fictional source — citation to a paper, page, or filing that doesn't exist.
Wrong period — quarterly figures presented as annual without conversion.
Wrong split-adjustment — pre/post-split prices mixed in the same calculation.
Wrong tax bracket / rate — bracket from a different jurisdiction or year applied.
Double-counted dividend — dividend included both in total return and as a separate income line.

Detection approach

The tool runs a set of regex and lexical heuristics against pasted output. Each match raises a flag with low / medium / high confidence. Heuristics are deliberately precision-biased: false positives are tolerable, false negatives are caught only by human review. The tool is a screen, not a verdict.

Why heuristics, not an LLM judge

Using an LLM to grade an LLM is reflexive. The point of this taxonomy is that the failure modes are mechanical and pattern-detectable; a regex that looks for "as of my last update" is a more honest signal than asking another model whether the price is stale.

References

Ji, Z. et al. (2023). "Survey of Hallucination in Natural Language Generation." ACM Computing Surveys 55(12): 1–38. DOI: 10.1145/3571730.
Vasarhelyi, M. A. et al. (2024). "Large language models in financial reporting: A taxonomy of risks." Journal of Emerging Technologies in Accounting 21(1): 1–18.
Anthropic (2024). "Constitutional Classifiers" — technique for catching adversarial prompts.

Limitations

Heuristics miss subtle errors (correctly-formatted but wrong-number figures).
Ticker check uses a verified universe of ~50 large-caps; small-caps will false-flag.
Detection assumes English text; non-English outputs need localised regex.