Skip to main content
aifinhub

Methodology · Tool · Last updated 2026-05-08

How LLM Finance Error Taxonomy works

The 12 documented failure modes detected by the LLM Finance Error Taxonomy tool.

The 12 modes

  1. Hallucinated ticker — fabricated or non-existent symbol on the implied exchange.
  2. Stale price — quote sourced from training data, not live data.
  3. Ratio mistake — financial ratio formula corrupted (e.g. P/E inverted).
  4. Units error — percentage where decimal expected, basis points where percent expected, shares where dollars expected.
  5. Currency error — wrong currency or missing currency tag.
  6. Time-zone error — market times reported in wrong tz (NYSE in CET, LSE in ET).
  7. Off-by-100 magnitude — decimal/percent confusion (5% as 500% or 0.05%).
  8. Fictional source — citation to a paper, page, or filing that doesn't exist.
  9. Wrong period — quarterly figures presented as annual without conversion.
  10. Wrong split-adjustment — pre/post-split prices mixed in the same calculation.
  11. Wrong tax bracket / rate — bracket from a different jurisdiction or year applied.
  12. Double-counted dividend — dividend included both in total return and as a separate income line.

Detection approach

The tool runs a set of regex and lexical heuristics against pasted output. Each match raises a flag with low / medium / high confidence. Heuristics are deliberately precision-biased: false positives are tolerable, false negatives are caught only by human review. The tool is a screen, not a verdict.

Why heuristics, not an LLM judge

Using an LLM to grade an LLM is reflexive. The point of this taxonomy is that the failure modes are mechanical and pattern-detectable; a regex that looks for "as of my last update" is a more honest signal than asking another model whether the price is stale.

References

  • Ji, Z. et al. (2023). "Survey of Hallucination in Natural Language Generation." ACM Computing Surveys 55(12): 1–38. DOI: 10.1145/3571730.
  • Vasarhelyi, M. A. et al. (2024). "Large language models in financial reporting: A taxonomy of risks." Journal of Emerging Technologies in Accounting 21(1): 1–18.
  • Anthropic (2024). "Constitutional Classifiers" — technique for catching adversarial prompts.

Limitations

  • Heuristics miss subtle errors (correctly-formatted but wrong-number figures).
  • Ticker check uses a verified universe of ~50 large-caps; small-caps will false-flag.
  • Detection assumes English text; non-English outputs need localised regex.

External resources

Planning estimates only — not financial, tax, or investment advice.