What does the Hallucination Detector methodology page document?

Grounding algorithm, supported claim classes, tolerance thresholds, and limitations for the Hallucination Detector. It states the formulas, assumptions, data sources, limitations, and reproducibility steps behind the Hallucination Detector, in the Finance category.

When was the Hallucination Detector methodology last reviewed?

This methodology was last reviewed on 2026-04-20. The matching tool is at https://aifinhub.io/hallucination-detector/.

Are the Hallucination Detector numbers reproducible?

Yes. This page embeds a worked example whose output is the verbatim result of running the shipped hallucination-detector engine on a fixed input; the embedded JSON is recomputed and diffed against the engine in CI, so the numbers cannot drift from the code.

Methodology: Hallucination Detector

Scope

This tool detects numeric-class hallucinations in an LLM's extraction of a source document. Numbers are the highest-value class to catch in financial settings — fabricated revenue, invented period dates, made-up growth rates, and invented percentages cause directly actionable bad decisions.

Prose-level fabrication (an LLM inventing a risk factor that isn't in the document, or fabricating a narrative attribution) is not detected. A future iteration will layer an embedding-based grounding pass on top of this numeric check.

Claim extraction regex layer

Kind	Regex (illustrative)
Currency	`[$€£]\|USD\|EUR\|GBP + digit(s) + optional suffix (B/M/K/bn/million/thousand)`
Percent	`-?\d+(\.\d+)?%`
Date	`Q[1-4] YYYY \| FY YYYY \| YYYY-MM-DD \| bare 4-digit year 20NN`
Number	`grouped (1,234) or >= 4-digit numbers`

Claims are deduplicated by (kind, normalized-value).

Grounding check

Direct substring match. Normalized claim present verbatim in source → grounded.
Numeric proximity. For currency + number, magnitude suffixes are expanded (B → ×10⁹, M → ×10⁶, K → ×10³). Compared to every numeric token in the source. If any is within ±1% of target → grounded. Otherwise the closest value is recorded as nearest for user inspection.
Date fallback. Any 4-digit year substring of the claim found in the source → grounded.

Grounding score

Score = grounded_claims / total_claims × 100%. Traffic-light tones:

≥ 90%: green (acceptable)
70–89%: amber (inspect manually)
< 70%: red (probable fabrication)

Limitations

No prose grounding. A hallucinated causal claim without embedded numbers is invisible to this tool.
Number unit inference is basic. "$2,847 million" vs "2.847 billion" is handled; exotic units (basis points, millis, thousands-of-thousands) may miss.
Within-1% tolerance is a heuristic. For true equality (e.g. exact share counts) this is too loose; for approximations (e.g. rounded margins) it's fine. Override on a case-by-case basis.
Locale. Comma thousands-separator only. European .-thousands / ,-decimal not supported yet.
Context-aware grounding. A claim can be grounded on the number but wrong on its attribution (LLM says revenue when source reports cost of goods). This tool will mark it grounded; you still need to read carefully.

Worked example

Frequently asked questions

How Hallucination Detector works

Scope

Claim extraction regex layer

Grounding check

Grounding score

Limitations

External resources