Skip to main content
aifinhub

Worked example

Running the shipped hallucination-detector engine on the input below produces exactly this output. Continuous integration recomputes it against the engine bundle on every build, so these numbers cannot drift from the code.

Input

{
  "tool": "hallucination-detector",
  "source": "",
  "output": ""
}

Output

{
  "claims": [],
  "totalClaims": 0,
  "groundedCount": 0,
  "ungroundedCount": 0,
  "groundingRate": 1
}

Frequently asked questions

What does the Hallucination Detector methodology page document?
Grounding algorithm, supported claim classes, tolerance thresholds, and limitations for the Hallucination Detector. It states the formulas, assumptions, data sources, limitations, and reproducibility steps behind the Hallucination Detector, in the Finance category.
When was the Hallucination Detector methodology last reviewed?
This methodology was last reviewed on 2026-04-20. The matching tool is at https://aifinhub.io/hallucination-detector/.
Are the Hallucination Detector numbers reproducible?
Yes. This page embeds a worked example whose output is the verbatim result of running the shipped hallucination-detector engine on a fixed input; the embedded JSON is recomputed and diffed against the engine in CI, so the numbers cannot drift from the code.

Methodology · Playground · Last updated 2026-04-20

How Hallucination Detector works

How the Hallucination Detector tool actually works — assumptions, algorithms, limitations.

Scope

This tool detects numeric-class hallucinations in an LLM's extraction of a source document. Numbers are the highest-value class to catch in financial settings — fabricated revenue, invented period dates, made-up growth rates, and invented percentages cause directly actionable bad decisions.

Prose-level fabrication (an LLM inventing a risk factor that isn't in the document, or fabricating a narrative attribution) is not detected. A future iteration will layer an embedding-based grounding pass on top of this numeric check.

Claim extraction regex layer

KindRegex (illustrative)
Currency[$€£]|USD|EUR|GBP + digit(s) + optional suffix (B/M/K/bn/million/thousand)
Percent-?\d+(\.\d+)?%
DateQ[1-4] YYYY | FY YYYY | YYYY-MM-DD | bare 4-digit year 20NN
Numbergrouped (1,234) or >= 4-digit numbers

Claims are deduplicated by (kind, normalized-value).

Grounding check

  1. Direct substring match. Normalized claim present verbatim in source → grounded.
  2. Numeric proximity. For currency + number, magnitude suffixes are expanded (B → ×10⁹, M → ×10⁶, K → ×10³). Compared to every numeric token in the source. If any is within ±1% of target → grounded. Otherwise the closest value is recorded as nearest for user inspection.
  3. Date fallback. Any 4-digit year substring of the claim found in the source → grounded.

Grounding score

Score = grounded_claims / total_claims × 100%. Traffic-light tones:

  • ≥ 90%: green (acceptable)
  • 70–89%: amber (inspect manually)
  • < 70%: red (probable fabrication)

Limitations

  1. No prose grounding. A hallucinated causal claim without embedded numbers is invisible to this tool.
  2. Number unit inference is basic. "$2,847 million" vs "2.847 billion" is handled; exotic units (basis points, millis, thousands-of-thousands) may miss.
  3. Within-1% tolerance is a heuristic. For true equality (e.g. exact share counts) this is too loose; for approximations (e.g. rounded margins) it's fine. Override on a case-by-case basis.
  4. Locale. Comma thousands-separator only. European .-thousands / ,-decimal not supported yet.
  5. Context-aware grounding. A claim can be grounded on the number but wrong on its attribution (LLM says revenue when source reports cost of goods). This tool will mark it grounded; you still need to read carefully.

External resources

Planning estimates only — not financial, tax, or investment advice.