How to use Hallucination Detector
Paste a source document and an LLM's extraction. Every numeric claim in the output is matched against the source — mismatches and unsupported claims are flagged so you catch fabrication before the number reaches a trading rule.
What It Does
Use the calculator with intent
Paste a source document and an LLM's extraction. Every numeric claim in the output is matched against the source — mismatches and unsupported claims are flagged so you catch fabrication before the number reaches a trading rule.
Engineers piping LLM extractions into trading or research pipelines who need a deterministic check that the numbers in the output actually appear in the source.
Interpreting Results
Flagged claims are the work. Each flag falls into one of three buckets: number not found in source (hallucination), number found but mis-attributed (paraphrase error), number rounded outside tolerance (precision drift). All three deserve a manual review before action.
Input Steps
Field by field
- 1
Provide
Provide source context (the document or retrieval result the model was supposed to ground its answer in).
- 2
Provide
Provide the model's output to be checked.
- 3
Run calculation
Run the detector. It checks entity grounding, numerical grounding, and self-consistency across N samples.
- 4
Read outputs
Read flagged spans. Each flag includes the failure type and severity. Review flags before accepting the output.
- 5
For
For batch use, upload a list of (context, output) pairs and read aggregate flag rates. Flag rate above 10% suggests prompt or model needs adjustment.
Common Scenarios
Use realistic starting points
10-K extraction sanity check
Source
10-K filing
Extraction
LLM-generated financials table
Every dollar figure in the table should match the filing line-item within rounding tolerance; unmatched figures are the hallucinations.
Earnings transcript Q&A summary
Source
Earnings call transcript
Extraction
Bulleted summary with growth rates
Growth rate quotes need to match management's spoken numbers exactly; reconstruction-from-prior-year errors are common.
Try These Tools
Run the numbers next
Agent Skill Tester for Markets
Paste a SKILL.md definition + sample input + your Anthropic API key. See structured extraction, token cost, and latency — all in your browser. No signup.
Prompt Regression Tester
Run the same prompt against multiple models (Claude 4.5/4.6/4.7, GPT-5, Gemini 2.5) with your own keys. Diff outputs, score drift, catch regressions.
Structured Schema Validator for Finance
Paste LLM JSON output and validate against four pre-built finance schemas — research output, trade decision, risk snapshot, peer comparison — with sanity.
FAQ
Questions people ask next
The short answers readers usually want after the first pass.
Related Content
Keep the topic connected
Hallucination Detection
Detecting LLM hallucinations in financial outputs: the verifiable-claim approach, citation grounding, and cross-model agreement signals that work.
Model Drift
Model drift: when an LLM's behavior changes between calls, versions, or weeks. The monitoring stack that catches it before production breaks.
Agent Skill Testing
Agent skill testing: the regression-test discipline for LLM-driven agents. What to test, how to score, and the difference between pass-rate and capability.