How to use Price-Blind Research Auditor
Paste a research prompt or agent context bundle. The auditor flags price numbers, directional words, and outcome-leaking phrases that make LLMs confabulate post-hoc rationales for a known answer.
What It Does
Use the calculator with intent
Paste a research prompt or agent context bundle. The auditor flags price numbers, directional words, and outcome-leaking phrases that make LLMs confabulate post-hoc rationales for a known answer.
Engineers building research agents where the LLM should reach conclusions from data — not be told the conclusion in the context — and need a deterministic check.
Interpreting Results
Each flag falls into one of three buckets: numeric leak (price/return numbers in the prompt), directional leak (up/down/bull/bear words), or outcome leak (post-hoc framing). Remove all three before the agent reaches an honest conclusion.
Input Steps
Field by field
- 1
Upload data
Upload your trade log. The auditor strips outcome data and shuffles trade order.
- 2
Review outputs
Review each entry/exit decision in isolation. Grade decision quality (A/B/C/F) and tag your reasoning.
- 3
Submit
Submit grades. The auditor un-blinds the outcomes and compares grade-vs-outcome.
- 4
Read outputs
Read the calibration report: do your A trades actually win at higher rate than C trades? Misaligned grading = self-deception.
- 5
Iterate
Repeat monthly with new trades. Calibration improves with practice if your decision-quality criteria genuinely predict outcomes.
Common Scenarios
Use realistic starting points
Stock pitch agent prompt
Prompt length
~2000 tokens
Includes price history
yes
Recent prices in the prompt are numeric leaks; the agent may rationalize the recent move rather than reach an independent conclusion. Strip and re-test.
Macro analysis agent prompt
Prompt length
~3000 tokens
Includes commentary
yes
Phrases like 'as expected' or 'continuing the trend' are outcome-leaks. Rewrite to neutral framing before re-running.
Try These Tools
Run the numbers next
Hallucination Detector
Paste a source document + an LLM's extraction. Every numeric claim in the output is checked against the source. Client-side. Catches silent fabrication.
Prompt Injection Tester
Red-team a finance agent against 24 documented prompt-injection attacks — direct override, role confusion, indirect injection via retrieved content.
Prompt Regression Tester
Run the same prompt against multiple models (Claude 4.5/4.6/4.7, GPT-5, Gemini 2.5) with your own keys. Diff outputs, score drift, catch regressions.
FAQ
Questions people ask next
The short answers readers usually want after the first pass.
Related Content
Keep the topic connected
Look-Ahead Bias
Look-ahead bias: when a backtest accidentally uses data the strategy wouldn't have had at decision time. The most common variants and how to catch them.
Hallucination Detection
Detecting LLM hallucinations in financial outputs: the verifiable-claim approach, citation grounding, and cross-model agreement signals that work.
Prompt Injection
Prompt injection: when untrusted text in a prompt overrides system instructions. The attack patterns and the structural defenses that work in production.