Methodology · Playground · Last updated 2026-04-20
How Price-Blind Research Auditor works
How the Price-Blind Research Auditor tool actually works — rule families, scoring, limitations.
Why this exists
When a large language model sees current prices, directional language, or position state while generating a trade thesis, it produces an argument that rationalises whichever direction the data implies. Agents with this contamination are systematically biased toward confirming existing positions. The fix is an architectural one: the LLM operates on price-blind context; only the risk engine, after the thesis is produced, reconciles the view with market state.
This tool is the lint layer for that boundary. It does not "fix" contamination; it reveals it so a human can rewrite the prompt or redact the retrieved context before handing the bundle to the model.
Rule families
The ruleset targets four classes of leakage:
1. Explicit prices
- Ticker–price pairs:
SYNTHETIC_A 451.20,BTC $67,400 - Dollar-denominated numbers:
$451.20 - Bid / ask / mid / last quotes:
bid: 451.18,ask: 451.22
2. Directional framing
- Percentage-move verbs:
up 4.7%,dropped 2.1% - Standalone directional verbs: rallying, dumping, ripping, crashing
- New-high / new-low language
- After-the-fact framing: after the rally, following the crash
- Chart-pattern labels: descending triangle, head and shoulders
- Recency + price: this morning + numeric context
3. Position state / P&L leakage
- Position language: open position, held since, long from
- Unrealised / realised P&L, mark-to-market values
- Stop-loss / take-profit / target-price metadata
4. Sentiment anchors
- bullish, bearish, hawkish, dovish, risk-on, risk-off
Scoring
Each match is weighted by severity:
high → 1.0
medium → 0.5
low → 0.2
The per-line leakage score is
leakage = min(1, total_weight / 10), saturating at
around ten high-severity matches. Verdict bands:
0: clean< 0.2: light0.2 – 0.5: caution> 0.5: contaminated
False-positive discipline
The rules are deliberately conservative — the cost of a false positive is that a developer re-reads a line; the cost of a false negative is a silently contaminated agent. Legitimate-but-flagged content (e.g. discussing historical regimes) should be annotated as a fixture the auditor is expected to catch, then wrapped or redacted before use.
Limitations
- Regex, not semantic. The tool catches structural leakage (numbers, verbs, metadata labels). A semantically-rich paraphrase like "today was a very good day for the stock" would slip through.
- Language: English only. German/other-language prompts are not covered in this release.
- Context window unaware. The tool does not check whether flagged language appears inside a quoted user utterance that the model is expected to treat as data. Treat all flags as review items.
- No redaction. Output is a diagnostic report, not a rewritten bundle. Redaction is a deliberate design choice — automated rewriting would strip nuance.
- No image / audio scanning. Multi-modal inputs (screenshots, charts, voice notes) are out of scope.
Privacy
All pattern matching runs in the browser. Nothing is uploaded. No cookies, no tracking scripts. Refresh the page and the pasted bundle is gone.
References
- Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). "On the Dangers of Stochastic Parrots." Proceedings of FAccT '21.
- Lin, S., Hilton, J., & Evans, O. (2022). "TruthfulQA: Measuring How Models Mimic Human Falsehoods." ACL 2022.
- Shi, F. et al. (2023). "Large Language Models Can Be Easily Distracted by Irrelevant Context." ICML 2023.
- Anthropic (2025). "Constitutional AI and the Separation of Context from Instruction." Technical note.
Connects to
- Prompt Regression Tester — after cleaning, regression-test the clean prompt across providers.
- Hallucination Detector — a clean prompt doesn't prevent fabrication; check the output too.
- Trading System Blueprinter — the price-blind boundary is a load-bearing architectural choice.
Changelog
- 2026-04-20 — Initial release with 14 rules across 4 families, severity weighting, and saturating leakage score.