Methodology · Playground · Last updated 2026-04-20
How Price-Blind Research Auditor works
How the Price-Blind Research Auditor tool actually works — rule families, scoring, limitations.
Why this exists
When a large language model sees current prices, directional language, or position state while generating a trade thesis, it produces an argument that rationalises whichever direction the data implies. Agents with this contamination are systematically biased toward confirming existing positions. The fix is an architectural one: the LLM operates on price-blind context; only the risk engine, after the thesis is produced, reconciles the view with market state.
This tool is the lint layer for that boundary. It does not "fix" contamination; it reveals it so a human can rewrite the prompt or redact the retrieved context before handing the bundle to the model.
Rule families
The ruleset targets four classes of leakage:
1. Explicit prices
- Ticker–price pairs:
SYNTHETIC_A 451.20,BTC $67,400 - Dollar-denominated numbers:
$451.20 - Bid / ask / mid / last quotes:
bid: 451.18,ask: 451.22
2. Directional framing
- Percentage-move verbs:
up 4.7%,dropped 2.1% - Standalone directional verbs: rallying, dumping, ripping, crashing
- New-high / new-low language
- After-the-fact framing: after the rally, following the crash
- Chart-pattern labels: descending triangle, head and shoulders
- Recency + price: this morning + numeric context
3. Position state / P&L leakage
- Position language: open position, held since, long from
- Unrealised / realised P&L, mark-to-market values
- Stop-loss / take-profit / target-price metadata
4. Sentiment anchors
- bullish, bearish, hawkish, dovish, risk-on, risk-off
Scoring
Each match is weighted by severity:
high → 1.0
medium → 0.5
low → 0.2
The per-line leakage score is
leakage = min(1, total_weight / 10), saturating at
around ten high-severity matches. Verdict bands:
0: clean< 0.2: light0.2 – 0.5: caution> 0.5: contaminated
False-positive discipline
The rules are deliberately conservative — the cost of a false positive is that a developer re-reads a line; the cost of a false negative is a silently contaminated agent. Legitimate-but-flagged content (e.g. discussing historical regimes) should be annotated as a fixture the auditor is expected to catch, then wrapped or redacted before use.
Limitations
- Regex, not semantic. The tool catches structural leakage (numbers, verbs, metadata labels). A semantically-rich paraphrase like "today was a very good day for the stock" would slip through.
- Language: English only. German/other-language prompts are not covered in this release.
- Context window unaware. The tool does not check whether flagged language appears inside a quoted user utterance that the model is expected to treat as data. Treat all flags as review items.
- No redaction. Output is a diagnostic report, not a rewritten bundle. Redaction is a deliberate design choice — automated rewriting would strip nuance.
- No image / audio scanning. Multi-modal inputs (screenshots, charts, voice notes) are out of scope.
Privacy
All pattern matching runs in the browser. Nothing is uploaded. No cookies, no third-party trackers. Refresh the page and the pasted bundle is gone.
References
- Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). "On the Dangers of Stochastic Parrots." Proceedings of FAccT '21.
- Lin, S., Hilton, J., & Evans, O. (2022). "TruthfulQA: Measuring How Models Mimic Human Falsehoods." ACL 2022.
- Shi, F. et al. (2023). "Large Language Models Can Be Easily Distracted by Irrelevant Context." ICML 2023.
- Anthropic (2025). "Constitutional AI and the Separation of Context from Instruction." Technical note.
Connects to
- Prompt Regression Tester — after cleaning, regression-test the clean prompt across providers.
- Hallucination Detector — a clean prompt doesn't prevent fabrication; check the output too.
- Trading System Blueprinter — the price-blind boundary is a load-bearing architectural choice.
External resources
Changelog
- 2026-04-20 — Initial release with 14 rules across 4 families, severity weighting, and saturating leakage score.