Feature Leakage Audit Checklist
Feature leakage is the most expensive backtest error because it disguises itself as skill. The model looks brilliant in-sample and fails the moment it sees genuinely unseen data. This checklist is a per-feature audit you run during construction, not a one-time pass.
Checklist Progress
Move item by item and keep your place
Progress saves locally, so you can work through the page over multiple sessions without resetting your checklist.
Checklist Sections
Work in focused batches instead of one long wall
Section 1
Phase 1: Statistical leakage
Section 2
Phase 2: Label and target leakage
Section 3
Phase 3: Source data leakage
Section 4
Phase 4: Detection and documentation
Pro Tips
Small moves that make the checklist easier to finish
Try These Tools
Run the numbers next
Walk-Forward Validator
Upload a returns CSV. Rolling or expanding IS/OOS windows, per-window Sharpe, walk-forward efficiency, and a concatenated OOS equity curve. Catches regime.
Backtest Overfitting Score
Upload a backtest trade log and compute Probability of Backtest Overfitting (PBO), Deflated Sharpe Ratio, and the odds your edge survives live trading.
Price-Blind Research Auditor
Paste a research prompt or agent context bundle. The auditor flags price numbers, directional words, and outcome-leaking phrases that cause LLMs.
Synthetic Market Data Generator
Generate synthetic price series — geometric Brownian motion, GARCH(1,1) with volatility clustering, regime-switching bull/bear, or copula-linked.
Sources & References
- Leakage in Data Mining: Formulation, Detection, and Avoidance — Kaufman, Rosset, Perlich, Stitelman, ACM TKDD (2012)
- Advances in Financial Machine Learning — Marcos Lopez de Prado, Wiley (2018)
Related Content
Keep the topic connected
Backtest Hygiene Checklist
Backtest hygiene checklist: point-in-time data, realistic costs, no look-ahead, aligned returns, and a reproducible run before you trust a number.
Look-Ahead Bias
Look-ahead bias: when a backtest accidentally uses data the strategy wouldn't have had at decision time. The most common variants and how to catch them.
Survivorship Bias
Survivorship bias in backtests: why dropped tickers, delisted funds, and dead share classes systematically inflate historical returns.
Overfitting
Overfitting in trading-strategy backtests: how multiple-testing inflates apparent edges and the diagnostics that catch it.