Overfitting
Overfitting occurs when model parameters are tuned tightly enough to historical data that the strategy describes random structure rather than persistent signal. The hallmark: in-sample Sharpe is excellent, out-of-sample Sharpe collapses. The mechanism is multiple-testing — every parameter explored is a hypothesis tested, and with enough trials, random noise produces strategies that look profitable purely by chance.
On This Page
Definition
Overfitting
Overfitting occurs when model parameters are tuned tightly enough to historical data that the strategy describes random structure rather than persistent signal. The hallmark: in-sample Sharpe is excellent, out-of-sample Sharpe collapses. The mechanism is multiple-testing — every parameter explored is a hypothesis tested, and with enough trials, random noise produces strategies that look profitable purely by chance.
Why it matters
Most retail and a meaningful fraction of institutional backtests are overfit. The strategies look great until they go live, at which point the ratio between live and backtested Sharpe (the haircut) typically lands at 0.3 to 0.5. Diagnosing overfitting is more valuable than designing a new strategy.
How it works
Hold out a true out-of-sample period and never look at it during development. Use walk-forward validation. Compute Probability of Backtest Overfitting (PBO) by comparing in-sample and out-of-sample rankings across parameter combinations. Apply Bailey-Lopez de Prado's deflated Sharpe ratio to penalize multiple testing. Treat any strategy whose deflated Sharpe collapses to near-zero as overfit until proven otherwise.
Example
Ten thousand random strategies on equity returns
Strategies tested
10,000
Best in-sample Sharpe
2.4
Same strategy out-of-sample Sharpe
0.3
Deflated Sharpe
0.1 (not significant)
The 'best' strategy from a 10k search has an in-sample Sharpe that looks impressive and a deflated Sharpe that says you found random luck. Live trading would lose money.
Key Takeaways
Every parameter you tune is a hypothesis you test — multiple testing is real.
Deflated Sharpe is the cleanest single-number defense against overfitting claims.
If you can't reproduce the strategy on a held-out, never-touched dataset, it's overfit.
Related Terms
Try These Tools
Run the numbers next
Backtest Overfitting Score
Upload a backtest trade log and compute Probability of Backtest Overfitting (PBO), Deflated Sharpe Ratio, and the odds your edge survives live trading.
Deflated Sharpe Ratio Calculator
Bailey & López de Prado deflated Sharpe — corrects observed Sharpe for selection bias across K trials. Reports deflated Sharpe, PSR (probability of skill).
Walk-Forward Validator
Upload a returns CSV. Rolling or expanding IS/OOS windows, per-window Sharpe, walk-forward efficiency, and a concatenated OOS equity curve. Catches regime.
FAQ
Questions people ask next
The short answers readers usually want after the first pass.
Sources & References
- The Probability of Backtest Overfitting — Bailey, Borwein, Lopez de Prado, Zhu (2014)
Related Content
Keep the topic connected
Bailey-Lopez de Prado PBO
Probability of Backtest Overfitting: a combinatorial test that estimates how likely your best in-sample strategy is to underperform out-of-sample.
Walk-Forward Optimization
Walk-forward optimization: rolling-window train/test that mimics live deployment. Why anchored vs sliding matters and the gotchas in window sizing.