Bailey-Lopez de Prado PBO
Probability of Backtest Overfitting (PBO) is a methodology by Bailey, Borwein, Lopez de Prado and Zhu (2014). For a set of N strategies (parameter combinations), partition the time series into S non-overlapping subsets, evaluate each strategy on every subset, then ask: how often does the in-sample best strategy fall below the out-of-sample median? PBO is that frequency, between 0 (never overfits) and 1 (always overfits).
On This Page
Definition
Bailey-Lopez (PBO)
Probability of Backtest Overfitting (PBO) is a methodology by Bailey, Borwein, Lopez de Prado and Zhu (2014). For a set of N strategies (parameter combinations), partition the time series into S non-overlapping subsets, evaluate each strategy on every subset, then ask: how often does the in-sample best strategy fall below the out-of-sample median? PBO is that frequency, between 0 (never overfits) and 1 (always overfits).
Why it matters
PBO turns the abstract 'is this overfit' question into a number that's testable on the same data the strategy was developed on, with no held-out sample required. PBO above 0.5 is a strategy that fails out-of-sample more often than it succeeds. Most retail strategy searches produce PBO above 0.7.
How it works
List N strategies. Split the time series into S subsets (typically 16). Compute the performance metric (Sharpe, by convention) for each strategy on each subset. For each pair of (in-sample, out-of-sample) subsets, identify the in-sample best strategy and check its out-of-sample rank percentile. PBO is the average rank percentile across all combinations.
Example
Parameter sweep over 200 combinations, 5-year daily backtest
Best in-sample Sharpe
1.9
PBO
0.74
Out-of-sample expectation
below median
PBO 0.74 says the strategy that won the in-sample search is likely to underperform the median strategy in the search out-of-sample. The 1.9 Sharpe is mostly noise.
Key Takeaways
PBO is computed on the same data — no holdout needed.
PBO > 0.5 is the threshold beyond which the search probably destroyed information.
Pair PBO with deflated Sharpe for a complete overfitting picture.
Related Terms
Try These Tools
Run the numbers next
Backtest Overfitting Score
Upload a backtest trade log and compute Probability of Backtest Overfitting (PBO), Deflated Sharpe Ratio, and the odds your edge survives live trading.
Deflated Sharpe Ratio Calculator
Bailey & López de Prado deflated Sharpe — corrects observed Sharpe for selection bias across K trials. Reports deflated Sharpe, PSR (probability of skill).
FAQ
Questions people ask next
The short answers readers usually want after the first pass.
Sources & References
- The Probability of Backtest Overfitting — Bailey, Borwein, Lopez de Prado, Zhu (2014)
Related Content
Keep the topic connected
Overfitting
Overfitting in trading-strategy backtests: how multiple-testing inflates apparent edges and the diagnostics that catch it.
Walk-Forward Optimization
Walk-forward optimization: rolling-window train/test that mimics live deployment. Why anchored vs sliding matters and the gotchas in window sizing.