Backtesting & Validation Explainer

Bailey-Lopez de Prado PBO

Probability of Backtest Overfitting (PBO) is a methodology by Bailey, Borwein, Lopez de Prado and Zhu (2014). For a set of N strategies (parameter combinations), partition the time series into S non-overlapping subsets, evaluate each strategy on every subset, then ask: how often does the in-sample best strategy fall below the out-of-sample median? PBO is that frequency, between 0 (never overfits) and 1 (always overfits).

1 FAQSPublished May 10, 2026Live Content

By Orbyd Editorial · AI Fin Hub Team

On This Page

Definition Example Key takeaways Related terms FAQ

Definition

Bailey-Lopez (PBO)

Why it matters

PBO turns the abstract 'is this overfit' question into a number that's testable on the same data the strategy was developed on, with no held-out sample required. PBO above 0.5 is a strategy that fails out-of-sample more often than it succeeds. Most retail strategy searches produce PBO above 0.7.

How it works

List N strategies. Split the time series into S subsets (typically 16). Compute the performance metric (Sharpe, by convention) for each strategy on each subset. For each pair of (in-sample, out-of-sample) subsets, identify the in-sample best strategy and check its out-of-sample rank percentile. PBO is the average rank percentile across all combinations.

Example

Parameter sweep over 200 combinations, 5-year daily backtest

Best in-sample Sharpe

1.9

PBO

0.74

Out-of-sample expectation

below median

PBO 0.74 says the strategy that won the in-sample search is likely to underperform the median strategy in the search out-of-sample. The 1.9 Sharpe is mostly noise.

Key Takeaways

PBO is computed on the same data — no holdout needed.

PBO > 0.5 is the threshold beyond which the search probably destroyed information.

Pair PBO with deflated Sharpe for a complete overfitting picture.

Try These Tools

Run the numbers next

CalculatorsCalculator

Backtest Overfitting Score

Upload a backtest trade log and compute Probability of Backtest Overfitting (PBO), Deflated Sharpe Ratio, and the odds your edge survives live trading.

Launch toolOpen ->

CalculatorsCalculator

Deflated Sharpe Ratio Calculator

Bailey & López de Prado deflated Sharpe — corrects observed Sharpe for selection bias across K trials. Reports deflated Sharpe, PSR (probability of skill).

Launch toolOpen ->

FAQ

Questions people ask next

The short answers readers usually want after the first pass.

Below 0.3 is a strategy whose in-sample winner is more likely than not to also win out-of-sample. Above 0.5 is overfit. Between is a yellow zone — diagnose before deploying.

Sources & References

The Probability of Backtest Overfitting — Bailey, Borwein, Lopez de Prado, Zhu (2014)

Keep the topic connected

Backtesting & Validation2 FAQS

Overfitting

Overfitting in trading-strategy backtests: how multiple-testing inflates apparent edges and the diagnostics that catch it.

Keep readingRead ->

Backtesting & Validation2 FAQS

Walk-Forward Optimization

Walk-forward optimization: rolling-window train/test that mimics live deployment. Why anchored vs sliding matters and the gotchas in window sizing.

Keep readingRead ->