Skip to main content
aifinhub
Backtesting & Validation Explainer

Bailey-Lopez de Prado PBO

Probability of Backtest Overfitting (PBO) is a methodology by Bailey, Borwein, Lopez de Prado and Zhu (2014). For a set of N strategies (parameter combinations), partition the time series into S non-overlapping subsets, evaluate each strategy on every subset, then ask: how often does the in-sample best strategy fall below the out-of-sample median? PBO is that frequency, between 0 (never overfits) and 1 (always overfits).

By Orbyd Editorial · AI Fin Hub Team

On This Page

Definition

Bailey-Lopez (PBO)

Probability of Backtest Overfitting (PBO) is a methodology by Bailey, Borwein, Lopez de Prado and Zhu (2014). For a set of N strategies (parameter combinations), partition the time series into S non-overlapping subsets, evaluate each strategy on every subset, then ask: how often does the in-sample best strategy fall below the out-of-sample median? PBO is that frequency, between 0 (never overfits) and 1 (always overfits).

Why it matters

PBO turns the abstract 'is this overfit' question into a number that's testable on the same data the strategy was developed on, with no held-out sample required. PBO above 0.5 is a strategy that fails out-of-sample more often than it succeeds. Most retail strategy searches produce PBO above 0.7.

How it works

List N strategies. Split the time series into S subsets (typically 16). Compute the performance metric (Sharpe, by convention) for each strategy on each subset. For each pair of (in-sample, out-of-sample) subsets, identify the in-sample best strategy and check its out-of-sample rank percentile. PBO is the average rank percentile across all combinations.

Example

Parameter sweep over 200 combinations, 5-year daily backtest

Best in-sample Sharpe

1.9

PBO

0.74

Out-of-sample expectation

below median

PBO 0.74 says the strategy that won the in-sample search is likely to underperform the median strategy in the search out-of-sample. The 1.9 Sharpe is mostly noise.

Key Takeaways

1

PBO is computed on the same data — no holdout needed.

2

PBO > 0.5 is the threshold beyond which the search probably destroyed information.

3

Pair PBO with deflated Sharpe for a complete overfitting picture.

Try These Tools

Run the numbers next

FAQ

Questions people ask next

The short answers readers usually want after the first pass.

Below 0.3 is a strategy whose in-sample winner is more likely than not to also win out-of-sample. Above 0.5 is overfit. Between is a yellow zone — diagnose before deploying.

Sources & References

Related Content

Keep the topic connected

Planning estimates only — not financial, tax, or investment advice.