Probability of Backtest Overfitting (PBO) Formula
The Probability of Backtest Overfitting (PBO) estimates how often the strategy that looks best in-sample fails to stay above median out-of-sample. Using combinatorially symmetric cross-validation, it splits the data into many in-sample/out-of-sample pairs, ranks the chosen strategy out-of-sample each time, and reports the fraction of splits where it lands in the bottom half. A high PBO means the in-sample winner is likely a product of overfitting.
Formula
Copy the exact expression or work through it step by step below.
PBO = (1/S) x sum over splits of 1{ w_c <= 0.5 }
logit: lambda_c = ln( w_c / (1 - w_c) ), PBO = P(lambda_c <= 0)
where w_c is the out-of-sample relative rank of the in-sample best Variables
S
Number of CSCV splits
How many in-sample/out-of-sample partitions the combinatorially symmetric cross-validation generates. Splitting the time series into M blocks gives M-choose-M/2 symmetric partitions, each used once.
w_c
Out-of-sample relative rank
For each split, the percentile rank (0 to 1) achieved out-of-sample by the strategy that ranked best in-sample. A value above 0.5 means the in-sample winner stayed above the out-of-sample median; at or below 0.5 it failed.
lambda_c
Logit of the rank
The log-odds transform of the relative rank, which spreads the distribution onto the real line. PBO is the probability this logit is at or below zero, equivalent to w_c at or below 0.5.
1{ . }
Indicator function
Equals 1 when the condition holds and 0 otherwise. Averaging the indicator across splits gives the empirical PBO.
Step By Step
- 1
Split the performance matrix (returns of every trial strategy) into M equal time blocks.
Divide a 16-block backtest into in-sample/out-of-sample halves.
- 2
Form every symmetric combination of blocks into in-sample and the complementary blocks into out-of-sample.
16 blocks taken 8 at a time give 12,870 symmetric splits.
- 3
In each split, pick the strategy with the best in-sample performance, then record its relative rank out-of-sample.
In one split the in-sample best ranks at the 40th percentile out-of-sample, so w_c = 0.40.
- 4
Compute the fraction of splits where the in-sample best lands at or below the out-of-sample median.
If 7,700 of 12,870 splits show w_c at or below 0.5, PBO = 7,700 / 12,870 = 0.60.
Worked Example
PBO from combinatorial cross-validation of a strategy search
Total symmetric splits S
12,870
Splits with rank <= median
7,700
PBO = (1/S) x (count of splits where w_c <= 0.5) = 7,700 / 12,870 = 0.5983.
PBO of about 0.60. The strategy that looked best in-sample fell into the bottom half out-of-sample in 60% of splits, far worse than the 50% a genuinely skilled strategy would average. This is strong evidence the in-sample selection was driven by overfitting, and the discovered edge is unlikely to survive live trading. PBO below about 0.5 is the bar for treating the selection process as trustworthy.
Common Variations
Try These Tools
Run the numbers next
Deflated Sharpe Ratio Calculator
Bailey & López de Prado deflated Sharpe — corrects observed Sharpe for selection bias across K trials. Reports deflated Sharpe, PSR (probability of skill).
Walk-Forward Validator
Upload a returns CSV. Rolling or expanding IS/OOS windows, per-window Sharpe, walk-forward efficiency, and a concatenated OOS equity curve. Catches regime.
Sources & References
- The Probability of Backtest Overfitting — Bailey, Borwein, Lopez de Prado, Zhu, Journal of Computational Finance (2017)
- Pseudo-Mathematics and Financial Charlatanism — Bailey, Borwein, Lopez de Prado, Zhu, Notices of the AMS (2014)
Related Content
Keep the topic connected
Deflated Sharpe Ratio Formula
The deflated Sharpe ratio formula: the probability a strategy's Sharpe is real after correcting for the number of trials, return skew, kurtosis, and sample length.
Probabilistic Sharpe Ratio Formula
The probabilistic Sharpe ratio formula: the chance a true Sharpe beats a benchmark, adjusting for skew, kurtosis, and sample length, with an example.
Overfitting
Overfitting in trading-strategy backtests: how multiple-testing inflates apparent edges and the diagnostics that catch it.
Bailey-Lopez de Prado PBO
Probability of Backtest Overfitting: a combinatorial test that estimates how likely your best in-sample strategy is to underperform out-of-sample.