Skip to main content
aifinhub
Backtesting & Validation Formula

Probability of Backtest Overfitting (PBO) Formula

The Probability of Backtest Overfitting (PBO) estimates how often the strategy that looks best in-sample fails to stay above median out-of-sample. Using combinatorially symmetric cross-validation, it splits the data into many in-sample/out-of-sample pairs, ranks the chosen strategy out-of-sample each time, and reports the fraction of splits where it lands in the bottom half. A high PBO means the in-sample winner is likely a product of overfitting.

By AI Fin Hub Research · AI Fin Hub Team
Best Next MoveCalculators

Backtest Overfitting Score

Upload a backtest trade log and compute Probability of Backtest Overfitting (PBO), Deflated Sharpe Ratio, and the odds your edge survives live trading.

CalculatorOpen ->

On This Page

Formula

Copy the exact expression or work through it step by step below.

PBO = (1/S) x sum over splits of 1{ w_c <= 0.5 } logit: lambda_c = ln( w_c / (1 - w_c) ), PBO = P(lambda_c <= 0) where w_c is the out-of-sample relative rank of the in-sample best

Variables

S

Number of CSCV splits

How many in-sample/out-of-sample partitions the combinatorially symmetric cross-validation generates. Splitting the time series into M blocks gives M-choose-M/2 symmetric partitions, each used once.

w_c

Out-of-sample relative rank

For each split, the percentile rank (0 to 1) achieved out-of-sample by the strategy that ranked best in-sample. A value above 0.5 means the in-sample winner stayed above the out-of-sample median; at or below 0.5 it failed.

lambda_c

Logit of the rank

The log-odds transform of the relative rank, which spreads the distribution onto the real line. PBO is the probability this logit is at or below zero, equivalent to w_c at or below 0.5.

1{ . }

Indicator function

Equals 1 when the condition holds and 0 otherwise. Averaging the indicator across splits gives the empirical PBO.

Step By Step

  1. 1

    Split the performance matrix (returns of every trial strategy) into M equal time blocks.

    Divide a 16-block backtest into in-sample/out-of-sample halves.

  2. 2

    Form every symmetric combination of blocks into in-sample and the complementary blocks into out-of-sample.

    16 blocks taken 8 at a time give 12,870 symmetric splits.

  3. 3

    In each split, pick the strategy with the best in-sample performance, then record its relative rank out-of-sample.

    In one split the in-sample best ranks at the 40th percentile out-of-sample, so w_c = 0.40.

  4. 4

    Compute the fraction of splits where the in-sample best lands at or below the out-of-sample median.

    If 7,700 of 12,870 splits show w_c at or below 0.5, PBO = 7,700 / 12,870 = 0.60.

Worked Example

PBO from combinatorial cross-validation of a strategy search

Total symmetric splits S

12,870

Splits with rank <= median

7,700

PBO = (1/S) x (count of splits where w_c <= 0.5) = 7,700 / 12,870 = 0.5983.

PBO of about 0.60. The strategy that looked best in-sample fell into the bottom half out-of-sample in 60% of splits, far worse than the 50% a genuinely skilled strategy would average. This is strong evidence the in-sample selection was driven by overfitting, and the discovered edge is unlikely to survive live trading. PBO below about 0.5 is the bar for treating the selection process as trustworthy.

Common Variations

Performance degradation: the regression slope of out-of-sample on in-sample performance across splits, which quantifies how much edge is lost rather than just how often.
Probabilistic Sharpe ratio: a single-strategy confidence measure rather than a search-level overfitting probability.
Deflated Sharpe ratio: corrects the Sharpe bar for the number of trials instead of cross-validating ranks.

Try These Tools

Run the numbers next

Sources & References

Related Content

Keep the topic connected

Planning estimates only — not financial, tax, or investment advice.