Skip to main content
aifinhub
Backtesting & Validation Calculator Guide

How to use Backtest Overfitting Score

From an uploaded backtest trade log, it computes the Probability of Backtest Overfitting (PBO), Deflated Sharpe Ratio (DSR), and the probability of skill (PSR) so you can quantify how much of an apparent edge is real versus selection bias.

By Orbyd Editorial · AI Fin Hub Team

What It Does

Use the calculator with intent

From an uploaded backtest trade log, it computes the Probability of Backtest Overfitting (PBO), Deflated Sharpe Ratio (DSR), and the probability of skill (PSR) so you can quantify how much of an apparent edge is real versus selection bias.

Quants and retail backtesters who tried more than a handful of parameter combinations and need to know whether the best one is genuinely skillful or just lucky.

Interpreting Results

PBO above ~0.5 means more than half of trials would have outperformed the chosen strategy out-of-sample — the strategy is more likely overfit than skillful. Deflated Sharpe corrects the headline Sharpe for the number of trials; a positive value is the real edge after the selection penalty.

Input Steps

Field by field

  1. 1

    Upload data

    Upload your trade log as a returns matrix (rows = trades, columns = strategy variants). Minimum 16 variants for a stable PBO estimate.

  2. 2

    Set parameters

    Set the number of CSCV partitions (default 16). More partitions = more stable estimate, longer runtime.

  3. 3

    Read outputs

    Read PBO (probability of backtest overfitting) — values above 0.5 mean the in-sample winner is likely to underperform out-of-sample.

  4. 4

    Read outputs

    Read Deflated Sharpe Ratio alongside. PBO measures relative overfitting; DSR measures absolute statistical significance after multiple-testing penalty.

  5. 5

    If

    If PBO > 0.5 or DSR < 1.65, treat the backtest as curve-fit. Reduce variant count, lengthen sample, or test on truly fresh data before live deployment.

Common Scenarios

Use realistic starting points

Single backtest, no parameter sweep

Trade log rows

500

Trials tried

1

In-sample Sharpe

1.4

PBO near zero, DSR ≈ raw Sharpe. With one trial there is no selection bias to deflate.

Heavy parameter sweep

Trade log rows

500

Trials tried

200

In-sample Sharpe

2.1

DSR falls well below 2.1 once the trial count is honest; PBO above 0.5 means the chosen parameter set probably came from luck.

Try These Tools

Run the numbers next

FAQ

Questions people ask next

The short answers readers usually want after the first pass.

PBO is the probability that the strategy with the best in-sample Sharpe ranks below median out-of-sample. Bailey, Borwein, Lopez de Prado, and Zhu (2017) introduced the metric. A PBO above 0.5 means the in-sample winner is more likely to underperform than outperform in production — i.e., the backtest is more curve-fit than predictive.

Related Content

Keep the topic connected

Planning estimates only — not financial, tax, or investment advice.