How to use Backtest Overfitting Score
From an uploaded backtest trade log, it computes the Probability of Backtest Overfitting (PBO), Deflated Sharpe Ratio (DSR), and the probability of skill (PSR) so you can quantify how much of an apparent edge is real versus selection bias.
What It Does
Use the calculator with intent
From an uploaded backtest trade log, it computes the Probability of Backtest Overfitting (PBO), Deflated Sharpe Ratio (DSR), and the probability of skill (PSR) so you can quantify how much of an apparent edge is real versus selection bias.
Quants and retail backtesters who tried more than a handful of parameter combinations and need to know whether the best one is genuinely skillful or just lucky.
Interpreting Results
PBO above ~0.5 means more than half of trials would have outperformed the chosen strategy out-of-sample — the strategy is more likely overfit than skillful. Deflated Sharpe corrects the headline Sharpe for the number of trials; a positive value is the real edge after the selection penalty.
Input Steps
Field by field
- 1
Upload data
Upload your trade log as a returns matrix (rows = trades, columns = strategy variants). Minimum 16 variants for a stable PBO estimate.
- 2
Set parameters
Set the number of CSCV partitions (default 16). More partitions = more stable estimate, longer runtime.
- 3
Read outputs
Read PBO (probability of backtest overfitting) — values above 0.5 mean the in-sample winner is likely to underperform out-of-sample.
- 4
Read outputs
Read Deflated Sharpe Ratio alongside. PBO measures relative overfitting; DSR measures absolute statistical significance after multiple-testing penalty.
- 5
If
If PBO > 0.5 or DSR < 1.65, treat the backtest as curve-fit. Reduce variant count, lengthen sample, or test on truly fresh data before live deployment.
Common Scenarios
Use realistic starting points
Single backtest, no parameter sweep
Trade log rows
500
Trials tried
1
In-sample Sharpe
1.4
PBO near zero, DSR ≈ raw Sharpe. With one trial there is no selection bias to deflate.
Heavy parameter sweep
Trade log rows
500
Trials tried
200
In-sample Sharpe
2.1
DSR falls well below 2.1 once the trial count is honest; PBO above 0.5 means the chosen parameter set probably came from luck.
Try These Tools
Run the numbers next
Walk-Forward Validator
Upload a returns CSV. Rolling or expanding IS/OOS windows, per-window Sharpe, walk-forward efficiency, and a concatenated OOS equity curve. Catches regime.
Deflated Sharpe Ratio Calculator
Bailey & López de Prado deflated Sharpe — corrects observed Sharpe for selection bias across K trials. Reports deflated Sharpe, PSR (probability of skill).
Synthetic Market Data Generator
Generate synthetic price series — geometric Brownian motion, GARCH(1,1) with volatility clustering, regime-switching bull/bear, or copula-linked.
FAQ
Questions people ask next
The short answers readers usually want after the first pass.
Related Content
Keep the topic connected
Overfitting
Overfitting in trading-strategy backtests: how multiple-testing inflates apparent edges and the diagnostics that catch it.
Bailey-Lopez de Prado PBO
Probability of Backtest Overfitting: a combinatorial test that estimates how likely your best in-sample strategy is to underperform out-of-sample.
Walk-Forward Optimization
Walk-forward optimization: rolling-window train/test that mimics live deployment. Why anchored vs sliding matters and the gotchas in window sizing.