Calculator

Backtest Overfitting Score

Name: Backtest Overfitting Score
Author: AI Fin Hub Research

Probability of backtest overfitting (PBO) calculator with the deflated Sharpe ratio via CSCV. Upload a strategy trade log; get a plain-English verdict.

AI Fin Hub Research Published Apr 20, 2026 Methodology Corrections

Inputs: Form inputs / CSV
Runtime: Instant
Privacy: Client-side · no upload
API key: Not required
Methodology: Open →

Education · Not investment advice. BaFin/EU framework. Past performance does not indicate future results. Editorial standards Sponsor disclosure Corrections

1 · Upload your backtest returns

Wide-format CSV: one column per candidate strategy, one row per observation. Optional date column as the first column. Returns are interpreted as simple (non-log) daily returns. All computation runs in your browser — nothing uploaded.

What this tool answers

If you backtested many candidate strategies and picked the best one, how likely is it that the winner is real versus the winner of a lucky lottery? Two complementary signals:

PBO (Probability of Backtest Overfitting, via Combinatorially-Symmetric Cross-Validation): fraction of splits where the in-sample winner ranks below median out-of-sample. High PBO = likely overfit.
DSR (Deflated Sharpe Ratio): probability that the Sharpe is statistically real, adjusted for how many strategies you tested and how non-normal the returns are. Low DSR = Sharpe probably a coincidence.

Load the synthetic demo for a working example, or upload your own CSV. See the methodology page for formulas and references.

How to use

Step-by-step

Full calculator guide →

1
Upload your trade log as a returns matrix (rows = trades, columns = strategy variants). Minimum 16 variants for a stable PBO estimate.
2
Set the number of CSCV partitions (default 16). More partitions = more stable estimate, longer runtime.
3
Read PBO (probability of backtest overfitting) — values above 0.5 mean the in-sample winner is likely to underperform out-of-sample.
4
Read Deflated Sharpe Ratio alongside. PBO measures relative overfitting; DSR measures absolute statistical significance after multiple-testing penalty.
5
If PBO > 0.5 or DSR < 1.65, treat the backtest as curve-fit. Reduce variant count, lengthen sample, or test on truly fresh data before live deployment.

For agents

Use in an agent

Same math, same result shape as the UI above — as a static ES module. No HTTP request, no auth, no rate limit.

import { compute } from "https://aifinhub.io/engines/backtest-overfitting-score.js";

Contract: /contracts/backtest-overfitting-score.json Full agent guide →

Glossary references

Terms used by this tool

All glossary →

Questions people ask next

FAQ

What is Probability of Backtest Overfitting (PBO)?

PBO is the probability that the strategy with the best in-sample Sharpe ranks below median out-of-sample. Bailey, Borwein, Lopez de Prado, and Zhu (2017) introduced the metric. A PBO above 0.5 means the in-sample winner is more likely to underperform than outperform in production — i.e., the backtest is more curve-fit than predictive.

How is PBO computed from a trade log?

Combinatorially Symmetric Cross-Validation (CSCV): split the returns matrix into S equal subsets, evaluate every binary partition into in-sample and out-of-sample halves, count how often the in-sample top performer is below the out-of-sample median. PBO = that count divided by total partitions. The tool exposes intermediate ranks for inspection.

What's the Deflated Sharpe Ratio?

Lopez de Prado's adjustment to standard Sharpe that accounts for skew, kurtosis, and the number of trials run. A DSR above 0.95 (95% confidence) is the rough threshold for 'this Sharpe is unlikely to be from random search'. The tool reports both raw and deflated.

How many strategy variants do I need to compute PBO?

At least 16 for stable estimates; ideally 32+. The tool will compute PBO with fewer but flags low-N estimates with a warning. If you only ran 2-3 variants, your PBO is more noise than signal.

Does a low PBO mean my strategy will work live?

No. PBO measures relative overfitting across the variants you tested, not absolute predictive power. A strategy can have low PBO (your best variant is genuinely better than the average variant) and still lose money live if all variants are weak. Combine PBO with deflated Sharpe and out-of-sample equity-curve inspection.

Related deep dive

All articles →

Read further

Long-form context behind the tool output.

Used in

Decision workflows that use this tool

Goal-driven flows that bundle this tool with adjacent ones.

Validate Your Strategy
Pressure-test a quant or LLM-augmented strategy before paper-trading or production.
Open

Complementary tools

Walk-Forward Validator

Upload a returns CSV. Rolling or expanding IS/OOS windows, per-window Sharpe, walk-forward efficiency, and a concatenated OOS equity curve. Catches regime.

Playgrounds Open

Risk-Adjusted Returns Calculator

Paste a returns CSV. Sharpe, Sortino, Calmar, Omega, alpha, beta, tracking error, information ratio, max drawdown, and tail moments — plus.

Calculators Open

Returns Distribution Analyzer

Paste a returns CSV. Histogram, normal-overlay, QQ plot, skewness, excess kurtosis, Jarque-Bera test, tail-weight index. See why Sharpe alone misleads.