Calculator
Backtest Overfitting Score
Probability of backtest overfitting (PBO) calculator with the deflated Sharpe ratio via CSCV. Upload a strategy trade log; get a plain-English verdict.
- Inputs
- Form inputs / CSV
- Runtime
- Instant
- Privacy
- Client-side · no upload
- API key
- Not required
- Methodology
- Open →
1 · Upload your backtest returns
Wide-format CSV: one column per candidate strategy, one row per observation. Optional date column as the first column. Returns are interpreted as simple (non-log) daily returns. All computation runs in your browser — nothing uploaded.
What this tool answers
If you backtested many candidate strategies and picked the best one, how likely is it that the winner is real versus the winner of a lucky lottery? Two complementary signals:
- PBO (Probability of Backtest Overfitting, via Combinatorially-Symmetric Cross-Validation): fraction of splits where the in-sample winner ranks below median out-of-sample. High PBO = likely overfit.
- DSR (Deflated Sharpe Ratio): probability that the Sharpe is statistically real, adjusted for how many strategies you tested and how non-normal the returns are. Low DSR = Sharpe probably a coincidence.
Load the synthetic demo for a working example, or upload your own CSV. See the methodology page for formulas and references.
How to use
Step-by-step
- 1
Upload your trade log as a returns matrix (rows = trades, columns = strategy variants). Minimum 16 variants for a stable PBO estimate.
- 2
Set the number of CSCV partitions (default 16). More partitions = more stable estimate, longer runtime.
- 3
Read PBO (probability of backtest overfitting) — values above 0.5 mean the in-sample winner is likely to underperform out-of-sample.
- 4
Read Deflated Sharpe Ratio alongside. PBO measures relative overfitting; DSR measures absolute statistical significance after multiple-testing penalty.
- 5
If PBO > 0.5 or DSR < 1.65, treat the backtest as curve-fit. Reduce variant count, lengthen sample, or test on truly fresh data before live deployment.
For agents
Use in an agent
Same math, same result shape as the UI above — as a static ES module. No HTTP request, no auth, no rate limit.
import { compute } from "https://aifinhub.io/engines/backtest-overfitting-score.js"; Contract: /contracts/backtest-overfitting-score.json Full agent guide →
Glossary references
Terms used by this tool
Questions people ask next
FAQ
What is Probability of Backtest Overfitting (PBO)?
PBO is the probability that the strategy with the best in-sample Sharpe ranks below median out-of-sample. Bailey, Borwein, Lopez de Prado, and Zhu (2017) introduced the metric. A PBO above 0.5 means the in-sample winner is more likely to underperform than outperform in production — i.e., the backtest is more curve-fit than predictive.
How is PBO computed from a trade log?
Combinatorially Symmetric Cross-Validation (CSCV): split the returns matrix into S equal subsets, evaluate every binary partition into in-sample and out-of-sample halves, count how often the in-sample top performer is below the out-of-sample median. PBO = that count divided by total partitions. The tool exposes intermediate ranks for inspection.
What's the Deflated Sharpe Ratio?
Lopez de Prado's adjustment to standard Sharpe that accounts for skew, kurtosis, and the number of trials run. A DSR above 0.95 (95% confidence) is the rough threshold for 'this Sharpe is unlikely to be from random search'. The tool reports both raw and deflated.
How many strategy variants do I need to compute PBO?
At least 16 for stable estimates; ideally 32+. The tool will compute PBO with fewer but flags low-N estimates with a warning. If you only ran 2-3 variants, your PBO is more noise than signal.
Does a low PBO mean my strategy will work live?
No. PBO measures relative overfitting across the variants you tested, not absolute predictive power. A strategy can have low PBO (your best variant is genuinely better than the average variant) and still lose money live if all variants are weak. Combine PBO with deflated Sharpe and out-of-sample equity-curve inspection.
Related deep dive
All articles →Read further
Long-form context behind the tool output.
- Pillar · Guide·10 min
The 2026 Engineer's Guide to AI in Markets
An engineer's map of where LLMs, MCP servers, and market-data APIs fit into a 2026 trading stack — and where they still break. Direct, no hype, no grift.
Read - Tutorial · Runnable·12 min
Did You Overfit? PBO and Deflated Sharpe
A practical tutorial on the two best-documented tests for backtest overfitting — PBO via CSCV and the Deflated Sharpe Ratio. Runnable Python + tool.
Read - Tutorial · Runnable·7 min
Signal Orthogonality: Why Ensembles Become One Bet
A 10-signal ensemble with pairwise correlation 0.8 is effectively a 1.5-signal ensemble. The math, a two-minute diagnostic, and three axes that work.
Read
Used in
Decision workflows that use this tool
Goal-driven flows that bundle this tool with adjacent ones.
Complementary tools
Users of this tool often explore
Walk-Forward Validator
Upload a returns CSV. Rolling or expanding IS/OOS windows, per-window Sharpe, walk-forward efficiency, and a concatenated OOS equity curve. Catches regime.
Risk-Adjusted Returns Calculator
Paste a returns CSV. Sharpe, Sortino, Calmar, Omega, alpha, beta, tracking error, information ratio, max drawdown, and tail moments — plus.
Returns Distribution Analyzer
Paste a returns CSV. Histogram, normal-overlay, QQ plot, skewness, excess kurtosis, Jarque-Bera test, tail-weight index. See why Sharpe alone misleads.