Methodology: Walk-Forward Validation Visualizer

Inputs

A CSV with at least a strategy returns column. Optional date and benchmark columns are recognised.
Number of windows N (2–20).
Train fraction per window (0.5–0.9).
Mode: rolling (fixed-length window slides forward) or anchored (in-sample anchored to t=0).

Window construction

For N windows on n observations:

window_span  = floor(n / N)
train_span   = floor(window_span · train_pct)
test_span    = window_span − train_span

Rolling window k:
  IS  = [k · test_span,  k · test_span + train_span)
  OOS = [k · test_span + train_span,  k · test_span + train_span + test_span)

Anchored window k:
  IS  = [0,  train_span + k · test_span)
  OOS = [train_span + k · test_span,  train_span + (k+1) · test_span)

Per-window Sharpe

Annualized using a 252-day factor:

SR_window = mean(returns) / stdev(returns) · √252

IS → OOS drop

The hero number is the proportional Sharpe loss:

drop = (mean(IS_Sharpe) − mean(OOS_Sharpe)) / |mean(IS_Sharpe)|

A drop above 50% is a strong overfitting signal. Drops in the 20–50% range are typical for honest strategies once costs are added; sub-20% drops are uncommon and worth verifying for look-ahead leakage.

References

Pardo, R. (2008). The Evaluation and Optimization of Trading Strategies, 2nd ed., Wiley. ISBN: 978-0-470-12801-5.
Bailey, D. H., Borwein, J., López de Prado, M., Zhu, Q. J. (2014). "The probability of backtest overfitting." Journal of Computational Finance 20(4): 39–69. DOI: 10.21314/JCF.2016.322.

Limitations

The tool tests realised Sharpe stability. It does not refit a model — you supply pre-computed strategy returns.
Overlapping returns (intraday holding periods, leveraged ETFs) violate i.i.d. assumptions inside Sharpe; consider Newey-West-adjusted variants.
Anchored mode reuses early in-sample rows in every window — a single bad early sample biases all in-sample Sharpes.

External resources

Advances in Financial Machine Learning, Ch.7 (Lopez de Prado 2018, Wiley)