How to Run Walk-Forward Validation
A single train-test split throws away data and depends on one arbitrary cut point. Walk-forward validation fixes both by rolling the split across the whole history, mirroring how a live strategy is periodically re-fit and then traded forward. Done right it produces a performance estimate that respects the arrow of time. Done wrong it leaks future information and flatters the result. The choices that separate a clean walk-forward from a leaky one are laid out below.
On This Page
Before You Start
Set up the inputs that make the next steps easier
Guide Steps
Move through it in order
Each step focuses on one decision so you can keep momentum without losing the thread.
- 1
Choose anchored or rolling windows
An anchored window keeps the start of the training set fixed and grows it over time, so the model always sees all history to date. A rolling window keeps the training length fixed and discards the oldest data as it advances, so the model adapts to recent regimes and forgets the distant past. Anchored suits stable relationships; rolling suits strategies whose edge depends on current conditions. Test both, because the difference reveals how regime-dependent the edge is.
If anchored and rolling give very different results, the strategy is sensitive to old data. That is information about the edge, not a nuisance to average away.
Use The ToolPlaygroundsWalk-Forward Validation Visualizer
Paste a strategy returns CSV, get per-window in-sample vs out-of-sample Sharpe and the IS→OOS drop. Rolling and anchored window modes. Browser-only.
ToolOpen -> - 2
Set the fit and test window lengths
The training window must be long enough to estimate the parameters stably, and the test window must be long enough to produce a meaningful out-of-sample sample but short enough that re-fitting stays realistic. A common pattern is a training window several times the test window. The ratio matters: too short a test window and each out-of-sample slice is noise; too long and the strategy goes stale before the next re-fit.
Match the test window to how often you would actually re-optimize live. Validating with monthly re-fits while planning to re-fit yearly tests a strategy you will not run.
Use The ToolPlaygroundsWalk-Forward Validator
Upload a returns CSV. Rolling or expanding IS/OOS windows, per-window Sharpe, walk-forward efficiency, and a concatenated OOS equity curve. Catches regime.
ToolOpen -> - 3
Re-optimize on each training window
Within each training block, run your full optimization to pick the parameters, then freeze them and apply them unchanged to the following test block. The key discipline is that the test block is never used to choose parameters. Repeat for every window. This reproduces the live process where you periodically re-fit on recent history and then trade the result forward without peeking at what comes next.
Log the chosen parameters for every window. If they swing wildly from window to window, the optimization is fitting noise rather than a stable edge.
- 4
Aggregate only the out-of-sample slices
Stitch together the test-block results into a single out-of-sample equity curve and compute performance from that, ignoring the in-sample training results entirely. This concatenated out-of-sample record is the honest estimate of how the strategy would have performed had you run it forward with periodic re-fitting. Reporting in-sample numbers, even alongside, invites the temptation to quote the flattering ones.
Count the total number of out-of-sample observations. A walk-forward with only a handful of test periods has too little out-of-sample data to draw conclusions from.
- 5
Deflate the aggregated result for the search
Walk-forward gives an honest time-ordered estimate, but it does not by itself correct for how many strategies you searched to arrive at the one you walked forward. If you ran walk-forward on dozens of candidate strategies and kept the best, that selection still inflates the result. Feed the aggregated out-of-sample Sharpe and your trial count into a deflated Sharpe to close this remaining gap.
The trial count includes every strategy you walked forward and discarded, not just the parameters within the surviving one.
Use The ToolCalculatorsDeflated Sharpe Ratio Calculator
Bailey & López de Prado deflated Sharpe — corrects observed Sharpe for selection bias across K trials. Reports deflated Sharpe, PSR (probability of skill).
ToolOpen ->
Common Mistakes
The misses that undo good inputs
Using the test window to pick parameters
If the test block influences parameter selection, it is no longer out of sample and the entire walk-forward becomes an elaborate in-sample fit. The test block must be touched exactly once, for measurement only.
Reporting the in-sample curve as the result
The in-sample equity curve reflects the optimization, not future performance. Only the concatenated out-of-sample slices estimate what live trading would have produced.
Leaking future data through restated or survivorship-biased inputs
Walk-forward respects time only if the data does too. Point-in-time errors, restated fundamentals, or a universe that excludes delisted names smuggle the future into the training window regardless of how the windows are arranged.
Try These Tools
Run the numbers next
FAQ
Questions people ask next
The short answers readers usually want after the first pass.
Sources & References
- Advances in Financial Machine Learning — Marcos Lopez de Prado, Wiley (2018)
- The Probability of Backtest Overfitting — Bailey, Borwein, Lopez de Prado, Zhu, Journal of Computational Finance (2017)
Related Content
Keep the topic connected
Walk-Forward Optimization
Walk-forward optimization: rolling-window train/test that mimics live deployment. Why anchored vs sliding matters and the gotchas in window sizing.
Walk-Forward vs K-Fold Cross-Validation
A decision matrix comparing walk-forward analysis and k-fold cross-validation for financial backtesting: leakage, regime handling, data efficiency, and when each fits.
Look-Ahead Bias
Look-ahead bias: when a backtest accidentally uses data the strategy wouldn't have had at decision time. The most common variants and how to catch them.
Survivorship Bias
Survivorship bias in backtests: why dropped tickers, delisted funds, and dead share classes systematically inflate historical returns.