Methodology: Walk-Forward Validator

Definitions

IS (in-sample) window: a contiguous slice of observations used to fit / select / validate a strategy's parameters.

OOS (out-of-sample) window: the contiguous slice immediately following IS, used to measure real-world performance of the IS-fitted strategy.

Walk-forward: slide IS and OOS windows forward by step observations; repeat.

Rolling: IS window has fixed length; the start and end both slide forward each step. Useful when regime changes matter and the model should only remember recent history.
Expanding: IS always starts at t=0; only the end advances. Useful when more history is always better (e.g. risk model calibration).

Per-window IS Sharpe: annualized Sharpe on the IS slice (for reference only — we don't optimize on it here).
Per-window OOS Sharpe: annualized Sharpe on the OOS slice.
Per-window OOS return: cumulative total return over the OOS slice.
Aggregate OOS Sharpe: Sharpe computed over the concatenation of all OOS slices. This is the single most-representative metric of what you'd see live.
Walk-forward efficiency ratio: mean(OOS Sharpe) / mean(IS Sharpe). Higher is better. Values below 0.4 are a strong overfitting signal.
OOS losing windows: count of windows with OOS Sharpe < 0.

Signal	Interpretation
Aggregate OOS Sharpe < 0.3	Weak OOS — edge does not persist.
WF efficiency < 0.4	IS/OOS degradation — likely overfit.
0.4 ≤ WF efficiency < 0.7	Some decay; inspect per-window consistency.
WF efficiency ≥ 0.7	Strong walk-forward.

No embargoed purging. For strategies with features that include lagged information (moving averages spanning IS/OOS boundary), a purged K-fold or embargo is more appropriate. This tool does a pure sequential walk; use Lopez de Prado's Advances in Financial Machine Learning Chapter 7 for proper purging.
Assumes the returns series is post-all-model-selection. If you re-optimize parameters per window, this tool cannot see that — the provided returns should reflect actual walk-forward trading.
Step selection. Step sizes smaller than OOS length produce overlapping windows, inflating apparent sample size. Default step = OOS length for non-overlapping slices.
Transaction costs. The returns series is used as-is. If you upload gross returns, efficiency will overstate live performance.
Non-stationary markets. If the underlying process changes, even a perfect walk-forward will show degradation. That's a feature, not a bug — but don't confuse regime change with overfitting.

Backtest Overfitting Score — PBO + Deflated Sharpe are complementary statistical tests. WF + PBO agreeing gives the strongest signal.
Risk-Adjusted Returns — deeper per-series risk metrics on the full or OOS-concatenated curve.
Did You Overfit? Running PBO + DSR — companion article with runnable Python.