A GBM-generated 757-day path with $\mu = 0.08$ and $\sigma = 0.24$ (seed 42) produces Sharpe 0.65 and max drawdown -35.5% — close to the long-run US equity index profile by design. The Synthetic Market Data Generator reports those numbers from one path; running many paths produces the Sharpe distribution. Bootstrap resampling of the same empirical series produces a different distribution: same mean, same volatility, but with the realised skew, kurtosis, and autocorrelation preserved. For backtests, the two methods are complementary diagnostics, not substitutes1.
TL;DR
- GBM on $\mu=0.08$, $\sigma=0.24$, 3 years (~757 trading days), seed 42: Sharpe 0.65, max drawdown -35.5%.
- Bootstrap preserves observed distribution shape (skew, kurtosis, autocorrelation if block-bootstrapped).
- GBM is fast and parametric; bootstrap is non-parametric and slower.
- For testing IID-violation sensitivity: bootstrap with block size 5-20 days.
- For testing distributional-assumption sensitivity: GBM and GARCH.
- For published backtests, run both and report the wider Sharpe band.
The scenarios
The Synthetic Market Data Generator on GBM with US-equity-like parameters ($\mu = 0.08$, $\sigma = 0.24$, 757 days, seed 42) returns:
| Stat | GBM single path |
|---|---|
| Annualised return | 15.5% |
| Annualised volatility | 24.0% |
| Sharpe ratio | 0.65 |
| Max drawdown | -35.5% |
Bootstrap on the same empirical underlying — say 757 historical days of an S&P 500 ETF — would return statistics matching the empirical distribution: similar mean and volatility but with realised left skew (~-0.5), excess kurtosis (~3-5), and visible volatility clustering.
What GBM gets right
- Mean and standard deviation can be specified to match any target empirical regime.
- The math is closed-form; paths are generated in O(N) time.
- Easy to vary parameters and study sensitivity.
What GBM gets wrong
GBM assumes IID lognormal returns. Real returns violate at least three properties2:
- Volatility clustering. High-vol days cluster. GBM has no mechanism for this; GARCH does.
- Fat tails. Realised equity returns have excess kurtosis 3-5. GBM produces excess kurtosis 0.
- Skewness. Equity returns are left-skewed. GBM is symmetric.
A strategy that targets vol-regime exposure (vol targeting, vol-aware sizing, vol-managed stop-losses) cannot be tested on GBM — the GBM environment does not exercise the vol-aware logic.
What bootstrap gets right
Block bootstrap preserves:
- The empirical distribution shape (mean, variance, skew, kurtosis).
- Short-range autocorrelation when block size > 1.
- Realised regime shifts within the historical window.
The block bootstrap3 resamples contiguous blocks of length $L$ with replacement, the canonical resampling method for dependent data4. For US daily equity data, $L = 5$-$20$ captures the typical short-range autocorrelation; $L = 60$+ preserves longer-range dependence at the cost of less variation between resamples.
What bootstrap gets wrong
- Cannot extrapolate beyond the observed range. A strategy that would have worked in a never-realised regime is untestable.
- Requires substantial historical data to seed the resampler. Bootstrap on 5 years of data tests robustness against 5 years of regimes; it does not test robustness against a 30-year regime that the historical window did not include.
- Block-size choice is somewhat arbitrary and affects results.
Side-by-side
| Property | GBM | Block bootstrap |
|---|---|---|
| Vol clustering | No | Yes (with $L \geq 5$) |
| Fat tails | No | Yes (preserves empirical) |
| Skewness | No | Yes (preserves empirical) |
| Parametric | Yes | No |
| Extrapolates beyond observed | Yes | No |
| Sample-size dependent | No | Yes |
| Speed (10,000 paths) | < 5 seconds | 10-60 seconds |
For testing IID assumptions of strategy logic: bootstrap. For testing distributional sensitivity in a controlled environment: GBM.
A worked comparison
Consider a vol-targeting strategy that scales position size inversely with 20-day realised volatility. On the GBM path above:
- Realised vol is roughly constant at 24%. Vol target multiplier varies in a narrow band around 1.
- Strategy P&L on GBM ≈ buy-and-hold P&L. The vol-aware logic does not exercise.
On a bootstrap of empirical S&P 500 daily returns:
- Realised vol clusters: 12% during calm 2017, 35% during March 2020, 20% during 2021.
- Vol target multiplier swings from ~2x in calm to ~0.5x in storms.
- Strategy P&L differs materially from buy-and-hold: smaller drawdowns during high-vol periods, slightly lower upside in calm periods.
The Sharpe ratio of the vol-targeting strategy will look near-identical to buy-and-hold under GBM and materially different under bootstrap. The bootstrap result is the honest one for this strategy.
When each is the right call
Use GBM when:
- The strategy logic does not depend on vol regimes (constant-size momentum, fixed-position pairs).
- You need to parametrically vary $\mu$ and $\sigma$ to study sensitivity.
- You want a clean null hypothesis ("under IID lognormal, does the strategy edge survive?").
Use bootstrap when:
- The strategy logic depends on realised distribution properties (vol-aware sizing, drawdown-aware stops).
- The empirical sample is long enough (5+ years of daily data, 10+ years of monthly).
- You want to test against the realised regimes specifically, not against an idealised model.
For most production backtests, run both. The Sharpe distribution under GBM tells you "is the strategy edge stable under assumptions about returns?" The Sharpe distribution under bootstrap tells you "is the strategy edge stable under the regimes we actually observed?"
GARCH as the bridge
A third option, between GBM and bootstrap, is GARCH-augmented synthetic data. GARCH(1,1) fits volatility clustering parametrically. The resulting paths have:
- Specified $\mu$ and unconditional $\sigma$.
- Volatility clustering matching the fit.
- Fat tails (typically excess kurtosis 1-3, less than empirical but more than GBM).
- Symmetric returns (GARCH does not naturally produce skew).
For vol-sensitive strategies, GARCH is closer to honest than GBM but does not fully replace bootstrap for skew-sensitive logic.
Sample-size requirements
| Method | Minimum historical data |
|---|---|
| GBM | None (parametric) |
| GARCH | ~500 daily observations for stable fit |
| Block bootstrap | Same as test horizon, ideally 3x test horizon |
For a 3-year backtest, the bootstrap should draw from ~9 years of history minimum. Shorter histories produce resampled paths that are too similar to the original — the bootstrap loses its diagnostic power.
Failure modes
- Quoting GBM Sharpe as a measurement. It is a simulation. Report it as a distribution across many paths, not as a single number.
- Bootstrap on insufficient history. Resampling 2 years of data 1,000 times produces 1,000 paths that all look like 2 years. The diagnostic value collapses.
- Bootstrap without blocking on autocorrelated returns. Independent draws break realised autocorrelation. Use block bootstrap with $L \geq 5$ for daily equity data.
- Trusting GBM on vol-sensitive strategies. The GBM environment does not exercise vol-aware logic. Move to GARCH or bootstrap.
FAQ
Which gives the more pessimistic Sharpe?
Depends on the strategy and the empirical regime. For vol-sensitive strategies on equity data, bootstrap typically gives a wider Sharpe band (more storms in the resample) than GBM. For pure trend strategies on commodity data, GBM may be wider because the empirical sample may have been benign.
How many bootstrap paths do I need?
For Sharpe distribution: 1,000-5,000 paths. For drawdown distribution at the 1-5% tail: 10,000+. The compute is bounded — at 1 second per path, 10,000 paths is under 3 hours on a Mac Mini.
Can I combine the methods?
Yes. Run GBM and bootstrap in parallel; report the union of the two Sharpe distributions, or the worst Sharpe across both. The combined report makes fewer assumptions than either alone.
Connects to
- Synthetic Data: GARCH vs GBM for Backtesting: the GBM-vs-GARCH side.
- Walk-Forward Validation Cookbook: alternative robustness test.
- How to Read a Backtest Report: what to expect in published reports.
- Backtest Overfitting LLM Strategies: PBO Explained: multi-strategy selection test.
- Synthetic Market Data Generator: generate paths yourself.
- Synthetic Market Data Generator methodology: full input/output specification.
References
Footnotes
-
Lopez de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. Chapter on Monte Carlo and bootstrap for backtesting. ↩
-
Cont, R. (2001). "Empirical properties of asset returns: stylized facts and statistical issues." Quantitative Finance 1(2), 223–236. arxiv.org ↩
-
Politis, D. N., & Romano, J. P. (1994). "The Stationary Bootstrap." Journal of the American Statistical Association 89(428), 1303–1313. tandfonline.com ↩
-
Hansen, B. E. (2022). Econometrics. Princeton University Press. Chapter on resampling methods. press.princeton.edu ↩
Verified engine output
Show the recompute-verified inputs and outputs
| seed | 42 |
|---|---|
| s0 | 100 |
| drift | 8 |
| vol | 24 |
| days | 757 |
| trading_days_per_year | 252 |
| model | gbm |
|---|---|
| prices (758 items) | [...] |
| stats › ann return | 0.15473617171459414 |
| stats › ann vol | 0.23964152679323195 |
| stats › sharpe | 0.6456984888437302 |
| stats › max drawdown | -0.35503595980976105 |
Computed live at build time.