Synthetic Data vs Bootstrap Resampling for Backtests

Q: Which gives the more pessimistic Sharpe?

Depends on the strategy and the empirical regime. For vol-sensitive strategies on equity data, bootstrap typically gives a wider Sharpe band (more storms in the resample) than GBM. For pure trend strategies on commodity data, GBM may be wider because the empirical sample may have been benign.

Q: How many bootstrap paths do I need?

For Sharpe distribution: 1,000-5,000 paths. For drawdown distribution at the 1-5% tail: 10,000+. The compute is bounded — at 1 second per path, 10,000 paths is under 3 hours on a Mac Mini.

Q: Can I combine the methods?

Yes. Run GBM and bootstrap in parallel; report the union of the two Sharpe distributions, or the worst Sharpe across both. The combined report makes fewer assumptions than either alone.

A GBM-generated 757-day path with $\mu = 0.08$ and $\sigma = 0.24$ (seed 42) produces Sharpe 0.65 and max drawdown -35.5% — close to the long-run US equity index profile by design. The Synthetic Market Data Generator reports those numbers from one path; running many paths produces the Sharpe distribution. Bootstrap resampling of the same empirical series produces a different distribution: same mean, same volatility, but with the realised skew, kurtosis, and autocorrelation preserved. For backtests, the two methods are complementary diagnostics, not substitutes¹.

TL;DR

GBM on $\mu=0.08$, $\sigma=0.24$, 3 years (~757 trading days), seed 42: Sharpe 0.65, max drawdown -35.5%.
Bootstrap preserves observed distribution shape (skew, kurtosis, autocorrelation if block-bootstrapped).
GBM is fast and parametric; bootstrap is non-parametric and slower.
For testing IID-violation sensitivity: bootstrap with block size 5-20 days.
For testing distributional-assumption sensitivity: GBM and GARCH.
For published backtests, run both and report the wider Sharpe band.

The scenarios

The Synthetic Market Data Generator on GBM with US-equity-like parameters ($\mu = 0.08$, $\sigma = 0.24$, 757 days, seed 42) returns:

Stat	GBM single path
Annualised return	15.5%
Annualised volatility	24.0%
Sharpe ratio	0.65
Max drawdown	-35.5%

Bootstrap on the same empirical underlying — say 757 historical days of an S&P 500 ETF — would return statistics matching the empirical distribution: similar mean and volatility but with realised left skew (~-0.5), excess kurtosis (~3-5), and visible volatility clustering.

What GBM gets right

Mean and standard deviation can be specified to match any target empirical regime.
The math is closed-form; paths are generated in O(N) time.
Easy to vary parameters and study sensitivity.

What GBM gets wrong

GBM assumes IID lognormal returns. Real returns violate at least three properties²:

Volatility clustering. High-vol days cluster. GBM has no mechanism for this; GARCH does.
Fat tails. Realised equity returns have excess kurtosis 3-5. GBM produces excess kurtosis 0.
Skewness. Equity returns are left-skewed. GBM is symmetric.

A strategy that targets vol-regime exposure (vol targeting, vol-aware sizing, vol-managed stop-losses) cannot be tested on GBM — the GBM environment does not exercise the vol-aware logic.

What bootstrap gets right

Block bootstrap preserves:

The empirical distribution shape (mean, variance, skew, kurtosis).
Short-range autocorrelation when block size > 1.
Realised regime shifts within the historical window.

The block bootstrap³ resamples contiguous blocks of length $L$ with replacement, the canonical resampling method for dependent data⁴. For US daily equity data, $L = 5$-$20$ captures the typical short-range autocorrelation; $L = 60$+ preserves longer-range dependence at the cost of less variation between resamples.

What bootstrap gets wrong

Cannot extrapolate beyond the observed range. A strategy that would have worked in a never-realised regime is untestable.
Requires substantial historical data to seed the resampler. Bootstrap on 5 years of data tests robustness against 5 years of regimes; it does not test robustness against a 30-year regime that the historical window did not include.
Block-size choice is somewhat arbitrary and affects results.

Side-by-side

Property	GBM	Block bootstrap
Vol clustering	No	Yes (with $L \geq 5$)
Fat tails	No	Yes (preserves empirical)
Skewness	No	Yes (preserves empirical)
Parametric	Yes	No
Extrapolates beyond observed	Yes	No
Sample-size dependent	No	Yes
Speed (10,000 paths)	< 5 seconds	10-60 seconds

For testing IID assumptions of strategy logic: bootstrap. For testing distributional sensitivity in a controlled environment: GBM.

A worked comparison

Consider a vol-targeting strategy that scales position size inversely with 20-day realised volatility. On the GBM path above:

Realised vol is roughly constant at 24%. Vol target multiplier varies in a narrow band around 1.
Strategy P&L on GBM ≈ buy-and-hold P&L. The vol-aware logic does not exercise.

On a bootstrap of empirical S&P 500 daily returns:

Realised vol clusters: 12% during calm 2017, 35% during March 2020, 20% during 2021.
Vol target multiplier swings from ~2x in calm to ~0.5x in storms.
Strategy P&L differs materially from buy-and-hold: smaller drawdowns during high-vol periods, slightly lower upside in calm periods.

The Sharpe ratio of the vol-targeting strategy will look near-identical to buy-and-hold under GBM and materially different under bootstrap. The bootstrap result is the honest one for this strategy.

When each is the right call

Use GBM when:

The strategy logic does not depend on vol regimes (constant-size momentum, fixed-position pairs).
You need to parametrically vary $\mu$ and $\sigma$ to study sensitivity.
You want a clean null hypothesis ("under IID lognormal, does the strategy edge survive?").

Use bootstrap when:

The strategy logic depends on realised distribution properties (vol-aware sizing, drawdown-aware stops).
The empirical sample is long enough (5+ years of daily data, 10+ years of monthly).
You want to test against the realised regimes specifically, not against an idealised model.

For most production backtests, run both. The Sharpe distribution under GBM tells you "is the strategy edge stable under assumptions about returns?" The Sharpe distribution under bootstrap tells you "is the strategy edge stable under the regimes we actually observed?"

GARCH as the bridge

A third option, between GBM and bootstrap, is GARCH-augmented synthetic data. GARCH(1,1) fits volatility clustering parametrically. The resulting paths have:

Specified $\mu$ and unconditional $\sigma$.
Volatility clustering matching the fit.
Fat tails (typically excess kurtosis 1-3, less than empirical but more than GBM).
Symmetric returns (GARCH does not naturally produce skew).

For vol-sensitive strategies, GARCH is closer to honest than GBM but does not fully replace bootstrap for skew-sensitive logic.

Sample-size requirements

Method	Minimum historical data
GBM	None (parametric)
GARCH	~500 daily observations for stable fit
Block bootstrap	Same as test horizon, ideally 3x test horizon

For a 3-year backtest, the bootstrap should draw from ~9 years of history minimum. Shorter histories produce resampled paths that are too similar to the original — the bootstrap loses its diagnostic power.

Failure modes

Quoting GBM Sharpe as a measurement. It is a simulation. Report it as a distribution across many paths, not as a single number.
Bootstrap on insufficient history. Resampling 2 years of data 1,000 times produces 1,000 paths that all look like 2 years. The diagnostic value collapses.
Bootstrap without blocking on autocorrelated returns. Independent draws break realised autocorrelation. Use block bootstrap with $L \geq 5$ for daily equity data.
Trusting GBM on vol-sensitive strategies. The GBM environment does not exercise vol-aware logic. Move to GARCH or bootstrap.

FAQ

Which gives the more pessimistic Sharpe?

Depends on the strategy and the empirical regime. For vol-sensitive strategies on equity data, bootstrap typically gives a wider Sharpe band (more storms in the resample) than GBM. For pure trend strategies on commodity data, GBM may be wider because the empirical sample may have been benign.

How many bootstrap paths do I need?