Synthetic price paths from a GBM model produce a Sharpe of 1.91 and a max drawdown of -17.6% over a 504-day run at 40% annualised return and 20.9% annualised volatility — exactly the numbers a backtest fed those paths will report. That is the entire point: the Synthetic Market Data Generator output is a stress-test scaffold, not a forecast. A GARCH model fits volatility clustering; GBM does not. For backtesting strategies sensitive to vol regimes (anything with stop-losses, position sizing, or vol targeting), the GARCH path is the more honest test.

TL;DR

  • GBM 504-day path on $\mu = 0.40$, $\sigma = 0.21$: annualised return 40.0%, annualised vol 20.9%, Sharpe 1.91, max drawdown -17.6%.
  • GBM assumes constant volatility. Real markets cluster volatility — GARCH catches this, GBM does not.
  • Use GBM paths to stress-test logic; use GARCH paths to stress-test sizing.
  • Bootstrap resampling of empirical returns is the third option — preserves the realised distribution shape.
  • For published backtests, run all three and compare the strategy's Sharpe across the three regimes.

The scenario

The Synthetic Market Data Generator on GBM with $\mu = 0.40$ (annualised drift), $\sigma = 0.21$ (annualised volatility), starting price 100, 504 trading days (two years), random seed fixed:

Stat Value
Final price (approximate) 222
Annualised return 39.9%
Annualised volatility 20.9%
Sharpe ratio 1.91
Max drawdown -17.6%

A strategy that simply holds the synthetic asset for the full 504 days realises this Sharpe and this drawdown; running many such paths is the Monte Carlo approach to backtesting1. Any deviation is the strategy's contribution.

Why GBM is wrong by default

Geometric Brownian Motion assumes returns are IID lognormal. Real return series violate at least three of the GBM assumptions2:

  1. Volatility clustering. High-vol periods follow high-vol periods. GBM has no mechanism for this; GARCH (Generalized AutoRegressive Conditional Heteroskedasticity) does.
  2. Fat tails. Empirical equity return distributions have excess kurtosis 3-5; GBM produces excess kurtosis ~0.
  3. Skewness. Equity returns are typically left-skewed; GBM is symmetric.

A strategy with vol-aware position sizing tested on GBM paths will look identical to one without — the GBM environment does not exercise the vol-aware logic. The strategy will then fail in production when realised vol clusters and the position sizer scales down at the worst moment.

GARCH-augmented paths

The same generator supports a GARCH(1,1) model. The path is generated by sampling returns from a normal distribution whose variance follows:

σ²_t = ω + α · ε²_{t-1} + β · σ²_{t-1}

A typical GARCH(1,1) fit on US equity index data has $\omega \approx 1\times 10^{-6}$, $\alpha \approx 0.08$, $\beta \approx 0.90$, satisfying $\alpha + \beta < 1$ for stationarity3. The realised paths show vol clustering — calm periods followed by storms followed by calm.

For a strategy that uses a 20-day rolling vol estimate for position sizing, the GARCH path produces materially different P&L than the GBM path. The strategy reduces position size during high-vol regimes, captures more of the calm-period upside, and gives back less in the storms. On GBM, the size adjustment is noise; on GARCH, it is signal.

Bootstrap resampling

The third option is to resample observed returns with replacement. The realised distribution shape (skew, kurtosis, volatility clustering at lag 1 if block-resampled) is preserved without parametric assumptions. Block bootstrap (resample contiguous blocks) preserves short-range dependence4.

Trade-offs:

Method Pro Con
GBM Clean math, fast Constant vol, thin tails
GARCH Vol clustering, realistic Parameter fitting required
Bootstrap Preserves observed shape Cannot extrapolate beyond observed range

For backtests of vol-sensitive strategies, GARCH and bootstrap both belong in the test suite. The Synthetic Data vs Bootstrap Resampling for Backtests piece walks the head-to-head comparison.

What synthetic data is for

The right framing: synthetic paths are stress tests, not forecasts. The strategy's logic should hold across the three regimes; the strategy's reported Sharpe should be the worst of the three, not the best, not the average.

A defensible backtest report includes:

  1. Empirical run. Strategy applied to historical data. The headline Sharpe.
  2. GBM stress. Same parameters $(\mu, \sigma)$, 10,000 paths. Distribution of strategy's Sharpe.
  3. GARCH stress. Same volatility process, 10,000 paths. Distribution of Sharpe under realistic vol clustering.
  4. Bootstrap stress. Block-resampled empirical returns. Distribution preserving observed shape.

A Sharpe of 1.5 on empirical data, 1.6 on GBM, 1.2 on GARCH, 1.4 on bootstrap is a defensible result — vol regime sensitivity is acknowledged. A Sharpe of 1.5 on empirical and 2.0 on GBM but the strategy was never tested on GARCH is a red flag.

The lookahead trap

Synthetic data does not eliminate look-ahead bias. If the strategy uses information that is not contemporaneous (e.g., normalising by future variance), running it on synthetic data does not catch the bug — the synthetic path also has "future" data the strategy can peek at. See Price-Blind LLM Research Harness for the structural fix.

Calibrating to a target backtest

The Synthetic Market Data Generator accepts user-specified $\mu$ and $\sigma$. To match an empirical backtest period:

  • Estimate $\mu$ and $\sigma$ on the empirical window.
  • Generate 10,000 GBM paths with those parameters.
  • Run the strategy on each path. The distribution of outcomes is the strategy's robustness on the GBM-implied null.

If the strategy's empirical Sharpe is at the 99th percentile of the GBM distribution, the strategy either has real edge or was selected from a much larger pool. The Deflated Sharpe Ratio computes the second possibility directly.

Production usage

For an in-development strategy, the synthetic-data test cycle:

  1. Run the strategy on historical data. Get the headline metrics.
  2. Generate 10,000 GBM paths with matched $(\mu, \sigma)$. Run the strategy. Distribution of Sharpe.
  3. Repeat with GARCH(1,1) fit on the historical residuals.
  4. Repeat with block bootstrap.
  5. Report empirical Sharpe alongside the worst-case Sharpe across the three synthetic regimes.

The four numbers belong in any backtest report that an external reader is expected to trust. Skipping them is the same failure as skipping standard deviation on a mean.

Failure modes

  • Using GBM Sharpe as if it were an empirical Sharpe. GBM is a simulation, not a measurement. Report it as a distribution, not a point.
  • Treating high GBM Sharpe as edge confirmation. A strategy that looks great on GBM and bad on GARCH has vol-regime exposure that GBM cannot test.
  • Sample size too small. Single-path GBM Sharpes are noisy. Use 10,000 paths minimum to get a stable distribution.
  • Forgetting that synthetic data starts at $t=0$. A strategy that requires a 200-day warm-up needs synthetic paths of at least 200 + test-window days.

FAQ

Should I use GBM or GARCH for stress-testing?

Both. GBM tests the strategy logic under simple conditions; GARCH tests the strategy's vol-regime sensitivity. A strategy that passes GBM but fails GARCH has a vol-exposure bug; a strategy that passes both is defensible.

How many synthetic paths do I need?

For a stable Sharpe distribution, 10,000 paths minimum. For tail estimates (1st percentile drawdown, ruin probability), 50,000-100,000 paths. The compute is cheap; the precision is real.

Does this replace a walk-forward test?

No. Synthetic data tests strategy robustness to distributional assumptions; walk-forward tests strategy robustness to regime change. They catch different failures; both belong in the test stack.

Connects to

References

Footnotes

  1. Lopez de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. Chapter on Monte Carlo for backtesting.

  2. Cont, R. (2001). "Empirical properties of asset returns: stylized facts and statistical issues." Quantitative Finance 1(2), 223–236. arxiv.org

  3. Engle, R. F. (1982). "Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation." Econometrica 50(4), 987–1007. jstor.org

  4. Politis, D. N., & Romano, J. P. (1994). "The Stationary Bootstrap." Journal of the American Statistical Association 89(428), 1303–1313. tandfonline.com

Verified engine output

Show the recompute-verified inputs and outputs
GBM, 504 trading days, seed 13, realised return ~40% / vol ~21%
Inputs
seed13
s0100
drift6
vol22
days504
trading_days_per_year252
Result
modelgbm
prices (505 items)[...]
stats › ann return0.39940867583646555
stats › ann vol0.2086438974819357
stats › sharpe1.914307969975715
stats › max drawdown-0.17610786935663184

Computed live at build time.