A Markov recovery model can warn you about drawdowns your historical tape never sampled, which is the whole reason to run it alongside a bootstrap. On a mid-Sharpe strategy (monthly Sharpe 0.5, skew −0.3, excess kurt 2.5, 20% drawdown threshold) over 4,000 simulated paths × 60 months, seed 42, the Drawdown Recovery Markov returns median recovery 11 months, p25 = 8, p75 = 16, and p95 = 28 months. pctRecoveredIn12Mo = 57.75%, pctNeverRecovered = 0.125%. The same statistics computed from a historical bootstrap on a real 60-month tape would be bounded by the tape's worst observed recovery; Markov can extrapolate beyond it. Which is more trustworthy depends on tape length, regime stability, and whether the tape has already sampled the strategy's bad regime.

TL;DR

Two approaches to drawdown-distribution estimation, two failure modes:

Approach What it does Strength Failure mode
Markov simulation Moment-match a skew-t to (Sharpe, skew, kurt), simulate 4000 paths Extrapolates beyond the tape's worst sequence Assumes return distribution is stationary; misses regime changes
Historical bootstrap Resample blocks from the empirical tape Faithful to observed dependencies Bounded by the tape's worst sequence; can't sample what hasn't happened

For tapes under 60 months the Markov simulation dominates; for tapes over 240 months the historical bootstrap dominates; in between the choice depends on whether the tape captures the strategy's bad regime.

The Markov engine's output, decomposed

The canonical input (monthly Sharpe 0.5, skew −0.3, excess kurt 2.5, threshold 20%) produces:

  • Median recovery 11 months. Half of paths recover within 11 months of hitting the 20% drawdown threshold.
  • p25 = 8 months. A quarter of paths recover in 8 months or less, the "good outcome" tail.
  • p75 = 16 months. Three-quarters of paths recover within 16 months, the "median bad outcome" tail.
  • p95 = 28 months. The 95th-percentile recovery time. The kill-switch parameter.
  • 57.75% of paths recover within 12 months. Most operators with a 12-month patience will not see strategy failure on average, but ~42% of paths exceed the patience constant.
  • 0.125% never recovered in 60 months. A small but non-zero fraction of paths exceed the simulation horizon.

These six numbers describe the recovery-time distribution under the engine's stationary-skew-t assumption. They are not estimates of historical drawdown recovery; they are estimates of future drawdown recovery under the assumed return-distribution moments.

What the bootstrap would produce on a real tape

Imagine a real 60-month strategy tape: 60 monthly returns. A standard block-bootstrap procedure:

  1. Pick block size b (typically 3–6 months to preserve serial dependence).
  2. Resample blocks with replacement to construct an N-month path. Choose N = 60 to match the simulation horizon.
  3. Compute drawdown statistics on each resampled path.
  4. Aggregate across 4,000 resampled paths.

The bootstrap's recovery-time distribution is bounded by the tape's structure: it cannot produce a path that strings together months worse than the worst k-month sequence in the tape. If the tape's worst 12-month rolling return is −15%, the bootstrap cannot produce a path with a 30% drawdown. The Markov simulation can, because it samples from the moment-matched distribution which has tails longer than any empirical 12-month sequence.

For a strategy whose live regime is similar to its tape regime, the bootstrap's tape-bounded estimate is the honest forecast. For a strategy that will face a regime not yet on the tape, the Markov simulation's parametric extrapolation is the necessary forecast, at the cost of accepting the moment-matched distributional assumption.

When tape length is short

For tapes under 60 months (5 years), the bootstrap's worst-case is too tame. A typical retail strategy backtest over 5 years has maybe one 6-month drawdown; the bootstrap's worst resampled 12-month sequence is only modestly worse. The 95th-percentile recovery under bootstrap will be 12–18 months, clearly less than the Markov's 28 months on the canonical input.

The discrepancy is the forecast horizon problem. The bootstrap estimates what would happen if the future tape were structurally identical to the past tape. The Markov simulation estimates what would happen under a re-draw from the same distribution. The Markov is more pessimistic because it can sample tail events the bootstrap has never seen.

For short-tape regimes the practitioner needs to choose: is the tape representative of the strategy's long-run distribution (in which case Markov over-states risk) or is the tape a benign sample of a riskier underlying (in which case Markov is correct and bootstrap is dangerous). Without external information, the safer default is Markov, under-stating drawdown risk causes blow-ups, over-stating causes excessive position-sizing conservatism.

When tape length is long

For tapes over 240 months (20 years), the bootstrap has likely sampled the strategy's bad regime at least once. The empirical worst 24-month sequence is closer to the true worst-case. The bootstrap's p95 recovery converges to the empirical reality.

In this regime the Markov assumption (i.i.d. draws from a moment-matched distribution) is the limiting factor. Real returns have regime persistence, bad regimes correlate across months in ways that produce longer drawdowns than i.i.d. draws would. The Markov simulation, by assuming independence, under-estimates the tail of the drawdown distribution in the autocorrelated regime.

For long-tape regimes, prefer the bootstrap (or a regime-switching simulation that captures the autocorrelation). The Markov simulation in its basic form is conservative on tail thickness but optimistic on tail clustering.

The 60-to-240 month gray zone

The most common retail regime is 60–240 months of tape. Here the choice depends on a substantive judgment:

  • Use Markov when: The tape is short relative to the strategy's full regime distribution, regime-fragility is plausible, and the cost of an unexpected drawdown is high. The asymmetric loss function (drawdown causes more damage than capital underuse) tilts toward Markov.
  • Use bootstrap when: The tape is long enough to have sampled a recession or volatility regime, and the strategy's risk profile is well-understood. The empirical fidelity favours bootstrap.
  • Use both: Run both engines, compare p95 recoveries. If they differ by more than 50%, the discrepancy is the uncertainty band on the forecast. Report both numbers.

The Markov simulation in the engine is a parametric stationary skew-t. A more sophisticated extension is a regime-switching skew-t (two regimes with persistence parameter), which captures the autocorrelation the basic Markov misses. The engine does not implement this; the user can produce equivalent output by running the basic Markov with two parameter sets (good-regime and bad-regime moments) and combining the results.

The threshold-sensitivity check

The 20% threshold is one of many possible kill-switch levels. Sweeping the engine across {15%, 20%, 25%, 30%} (4,000 paths, 60 months, seed 42) gives a near-linear shift in recovery time:

  • 15% threshold: median 8, p95 = 23 months.
  • 20% threshold: median 11, p95 = 28 months.
  • 25% threshold: median 15, p95 = 33 months.
  • 30% threshold: median 18, p95 = 38 months.

A deeper drawdown threshold means a longer climb back, so both the median and the p95 recovery rise roughly 5 months per 5-percentage-point step in this regime. A kill-switch defined at one threshold has materially different implications from the same kill-switch at a different threshold. The defensible procedure is to sweep the threshold and report the (threshold, p95) joint distribution, then pick the kill-switch based on operator patience.

Connects to

References

  • Politis, D. N., & Romano, J. P. (1994). "The Stationary Bootstrap." Journal of the American Statistical Association 89(428), 1303–1313. The stationary block-bootstrap reference. https://www.jstor.org/stable/2290993
  • Magdon-Ismail, M., Atiya, A. F., Pratap, A., & Abu-Mostafa, Y. S. (2004). "On the Maximum Drawdown of a Brownian Motion." Journal of Applied Probability 41(1), 147–161. https://www.cambridge.org/core/journals/journal-of-applied-probability
  • Azzalini, A., & Capitanio, A. (2003). "Distributions Generated by Perturbation of Symmetry with Emphasis on a Multivariate Skew-t Distribution." Journal of the Royal Statistical Society B 65(2), 367–389. Skew-t reference.
  • Lopez de Prado, M. (2018). Advances in Financial Machine Learning, chapter on backtest statistics including drawdown.
  • McNeil, A. J., Frey, R., & Embrechts, P. (2015). Quantitative Risk Management, 2nd ed., Princeton UP. Chapter 7 covers drawdown and tail-risk modelling.

Verified engine output

Show the recompute-verified inputs and outputs
Inputs
monthly_sharpe0.5
monthly_skew-0.3
monthly_excess_kurt2.5
recovery_threshold0.2
paths4000
max_months60
seed42
Result
median11
p258
p7516
p9528
pct recovered in12 mo0.5775
pct never recovered0.00125
recovery times (3995 items)[...]

Computed live at build time.

Frequently asked questions

When should I prefer the Markov simulation over a historical bootstrap?
When the tape is short (under 60 months) relative to the strategy lifecycle, or regime-fragility is plausible. For long-tape (>240 months) well-understood strategies, prefer bootstrap.
Does the engine support a block-bootstrap option?
Not directly. For bootstrap analysis run a standard block-bootstrap externally and compare the two outputs side-by-side.
Why does the Markov assume i.i.d. draws when real returns are autocorrelated?
Because the closed-form skew-t moment-match is tractable. Adding autocorrelation requires a regime-switching ARMA — the natural next step for a more conservative forecast.
What about Vine copulas or other dependency models?
Vine copulas capture asymmetric dependencies for multi-asset drawdown. The single-asset Markov engine here does not need them because the dependency structure is intra-asset.
How does this connect to my actual kill-switch design?
Compute p95 recovery from the engine; sweep across drawdown thresholds. If p95 at your chosen threshold exceeds operator patience, pick a tighter threshold or accept a shorter patience constant.