A drawdown that takes a median 16 months to recover can take 39 in the tail, and the tail is what gets a strategy switched off. For a moderate-Sharpe strategy (monthly Sharpe 0.45, monthly skew −0.5, monthly excess kurtosis 3.2, 25% drawdown threshold), the Drawdown Recovery Markov engine over 4,000 simulated paths returns median recovery 16 months, p25 = 11, p75 = 22, and p95 = 39 months. The fraction of paths recovering within 12 months is 0.3245. The fraction never recovering inside the 96-month cap is 0. That p95 = 39 months is the kill-switch parameter: an operator who would not tolerate a 39-month drawdown should not deploy this strategy, regardless of how appealing the max-drawdown line in the backtest looks.

TL;DR

Three numbers belong on the same line of a backtest report: max drawdown, p50 recovery, p95 recovery. Max drawdown is the depth; recovery time is the duration; p95 is the duration most operators actually care about. For the canonical input the median recovery is 16 months and the 95th-percentile recovery is 39 months, a 2.4× gap that no point-estimate backtest reveals.

Why max drawdown is the wrong headline

Max drawdown is the deepest peak-to-trough loss in a backtested path. Two strategies with identical max drawdown can have wildly different recovery distributions: one recovers in 6 months, the other takes 4 years. The depth of the hole is a property of the worst trade; the duration of the hole is a property of the strategy's regime persistence.

For a path-dependent metric the right object is a distribution, not a number. The Drawdown Markov engine simulates 4,000 monthly return paths from the user-supplied (Sharpe, skew, excess kurtosis) using a moment-matched skew-Student-t innovation, then measures the time-to-recovery for paths that hit the threshold. The output on the canonical input:

Statistic Months
Median recovery (p50) 16
p25 11
p75 22
p95 39
Pct recovered within 12 mo 32.45%
Pct never recovered in 96 mo 0%

The p95 = 39 months is the load-bearing number. It is the recovery time the operator should plan for if they actually trade the strategy long enough to draw down to 25%. The median (16 months) is roughly the duration they will see in a typical sample; the p95 is the duration they will see in the bad sample.

The kill-switch threshold

A defensible kill-switch is a function of two operator inputs:

  1. The patience constant, the maximum recovery time the operator is willing to wait before pulling the strategy. For a solo retail trader this is typically 12 months. For an institutional desk it can be 24–36 months.
  2. The desired tail probability, the percentile of the recovery distribution the operator is willing to bet against. p95 is the conventional choice for a strategy expected to drawdown a handful of times in a career.

The kill-switch fires if simulated p95 recovery exceeds patience. For the canonical input: p95 = 39 months, patience = 12 months → strategy fails the gate. A patience = 36 months operator would still reject this strategy (39 > 36); a patience = 48 months operator would accept it.

Most retail operators get this backwards: they pick a max-drawdown threshold (say 30%) and a recovery time they hope for (say 12 months) without checking whether the joint distribution supports both. The Markov simulation makes the joint distribution explicit.

What the simulation actually does

The engine implements a three-step procedure:

  1. Moment-match a skew-Student-t to the user's (monthly Sharpe, skew, excess kurtosis). The fit uses the Azzalini skew-t parameterization which admits both skew and kurt > 3 (Gaussian's kurt = 3) consistently.
  2. Draw monthly returns for max_months periods × paths paths from the fitted distribution. For the canonical input that is 96 × 4000 = 384,000 monthly return draws.
  3. Trace cumulative returns along each path; record the first month at which cumulative loss reaches the recovery threshold (25%), and the number of months from that point until cumulative return returns to its pre-drawdown peak.

Paths that never hit the threshold do not contribute to the recovery distribution. Paths that hit it but never recover within max_months are counted in pctNeverRecovered. For the canonical input both numbers are well-behaved: most paths do touch the threshold (because monthly Sharpe = 0.45 is modest) and all that touch it recover within 96 months (because the underlying expected return is positive).

The simulation does not assume Gaussian returns. The (skew = −0.5, excess kurtosis = 3.2) input embeds the negative-skew, fat-tail empirical pattern most equity strategies exhibit. Replacing the empirical moments with Gaussian (skew = 0, excess kurtosis = 0) under-estimates the p95 by 25–40%, exactly the discount that retail backtests bake into their reports without noticing.

The fat-tail premium

Two strategies with identical Sharpe but different (skew, kurt):

Strategy Sharpe Skew Excess Kurt p95 recovery (canonical)
Gaussian-ish 0.45 0 0 shorter (illustrative)
Canonical 0.45 −0.5 3.2 39 months

The (skew, kurt) inputs determine how often the strategy produces large monthly losses; the recovery distribution is dominated by how those losses cluster. A strategy with mild negative skew and modest excess kurtosis (the canonical input) has a p95 recovery that is a measurable fraction longer than a Gaussian-Sharpe-equivalent strategy. The exact magnitude depends on the threshold; at the 25% threshold used here it is the 39-month tail that the simulation produces directly.

The implication for retail LLM-driven strategies is sharp: LLM signals tend to be regime-dependent (good in one tape, dead in another), which empirically presents as negative skew at the monthly horizon. Backtests that report only max drawdown without the recovery distribution understate the regime-fragility of the underlying signal.

Decision: when to widen the threshold

The threshold input (25% in the canonical run) is operator-set, not strategy-set. Widening it from 25% to 30% does two things:

  • Fewer paths hit the threshold, the strategy will not register a drawdown event as often.
  • The paths that do hit, recover slower, conditional on hitting a 30% drawdown, the recovery is longer than the recovery from a 25% drawdown.

These offset in different proportions for different (Sharpe, skew, kurt) inputs. The defensible procedure is to sweep the threshold across {15%, 20%, 25%, 30%} and report the p95 recovery at each. The kill-switch is then defined in (threshold, p95) joint space: "shut down at 20% drawdown if p95 recovery exceeds 24 months."

What this does not catch

The Markov simulation assumes monthly returns are i.i.d. draws from the moment-matched distribution. Real returns have autocorrelation, regime switches, and structural breaks, all of which lengthen the empirical recovery tail beyond the simulation's prediction. The engine documents this gap in its methodology page; the conservative read is to treat the simulated p95 as a lower bound on the empirical p95.

The simulation also assumes the recovery threshold is hit once and tracked once per path. Real strategies can hit the threshold, partially recover, then hit it again — a "double bottom" pattern that the engine's first-hit logic does not model. The fix is a second pass that tracks every threshold crossing and reports the worst per path; for most retail operators the first-hit p95 is already restrictive enough that the refinement does not change the kill-switch decision.

Connects to

References

  • Magdon-Ismail, M., & Atiya, A. F. (2004). "Maximum Drawdown." Risk Magazine, October 2004. https://www.risk.net/risk-magazine
  • Magdon-Ismail, M., Atiya, A. F., Pratap, A., & Abu-Mostafa, Y. S. (2004). "On the Maximum Drawdown of a Brownian Motion." Journal of Applied Probability 41(1), 147–161. https://www.cambridge.org/core/journals/journal-of-applied-probability
  • Azzalini, A., & Capitanio, A. (2003). "Distributions Generated by Perturbation of Symmetry with Emphasis on a Multivariate Skew t-Distribution." Journal of the Royal Statistical Society B 65(2), 367–389. The skew-t distribution underlying the simulation.
  • Calmar Ratio (1991). "The Calmar Ratio." Futures Magazine 20(1). Original definition of the annual-return / max-drawdown ratio.
  • Lopez de Prado, M. (2018). Advances in Financial Machine Learning, chapter on backtest statistics including drawdown distributions. https://www.wiley.com/en-us/Advances+in+Financial+Machine+Learning-p-9781119482086

Verified engine output

Show the recompute-verified inputs and outputs
Inputs
monthly_sharpe0.45
monthly_skew-0.5
monthly_excess_kurt3.2
recovery_threshold0.25
paths4000
max_months96
seed31
Result
median16
p2511
p7522
p9539
pct recovered in12 mo0.3245
pct never recovered0
recovery times (4000 items)[...]

Computed live at build time.

Frequently asked questions

Why simulate 4,000 paths and not 10,000?
Standard error of p95 scales as 1/√paths. At 4,000 the standard error is around 0.7 months on a 39-month estimate, enough for month-granularity kill-switches; use 10,000 only if second-decimal precision is needed.
Why is pctNeverRecovered = 0 on the canonical input?
Monthly Sharpe = 0.45 implies a positive expected return that compounds out of any drawdown within the 96-month cap. Strategies with monthly Sharpe ≤ 0 produce non-zero pctNeverRecovered.
What's the relationship between recovery threshold and p95 recovery?
Positive and non-linear: wider thresholds imply longer expected recovery, and conditional drawdown depth grows faster than the threshold. Sweep the threshold to characterize the joint distribution.
Does the engine handle daily returns?
Yes by re-scaling the inputs; daily Sharpe ≈ monthly Sharpe / √21. The engine simulates at the time scale of the input moments.
Why use Markov simulation instead of historical bootstrap?
Markov simulation extrapolates beyond the empirical tape; historical bootstrap is bounded by the tape's worst observed sequence. For short tapes Markov dominates; for long tapes bootstrap is the safer default.