Position sizing for LLM signals fails at the deterministic-Kelly step. Plug a $\widehat{p}$ from a 200-call eval into the formula and you over-bet by 3-5x. The Position Sizing under Edge Variance tool ran on edge_mean = 2.5%, edge_stddev = 1.8%, outcome variance = 4.5%, fractional Kelly = 0.2: deterministic Kelly is 55.6% of bankroll, Bayesian Kelly drops to 55.2%, and the conservative variant (lower-bound posterior) collapses to 15.6%. The 5% CVaR on the conservative bet is 4.55% of bankroll per drawdown event. Edge uncertainty is the binding constraint for LLM-derived signals; the formula has to know that.

TL;DR

  • Deterministic Kelly on $\mu_{\text{edge}} = 2.5%$, $\sigma_{\text{edge}} = 1.8%$, $\sigma^2_{\text{outcome}} = 4.5%$: 55.6% of bankroll.
  • Quarter-Kelly fraction (0.2): 11.1% of bankroll deterministic.
  • Bayesian Kelly using the posterior mean: 55.2% (barely different — the mean is similar).
  • Conservative Kelly using a one-sigma-lower edge (edge_mean − edge_stddev): 15.6%: collapses by a factor of 3.6.
  • 5% CVaR on the conservative bet: 4.55% of bankroll per drawdown event.
  • For LLM signals, deterministic Kelly is wrong. Conservative Kelly is the right default.

The scenario

An LLM signal generator produces a binary direction call with a self-reported edge of 2.5 percentage points above coin-flip and an estimated standard deviation of edge of 1.8 percentage points. The outcome variance is 4.5 percentage points. Quarter-Kelly scaling is the practitioner's default for noisy edges. The Position Sizing under Edge Variance engine computes four numbers on this scenario.

The four sizing values come out as:

Approach Raw fraction Quarter-Kelly fraction
Deterministic Kelly (treats edge as known) 55.6% 11.1%
Bayesian Kelly (posterior mean) 55.2% 11.0%
Conservative Kelly (lower-bound posterior) 15.6% 3.1%

The 5% CVaR on the conservative bet is 4.55% of bankroll. That is the per-drawdown-event expected loss in the worst 5% of paths.

Why deterministic Kelly is wrong here

The formula

f_kelly = edge / outcome_variance = 0.025 / 0.045 = 0.556

assumes the 2.5% edge estimate is correct. It is not. The 1.8% standard deviation says the true edge could plausibly be anywhere from 0% to 5%. The 95% credible interval includes zero. Sizing at 55.6% of bankroll on an edge whose interval includes zero is sizing on belief, not on data.

Bayesian Kelly, computed against the posterior mean of the edge distribution, comes out at 55.2% — barely different from deterministic. That is the trap. The posterior mean is similar to the point estimate when the prior is uninformative. The posterior lower bound is what drives the conservative variant down to 15.6%, the bet-sizing-under-uncertainty correction1. Reporting only the posterior mean tells the trader nothing they didn't already know.

The conservative variant

The 15.6% conservative Kelly comes from using a one-sigma-lower edge estimate: the engine computes the conservative edge as edge_mean − edge_stddev = 2.5% − 1.8% = 0.7%, then runs Kelly on that. The size shrinks by the same factor the edge estimate does (0.7% / 2.5% ≈ 0.28, so 55.6% × 0.28 ≈ 15.6%) — Kelly is linear in edge. A one-sigma haircut is less aggressive than a full 5th-percentile (1.645σ) cut, which on this scenario would push the edge estimate negative; the one-sigma rule keeps the conservative bet positive while still collapsing it by a factor of 3.6.

For LLM signals, this is the right default. Three reasons:

  1. The eval sample is small. A 200-call eval with a 54% accuracy reading has a Beta posterior 95% CI of roughly [47%, 61%]. The posterior lower bound is what governs survival.
  2. LLM signals decay. The edge that was true on the training corpus is rarely the edge on the production corpus six months later. Conservative sizing absorbs this decay without code changes.
  3. The penalty for over-betting is asymmetric. Half-Kelly over-bet costs roughly 25% of expected log-growth; lower-bound conservative under-bet costs roughly 50%. The downside-protected variant pays back as soon as the eval sample grows or the edge decays, because the lower bound moves with the data.

The CVaR number

The 5% CVaR of 4.55% says: in the worst 5% of drawdown events on the conservative-Kelly bet, the average bankroll loss is 4.55%. That is roughly the same magnitude as the bet itself (3.1% quarter-Kelly fraction × the outcome variance dispersion), which is the right shape — a sized bet should not be capable of a single-event loss many times its own size.

Compare this to deterministic quarter-Kelly: 11.1% bet size. A single losing event on a 4.5% outcome variance series gives roughly an 11% × √4.5% ≈ 2.3% expected loss in the bad 5%, but the variance on that estimate is large. The conservative bet has a tighter loss distribution because the bet is smaller — survival probability scales nonlinearly.

Comparison to fixed-fractional

A common alternative is fixed fractional: bet a constant fraction (1-2%) of bankroll regardless of edge or variance estimates. Compared to the conservative Kelly above:

Metric Conservative Kelly Fixed 2%
Bet fraction 3.1% 2.0%
Sensitive to edge estimate Yes No
Sensitive to outcome variance Yes No
Scales with eval sample size Yes No

Fixed fractional is insensitive to mis-estimation of edge — the trader simply does not use the estimate. It is also insensitive to mis-estimation of variance. The cost is that fixed fractional cannot grow with conviction; a strategy that learns its edge by another 100 calls cannot up-size without changing the policy. Conservative Kelly with a transparent lower bound does this mechanically.

The integration with calibration

LLM probability outputs are not calibrated by default. A model that says "70% confident" is right 70% of the time only if you bin the outputs and check2. The Calibration Dojo measures the gap between stated and realised frequencies. If the model is overconfident by 15 percentage points on average, the edge estimate going into the sizer must be deflated by that same amount before sizing.

Production pipeline order: isotonic-calibrate the model's probability output, then compute edge as $(p_{\text{calibrated}} - 0.5) \cdot \text{payoff}$, then run through Position Sizing under Edge Variance with the conservative variant.

BaFin context

BaFin's retail-trading guidance does not specify position-sizing rules directly, but the leverage-product warnings reference total-loss scenarios that match the deterministic-Kelly failure mode3. A 55.6% bankroll bet on a non-leverage-product equity edge with a wide posterior interval is, in practical terms, a leverage product on the trader's own conviction — the regulatory framing applies.

For published trading content under MiFID II suitability guidelines, position-sizing recommendations must explicitly acknowledge edge uncertainty4. The conservative variant, with its visible lower-bound posterior, satisfies this better than the deterministic point estimate.

Failure modes

  • Reporting only the deterministic Kelly fraction. It is 3-5x too large on LLM signals. The number looks reasonable until the edge regresses.
  • Using the posterior mean as if it were the lower bound. They are not the same. The mean tracks the prior; the lower bound tracks the data.
  • Ignoring outcome variance. A 2.5% edge on a 1% outcome variance is a different scenario than 2.5% on 4.5%. The Kelly formula is in the ratio; either input matters.
  • Compounding mis-calibrated edge into the sizer. Calibrate first, then size. The order matters.

Connects to

References

Footnotes

  1. Lopez de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. Chapter on bet sizing under uncertainty.

  2. Bailey, D. H., & Lopez de Prado, M. (2014). "The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality." Journal of Portfolio Management 40(5), 94–107. pm-research.com

  3. BaFin (2022). "Information for retail investors: leverage products and the risk of total loss." bafin.de

  4. ESMA (2023). "Guidelines on certain aspects of the MiFID II suitability requirements." esma.europa.eu

Verified engine output

Show the recompute-verified inputs and outputs
edge mean 2.5%, edge stddev 1.8%, outcome variance 4.5%, quarter-Kelly fraction 0.2
Inputs
edge_mean0.025
edge_stddev0.018
outcome_variance0.045
kelly_fraction0.2
Result
deterministic kelly0.5555555555555556
fractional deterministic0.11111111111111112
bayesian kelly0.551584149677875
fractional bayesian0.110316829935575
conservative kelly0.1555555555555556
fractional conservative0.031111111111111124
cvar50.04551313477943839

Computed live at build time.

Frequently asked questions

Why is Bayesian Kelly so close to deterministic Kelly here?
The Bayesian variant uses the posterior mean, which under an uninformative prior is approximately the sample mean. When the edge estimate is the only signal and the prior is flat, posterior mean and point estimate agree to several decimal places. The Bayesian value adds nothing on its own — the conservative lower-bound variant is what catches the uncertainty.
Should I always use the conservative variant?
For LLM signals from eval samples under 500 calls, yes. For signals with hundreds of validated live outcomes and stable edge, the gap between deterministic and conservative narrows and the Bayesian mean becomes defensible. The conservative variant never over-bets relative to the data, which is the load-bearing property.
How does this connect to quarter vs eighth Kelly?
The quarter-vs-eighth piece argues for eighth-Kelly when win-rate is a Beta posterior. The edge-variance approach here is the multi-parameter version — both posterior over p and posterior over outcome variance feed into the conservative bet. The conclusion is the same: lower-bound posteriors govern sizing when data is thin.