TL;DR
A 10-signal ensemble whose signals are all correlated at 0.8 is effectively a 1.5-signal ensemble — its effective degrees of freedom are much lower than the headline count. Most retail "ensemble" systems fail this test silently and compound miscalibration into conviction. The fix is an orthogonality floor: require at least one orthogonal signal among the firing set, where orthogonality is measured by rank-correlation on a fresh validation window. Below: the math, a two-minute diagnostic, and the three orthogonal axes that tend to actually work in practice.
The correlated-ensemble illusion
You build a "10-signal ensemble." Signal A fires when 5-day momentum is positive; Signal B when 10-day momentum is positive; Signal C when 20-day momentum is positive. When all three fire, you feel confident. But A, B, and C all measure the same underlying phenomenon: momentum at different lags. Their outputs are correlated at > 0.8. They are not three signals — they are one signal re-expressed three times.
This is the structural failure of naive ensembles: signal count ≠ signal dimensionality.
Effective degrees of freedom
For a set of binary signals with pairwise correlation matrix R:
dof_eff = N² / (sum of R² elements)
= N² / (N + 2·Σ r_ij²) (for symmetric R)
For 10 signals with average pairwise correlation 0.8: dof_eff ≈ 1.5. For 10 signals with average correlation 0.1: dof_eff ≈ 8.2. A 10-signal ensemble you can credibly call "10 signals" has all pairwise correlations below ~0.15.
The headline count does not matter. Effective dof does.
The orthogonality floor
The production-simple fix: require at least one signal in the firing set to have rank-correlation below a threshold (e.g. 0.2) against the others measured on a held-out window.
import numpy as np
def effective_dof(returns: np.ndarray) -> float:
"""
returns: shape (n_signals, n_observations), each row a signal's pnl series.
"""
if returns.shape[0] == 1:
return 1.0
corr = np.corrcoef(returns)
n = corr.shape[0]
return n ** 2 / (corr ** 2).sum()
def has_orthogonal_signal(corr_matrix: np.ndarray, threshold: float = 0.2) -> bool:
"""
True iff at least one signal is weakly correlated (<= threshold) with every other.
"""
n = corr_matrix.shape[0]
for i in range(n):
others = [abs(corr_matrix[i, j]) for j in range(n) if j != i]
if max(others) <= threshold:
return True
return False
Wire has_orthogonal_signal as a gate in your ensemble scorer. If an idea fires on 5 correlated signals but no orthogonal one, skip it.
The three axes that empirically work
From public quant research and practitioner writeups, signal families that tend to be genuinely orthogonal to each other:
- Price-based (momentum, mean-reversion, volatility breakout, trend persistence) — all variants of "what did price do recently."
- Flow-based (order-book imbalance, volume profile, options dealer positioning, funds flow data) — "who is buying."
- Fundamental / event-based (earnings surprise, filings tone, analyst revision breadth, guidance change) — "what the company said."
A firing set with one price, one flow, one fundamental is a genuinely 3-dimensional ensemble. A firing set with three momentum variants is 1-dimensional. The orthogonality floor catches the difference.
The rank-correlation choice
Pearson correlation is sensitive to outliers and assumes linearity. For signals that emit -1, 0, +1 or noisy continuous scores, use Spearman rank correlation as the orthogonality metric. It captures monotonic relationships without being fooled by heavy-tailed events.
What breaks this
- Look-ahead in validation window. The correlation matrix must be computed on a window with no overlap with the trading period. Retail setups often compute correlation on the whole history including live trades. Not valid.
- Regime change invalidates the correlation matrix. Signals orthogonal in 2019 can become correlated in 2025. Recompute quarterly.
- Signal definition drift. If "momentum signal A" changes its parameters over time, historical correlation with "flow signal B" is meaningless. Version your signal definitions.
Connects to
- Backtest Overfitting Score — evaluates whether the IS-best ensemble survives OOS. High orthogonality typically correlates with low PBO; low orthogonality is a prior on high PBO.
- Kelly Sizer — once you have a genuinely orthogonal ensemble, the combined probability is meaningfully different from the single-signal probability, and the Kelly size updates accordingly.
References
- Lopez de Prado, M. (2018). Advances in Financial Machine Learning (Chapter 8 on feature importance).
- Harvey, C. R., Liu, Y., & Zhu, H. (2016). "... and the Cross-Section of Expected Returns." Review of Financial Studies 29(1).
- Kirilenko, A., Kyle, A., Samadi, M., & Tuzun, T. (2017). "The Flash Crash: High-Frequency Trading in an Electronic Market." Journal of Finance 72(3).