Multi-Timeframe Signal Integration With LLMs

TL;DR

LLMs reason well about fundamentals on a weekly or monthly horizon and poorly about microstructure on an intraday horizon. Mixing the two in a single prompt produces a muddle: the model averages regime-specific signals into a thesis that reflects neither. The production-sound architecture separates timescales. An expensive model produces a structured weekly thesis with explicit invalidation conditions. A cheap daily check asks only whether any invalidation tripped. Deterministic intraday gates handle entry timing without re-running research. When multiple signals coexist at the same timescale, equal-weighting is a hidden bet on the highest-volatility signal. Inverse-volatility, Sharpe-weighted, and Bayesian pooling each give the portfolio a different risk profile. Pick the method that matches the firing statistics, not the one that looks cleanest in backtest.

Why a weekly LLM plus intraday gates outperforms one prompt per decision

A single prompt that bundles a 10-K, yesterday's daily bar, and this morning's tape is not a multi-timeframe reasoner. It is a model drawing attention across incompatible regimes and emitting whichever signal dominated the last 2,000 tokens of its context. The architecture below separates the call by timescale: one expensive weekly research pass, one cheap daily invalidation check, and deterministic intraday gates. The research thesis caches. The intraday logic never calls an LLM.

What LLMs can reason about at each timescale

Language models trained on decades of written analysis can reproduce the structure of fundamental reasoning — accruals, guidance revisions, segment trends, macro mean-reversion. They have read almost nothing useful about the intraday book. The capability curve steepens as the horizon shortens.

Timescale	LLM can reason about	LLM cannot reason about
Weekly / monthly	Fundamentals, macro narratives, earnings revisions, sector dispersion, filings tone	Intra-week microstructure, rebalancing flows, weekly options expiry mechanics
Daily	Daily narrative, earnings reactions, macro prints, analyst revisions, event calendars	Microstructure noise, opening-auction imbalance, block-trade prints
Intraday	Scheduled events (earnings release, FOMC print), qualitative news headlines	Order flow, book imbalance, queue dynamics, tick-by-tick tape reading

A practitioner deploying LLM research into a trading stack should treat the right column as a non-goal. Intraday microstructure belongs to deterministic signals trained on book events, not to a transformer asked to speculate about them.¹

Why mixing timescales in one prompt degrades output

Ask a model: given this 10-K for an example large-cap issuer, yesterday's price action, and today's pre-market tape, what is your thesis? The answer will look coherent and be incoherent. Attention distributes across all three blocks, and whichever block carries the most specific numbers tends to dominate the conclusion. A juicy 10-K narrative gets washed out by a single pre-market print. A boring 10-K lets a single overnight gap set the tone.

The failure is not a training deficit. It is a task-specification deficit. The weekly signal lives in a reference class of perhaps 60 observations per year (one weekly reading). The intraday signal lives in a reference class of thousands of ticks per day. Asking a model to combine them into a single probability is asking it to pool two distributions with incompatible variance structures. Even a calibrated human analyst² would refuse the task as posed and split it into two questions.

Sequential calls fix the muddle. The weekly research pass outputs a thesis plus invalidation conditions. The intraday layer asks a narrower question: has any invalidation tripped? The narrower prompt has a narrower answer, which is what calibration requires.

Architecture: weekly LLM + daily cheap + intraday deterministic

The design has three layers. Each layer runs on its own schedule and writes to a shared state store. A decision is the composition of the three most recent outputs.

Weekly layer. A research call to a reasoning model (Sonnet or Opus class) produces a structured thesis object with four fields: direction (long / short / flat), probability (a number the calibrator will process later), invalidation_conditions (a list of deterministic rules), and rationale (free text for the audit log). The call runs once per week or when material news forces a re-run. Cost: one call.

Daily layer. A cheap model (Haiku class) or, better, deterministic code checks the thesis object against yesterday's closing state. The only question is whether any invalidation condition tripped. If yes, the thesis is retired. If no, the thesis persists into the next day. Cost: one short call per ticker per day, or zero calls if the checks are pure rules.

Intraday layer. Pure deterministic gates. Volatility spike, volume anomaly, time-of-day filter, spread filter, scheduled-event blackout. No LLM involvement. The intraday layer's job is timing the entry once the weekly thesis has already decided the direction.

import json
from dataclasses import dataclass, field
from datetime import date, timedelta
from typing import Callable

import anthropic

client = anthropic.Anthropic()

@dataclass
class Thesis:
    ticker: str
    direction: str          # "long", "short", "flat"
    probability: float
    invalidation_conditions: list[dict]
    rationale: str
    issued_on: date
    expires_on: date

def weekly_research(ticker: str, context_pack: str) -> Thesis:
    """Expensive call. Runs Monday 09:00 local. One per ticker per week."""
    prompt = (
        "Produce a structured thesis for the symbol described. "
        "Return JSON with keys: direction, probability, "
        "invalidation_conditions, rationale. "
        "invalidation_conditions is a list of dicts with fields "
        "metric, operator, threshold. "
        "Do not speculate on intraday price action."
    )
    msg = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2000,
        messages=[{"role": "user", "content": prompt + "\n\n" + context_pack}],
    )
    payload = json.loads(msg.content[0].text)
    today = date.today()
    return Thesis(
        ticker=ticker,
        direction=payload["direction"],
        probability=payload["probability"],
        invalidation_conditions=payload["invalidation_conditions"],
        rationale=payload["rationale"],
        issued_on=today,
        expires_on=today + timedelta(days=7),
    )

def daily_invalidation(thesis: Thesis, eod_state: dict) -> bool:
    """Cheap deterministic check. Runs 17:00 local each weekday."""
    for cond in thesis.invalidation_conditions:
        metric = eod_state.get(cond["metric"])
        if metric is None:
            continue
        op, thr = cond["operator"], cond["threshold"]
        if op == ">" and metric > thr: return True
        if op == "<" and metric < thr: return True
        if op == "==" and metric == thr: return True
    return False

@dataclass
class IntradayGate:
    name: str
    check: Callable[[dict], bool]

def intraday_gates() -> list[IntradayGate]:
    return [
        IntradayGate("vol_spike",
            lambda s: s["realized_vol_5m"] > 3 * s["realized_vol_trailing"]),
        IntradayGate("spread_too_wide",
            lambda s: s["spread_bps"] > 25),
        IntradayGate("event_blackout",
            lambda s: s["minutes_to_next_event"] < 30),
    ]

def should_trade_now(thesis: Thesis, intraday_state: dict) -> bool:
    if thesis.direction == "flat":
        return False
    for g in intraday_gates():
        if g.check(intraday_state):
            return False
    return True

The composition is the whole trick: weekly_research runs once, daily_invalidation runs five times, should_trade_now runs on every tick and calls no model at all.³

Aggregation math when multiple signals exist at the same timescale

A single timescale often houses several signals. Three momentum variants at daily. Two flow indicators intraday. A fundamental and a news-sentiment thesis at weekly. Combining them is a pooling problem, and the pooling rule controls the risk profile more than practitioners realise.

Equal weighting assigns w_i = 1/N. This sounds fair and is not. The highest-volatility signal dominates the combined series because its dollar contribution scales with volatility. A portfolio of three signals with vols of 5%, 10%, and 30% under equal weights is effectively an 80% bet on the third signal.

Inverse-volatility weighting assigns w_i = (1/sigma_i) / sum(1/sigma_j). Each signal contributes the same amount of risk. This is the default for any ensemble that wants balanced risk contribution across members.

Sharpe-weighted pooling assigns w_i = SR_i / sum(SR_j) across signals with positive Sharpe. This tilts toward the signals that have historically paid for their volatility. It requires a reliable out-of-sample Sharpe estimate per signal, which practitioners overestimate the availability of.⁴

Bayesian pooling treats each signal weight as a posterior: a prior belief about the signal's edge (from research, from theory) updated by observed track record. It handles cold-start cases where the Sharpe estimate is too noisy to trust.

import numpy as np

def equal_pool(signals: np.ndarray) -> np.ndarray:
    """signals: shape (n_signals, n_obs). Returns combined series."""
    return signals.mean(axis=0)

def inv_vol_pool(signals: np.ndarray) -> np.ndarray:
    vols = signals.std(axis=1, ddof=0)
    w = (1.0 / vols) / (1.0 / vols).sum()
    return (w[:, None] * signals).sum(axis=0)

def sharpe_pool(signals: np.ndarray, freq: int = 252) -> np.ndarray:
    mus = signals.mean(axis=1)
    vols = signals.std(axis=1, ddof=0)
    sr = np.sqrt(freq) * mus / np.where(vols == 0, 1e-9, vols)
    sr_pos = np.clip(sr, 0, None)
    total = sr_pos.sum()
    w = sr_pos / total if total > 0 else np.ones_like(sr_pos) / len(sr_pos)
    return (w[:, None] * signals).sum(axis=0)

def bayes_pool(
    signals: np.ndarray,
    prior_mean: np.ndarray,
    prior_strength: float = 30.0,
) -> np.ndarray:
    """prior_mean: expected Sharpe per signal. prior_strength: pseudo-obs."""
    T = signals.shape[1]
    mus = signals.mean(axis=1)
    vols = signals.std(axis=1, ddof=0)
    sr_obs = np.sqrt(252) * mus / np.where(vols == 0, 1e-9, vols)
    post = (prior_strength * prior_mean + T * sr_obs) / (prior_strength + T)
    post_pos = np.clip(post, 0, None)
    total = post_pos.sum()
    w = post_pos / total if total > 0 else np.ones_like(post_pos) / len(post_pos)
    return (w[:, None] * signals).sum(axis=0)

rng = np.random.default_rng(42)
s1 = rng.normal(0.0005, 0.005, 500)   # low-vol, small edge
s2 = rng.normal(0.0008, 0.010, 500)   # mid-vol, medium edge
s3 = rng.normal(0.0010, 0.030, 500)   # high-vol, big-edge candidate
X = np.vstack([s1, s2, s3])

combined_eq  = equal_pool(X)
combined_iv  = inv_vol_pool(X)
combined_sr  = sharpe_pool(X)
combined_bay = bayes_pool(X, prior_mean=np.array([0.5, 1.0, 0.8]))

Running the four methods on the synthetic trio will produce four different Sharpe numbers and four different drawdown profiles. None is objectively correct. Inverse-volatility is the safest default when the per-signal Sharpe estimate is unreliable. Bayesian pooling is the honest choice when signals have short live track records but credible priors from research.

The conflict pattern: disagreement across timeframes

Weekly says bullish. Daily says mean-revert. Intraday flags overbought. What now? Three common resolutions, each with a different cost profile.

Weighted average across timeframes. Combine the three probabilities using inverse-volatility or Bayesian weights. Clean math, and dominated in practice by whichever timeframe carried the highest conviction number — not necessarily the highest accuracy. A confident daily contrarian signal can override a weaker weekly thesis even when the weekly is better-calibrated.

Gating. The weekly thesis is primary. Daily and intraday are veto-only. If the weekly says long and the daily flags a bearish invalidation, the trade is skipped, not reversed. If neither daily nor intraday objects, the weekly decides. This is the pattern used by most systematic discretionary funds because it matches how humans actually process multi-horizon information.⁵

Agree-else-pass. Trade only when all three layers agree. This is the strictest filter and the rarest fire. Trade frequency drops by roughly an order of magnitude. Expected Sharpe on the firing set rises. Expected profit in dollars often falls because the filter is too restrictive for trading-cost economics.

The framework for choosing: if expected trades per month exceed 20, gating is usually right. If expected trades are between 5 and 20, a weighted average with an orthogonality floor⁶ works. If trades are rare and high-conviction (under 5 per month), agree-else-pass has the best ratio of signal to transaction cost.

Trades / month	Preferred resolution	Rationale
> 20	Gating (weekly primary, others veto)	Frequent firing means a forgiving filter dominates economics
5 – 20	Weighted average with orthogonality floor	Middle ground, needs honest weights
< 5	Agree-else-pass	Rare trades, want maximum conviction per fire

Cost implications

Weekly research with a reasoning-grade model on a 15,000-token context pack costs roughly $0.20 per call at published 2026-04 Sonnet rates.⁷ Running that research daily on one ticker is approximately $6 per month. Running it on every intraday tick would be $300 per month per ticker at best, and far higher for active symbols. The gated architecture — weekly LLM, daily invalidation check (cheap model or deterministic), intraday deterministic gates — runs near $1 per month per ticker for the cheap-model version and close to $0.05 per month per ticker for the fully deterministic intraday path. A 50-ticker universe is $50 per month versus $15,000 for the naive architecture, for equivalent information freshness.

Token cost for LLM finance workloads tracks closely with the architecture choices made here;[^8] the cost collapse is almost entirely driven by not asking the model the wrong timescale of question.

When this architecture fails

Three regimes where the weekly-thesis-plus-intraday-gate pattern is the wrong design.

True intraday edges. Market-making, statistical arbitrage on millisecond horizons, latency-sensitive liquidity provision. The edge lives in reaction time and queue position. A language model cannot compete at the timescale, and a weekly thesis has nothing to contribute. These strategies belong to deterministic systems written against book events directly.

Event-driven strategies where the event changes the thesis. A surprise Fed announcement, a merger break, a major earnings miss. The weekly thesis is obsolete the moment the event prints. Caching the prior thesis through the event is a structural failure. Event-driven strategies need a fast re-research path — not a cached weekly thesis.

High-frequency trading. HFT is not the subject of this architecture. Nothing that calls an LLM on any cadence belongs on the same time axis as HFT execution.

The honest summary: the architecture works for strategies whose edge lives in fundamental or macro reasoning expressed at weekly cadence, with intraday execution gates handling timing. Outside that envelope, pick a different design.

Connects to

The 2026 Engineer's Guide to AI in Markets — pillar guide for the full stack; this article is the multi-horizon chapter.
Signal Orthogonality: Why Ensembles Become One Bet — sister piece on pooling signals within a single timescale.
Real-Time vs End-of-Day Systems — the infrastructure trade-off that drives the daily-vs-intraday split.
Conviction-Scaled Kelly Sizing — sizing rule that accepts probabilities from the aggregation stage here.
Correlation Matrix Visualizer — inspect signal correlation before pooling weights are chosen.
Kelly Sizer — converts the combined probability into position size.
Token Cost Optimizer — budgets the weekly / daily / intraday calls against a monthly cap.

References

Lopez de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. Feature importance and multi-horizon signal construction.
Silver, N. (2012). The Signal and the Noise. Penguin. Calibration across reference classes of different cadence.
Anthropic (2026). "Long-context performance and prompt caching." Developer documentation relevant to the cached-thesis design.

Anthropic (2026). "Claude model capabilities and long-context benchmarks." Developer documentation on reasoning-grade model evaluation on needle-in-haystack and financial-document tasks. ↩
Tetlock, P., & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown. The reference-class reasoning framework for separating forecasts by cadence and base rate. ↩
Hand-sketch of the cost composition: one research call per week per ticker, one cheap or zero-cost daily check, zero LLM calls at tick cadence. ↩
Harvey, C. R., & Liu, Y. (2015). "Backtesting." Journal of Portfolio Management 42(1). Frames the selection-bias problem in Sharpe-weighted pooling. ↩
Lo, A. W. (2004). "The Adaptive Markets Hypothesis: Market Efficiency from an Evolutionary Perspective." Journal of Portfolio Management 30(5). The multi-horizon edge framing used in the gating section. ↩
Aifinhub, "Signal Orthogonality: Why Ensembles Become One Bet." Orthogonality floor used as a pre-condition for weighted pooling. ↩
Anthropic (2026-04). Published per-token pricing for Claude Sonnet 4.6 on the Messages API, used for the $0.20 per 15K-token research call estimate. ↩