The 2026 Engineer's Guide to AI in Markets

TL;DR

In 2026, the gap between "AI trading" hype and an AI-native trading stack that actually works has never been wider. LLMs are excellent at structured extraction and reproducible research prompts, and worthless as direct trade-signal generators. MCP is real and production-ready for data access, early-stage for execution. The stack that works in April 2026 is: calibrated LLM research layer → conviction-scaled sizing → idempotent execution → heartbeat + circuit-breaker robustness. Everything else is a distraction.

What this guide is

A map of what actually composes in an AI-in-markets stack, maintained by working engineers, updated when the landscape moves. Six layers, in the order they break during a real outage:

Data — bars, quotes, filings, alt data
MCP / orchestration — how agents reach the data and execution layers
Research — LLM-driven analysis that doesn't leak alpha back to itself
Signals + sizing — converting probability to position
Execution — idempotent, logged, auditable
Robustness — the layer that determines whether you sleep at night

1. Data

The boring, load-bearing layer. Most AI traders under-invest here and pay for it later. A sound 2026 baseline:

Bars + quotes: Alpaca (free IEX or paid SIP), Polygon.io, Databento, or Tiingo depending on use case and budget.
Filings + fundamentals: FMP or SEC EDGAR direct for filings, Tiingo for fundamentals.
News: Tiingo News API, Polygon.io news, or vendor-specific feeds.

The honest question isn't "which vendor is best" — it's "which tier at which vendor is cheapest for my exact workload." The Data-Vendor TCO Calculator answers that explicitly with tier selection per scenario. Expect costs from $10/month (Tiingo EOD) to $500+/month (full real-time SIP + options).

Anti-pattern: buying a tier larger than your research loop actually uses. A quarter-Kelly research loop that fires once per ticker per day does not need real-time tick data.

2. MCP / orchestration

MCP — the Model Context Protocol specified by Anthropic in November 2024 — is the biggest surface-area shift in how LLMs reach market data. In April 2026 the landscape looks like:

Official servers: Alpaca (V2 shipped April 2026, 61 actions), Polygon.io (read-only, full equity surface).
Community servers: Databento, IBKR via CLI wrapper, Tradier, Tiingo, NautilusTrader.
Security baseline: there isn't one. The Finance MCP Directory is the first published attempt to grade servers on scope, auth, idempotency, transport, and schema quality.

The one rule: do not give an LLM an MCP server with execution scope plus a full-authority API key and expect things to go well. Scope your keys; prefer servers with idempotency on order submission (Alpaca's has it; Tradier's community server does not).

3. Research

This is where most attempts fail invisibly. The failure mode: you give an LLM a research task, the LLM sees the current price somewhere in the context, and every subsequent "analysis" retroactively rationalizes that price. Nobody notices because the outputs look fluent.

The fix is architectural: price-blind research. Split your research context into two halves. The market-visible half (prices, positions, your own PnL) goes only to the execution / sizing layer. The research half (filings, earnings, macro indicators, news summaries) goes only to the LLM. The LLM emits a structured probability and thesis. The execution layer decides what to do with it.

The Prompt Regression Tester tests your research prompts across model versions; the Hallucination Detector catches fabricated numbers in structured extractions. Neither tool is a substitute for the architectural separation, but both are failure-mode-detection tools for the research layer.

4. Signals + sizing

Once the research layer emits a calibrated probability, sizing is a risk question, not a modeling question. The best-documented answer is fractional Kelly with a per-trade cap:

f_full = (b·p − q) / b
f_bet  = min(f_full × fraction, per_trade_cap)

Quarter-Kelly (f × 0.25) is the standard practical choice for strategies whose win rates are estimated rather than known. The Fractional Kelly Sizer lets you stress-test a given edge across thousands of Monte Carlo paths so the drawdown distribution is visible, not just the expected growth rate.

The mistake that sinks solo traders: using full Kelly on an over-estimated edge. Full Kelly on a 55% edge that's really 52% → compounding ruin in a few hundred trades. Fractional Kelly absorbs most of that miscalibration without sacrificing much growth.

5. Execution

Three rules here. Violate any and the system will find out the hard way:

Idempotency: every order carries a client-supplied key. If you retry, you don't double-fill.
Atomic logging: every decision and every fill is written to an append-only log before the next state change.
No look-ahead: the execution layer doesn't use any information not available at the decision timestamp. Easy to violate accidentally in backtests; see Backtest Overfitting Score for the statistical signatures of this failure.

6. Robustness

The layer that determines whether you can take a vacation. Three patterns:

Heartbeat: every cycle writes data/heartbeat.json with a timestamp.
Watchdog: an independent process reads the heartbeat; if it's stale beyond N minutes during market hours, it trips the circuit breaker and alerts via Telegram.
Circuit breaker: a data/circuit.json file with { "paused": true, "reason": ... }. Every layer honors it. Resume is deliberately manual.

The Trading System Blueprinter generates a starter scaffold that wires these three in by default.

What does this cost

At a mid-range configuration — Claude Sonnet 4.6 as the research LLM, Polygon Starter ($29/mo) for data, Alpaca paper/live for execution, a Mac Mini running launchd — the all-in monthly cost for a research loop scanning ~500 tickers daily is roughly:

Line	Amount
Data (Polygon Starter)	$29
LLM research (~10 ideas/day × 5 calls × ~10K in/1.5K out)	~$180
Broker	$0 (Alpaca)
Infra	$0 (Mac Mini + Cloudflare Pages)
Total	~$210 / month

Use the Token-Cost Optimizer to calibrate for your specific loop. Cost per validated trade (ideas × 1/validation-rate) is usually the right denominator, not cost per call.

What to avoid

Anything that claims LLMs can "read the market" without an explicit price-blind boundary.
Execution-scope MCP servers without idempotency. Duplicate fills during a retry loop have broken more strategies than any other single failure in 2026.
Backtest Sharpe without Deflated Sharpe. The Backtest Overfitting Score exists because the single biggest retail-algo failure mode is convincing yourself the best of many strategies is real when it isn't.
Full Kelly on an estimated edge. Quarter-Kelly is the default; go smaller if the edge comes from LLM output whose calibration you haven't yet measured.