TL;DR
Real-time market data costs 20-50× more than end-of-day data, and most retail strategies that claim they need it don't. The decision rule: compare your signal's decay half-life to your realistic execution latency. If the signal decays on a scale of hours and you place orders on a scale of minutes, real-time data adds cost, not edge. Four signal classes genuinely require real-time: opening-auction strategies, event-driven minute-scale trades, pairs with sub-hour half-lives, and option market-making. Everything else — including almost all LLM-driven research — runs cleanly on end-of-day or 15-minute-delayed data.
The cost delta
As of April 2026, the honest numbers for a single US equities user:
- Alpha Vantage free tier: EOD bars, 25 calls/day — $0/month.
- Tiingo EOD: 1 year of history, 500 calls/hour — $0/month.
- Alpaca IEX real-time: partial coverage (IEX feed only) — $0/month, but not full market.
- Polygon Starter: 2 years historical bars, no real-time — $29/month.
- Polygon Advanced (full SIP real-time + L1): $199/month.
- Polygon Business (with L2 depth): $500+/month.
- Databento full-SIP tick feed: usage-based, typically $200-1000/month for retail volumes.
The ratio between "I can do EOD research for free" and "I need the real consolidated NBBO in real-time" is 20-50×. For anyone paying out of pocket, this is the single most consequential infrastructure decision in the stack.
See the Data-Vendor TCO Calculator for per-symbol, per-request breakdowns.
The decision rule
The question "do I need real-time data" decomposes into two sub-questions:
- How fast does the signal decay?
- How fast can the operator actually act on it?
If (1) is much larger than (2), real-time is wasted. If (2) is larger than (1), real-time is impossible regardless of spend.
Signal decay half-life
Define the signal decay half-life as the time after signal generation at which the expected edge falls to 50% of its initial value. This is measurable from a backtest by decomposing trade P&L by time-held and fitting an exponential decay.
import numpy as np
def estimate_half_life(trade_pnls_by_hold_time: dict[int, float]) -> float:
# trade_pnls_by_hold_time: {minutes_held: average_pnl_bps}
minutes = np.array(sorted(trade_pnls_by_hold_time.keys()))
pnls = np.array([trade_pnls_by_hold_time[m] for m in minutes])
# Fit pnl(t) = pnl0 * exp(-t / tau)
log_pnls = np.log(np.maximum(pnls, 1e-9))
slope, intercept = np.polyfit(minutes, log_pnls, 1)
tau = -1.0 / slope # decay constant
return tau * np.log(2) # half-life in minutes
Realistic execution latency
Execution latency for retail operators typically runs:
- 1-5 minutes end-to-end, if the pipeline is: signal generation on a bar → human confirmation → manual order entry.
- 30-120 seconds for a well-written automated pipeline on home infrastructure: signal on a fresh bar → risk check → broker API call → fill confirmation.
- 1-10 seconds for a co-located setup with sub-millisecond broker API: this is institutional territory and out of scope for the retail stack.
The gap between "I have the tick" and "my order is at the exchange" is the binding latency, not the data-delivery latency.
The rule itself
- If signal half-life > 10× execution latency: EOD or 15-min delayed data is fine.
- If signal half-life is 1-10× execution latency: real-time may help at the margin; test it.
- If signal half-life < execution latency: real-time isn't enough — you're in a regime where you can't reliably act on your own signals.
For a typical retail stack with ~1 minute execution latency, signals with half-life > 10 minutes are comfortably end-of-day-compatible if they only need to rebalance daily. Signals with half-life > 4 hours are comfortably daily-bar-compatible regardless of rebalance cadence.
Where real-time is actually load-bearing
Four signal classes genuinely need real-time data. For each, the reason isn't speculative — it's structural.
1. Opening-auction strategies
The opening print on most exchanges is a discrete event that happens once per day over a window of seconds. A strategy that trades around the open (pre-open imbalance, auction-fade, gap continuation/reversion) needs to see the indicative matching price in the auction window and react before the print finalizes. This is not a half-life question — the entire alpha window is measured in seconds.
End-of-day data can't participate; 15-minute-delayed data can't participate. Real-time consolidated quotes and imbalance feeds (NYSE OpenBook, NASDAQ TotalView-ITCH) are the minimum.
2. Event-driven strategies at the minute scale
Earnings prints, FOMC statements, macro surprises (CPI, NFP) move prices in seconds. A strategy that reads the print and trades within 1-5 minutes of release needs real-time data by construction. There's a narrow retail-accessible edge here: by the time a news wire is fully parsed and the strategy fires a trade, typically 20-60 seconds have elapsed, and the largest first-order move is partially gone. The residual — the minute-scale overshoot and revert — is what retail can participate in, and it requires real-time data to see.
3. Pairs and mean-reversion with sub-hour half-lives
Classic pairs trading on cointegrated equities can have half-lives anywhere from minutes to days. Some HFT-adjacent pairs (ETFs vs their underlyings, depositary receipts vs the home listing during cross-overlap) have half-lives in the seconds-to-minutes range. These genuinely need real-time data and ideally co-location; retail cannot compete there.
Pairs with half-lives of hours to days (most cross-sectional factor spreads) are fine on 15-minute-delayed or EOD data.
4. Option market-making
Quoting options means continuously updating bid/ask on the entire chain as the underlying moves, implied vol shifts, and order flow telegraphs informed trading. The bid-ask update cadence is by necessity sub-second, and that requires real-time L1 on the underlying, real-time chains, and real-time Greeks. This is the most data-hungry use case in the retail-adjacent space, and it prices in at $300+/month just for the data, before any broker fees.
If you are not doing one of these four, you probably don't need real-time.
Where end-of-day is genuinely sufficient
Rebalancing strategies
A portfolio strategy that computes target weights from today's close and rebalances at tomorrow's open — trend-following, factor portfolios, risk-parity, cross-sectional momentum — operates entirely on EOD data. The execution happens at the open of the next trading day; real-time intraday data adds zero information to the decision.
This is where most retail quant strategies live.
Regime-filter overlays
A strategy that flips between two sub-strategies based on a slow regime signal (VIX regime, yield-curve inversion, trend regime from a long moving average) can compute the regime daily or weekly. The regime itself has a half-life measured in weeks or months. Real-time data here is pure waste.
LLM research loops
An LLM reads a 10-K or an earnings transcript and returns a structured prediction with a multi-day-to-multi-week horizon. The input data is SEC EDGAR (free), earnings transcripts (various free or low-cost sources), and news (lagged by minutes at minimum through any retail-accessible wire). The decision cadence is minutes-to-hours. The execution is at the next market open or over the next few sessions. Real-time intraday data is not in the loop.
See The Price-Blind LLM Research Harness for the architectural reason this loop is deliberately not price-informed.
Calendar and spread trades
Futures roll trades, index-addition/deletion trades, ETF creation/redemption arb at the retail scale — all of these are calendar-driven, not tick-driven. The edge is in being right about the event; the data you need is the calendar and the EOD settlement prints.
A concrete example: 20-day momentum on US equities
Consider a simple strategy: rank the S&P 500 on 20-day return, long the top decile, short the bottom decile, rebalance daily at the open, hold one day.
- Signal generation uses 20 daily closes. EOD data only.
- Signal decay half-life: empirically ~2-5 trading days for cross-sectional momentum (well-documented in the Jegadeesh & Titman, Asness, and Moskowitz literature).
- Execution: market-on-open orders submitted pre-market.
- Real-time benefit: zero, because the signal is recomputed once per day from EOD data, and the execution is at the open through a market order.
A strategy like this paying for Polygon Advanced ($199/month) is paying $2,400/year for nothing. It runs identically on Tiingo EOD ($0) or Alpha Vantage EOD ($0 up to 25 calls/day, easily covered by 500 symbols queried in batches).
The hidden cost of real-time on the wrong strategy
Beyond the subscription fee, real-time data imposes an architectural tax:
- Higher compute baseline. Tick processing is 100-1000× the data volume of bar processing.
- More failure modes. WebSocket disconnects, sequence gaps, heartbeat timeouts. See Heartbeats, Watchdogs, and Circuit Breakers.
- More surface area to backtest. Backtest infrastructure for tick data is 10× the complexity of bar-data backtesting. See Execution Simulation: The Slippage and Impact You Can't Ignore.
- Temptation to over-trade. A trader with real-time data trades more, which usually means worse.
A strategy that works fine on EOD data but is rebuilt on real-time data typically degrades in backtest Sharpe, not improves, because the noise-to-signal ratio at finer time scales is worse and the new failure modes bite.
The upgrade path
If you're on EOD data and genuinely need to graduate, the order of operations:
- Start with 15-minute-delayed free tiers (Alpaca IEX, Yahoo Finance) to test whether intraday data even improves your strategy.
- If the 15-minute-delayed test shows material improvement, upgrade to a real-time Level 1 feed ($29-$99/month range, depending on coverage).
- Add Level 2 only if you have a specific order-book-imbalance or market-making hypothesis. This is $300-$500/month; it should have a pre-identified edge, not be bought speculatively.
- Move to tick-level historical data only if your live-vs-simulation gap is dominated by within-bar execution variance. This is a one-time $300-$3000 purchase depending on universe and history length.
Each step should be driven by evidence from the previous step, not by anticipation.
The honest answer
For most retail AI-in-markets research in 2026, EOD data is sufficient. The LLM paths are slow (tokens, not ticks), the research cadence is daily-to-weekly, and the execution is end-of-day rebalancing or minute-scale after a news event that doesn't require pre-positioned tick infrastructure.
Real-time data is a specific tool for a specific set of strategies. Treat it like L2 depth or options data: pay for it when the signal requires it, not as a default.
Connects to
- Market Data APIs Compared (2026) — vendor-by-vendor detail for the pricing tiers quoted above.
- The $0/Month Trading Stack — what becomes possible when EOD is all you need.
- Execution Simulation: The Slippage and Impact You Can't Ignore — why backtest fidelity matters even more on bar data.
- The Price-Blind LLM Research Harness — the LLM research loop that lives in EOD-land.
- Data-Vendor TCO Calculator — price the real-time vs EOD decision for your specific workload.
- Execution Simulator — quantify slippage and impact at bar vs tick cadences.
References
- Jegadeesh, N., & Titman, S. (1993). "Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency." Journal of Finance 48(1).
- Asness, C. S., Moskowitz, T. J., & Pedersen, L. H. (2013). "Value and Momentum Everywhere." Journal of Finance 68(3).
- Harris, L. (2003). Trading and Exchanges: Market Microstructure for Practitioners. Oxford University Press.
- SEC. "Consolidated Audit Trail and the National Market System." (Rules and technical specifications for SIP consolidated data.)
- Vendor pricing pages for Polygon.io, Databento, Tiingo, Alpaca (retrieved April 2026).