Research · 209 articles
Research
Long-form pieces on LLMs, MCP, and the engineering around markets. Pillar guides, head-to-heads, runnable tutorials, opinionated methodology. Read-it-or-skim-it: there's a TL;DR on every one.
Best LLM for SEC 10-K and Earnings Extraction 2026
Best LLM for SEC 10-K and earnings extraction 2026: cost vs accuracy across Opus 4.7, GPT-5.5, Gemini 3.5 Flash. $0.0096 to $0.52 per document, engine-computed.
Interactive Brokers vs Alpaca vs Tradier Trading API 2026
Interactive Brokers vs Alpaca vs Tradier trading API 2026: rate limits, market-data cost, options support, and account requirements. Verified 2026-06-07.
Polygon.io vs Databento vs Alpaca Market Data API 2026
Polygon.io vs Databento vs Alpaca market-data API 2026: flat $199 SIP vs metered $/GB with L2 depth vs $99 broker-bundled. Pricing and coverage, verified.
Tiingo vs EODHD vs Financial Modeling Prep 2026
Tiingo vs EODHD vs Financial Modeling Prep 2026: fundamentals and EOD data API for a solo quant. Tiingo fundamentals now sales-gated; EODHD and FMP self-serve.
The Selection-Bias Sharpe Benchmark by Trials and Sample
A computed reference: the minimum annualized Sharpe needed to claim real edge, by strategy trials (N) and backtest length T. Short backtests punish search most.
Best Crypto Exchange API for Trading Bots 2026
Best crypto exchange API for trading bots 2026: pick by jurisdiction first. Binance for lowest fees, Coinbase for a US default, Kraken for a lower-fee US venue.
Best Free Stock Market Data API 2026
Best free stock market data API 2026: Alpaca's IEX tier for real-time US equities, SEC EDGAR for fundamentals, Twelve Data and EODHD for breadth.
Best Python Backtesting Framework 2026
Best Python backtesting framework 2026: VectorBT for research speed, NautilusTrader for live parity, Backtrader for simplicity, Zipline-Reloaded for factors.
Binance vs Coinbase vs Kraken API 2026
Binance vs Coinbase vs Kraken API 2026 for trading bots: Binance wins fees, Coinbase wins US compliance, Kraken balances low fees and WebSocket streaming.
Binance vs Kraken API 2026
Binance vs Kraken API 2026: Binance wins fees (~0.075% maker/taker) and ecosystem but is non-US; Kraken is US-regulated with low-latency streaming.
CCXT vs Native Exchange APIs 2026
CCXT vs native exchange APIs 2026: CCXT normalizes 100+ exchanges in one codebase for multi-venue bots; native APIs win feature access and lowest latency.
Cheapest LLM for SEC 10-K Extraction at 10,000 Filings a Month 2026
Cheapest LLM for SEC 10-K extraction at 10,000 filings/month: Gemini 2.5 Flash-Lite $161.70/mo vs GPT-5.5 $8,715. Engine-computed, with a two-stage path.
Coinbase vs Kraken API 2026
Coinbase vs Kraken API 2026 for trading bots: Kraken wins fees (~0.25%/0.40% base vs ~0.60%/0.80%); Coinbase wins a simpler per-second rate model.
DeepSeek V4 vs Gemini 3.5 Flash: SEC Extraction 2026
DeepSeek V4 vs Gemini 3.5 Flash for SEC filing extraction 2026: DeepSeek is ~10x cheaper input, 30x cheaper output with 1M context; Gemini wins ecosystem.
EODHD vs Financial Modeling Prep (FMP) 2026
EODHD vs FMP 2026: EODHD wins transparent, cheap global EOD pricing ($19.99/mo); FMP wins fundamentals and filings depth with a 250-requests/day free tier.
Finance-Workload Cost per 1,000 Tasks: Gemini 3.5 Flash vs Opus 4.7 vs GPT-5.5 2026
Finance-workload cost per 1,000 tasks: Gemini 3.5 Flash $16.50, Opus 4.7 $41.70, GPT-5.5 $55.00, Flash-Lite $1.00. Engine-computed on a verified shape.
GPT-5.5 vs Claude Opus 4.7 for Finance 2026
GPT-5.5 vs Claude Opus 4.7 for finance 2026: same $5 input, Opus edges output at $25 vs $30, but its new tokenizer and 90% caching reshape the real cost.
GPT-5.5 vs Gemini 3.5 Flash for Finance 2026
GPT-5.5 vs Gemini 3.5 Flash for finance 2026: GPT-5.5 ($5/$30) is frontier reasoning; Gemini 3.5 Flash ($1.50/$9) is an agent-tier workhorse, ~3.3x cheaper.
Is Databento Worth It in 2026?
Is Databento worth it in 2026? For builders needing pro-grade, granular data without a contract, usually yes; for casual retail, often no. Usage-based pricing.
Is Polygon (Massive) Pricing Worth It in 2026?
Is Polygon (now Massive) pricing worth it in 2026? Stocks: free Basic, $29/$79 delayed, $199 Advanced for real-time SIP. Real-time costs more than expected.
NautilusTrader vs VectorBT 2026
NautilusTrader vs VectorBT 2026: NautilusTrader is a Rust-cored engine with research-to-live parity; VectorBT is the fastest vectorized research library.
Prompt-Caching ROI for Finance LLM Agents 2026
Prompt-caching ROI for finance LLM agents: caching cuts input cost only. Opus 4.7 at 90% cache ($348.48/mo) still loses to uncached Gemini 3.5 Flash.
Qdrant vs Pinecone for Finance RAG 2026
Qdrant vs Pinecone for financial RAG 2026: close at 10M vectors, but Qdrant self-hosted is far cheaper at 100M and keeps sensitive data under your control.
QuantConnect vs Backtrader 2026
QuantConnect vs Backtrader 2026: hosted platform with bundled data and live deployment versus a free local library with full control. Which backtester fits you.
Schwab API vs Alpaca 2026
Schwab API vs Alpaca 2026: Schwab offers a deeper options chain on a regulated incumbent with OAuth and a 7-day refresh; Alpaca wins simple auth and an MCP server.
sec-api.io vs SEC EDGAR Free API 2026
sec-api.io vs the free SEC EDGAR API 2026: EDGAR is free, keyless, 10 req/sec, raw filings; sec-api.io ($55-$239/mo) adds search and section extractors.
Self-Hosted vs API LLM for Finance: Breakeven 2026
Self-hosted vs API LLM for finance 2026: a rented H100 (~$1.50-$7/hr) only beats per-token APIs at high volume. Idle time and ops make it 3-5x the raw GPU cost.
The LLM-in-Finance Economics Report 2026
LLM-in-finance cost report 2026: every frontier model priced on 10-K extraction, earnings calls, news sentiment, and agents. Engine-computed, verified rates.
Tradier vs Alpaca for Options 2026
Tradier vs Alpaca for options 2026: Tradier is options-first with real-time chains and multi-leg orders; Alpaca wins simple auth and an MCP server.
Twelve Data vs EODHD vs FMP 2026
Twelve Data vs EODHD vs FMP 2026: Twelve Data wins real-time streaming, EODHD wins cheap global history, FMP wins fundamentals and SEC filings depth.
Twelve Data vs Polygon.io (now Massive) 2026
Twelve Data vs Polygon.io (now Massive) 2026: Twelve Data wins multi-asset breadth (stocks/forex/crypto); Polygon wins granular US equities/options tick depth.
VectorBT vs Backtrader 2026
VectorBT vs Backtrader 2026: VectorBT wins research speed and parameter sweeps; Backtrader wins simplicity and the path to live. Which Python backtester fits you.
Zipline-Reloaded vs Backtrader 2026
Zipline-Reloaded vs Backtrader 2026: Zipline-Reloaded is the maintained fork with a Pipeline API for equity factor research; Backtrader is flexible but aging.
Alpaca API Free Tier: IEX vs SIP 2026
Alpaca free tier is the IEX feed (~3% of US volume); the paid SIP feed (Algo Trader Plus, $99/mo) is the full tape. What each is valid for, verified.
Alpaca API Rate Limits 2026
Alpaca API rate limits 2026: Trading API 200 req/min (paper and live), Market Data 200 req/min free IEX or 10,000 on Algo Trader Plus. Verified, with geo notes.
Alpaca vs Interactive Brokers API Comparison 2026
Alpaca vs Interactive Brokers API comparison 2026: Alpaca wins on REST/JSON ergonomics, IBKR on multi-asset coverage. Verified rate limits and fees.
Alpaca vs Tradier API 2026
Alpaca vs Tradier API 2026: Alpaca wins on free IEX data and 200 req/min trading; Tradier on options-first depth at 60 req/min trading. Verified limits.
Alpha Vantage vs Twelve Data 2026
Alpha Vantage vs Twelve Data 2026: verified free-tier limits, paid plan prices, and per-minute rate caps. Twelve Data wins free use; paid pick depends on scan.
Benzinga News API Pricing 2026
Benzinga News API pricing 2026: sales-gated premium tiers, a fee-free Basic tier, and why Polygon.io's bundled Ticker News is the cheaper self-serve option.
Best Backtesting Data Sources 2026
Best backtesting data sources 2026: Alpaca free IEX, Polygon flat files, Databento tick history, Nasdaq Data Link, plus the survivorship trap.
Best Crypto Market Data APIs 2026
Best crypto market data APIs 2026: CoinGecko Demo for prototyping, CoinMarketCap Hobbyist for the cheapest paid step, Twelve Data for multi-asset agents.
Best Free LLM Token-Cost Calculator for Finance 2026
Best free LLM token-cost calculator for finance 2026: the Token-Cost Optimizer prices a research loop per validated trade, compared with alternatives.
Best Free Market-Data Cost Calculator 2026
Best free market-data cost calculator 2026: the Data-Vendor TCO tool gates vendors on coverage and ranks the qualifying ones cheapest-first.
Best Fundamental & Filings Data APIs for Finance 2026
Best fundamental and filings data APIs 2026: FMP for free-tier headroom, EODHD for global breadth, Alpha Vantage for clear pricing, Intrinio for feeds.
Best LLM APIs for SEC Filing Extraction 2026
Best LLM APIs for SEC filing extraction 2026 ranked on price, context window, and numeric fidelity: Gemini Flash for volume, Claude and GPT for tables.
Best LLM for Financial Analysis 2026
Best LLM for financial analysis in 2026 is task-tiered: Gemini Flash for extraction, Gemini Pro for long context, Opus 4.7 or GPT-5.5 for reasoning.
Best Real-Time Options Data APIs 2026
Best real-time options data APIs 2026: Tradier free to account holders, Alpaca at $99, Polygon, Databento OPRA, Cboe LiveVol. Verified vendor pricing.
Best Sentiment & News APIs for Trading 2026
Best sentiment and news APIs for trading 2026: Finnhub free 60 calls/min, Alpha Vantage AI scores, Polygon news, Benzinga wire. Verified vendor pricing.
Best Vector DBs for Financial RAG 2026
Best vector databases for financial RAG 2026: pgvector free on Postgres, Qdrant Cloud free-forever tier, Pinecone serverless, Weaviate. Verified pricing.
Binance vs Coinbase API 2026
Binance vs Coinbase Advanced Trade API 2026: Binance meters 6,000 request-weight/min per IP, Coinbase 30 req/s per user. Verified rate limits.
Charles Schwab thinkorswim API for Automated Trading 2026
Charles Schwab thinkorswim API for automated trading 2026: there is no standalone thinkorswim API; all automation routes through the Schwab Trader API.
Charles Schwab Trader API Status 2026
Charles Schwab Trader API status 2026: live for individual developers, 1-3 day approval, 120 req/min limit. TD Ameritrade shutdown and thinkorswim, verified.
Cheapest LLM for SEC Filings 2026
Cheapest LLM for SEC filings 2026: verified prices and cost per 10-K for Gemini Flash, DeepSeek, GPT-5.4 nano, and Haiku. Context fit and accuracy decide it.
Cheapest Stock Market Data API 2026
Cheapest stock market data API 2026: Alpaca IEX is the free floor, Finnhub the best free rate limit, Alpaca SIP $99 the cheapest real-time tape. Verified.
Claude vs GPT-5 vs Gemini for Financial Analysis 2026
Claude vs GPT-5 vs Gemini for financial analysis 2026: verified API list prices, context windows, caching discounts, and effective cost per 10-K filing.
CoinGecko vs CoinMarketCap API 2026
CoinGecko vs CoinMarketCap API 2026: free Demo 10k calls at 100/min vs Basic 15k credits at 50/min, plus paid plans. Calls vs credits, and who wins.
Databento vs Polygon.io 2026
Databento vs Polygon.io 2026: metered $/GB with $125 free credits vs Polygon's flat $199/mo real-time tier. Level-2 depth, predictability, and who wins.
DeepSeek vs Mistral for Financial Analysis 2026
DeepSeek vs Mistral for financial analysis 2026: verified DeepSeek V4 pricing and context, Mistral's lineup, and the per-loop agent economics that decide cost.
EODHD vs Marketstack 2026
EODHD vs Marketstack 2026: verified plan prices, request quotas, and global coverage. EODHD wins deep global EOD; Marketstack's Basic is the cheaper thin entry.
EODHD vs Tiingo API 2026
EODHD vs Tiingo 2026: EODHD covers 150,000+ global tickers from €19.99/mo with documented limits; Tiingo is a cheaper US-focused EOD bundle.
Financial Modeling Prep vs Alpha Vantage 2026
Financial Modeling Prep vs Alpha Vantage 2026: FMP's 250/day free tier and fundamentals depth vs Alpha Vantage's rate-tiered premium plans. Who wins.
Gemini 3.5 Flash for Financial Agents: The Cost Reality 2026
Gemini 3.5 Flash for financial agents 2026: at $1.50/$9.00 it is frontier agent-tier at Flash speed, not budget. Real loop cost vs Flash-Lite, Opus.
Gemini 3.5 Flash vs GPT-5.5 vs Claude Opus 4.7 for Finance Extraction 2026
Gemini 3.5 Flash vs GPT-5.5 vs Opus 4.7 for 10-K extraction 2026: verified per-filing cost from the engine. Cheapest frontier pick, not budget.
IBKR TWS API Rate Limits 2026
IBKR TWS API rate limits 2026: 50 messages/second client cap, historical pacing (60 per 10 min, 6 per 2s), 100 data lines, the option-chain throttle. Verified.
Interactive Brokers API Pricing 2026
Interactive Brokers API pricing 2026: the API is free (commission-based), market data $1-15/mo per exchange, TWS API capped at 50 messages/second. Verified.
Intrinio vs Polygon.io 2026
Intrinio vs Polygon.io 2026: institution-priced per-product feeds from $1,250/mo vs Polygon's flat $199/mo real-time retail tier. Who wins, and when.
Is Finnhub Free? Free Tier Limits 2026
Is Finnhub free in 2026? Yes, a free tier with a documented 60 calls/minute, real-time US quotes, and 50-symbol WebSocket. Free vs paid and commercial catch.
OpenAI Prompt Caching Pricing 2026
OpenAI prompt caching pricing 2026: cached input is 10% of standard (GPT-5.5 $0.50 vs $5.00), automatic and prefix-based. Anthropic compared. Verified.
Polygon.io Free Tier Limits 2026
Polygon.io free tier limits 2026: 5 API calls/minute, 15-minute delayed data, ~2 years of history, no WebSocket. Every Basic-plan constraint, verified.
Polygon.io Futures API Pricing 2026
Polygon.io futures API pricing 2026: Futures free Basic to $199 Advanced (real-time CME group), Currencies $49 Starter for real-time forex. Verified.
Polygon.io News API Pricing 2026
Polygon.io news API pricing 2026: the Ticker News endpoint is free on every tier with sentiment fields; free Basic caps history at 2 years. Verified.
Polygon.io Options API Pricing 2026
Polygon.io options API pricing 2026: Starter $29, Developer $79, Advanced $199. Greeks and IV on every paid tier; only Advanced is real-time. Verified.
Polygon.io Pricing Plans 2026
Polygon.io pricing plans 2026: Basic is free (5 calls/min), Starter $29, Developer $79, Advanced $199 for real-time SIP. Every tier verified.
Polygon.io vs Alpaca Market Data 2026
Polygon.io vs Alpaca market data 2026: Alpaca's free IEX feed and $99/mo full SIP vs Polygon's $199/mo unlimited-call real-time tier. Who wins.
Robinhood API: Official or Unofficial 2026
Robinhood API official or unofficial 2026: the Crypto Trading API is real and key-based, but there is no official stock or options API. What to use instead.
Schwab Developer API vs IBKR 2026
Schwab Developer API vs IBKR 2026: Schwab's 120 req/min OAuth REST vs IBKR's free pacing-based TWS API. Cost, rate models, and multi-asset breadth.
SEC EDGAR API Rate Limits 2026
SEC EDGAR API rate limits 2026: 10 requests/second per requester, free with no API key, User-Agent header required. Endpoints, bulk data, and pipeline guidance.
The 2026 Gemini Cost Ladder for Finance: 3.5 Flash, Flash-Lite, 2.5 Pro
Gemini 3.5 Flash, Flash-Lite, 2.5 Pro for finance 2026: a verified three-rung cost ladder from the engine. 3.5 Flash sits at the top, not the floor.
Tiingo vs Polygon vs Finnhub 2026
Tiingo vs Polygon vs Finnhub 2026: Tiingo wins on fundamentals, Finnhub on its free tier (60 calls/min), Polygon on flat-rate intraday. Verified prices.
TradeStation API Pricing 2026
TradeStation API pricing 2026: a funded account plus an email request gets a WebAPI key; rate limits are per-category. Verified access and limits.
TradeStation API vs Alpaca 2026
TradeStation API vs Alpaca 2026: TradeStation's funded-account key gate and per-category limits vs Alpaca's instant free key and flat 200 req/min.
Tradier API Rate Limits 2026
Tradier API rate limits 2026: 120 req/min market data, 60 req/min order placement, 60 in sandbox. X-Ratelimit headers, options-first capabilities, verified.
Tradier vs Tastytrade API 2026
Tradier vs Tastytrade API 2026: Tradier's published 120/60 req/min limits vs Tastytrade's OAuth Open API, 24h sandbox, and DXLink streaming. Who wins.
10-K Token Estimator: Cache Hit Rate as the Cost Driver
Engine output across 10 models and 3 cache regimes: Sonnet at 60% cache costs $0.144 per 10-K with peers; cache architecture drives cost more than model.
Agent Cost Envelope: 150 Markets a Day, Three Configurations
Engine returns Sonnet at 60% convergence = $859/month for 150 markets × 5 steps — 1.72× over budget. Opus is 5× higher; steps are the binding constraint.
Alpaca vs Tradier: Options + MCP Coverage Compared
Comparator returns Alpaca as the only strict-filter fit (4/4); Tradier ties when auth relaxes. MCP directory confirms one official broker MCP.
Auditable LLM Decision-Making for Finance
Six load-bearing audit layers mapped to MiCA, MiFID II, SEC 17a-4, FINRA 4511, SR 11-7, BaFin MaRisk. Collapsing any layer destroys defensibility.
Batch vs Realtime Overnight Cost
A 12-hour deadline kills the batch discount. On 5,000 jobs at Sonnet 4.6, real-time is €210/day; batch would be €105 but the 24h SLA blocks it.
Calibration Dojo vs Platt vs Isotonic: The Right Tool
Engine: Brier 0.202, reliability 0.009 on a 200-forecast sample tape (well calibrated). Three calibration methods address different LLM-output failure modes.
Claude Opus 4.7 vs Gemini 2.5 Pro: 10-K Extraction Cost
Engine output: Opus $0.66 vs Gemini Pro $0.06 per 2-peer 10-K synthesis (10.6× ratio). Selector ties them at score 98; the eval-harness pivots on quality.
Claude Sonnet vs GPT-5 on Prompt Cache for Finance
Claude Sonnet's 5-minute TTL gives 90% off cached reads ($0.30/M); GPT-5's token cache gives 50% off a $10/M base ($5/M). Sonnet wins on cost; GPT-5's edge is cache survival.
Data Vendor TCO for EU Retail
Alpaca Algo Trader Plus is €1,188/year on headline. OPRA pass-through, exchange fees, and BaFin record-keeping push true TCO to €2,400-3,800.
Deflated Sharpe in Low-Trial Research Programmes
Engine output on a 1.8 Sharpe at trial counts 1, 40, 100, 1000 — selection-bias benchmark grows fast; low-trial regimes do not protect raw Sharpe.
Deflated Sharpe vs PBO on the Same Tape
Engine output on 8 strategies: PBO = 0.726 (over the 0.5 overfit threshold) and DSR PSR = 0. Two tests reject the tape for different reasons.
Drawdown Markov vs Historical Bootstrap: Tape-Length Decision
Engine returns p95 recovery 28 months on canonical input; Markov beats bootstrap on tapes under 60 months and loses on tapes over 240 months.
Drawdown Markov: Recovery Tails Explained
Canonical input yields p95 = 39 months recovery on a 25% drawdown threshold — the kill-switch parameter most backtest reports omit entirely.
Earnings Call Summarization: 250 Tickers, Nine Models
Engine returns $0.91/year (Gemini 2.5 Flash-Lite) to $42.80/year (Opus 4.7) for 250-ticker quarterly coverage. Operator review time dominates API spend by 200×.
Edge-Variance Sizing for LLM Signals
Deterministic Kelly on a 2.5% edge says 55.6% of bankroll. The conservative lower-bound posterior says 15.6%. For LLM signals, conservative wins.
Execution Simulator vs Order Book Replay
Simulator: parametric impact, fast, sometimes wrong. Replay: actual book snapshots, true slippage, needs data. Replay is the honest tool at scale.
Execution Simulator: 50k Shares at 10% Participation
Engine returns 12.59 bps total cost ($2,833 on $2.25M notional) for a 50k-share buy at 10% participation. Splitting across days makes impact worse.
Fallback Chain Simulation: When Claude Is Down
Single-provider uptime ~99.5%. A two-stage Anthropic→OpenAI chain lifts it to 99.99%. The fallback returns an answer, not a matching answer.
First Principles: LLMs and Market Microstructure
LLMs cannot reason about sub-millisecond mechanics. They are useful for literature synthesis, slippage post-mortems, and hypothesis generation only.
Forecast Scoring Sandbox: Reading the Reliability Curve
15 forecasts: Brier 0.154, log-loss 0.470. Murphy decomposition shows reliability equals uncertainty — informative but mis-calibrated forecaster.
FTC vs BaFin: Publishing Rule Cost Compared
The regulatory-cost engine returns US-FTC $42,360/year vs EU MiCA+DORA $179,600 at the same operating shape. BaFin, MAR, and AI Act add publishing layers the engine does not price.
FTC vs NLT Regulatory Cost for EU Publishing
FTC-supervised finance publishing runs €88,440/year. The no-licence alternative cuts that by half. EU adds BaFin and MiFID II overhead on top.
Hallucination Detector: Numeric Source Grounding in Practice
Engine flags an EPS substitution ($0.81 vs source $0.78) at groundingRate 0.25 — the substitution failure semantic-similarity checks systematically miss.
Kelly vs Fixed Fractional on a Noisy Edge
On a p=0.55, b=1.40 edge, capped Kelly (3%) returns a 14.5x median vs fixed 2%'s 6.2x; fixed fractional wins on drawdown and noisy-edge robustness.
Kelly with Uncertain Edge: Quarter vs Eighth
Quarter-Kelly assumes you know win-rate. When it is a Beta posterior, eighth-Kelly often dominates on long-run growth and on path-of-ruin probability.
Kupiec vs Bootstrap for VaR Validation
30-day VaR backtest: 7 exceptions vs 0.3 expected, Kupiec LR 32.34, p 1.3e-8. Bootstrap agrees but converges slower. Basel III names Kupiec.
LLM Finance Error Taxonomy: The Bond-Yield Trap
A 12-mode taxonomy of LLM finance errors. Coupon-yield conflation is the textbook case. A 30-line classifier catches 70-80% before they ship.
Model Risk Management for Solo LLM Research Loops
Federal Reserve SR 11-7's three-lines-of-defence framework adapted for the single-operator LLM workflow. Ignoring it re-invents prevented failures.
Model Selector: Extraction Tier at $50 / sub-5s / Medium
Engine returns Gemini 2.5 Flash-Lite (score 96.0, $3/mo) as the top qualifying model, with Gemini 2.5 Flash second; Haiku 4.5 disqualified on context window, Sonnet+ on budget.
Operational Risk for Solo Quant Stacks
Basel's seven operational-risk categories applied to a single-operator launchd stack. Categories 6 and 7 dominate solo-quant losses by an order.
Options Greeks: 30-DTE OTM Call, Worked End to End
Engine returns delta 0.301, gamma 0.0217, theta −$0.10/day, vega $0.20/IV-point for a 30-DTE 5% OTM call on $200 spot at 28% IV — the LLM-confounder case.
PBO Score on an Eight-Strategy Matrix
Eight synthetic strategies on 80 observations: PBO = 0, deflated Sharpe = 0 for every candidate. The combination is uninformative when correlated.
Prompt Injection Tester: News Feed Agents
23 attack payloads against a finance research agent reading attacker-controlled news. Typical first run: 4-7 vulnerable patterns. Four mitigations.
Publishing Finance Content with LLMs in the EU
An EU publisher using LLMs sits between BaFin/WpHG, MiCA, EU AI Act, and ESMA. The defensible posture: no advice, disclose AI, publish methodology.
Returns Distribution: Fat Tails in an Equity Portfolio
55-month sample. Skew -0.54, JB p-value 0.24. Normality test fails to reject — the QQ tail still shows the distribution is fat enough to break VaR.
Risk Parity vs Efficient Frontier: 3-Asset Portfolio Build
Engine returns a reported tangency [12%, −40%, 85%, 42%] and min-variance [8%, 8%, 79%, 6%] on a four-asset macro tape; risk parity refuses the short. Decision pivot.
Risk-Adjusted Returns: Benchmark Choice Drives the Report
Engine returns IR 0.708 vs 0.433 and beta 1.246 vs 3.054 on the same returns against two benchmarks. Sharpe is invariant; alpha and IR are not.
Schema Validator: Trade Decision Strictness in Practice
Engine: 4 payloads against trade_decision — pass, missing-fields fail, sanity-banded warning, enum-mismatch fail. Strict mode is the safe default.
SEC Filing Chunking: Strategy, Size, and Embedding Cost
Engine returns 131 chunks at 1024-tok/10% overlap on a 10-K body; structural preserves table boundaries that recursive splits across six runs.
Selection Bias in LLM Strategy Research
When an LLM proposes ten strategies and you pick the best, the apparent Sharpe is 1.5-2x the real edge. Deflated Sharpe and four discipline rules.
Skill as Contract, Not Prompt: The Methodology Shift
Contract-form skills (typed I/O, runtime invariants, verification harness) survive drift and adversarial inputs that prompt-form skills cannot.
Sortino vs Sharpe: The Tail-Skew Tradeoff
Same series, Sharpe 2.08 vs Sortino 2.90. The 39% gap is downside-only deviation discarding upside variance. When that read is honest and when it lies.
Stat-Arb Capacity: Half-Life Sets the Ceiling
A 7-day half-life pair returns $1B engine maxAum, $400M at 10bp slippage. Practical retail capacity sits orders of magnitude below. Half-life binds.
Structural vs Fixed Chunking for SEC Filings
On a 14k-token earnings transcript, fixed chunking returns 8 chunks of 1,839 tokens; structural returns 30 of 565. Workload-bound, not cost-bound.
Synthetic Data vs Bootstrap Resampling for Backtests
GBM gives clean parametric paths; bootstrap preserves observed skew and vol clustering. For vol-sensitive strategies, run both, report the worst.
Synthetic Data: GARCH vs GBM for Backtesting
GBM Sharpe 1.91 on 504 days is the strategy null. GARCH paths add vol clustering — the regime that breaks vol-sensitive strategies. Run both.
Token Cost Optimizer: Cache Amortization, Three Regimes
Engine returns $46.94/month at 55% cache hit rate vs $74.65 at 0% — a 37% saving. The 13% break-even is approximate; real break-even sits closer to 8–10%.
Trading System Blueprint: Alpaca + Claude Opus
A solo retail trading system in ~1,000 lines of Python, four launchd plists, one DuckDB. Seven decisions and three tests that catch 80% of failures.
VaR Backtest: Kupiec vs Christoffersen on the Same Tape
Engine output on two tapes with 4 breaches each: Kupiec passes both (p=0.572); Christoffersen rejects clustering (p=0.011) and passes isolation.
Walk-Forward Window Sizing: A Decision Rule
On the canonical 50-bar tape the engine returns mean OOS Sharpe 0.653, efficiency 0.600 — but the four windows hide a regime flip the half-life rule fixes.
Bond Yield Curve Parsing with Claude Haiku 4.5
Extract US Treasury yield curves from auction announcements using Claude Haiku 4.5 — full prompt, deterministic verifier-in-the-loop pattern, evaluation.
Deflated Sharpe Ratio
Bailey-López de Prado (2014) deflated Sharpe ratio, derived from extreme-value statistics, with a Monte Carlo confirmation and the full deflation table.
Earnings Call Summarisation
Cost and architecture guide for earnings-call summarisation across eight production LLMs: verifiable 2026 vendor pricing and qualitative failure modes.
MCP vs Custom HTTP: A Six-Scenario Decision
When does MCP beat a hand-rolled HTTP integration in agent finance stacks? Six concrete scenarios — multi-agent registry, auth scope, idempotent retries.
Quant Interview: 50 Questions, LLM-Graded
Fifty quant interview questions — probability, statistics, derivatives, microstructure, regression — with answer keys and a Claude Sonnet 4.6 grading.
Retail PnL vs Backtest
The eleven-point gap between a 14% backtest and 3.2% live PnL, decomposed across eight mechanisms with dollar examples — slippage, latency, fees, fills.
Risk Parity vs Kelly: When Each Sizing Framework
Risk parity and Kelly solve different problems. Risk parity wins when correlations are stable and edge is noisy; Kelly wins when edge is concentrated.
Walk-Forward Validation Pitfalls in LLM-Generated
Eight pitfalls that quietly inflate walk-forward Sharpe in LLM-generated trading strategies — leakage, regime blindness, micro vs macro re-estimation.
Why LLMs Fail Options Greeks
LLMs misfire on theta sign, vega-vs-gamma conflation, and ITM-vs-ATM gamma ranking. The three reproducible error categories, plus a verifier fix.
Backtest Overfitting in LLM Trading Strategies
The Probability of Backtest Overfitting, applied to LLM-augmented research. Why LLM strategies inflate PBO, how to compute it, and the three-gate.
Caching Strategies for Production LLM Pipelines
Three caching layers for production LLM pipelines: provider-side prompt caching, application response cache, semantic similarity cache. Decision matrix +.
Calibration Drift: Why Your LLM's Confidence Score
LLM-reported confidence calibration drifts as the model is updated. Detection patterns and re-calibration math.
Compliance Audit Trails for LLM-Driven Trade
Schema, append-only log design, and reproducibility patterns for SEC/FINRA-compliant LLM trade audit trails.
Cost-Per-Validated-Trade
Tokens spent doesn't equal value. The cost-per-validated-trade metric and how to instrument it.
Hallucination Detection at Scale
Production-grade LLM hallucination detection in four layers: source grounding, self-consistency, deterministic verification, and adversarial probes.
MCP Server Latency: The Hidden Cost of Tool-Call
Each MCP tool call adds latency. For multi-step agents, the total roundtrip cost dominates. Architecture patterns to amortize it.
MCP Servers for Financial Data
A five-grade A-E rubric for finance MCP servers across auth, egress, audit, rotation, and vendor posture. 12-server walkthrough plus anti-patterns.
Model Selection in Finance: Surviving Benchmarks
Model selection finance methodology: a five-axis rubric, quarterly rebench cadence, version-pinning, and shadow A/B that survive 3-6 month.
Production LLM Latency Budgets
Trading apps with LLM-augmented research need explicit latency budgets per call. The P50/P95/P99 math, queue-theory bounds, and architecture patterns.
Prompt Version Control
Prompts are code. Version them, diff them, regression-test them. Three workflow patterns that survive team scaling.
RAG vs Fine-Tuning: A Cost Model
RAG looks cheaper at low query volume. Above 100k queries/month with stable knowledge, fine-tuning wins. The break-even math.
Temperature, Top-P, and Top-K
Sampling parameters affect both quality and cost. The decision rules for each parameter, with worked examples on production cost impact.
Token-Cost Optimization
Three token-cost reduction strategies. Decision rules for when each pays back, and the math when they compound.
Vendor Lock-In Risk: How to Architect Cross-Provider
Anthropic, OpenAI, and Google can all break or price-jump in one quarter. The fallback-chain architecture that survives a single-vendor outage.
After-Hours, 24-7, and Pre-Market Asymmetries
Three boundaries where LLM research built on equity's 9:30–16:00 clock breaks — earnings after close, 24-7 crypto, pre-market Asia/Europe action. Decision rule.
Agent Memory Patterns for Finance Research
Three memory tiers for finance agents — working, episodic, long-term lesson library — with retention policies and runnable Python for each.
Batch API Economics for Finance Loops
When Anthropic Message Batches or OpenAI Batch cut cost by half on finance workloads — and the soft-deadline rule for when batch is not a valid choice.
Bayesian Updating for LLM-Assisted Forecasts
Turn LLM probability outputs into calibrated posteriors — Beta-Binomial for binary forecasts, Normal-Inverse-Gamma for continuous — with runnable Python.
Bounded-Cost Agentic Research
Three gates stop runaway agent loops: hard token budget, step-count cap, and a cost-convergence check that halts when belief stops moving.
Brier Scores and Log Loss for Forecasters
Two proper scoring rules for probabilistic forecasts, why Brier decomposes into reliability plus resolution, and why log loss punishes overconfident wrongness.
Context Hygiene for Multi-Step Research
Three-tier layered summary — leaf documents, intermediate briefs, working memory — with per-tier retention rules that keep long research loops cheap and sharp.
Evaluation Harness for Finance LLM Tasks
Why public benchmarks are a signal not a decision, how to source ground truth from EDGAR, and a runnable eval-harness skeleton with bootstrap confidence.
Fine-Tuning vs RAG vs Long-Context for Filings
Decision matrix for finance LLMs: when RAG wins, when long-context wins, and when fine-tuning makes sense. Cost math from published 2026-04 vendor rates.
Inference Cost Attribution per Idea and Trade
Append-only cost-event schema plus two canonical SQL queries — cost per idea, cost per validated trade — with cache-write amortization built in.
MCP vs Function Calling for Finance Agents
Where MCP wins, where function calling wins, and why the right answer is almost always a hybrid — data layer on MCP, decision code on function calling.
Model Selection Framework for Finance Tasks
A task × latency × cost × context decision tree for finance LLM workloads. Ten concrete scenarios mapped to tier bands. Grounded in published pricing, not.
Multi-Timeframe Signal Integration With LLMs
LLMs belong on weekly fundamentals, not intraday microstructure. A two-layer architecture: weekly LLM thesis plus rule-based intraday invalidation gates.
News Feed Integration for Finance Agents
Four patterns — source vetting, injection sanitization, timestamp discipline, dedup across reporters — make news safe for an LLM finance agent. Runnable.
Numeric Precision in LLM Filing Extraction
Six precision traps — units, currency, GAAP vs non-GAAP, diluted vs basic shares, restatements, rounding — and the structured-output pattern that fixes them.
Observability Patterns for LLM Trading Agents
Three patterns that stop silent failure: trace-ID propagation, structured log schema with per-step cost and confidence, and a deterministic replay harness.
Postmortem Template for LLM Trading Systems
A blameless, append-only postmortem template plus a 20-mode failure checklist — price-blind leaks to cache poisoning — keyed to the trace-ID log.
Prompt Caching Economics for Finance
How Anthropic, OpenAI, and Gemini prompt caching works on finance workloads — 5-minute TTL, hit-rate patterns, and 50-90% input savings at the right design.
Prompt Injection Defenses for Finance Agents
Five stacked defenses: input fencing, output validation, tool allow-list, bounded-cost circuit, dual-model cross-check. No single defense is sufficient.
Prompt Patterns for Earnings Calls
Five copy-paste patterns — speaker attribution, hedged-guidance confidence, multi-quarter delta, risk aggregator, forward-outlook separator.
Rate Limit Design for LLM Research Loops
Three primitives that turn bursty finance workloads into stable loops: per-provider token bucket, cross-provider fallback chain, and graceful degradation.
Reading Financial Filings With LLMs: 2026 Playbook
A map of eight filing tasks — extraction, summarization, peer comparison, Q&A, classification, sentiment, forecasting input, compliance — with model.
Research Diary Schema: Auditable LLM Research
A 12-field append-only schema that captures every idea — including rejected ones — to unlock calibration, proper scoring, and post-hoc overfitting analysis.
Thinking Tokens for Finance Tasks
When extended-thinking and reasoning-effort modes earn their 3-10x cost tax on finance workloads — and when they are a silent drain on the budget.
Backtest to Paper to Live: Deployment Playbook
Backtest to paper to live — the gates that separate each stage, the metrics that trigger rollback, and the kill-switch you should already have.
Choosing a Broker API 2026: Rate Limits, Fees, Auth
Choosing a broker API 2026 — Alpaca vs IBKR vs Tradier vs Schwab vs Robinhood on the axes that bite: auth, order types, rate limits, and fees.
Execution Simulation: Slippage and Impact
The math of market impact — why it scales as the square root of trade size, when linear impact dominates, and the fix that keeps backtests honest.
Options Greeks for LLM-Driven Trading
Options Greeks for LLM-driven trading: delta, gamma, theta, vega, rho — what each costs, three rules, plus a prompt template for multi-leg positions.
Prompt Injection Attack Catalog for Finance Agents
Prompt injection attacks on finance agents — indirect injection via news feeds, tool-result poisoning, prompt exfiltration, unit confusion — plus defenses.
Rate-Limited, Resumable Market-Data Ingestion
Four primitives that turn a weekend ingestion script into a six-month loop: token-bucket limits, resumable checkpoints, idempotent writes, DLQs.
Real-Time vs End-of-Day Trading Systems
Real-time vs end-of-day trading systems — the decision rule, the 20-50x cost delta, and the four signal types where real-time is genuinely load-bearing.
Synthetic Market Data for Backtests: Beyond GBM
Synthetic market data beyond GBM — when GARCH(1,1), regime-switching, or copula-linked pairs are the right next step. Trade-offs plus a Python template.
The $0/Month Trading Stack in 2026
Zero-cost solo trading stack: launchd + free market data tiers + local LLMs on cheap paths + BYO API keys — plus where paid tiers become unavoidable.
The Sharpe Ratio Trap
Sharpe ignores tail risk, assumes Gaussian returns, and is trivially gameable. Four metrics to report alongside it: Sortino, Calmar, tail, deflated Sharpe.
Walk-Forward Validation: A Cookbook
Walk-forward is the cheapest honest backtest you can run. Anchored vs rolling windows, the four parameters that matter, and a 60-line Python template.
Broker APIs for AI Agents 2026: MCP Coverage
Broker APIs for AI agents 2026: Alpaca, IBKR, Tradier ranked on MCP server coverage, order idempotency, and retry safety for autonomous trading.
Building a Production Claude Agent for Finance
Production Claude agent for finance: price-blind research, idempotent execution, heartbeat + watchdog + circuit breaker, under $225/month at small scale.
Calibrating LLM Forecasts with Isotonic Regression
LLM probabilities are systematically miscalibrated. Isotonic regression via PAV is the cheapest robust fix: 40 lines of Python, no distributional priors.
Conviction-Scaled Kelly Bet Sizing
Full Kelly is brutally unforgiving of over-estimation. Quarter-Kelly with a conviction-tier mapping and a per-trade cap is the defensible default.
Did You Overfit? PBO and Deflated Sharpe
A practical tutorial on the two best-documented tests for backtest overfitting — PBO via CSCV and the Deflated Sharpe Ratio. Runnable Python + tool.
Finance MCP Servers: The Security Baseline
An opinionated rubric for grading 2026 finance MCP servers on scope, auth, idempotency, transport, and schema — plus the failure modes that kill agents.
Heartbeats, Watchdogs, Circuit Breakers for Trading
Silent failure is the worst failure mode. Three patterns prevent it — heartbeat, watchdog, circuit breaker — in under 100 lines of Python on launchd.
How to Read a Backtest Report: 2026 Cheat Sheet
Five questions a backtest report must answer — edge real, persistent, cheap to trade, bearable, explainable — with the statistics that verify each.
LLM Prompt Patterns for 10-K and 8-K Extraction
Three structured patterns for auditable 10-K extractions: field-by-field JSON, citation-required verbatim quotes, and contradiction-triangle cross-check.
Market Data APIs Compared: Databento vs Polygon 2026
Market data APIs compared: six retail providers on pricing, tier coverage, real-time access, options and futures coverage, and who wins for each profile.
Signal Orthogonality: Why Ensembles Become One Bet
A 10-signal ensemble with pairwise correlation 0.8 is effectively a 1.5-signal ensemble. The math, a two-minute diagnostic, and three axes that work.
The 2026 Engineer's Guide to AI in Markets
An engineer's map of where LLMs, MCP servers, and market-data APIs fit into a 2026 trading stack — and where they still break. Direct, no hype, no grift.
The 5 Failure Modes of LLM Trading Agents (2026)
The 5 recurring failure modes in retail LLM trading agents: price-blind leaks, numeric fabrication, prompt drift, token runaway, audit amnesia.
The 8-Step LLM Research Prompt Template
Free-form prompts yield uncalibrated LLM output. An 8-step template makes research reproducible and better-calibrated across model versions.
The BaFin + EU Guide for Retail AI Traders (2026)
BaFin and EU rules for retail AI trading, publishing finance content, and automated strategies. Education-safe phrasing and the minimum compliance stack.
The Price-Blind LLM Research Harness
Price-blind LLM research — most harnesses leak the current price and the model confabulates. The architectural fix and a 30-line Python scaffold.
The Token-Cost Reality of LLM Trading Research
What LLM trading research costs per idea and per validated trade across Claude, GPT-5, and Gemini 2.5. Pricing, caching, model-mix under $200/month.
Publication standards