Research · 52 articles
Research
Long-form pieces on LLMs, MCP, and the engineering around markets. Pillar guides, head-to-heads, runnable tutorials, opinionated methodology. Read-it-or-skim-it: there's a TL;DR on every one.
After-Hours, 24-7, and Pre-Market Asymmetries
Three boundaries where LLM research built on equity's 9:30–16:00 clock breaks — earnings after close, 24-7 crypto, pre-market Asia/Europe action. Decision rule.
Agent Memory Patterns for Finance Research
Three memory tiers for finance agents — working, episodic, long-term lesson library — with retention policies and runnable Python for each.
Batch API Economics for Finance Loops
When Anthropic Message Batches or OpenAI Batch cut cost by half on finance workloads — and the soft-deadline rule for when batch is not a valid choice.
Bayesian Updating for LLM-Assisted Forecasts
Turn LLM probability outputs into calibrated posteriors — Beta-Binomial for binary forecasts, Normal-Inverse-Gamma for continuous — with runnable Python.
Bounded-Cost Agentic Research
Three gates stop runaway agent loops: hard token budget, step-count cap, and a cost-convergence check that halts when belief stops moving.
Brier Scores and Log Loss for Forecasters
Two proper scoring rules for probabilistic forecasts, why Brier decomposes into reliability plus resolution, and why log loss punishes overconfident wrongness.
Context Hygiene for Multi-Step Research
Three-tier layered summary — leaf documents, intermediate briefs, working memory — with per-tier retention rules that keep long research loops cheap and sharp.
Evaluation Harness for Finance LLM Tasks
Why public benchmarks are a signal not a decision, how to source ground truth from EDGAR, and a runnable eval-harness skeleton with bootstrap confidence intervals.
Fine-Tuning vs RAG vs Long-Context for Filings
Decision matrix for finance LLMs: when RAG wins, when long-context wins, and when fine-tuning makes sense. Cost math from published 2026-04 vendor rates.
Inference Cost Attribution per Idea and Trade
Append-only cost-event schema plus two canonical SQL queries — cost per idea, cost per validated trade — with cache-write amortization built in.
MCP vs Function Calling for Finance Agents
Where MCP wins, where function calling wins, and why the right answer is almost always a hybrid — data layer on MCP, decision code on function calling.
Model Selection Framework for Finance Tasks
A task × latency × cost × context decision tree for finance LLM workloads. Ten concrete scenarios mapped to tier bands. Grounded in published pricing, not benchmarks.
Multi-Timeframe Signal Integration With LLMs
LLMs belong on weekly fundamentals, not intraday microstructure. A two-layer architecture: weekly LLM thesis plus rule-based intraday invalidation gates.
News Feed Integration for Finance Agents
Four patterns — source vetting, injection sanitization, timestamp discipline, dedup across reporters — make news safe for an LLM finance agent. Runnable scaffold.
Numeric Precision in LLM Filing Extraction
Six precision traps — units, currency, GAAP vs non-GAAP, diluted vs basic shares, restatements, rounding — and the structured-output pattern that fixes them.
Observability Patterns for LLM Trading Agents
Three patterns that stop silent failure: trace-ID propagation, structured log schema with per-step cost and confidence, and a deterministic replay harness.
Postmortem Template for LLM Trading Systems
A blameless, append-only postmortem template plus a 20-mode failure checklist — price-blind leaks to cache poisoning — keyed to the trace-ID log.
Prompt Caching Economics for Finance
How Anthropic, OpenAI, and Gemini prompt caching works on finance workloads — 5-minute TTL, hit-rate patterns, and 50-90% input savings at the right design.
Prompt Injection Defenses for Finance Agents
Five stacked defenses: input fencing, output validation, tool allow-list, bounded-cost circuit, dual-model cross-check. No single defense is sufficient.
Prompt Patterns for Earnings Calls
Five copy-paste patterns — speaker attribution, hedged-guidance confidence, multi-quarter delta, risk aggregator, forward-outlook separator — with runnable code.
Rate Limit Design for LLM Research Loops
Three primitives that turn bursty finance workloads into stable loops: per-provider token bucket, cross-provider fallback chain, and graceful degradation.
Reading Financial Filings With LLMs: 2026 Playbook
A map of eight filing tasks — extraction, summarization, peer comparison, Q&A, classification, sentiment, forecasting input, compliance — with model, pattern, and cost.
Research Diary Schema: Auditable LLM Research
A 12-field append-only schema that captures every idea — including rejected ones — to unlock calibration, proper scoring, and post-hoc overfitting analysis.
Thinking Tokens for Finance Tasks
When extended-thinking and reasoning-effort modes earn their 3-10x cost tax on finance workloads — and when they are a silent drain on the budget.
Backtest to Paper to Live: Deployment Playbook
Backtest to paper to live — the gates that separate each stage, the metrics that trigger rollback, and the kill-switch you should already have.
Choosing a Broker API in 2026
Choosing a broker API 2026 — Alpaca vs IBKR vs Tradier vs Schwab vs Robinhood on the axes that bite: auth, order types, rate limits, and fees.
Execution Simulation: Slippage and Impact
The math of market impact — why it scales as the square root of trade size, when linear impact dominates, and the fix that keeps backtests honest.
Options Greeks for LLM-Driven Trading
Options Greeks for LLM-driven trading: delta, gamma, theta, vega, rho — what each costs, three rules, plus a prompt template for multi-leg positions.
Prompt Injection Attack Catalog for Finance Agents
Prompt injection attacks on finance agents — indirect injection via news feeds, tool-result poisoning, prompt exfiltration, unit confusion — plus defenses.
Rate-Limited, Resumable Market-Data Ingestion
Four primitives that turn a weekend ingestion script into a six-month loop: token-bucket limits, resumable checkpoints, idempotent writes, DLQs.
Real-Time vs End-of-Day Trading Systems
Real-time vs end-of-day trading systems — the decision rule, the 20-50x cost delta, and the four signal types where real-time is genuinely load-bearing.
Synthetic Market Data for Backtests: Beyond GBM
Synthetic market data beyond GBM — when GARCH(1,1), regime-switching, or copula-linked pairs are the right next step. Trade-offs plus a Python template.
The $0/Month Trading Stack in 2026
Zero-cost solo trading stack: launchd + free market data tiers + local LLMs on cheap paths + BYO API keys — plus where paid tiers become unavoidable.
The Sharpe Ratio Trap
Sharpe ignores tail risk, assumes Gaussian returns, and is trivially gameable. Four metrics to report alongside it: Sortino, Calmar, tail, deflated Sharpe.
Walk-Forward Validation: A Cookbook
Walk-forward is the cheapest honest backtest you can run. Anchored vs rolling windows, the four parameters that matter, and a 60-line Python template.
Broker APIs Compared: Alpaca vs IBKR vs Tradier 2026
Broker APIs for retail AI trading 2026: Alpaca for solo ops (official MCP), IBKR for multi-asset depth, Tradier for options. Head-to-head + MCP angle.
Building a Production Claude Agent for Finance
Production Claude agent for finance: price-blind research, idempotent execution, heartbeat + watchdog + circuit breaker, under $225/month at small scale.
Calibrating LLM Forecasts with Isotonic Regression
LLM probabilities are systematically miscalibrated. Isotonic regression via PAV is the cheapest robust fix: 40 lines of Python, no distributional priors.
Conviction-Scaled Kelly Bet Sizing
Full Kelly is brutally unforgiving of over-estimation. Quarter-Kelly with a conviction-tier mapping and a per-trade cap is the defensible default.
Did You Overfit? PBO and Deflated Sharpe
A practical tutorial on the two best-documented tests for backtest overfitting — PBO via CSCV and the Deflated Sharpe Ratio. Runnable Python + tool.
Finance MCP Servers: The Security Baseline
An opinionated rubric for grading 2026 finance MCP servers on scope, auth, idempotency, transport, and schema — plus the failure modes that kill agents.
Heartbeats, Watchdogs, Circuit Breakers for Trading
Silent failure is the worst failure mode. Three patterns prevent it — heartbeat, watchdog, circuit breaker — in under 100 lines of Python on launchd.
How to Read a Backtest Report: 2026 Cheat Sheet
Five questions a backtest report must answer — edge real, persistent, cheap to trade, bearable, explainable — with the statistics that verify each.
LLM Prompt Patterns for 10-K and 8-K Extraction
Three structured patterns for auditable 10-K extractions: field-by-field JSON, citation-required verbatim quotes, and contradiction-triangle cross-check.
Market Data APIs Compared: Databento vs Polygon 2026
Market data APIs compared: six retail providers on pricing, tier coverage, real-time access, options and futures coverage, and who wins for each profile.
Signal Orthogonality: Why Ensembles Become One Bet
A 10-signal ensemble with pairwise correlation 0.8 is effectively a 1.5-signal ensemble. The math, a two-minute diagnostic, and three axes that work.
The 2026 Engineer's Guide to AI in Markets
An engineer's map of where LLMs, MCP servers, and market-data APIs fit into a 2026 trading stack — and where they still break. Direct, no hype, no grift.
The 5 Failure Modes of LLM Trading Agents (2026)
The 5 recurring failure modes in retail LLM trading agents: price-blind leaks, numeric fabrication, prompt drift, token runaway, audit amnesia.
The 8-Step LLM Research Prompt Template
Free-form prompts yield uncalibrated output. An 8-step template — reference class, decomposition, pre-mortem, invalidation, JSON — fixes that.
The BaFin + EU Guide for Retail AI Traders (2026)
BaFin and EU rules for retail AI trading, publishing finance content, and automated strategies. Education-safe phrasing and the minimum compliance stack.
The Price-Blind LLM Research Harness
Price-blind LLM research — most harnesses leak the current price and the model confabulates. The architectural fix and a 30-line Python scaffold.
The Token-Cost Reality of LLM Trading Research
What LLM trading research costs per idea and per validated trade across Claude, GPT-5, and Gemini 2.5. Pricing, caching, model-mix under $200/month.
Publication standards