Research

Agent Memory Patterns for Finance Research

Three memory tiers for finance agents — working, episodic, long-term lesson library — with retention policies and runnable Python for each.

Batch API Economics for Finance Loops

When Anthropic Message Batches or OpenAI Batch cut cost by half on finance workloads — and the soft-deadline rule for when batch is not a valid choice.

Bayesian Updating for LLM-Assisted Forecasts

Turn LLM probability outputs into calibrated posteriors — Beta-Binomial for binary forecasts, Normal-Inverse-Gamma for continuous — with runnable Python.

Bounded-Cost Agentic Research

Three gates stop runaway agent loops: hard token budget, step-count cap, and a cost-convergence check that halts when belief stops moving.

Brier Scores and Log Loss for Forecasters

Two proper scoring rules for probabilistic forecasts, why Brier decomposes into reliability plus resolution, and why log loss punishes overconfident wrongness.

Context Hygiene for Multi-Step Research

Three-tier layered summary — leaf documents, intermediate briefs, working memory — with per-tier retention rules that keep long research loops cheap and sharp.

Evaluation Harness for Finance LLM Tasks

Why public benchmarks are a signal not a decision, how to source ground truth from EDGAR, and a runnable eval-harness skeleton with bootstrap confidence intervals.

Methodology · Opinion12-min read

Fine-Tuning vs RAG vs Long-Context for Filings

Decision matrix for finance LLMs: when RAG wins, when long-context wins, and when fine-tuning makes sense. Cost math from published 2026-04 vendor rates.

Inference Cost Attribution per Idea and Trade

Append-only cost-event schema plus two canonical SQL queries — cost per idea, cost per validated trade — with cache-write amortization built in.

MCP vs Function Calling for Finance Agents

Where MCP wins, where function calling wins, and why the right answer is almost always a hybrid — data layer on MCP, decision code on function calling.

Comparison · Benchmark12-min read

Model Selection Framework for Finance Tasks

A task × latency × cost × context decision tree for finance LLM workloads. Ten concrete scenarios mapped to tier bands. Grounded in published pricing, not benchmarks.

Multi-Timeframe Signal Integration With LLMs

LLMs belong on weekly fundamentals, not intraday microstructure. A two-layer architecture: weekly LLM thesis plus rule-based intraday invalidation gates.

News Feed Integration for Finance Agents

Four patterns — source vetting, injection sanitization, timestamp discipline, dedup across reporters — make news safe for an LLM finance agent. Runnable scaffold.

Numeric Precision in LLM Filing Extraction

Six precision traps — units, currency, GAAP vs non-GAAP, diluted vs basic shares, restatements, rounding — and the structured-output pattern that fixes them.

Observability Patterns for LLM Trading Agents

Three patterns that stop silent failure: trace-ID propagation, structured log schema with per-step cost and confidence, and a deterministic replay harness.

Postmortem Template for LLM Trading Systems

A blameless, append-only postmortem template plus a 20-mode failure checklist — price-blind leaks to cache poisoning — keyed to the trace-ID log.

Prompt Caching Economics for Finance

How Anthropic, OpenAI, and Gemini prompt caching works on finance workloads — 5-minute TTL, hit-rate patterns, and 50-90% input savings at the right design.

Prompt Injection Defenses for Finance Agents

Five stacked defenses: input fencing, output validation, tool allow-list, bounded-cost circuit, dual-model cross-check. No single defense is sufficient.

Prompt Patterns for Earnings Calls

Five copy-paste patterns — speaker attribution, hedged-guidance confidence, multi-quarter delta, risk aggregator, forward-outlook separator — with runnable code.

Rate Limit Design for LLM Research Loops

Three primitives that turn bursty finance workloads into stable loops: per-provider token bucket, cross-provider fallback chain, and graceful degradation.

Pillar · Guide14-min read

Reading Financial Filings With LLMs: 2026 Playbook

A map of eight filing tasks — extraction, summarization, peer comparison, Q&A, classification, sentiment, forecasting input, compliance — with model, pattern, and cost.

Research Diary Schema: Auditable LLM Research

A 12-field append-only schema that captures every idea — including rejected ones — to unlock calibration, proper scoring, and post-hoc overfitting analysis.

Thinking Tokens for Finance Tasks

When extended-thinking and reasoning-effort modes earn their 3-10x cost tax on finance workloads — and when they are a silent drain on the budget.

Backtest to Paper to Live: Deployment Playbook

Backtest to paper to live — the gates that separate each stage, the metrics that trigger rollback, and the kill-switch you should already have.

Comparison · Benchmark10-min read

Choosing a Broker API in 2026

Choosing a broker API 2026 — Alpaca vs IBKR vs Tradier vs Schwab vs Robinhood on the axes that bite: auth, order types, rate limits, and fees.

Execution Simulation: Slippage and Impact

The math of market impact — why it scales as the square root of trade size, when linear impact dominates, and the fix that keeps backtests honest.

Options Greeks for LLM-Driven Trading

Options Greeks for LLM-driven trading: delta, gamma, theta, vega, rho — what each costs, three rules, plus a prompt template for multi-leg positions.

Prompt Injection Attack Catalog for Finance Agents

Prompt injection attacks on finance agents — indirect injection via news feeds, tool-result poisoning, prompt exfiltration, unit confusion — plus defenses.

Rate-Limited, Resumable Market-Data Ingestion

Four primitives that turn a weekend ingestion script into a six-month loop: token-bucket limits, resumable checkpoints, idempotent writes, DLQs.

Methodology · Opinion9-min read

Real-Time vs End-of-Day Trading Systems

Real-time vs end-of-day trading systems — the decision rule, the 20-50x cost delta, and the four signal types where real-time is genuinely load-bearing.

Synthetic Market Data for Backtests: Beyond GBM

Synthetic market data beyond GBM — when GARCH(1,1), regime-switching, or copula-linked pairs are the right next step. Trade-offs plus a Python template.

The $0/Month Trading Stack in 2026

Zero-cost solo trading stack: launchd + free market data tiers + local LLMs on cheap paths + BYO API keys — plus where paid tiers become unavoidable.

The Sharpe Ratio Trap

Sharpe ignores tail risk, assumes Gaussian returns, and is trivially gameable. Four metrics to report alongside it: Sortino, Calmar, tail, deflated Sharpe.

Walk-Forward Validation: A Cookbook

Walk-forward is the cheapest honest backtest you can run. Anchored vs rolling windows, the four parameters that matter, and a 60-line Python template.

Comparison · Benchmark8-min read

Broker APIs Compared: Alpaca vs IBKR vs Tradier 2026

Broker APIs for retail AI trading 2026: Alpaca for solo ops (official MCP), IBKR for multi-asset depth, Tradier for options. Head-to-head + MCP angle.

Tutorial · Runnable14-min read

Building a Production Claude Agent for Finance

Production Claude agent for finance: price-blind research, idempotent execution, heartbeat + watchdog + circuit breaker, under $225/month at small scale.

Calibrating LLM Forecasts with Isotonic Regression

LLM probabilities are systematically miscalibrated. Isotonic regression via PAV is the cheapest robust fix: 40 lines of Python, no distributional priors.

Tutorial · Runnable8-min read

Conviction-Scaled Kelly Bet Sizing

Full Kelly is brutally unforgiving of over-estimation. Quarter-Kelly with a conviction-tier mapping and a per-trade cap is the defensible default.

Did You Overfit? PBO and Deflated Sharpe

A practical tutorial on the two best-documented tests for backtest overfitting — PBO via CSCV and the Deflated Sharpe Ratio. Runnable Python + tool.

Finance MCP Servers: The Security Baseline

An opinionated rubric for grading 2026 finance MCP servers on scope, auth, idempotency, transport, and schema — plus the failure modes that kill agents.

Tutorial · Runnable9-min read

Heartbeats, Watchdogs, Circuit Breakers for Trading

Silent failure is the worst failure mode. Three patterns prevent it — heartbeat, watchdog, circuit breaker — in under 100 lines of Python on launchd.

Tutorial · Runnable9-min read

How to Read a Backtest Report: 2026 Cheat Sheet

Five questions a backtest report must answer — edge real, persistent, cheap to trade, bearable, explainable — with the statistics that verify each.

Tutorial · Runnable9-min read

LLM Prompt Patterns for 10-K and 8-K Extraction

Three structured patterns for auditable 10-K extractions: field-by-field JSON, citation-required verbatim quotes, and contradiction-triangle cross-check.

Comparison · Benchmark9-min read

Market Data APIs Compared: Databento vs Polygon 2026

Market data APIs compared: six retail providers on pricing, tier coverage, real-time access, options and futures coverage, and who wins for each profile.

Tutorial · Runnable7-min read

Signal Orthogonality: Why Ensembles Become One Bet

A 10-signal ensemble with pairwise correlation 0.8 is effectively a 1.5-signal ensemble. The math, a two-minute diagnostic, and three axes that work.

Pillar · Guide10-min read

The 2026 Engineer's Guide to AI in Markets

An engineer's map of where LLMs, MCP servers, and market-data APIs fit into a 2026 trading stack — and where they still break. Direct, no hype, no grift.

The 5 Failure Modes of LLM Trading Agents (2026)

The 5 recurring failure modes in retail LLM trading agents: price-blind leaks, numeric fabrication, prompt drift, token runaway, audit amnesia.

Methodology · Opinion9-min read

The 8-Step LLM Research Prompt Template

Free-form prompts yield uncalibrated output. An 8-step template — reference class, decomposition, pre-mortem, invalidation, JSON — fixes that.

Methodology · Opinion12-min read

The BaFin + EU Guide for Retail AI Traders (2026)

BaFin and EU rules for retail AI trading, publishing finance content, and automated strategies. Education-safe phrasing and the minimum compliance stack.

The Price-Blind LLM Research Harness

Price-blind LLM research — most harnesses leak the current price and the model confabulates. The architectural fix and a 30-line Python scaffold.

The Token-Cost Reality of LLM Trading Research

What LLM trading research costs per idea and per validated trade across Claude, GPT-5, and Gemini 2.5. Pricing, caching, model-mix under $200/month.