aifinhub

Research · 52 articles

Research

Long-form pieces on LLMs, MCP, and the engineering around markets. Pillar guides, head-to-heads, runnable tutorials, opinionated methodology. Read-it-or-skim-it: there's a TL;DR on every one.

Methodology · Opinion 10-min read

After-Hours, 24-7, and Pre-Market Asymmetries

Three boundaries where LLM research built on equity's 9:30–16:00 clock breaks — earnings after close, 24-7 crypto, pre-market Asia/Europe action. Decision rule.

Read the guide
Tutorial · Runnable11-min read

Agent Memory Patterns for Finance Research

Three memory tiers for finance agents — working, episodic, long-term lesson library — with retention policies and runnable Python for each.

Read
Methodology · Opinion11-min read

Batch API Economics for Finance Loops

When Anthropic Message Batches or OpenAI Batch cut cost by half on finance workloads — and the soft-deadline rule for when batch is not a valid choice.

Read
Tutorial · Runnable11-min read

Bayesian Updating for LLM-Assisted Forecasts

Turn LLM probability outputs into calibrated posteriors — Beta-Binomial for binary forecasts, Normal-Inverse-Gamma for continuous — with runnable Python.

Read
Methodology · Opinion10-min read

Bounded-Cost Agentic Research

Three gates stop runaway agent loops: hard token budget, step-count cap, and a cost-convergence check that halts when belief stops moving.

Read
Tutorial · Runnable10-min read

Brier Scores and Log Loss for Forecasters

Two proper scoring rules for probabilistic forecasts, why Brier decomposes into reliability plus resolution, and why log loss punishes overconfident wrongness.

Read
Tutorial · Runnable10-min read

Context Hygiene for Multi-Step Research

Three-tier layered summary — leaf documents, intermediate briefs, working memory — with per-tier retention rules that keep long research loops cheap and sharp.

Read
Tutorial · Runnable12-min read

Evaluation Harness for Finance LLM Tasks

Why public benchmarks are a signal not a decision, how to source ground truth from EDGAR, and a runnable eval-harness skeleton with bootstrap confidence intervals.

Read
Methodology · Opinion12-min read

Fine-Tuning vs RAG vs Long-Context for Filings

Decision matrix for finance LLMs: when RAG wins, when long-context wins, and when fine-tuning makes sense. Cost math from published 2026-04 vendor rates.

Read
Tutorial · Runnable11-min read

Inference Cost Attribution per Idea and Trade

Append-only cost-event schema plus two canonical SQL queries — cost per idea, cost per validated trade — with cache-write amortization built in.

Read
Methodology · Opinion10-min read

MCP vs Function Calling for Finance Agents

Where MCP wins, where function calling wins, and why the right answer is almost always a hybrid — data layer on MCP, decision code on function calling.

Read
Comparison · Benchmark12-min read

Model Selection Framework for Finance Tasks

A task × latency × cost × context decision tree for finance LLM workloads. Ten concrete scenarios mapped to tier bands. Grounded in published pricing, not benchmarks.

Read
Methodology · Opinion11-min read

Multi-Timeframe Signal Integration With LLMs

LLMs belong on weekly fundamentals, not intraday microstructure. A two-layer architecture: weekly LLM thesis plus rule-based intraday invalidation gates.

Read
Tutorial · Runnable11-min read

News Feed Integration for Finance Agents

Four patterns — source vetting, injection sanitization, timestamp discipline, dedup across reporters — make news safe for an LLM finance agent. Runnable scaffold.

Read
Tutorial · Runnable11-min read

Numeric Precision in LLM Filing Extraction

Six precision traps — units, currency, GAAP vs non-GAAP, diluted vs basic shares, restatements, rounding — and the structured-output pattern that fixes them.

Read
Tutorial · Runnable11-min read

Observability Patterns for LLM Trading Agents

Three patterns that stop silent failure: trace-ID propagation, structured log schema with per-step cost and confidence, and a deterministic replay harness.

Read
Methodology · Opinion10-min read

Postmortem Template for LLM Trading Systems

A blameless, append-only postmortem template plus a 20-mode failure checklist — price-blind leaks to cache poisoning — keyed to the trace-ID log.

Read
Methodology · Opinion11-min read

Prompt Caching Economics for Finance

How Anthropic, OpenAI, and Gemini prompt caching works on finance workloads — 5-minute TTL, hit-rate patterns, and 50-90% input savings at the right design.

Read
Methodology · Opinion11-min read

Prompt Injection Defenses for Finance Agents

Five stacked defenses: input fencing, output validation, tool allow-list, bounded-cost circuit, dual-model cross-check. No single defense is sufficient.

Read
Tutorial · Runnable12-min read

Prompt Patterns for Earnings Calls

Five copy-paste patterns — speaker attribution, hedged-guidance confidence, multi-quarter delta, risk aggregator, forward-outlook separator — with runnable code.

Read
Tutorial · Runnable10-min read

Rate Limit Design for LLM Research Loops

Three primitives that turn bursty finance workloads into stable loops: per-provider token bucket, cross-provider fallback chain, and graceful degradation.

Read
Pillar · Guide14-min read

Reading Financial Filings With LLMs: 2026 Playbook

A map of eight filing tasks — extraction, summarization, peer comparison, Q&A, classification, sentiment, forecasting input, compliance — with model, pattern, and cost.

Read
Methodology · Opinion10-min read

Research Diary Schema: Auditable LLM Research

A 12-field append-only schema that captures every idea — including rejected ones — to unlock calibration, proper scoring, and post-hoc overfitting analysis.

Read
Methodology · Opinion11-min read

Thinking Tokens for Finance Tasks

When extended-thinking and reasoning-effort modes earn their 3-10x cost tax on finance workloads — and when they are a silent drain on the budget.

Read
Tutorial · Runnable12-min read

Backtest to Paper to Live: Deployment Playbook

Backtest to paper to live — the gates that separate each stage, the metrics that trigger rollback, and the kill-switch you should already have.

Read
Comparison · Benchmark10-min read

Choosing a Broker API in 2026

Choosing a broker API 2026 — Alpaca vs IBKR vs Tradier vs Schwab vs Robinhood on the axes that bite: auth, order types, rate limits, and fees.

Read
Methodology · Opinion11-min read

Execution Simulation: Slippage and Impact

The math of market impact — why it scales as the square root of trade size, when linear impact dominates, and the fix that keeps backtests honest.

Read
Tutorial · Runnable10-min read

Options Greeks for LLM-Driven Trading

Options Greeks for LLM-driven trading: delta, gamma, theta, vega, rho — what each costs, three rules, plus a prompt template for multi-leg positions.

Read
Methodology · Opinion10-min read

Prompt Injection Attack Catalog for Finance Agents

Prompt injection attacks on finance agents — indirect injection via news feeds, tool-result poisoning, prompt exfiltration, unit confusion — plus defenses.

Read
Tutorial · Runnable11-min read

Rate-Limited, Resumable Market-Data Ingestion

Four primitives that turn a weekend ingestion script into a six-month loop: token-bucket limits, resumable checkpoints, idempotent writes, DLQs.

Read
Methodology · Opinion9-min read

Real-Time vs End-of-Day Trading Systems

Real-time vs end-of-day trading systems — the decision rule, the 20-50x cost delta, and the four signal types where real-time is genuinely load-bearing.

Read
Tutorial · Runnable10-min read

Synthetic Market Data for Backtests: Beyond GBM

Synthetic market data beyond GBM — when GARCH(1,1), regime-switching, or copula-linked pairs are the right next step. Trade-offs plus a Python template.

Read
Methodology · Opinion10-min read

The $0/Month Trading Stack in 2026

Zero-cost solo trading stack: launchd + free market data tiers + local LLMs on cheap paths + BYO API keys — plus where paid tiers become unavoidable.

Read
Methodology · Opinion8-min read

The Sharpe Ratio Trap

Sharpe ignores tail risk, assumes Gaussian returns, and is trivially gameable. Four metrics to report alongside it: Sortino, Calmar, tail, deflated Sharpe.

Read
Tutorial · Runnable11-min read

Walk-Forward Validation: A Cookbook

Walk-forward is the cheapest honest backtest you can run. Anchored vs rolling windows, the four parameters that matter, and a 60-line Python template.

Read
Comparison · Benchmark8-min read

Broker APIs Compared: Alpaca vs IBKR vs Tradier 2026

Broker APIs for retail AI trading 2026: Alpaca for solo ops (official MCP), IBKR for multi-asset depth, Tradier for options. Head-to-head + MCP angle.

Read
Tutorial · Runnable14-min read

Building a Production Claude Agent for Finance

Production Claude agent for finance: price-blind research, idempotent execution, heartbeat + watchdog + circuit breaker, under $225/month at small scale.

Read
Tutorial · Runnable10-min read

Calibrating LLM Forecasts with Isotonic Regression

LLM probabilities are systematically miscalibrated. Isotonic regression via PAV is the cheapest robust fix: 40 lines of Python, no distributional priors.

Read
Tutorial · Runnable8-min read

Conviction-Scaled Kelly Bet Sizing

Full Kelly is brutally unforgiving of over-estimation. Quarter-Kelly with a conviction-tier mapping and a per-trade cap is the defensible default.

Read
Tutorial · Runnable12-min read

Did You Overfit? PBO and Deflated Sharpe

A practical tutorial on the two best-documented tests for backtest overfitting — PBO via CSCV and the Deflated Sharpe Ratio. Runnable Python + tool.

Read
Methodology · Opinion11-min read

Finance MCP Servers: The Security Baseline

An opinionated rubric for grading 2026 finance MCP servers on scope, auth, idempotency, transport, and schema — plus the failure modes that kill agents.

Read
Tutorial · Runnable9-min read

Heartbeats, Watchdogs, Circuit Breakers for Trading

Silent failure is the worst failure mode. Three patterns prevent it — heartbeat, watchdog, circuit breaker — in under 100 lines of Python on launchd.

Read
Tutorial · Runnable9-min read

How to Read a Backtest Report: 2026 Cheat Sheet

Five questions a backtest report must answer — edge real, persistent, cheap to trade, bearable, explainable — with the statistics that verify each.

Read
Tutorial · Runnable9-min read

LLM Prompt Patterns for 10-K and 8-K Extraction

Three structured patterns for auditable 10-K extractions: field-by-field JSON, citation-required verbatim quotes, and contradiction-triangle cross-check.

Read
Comparison · Benchmark9-min read

Market Data APIs Compared: Databento vs Polygon 2026

Market data APIs compared: six retail providers on pricing, tier coverage, real-time access, options and futures coverage, and who wins for each profile.

Read
Tutorial · Runnable7-min read

Signal Orthogonality: Why Ensembles Become One Bet

A 10-signal ensemble with pairwise correlation 0.8 is effectively a 1.5-signal ensemble. The math, a two-minute diagnostic, and three axes that work.

Read
Pillar · Guide10-min read

The 2026 Engineer's Guide to AI in Markets

An engineer's map of where LLMs, MCP servers, and market-data APIs fit into a 2026 trading stack — and where they still break. Direct, no hype, no grift.

Read
Methodology · Opinion8-min read

The 5 Failure Modes of LLM Trading Agents (2026)

The 5 recurring failure modes in retail LLM trading agents: price-blind leaks, numeric fabrication, prompt drift, token runaway, audit amnesia.

Read
Methodology · Opinion9-min read

The 8-Step LLM Research Prompt Template

Free-form prompts yield uncalibrated output. An 8-step template — reference class, decomposition, pre-mortem, invalidation, JSON — fixes that.

Read
Methodology · Opinion12-min read

The BaFin + EU Guide for Retail AI Traders (2026)

BaFin and EU rules for retail AI trading, publishing finance content, and automated strategies. Education-safe phrasing and the minimum compliance stack.

Read
Methodology · Opinion8-min read

The Price-Blind LLM Research Harness

Price-blind LLM research — most harnesses leak the current price and the model confabulates. The architectural fix and a 30-line Python scaffold.

Read
Methodology · Opinion8-min read

The Token-Cost Reality of LLM Trading Research

What LLM trading research costs per idea and per validated trade across Claude, GPT-5, and Gemini 2.5. Pricing, caching, model-mix under $200/month.

Read

Publication standards

Planning estimates only — not financial, tax, or investment advice.