How to use Fallback Chain Simulator
Define a provider fallback chain. The page simulates rate-limit and latency failures across a configurable load profile and reports p50/p95/p99 latency, success rate, total cost, and degradation events so you can size the chain before deploying it.
What It Does
Use the calculator with intent
Define a provider fallback chain. The page simulates rate-limit and latency failures across a configurable load profile and reports p50/p95/p99 latency, success rate, total cost, and degradation events so you can size the chain before deploying it.
Reliability engineers designing multi-provider LLM stacks who learned that a single-provider outage takes the agent down — and need to size the fallback before it matters.
Interpreting Results
p99 latency is the headline — most agents need p99 under a budget for interactive use. Success rate close to 100% means the chain is robust enough; below 99% suggests adding another fallback. Cost is the trade-off — robust chains cost more.
Input Steps
Field by field
- 1
Add
Add chain links in priority order: primary endpoint, fallback 1, fallback 2, ...
- 2
For
For each link, set: model, per-call cost, expected latency, and failure rate.
- 3
Run calculation
Run the simulator. Read total cost, end-to-end latency distribution, and overall chain success rate.
- 4
Toggle setting
Toggle 'retry on timeout' vs. 'fall through on timeout'. Retry preserves the primary's cost; fall-through escalates to the next link.
- 5
Compare results
Compare chains: 2-link primary+fallback vs. 3-link with extra cache. The marginal gain in success rate often diminishes after the second fallback.
Common Scenarios
Use realistic starting points
Two-provider chain (Claude → GPT)
Primary
Sonnet
Fallback
GPT-4o
p99 dominated by fallback latency; success rate ~99.5%. Cost slightly higher than primary-only when fallback fires.
Three-provider chain (Claude → GPT → Gemini)
Primary
Sonnet
Fallbacks
GPT-4o, Gemini-2.5
Success rate ~99.9%; p99 driven by the slowest provider in the chain. Cost steps up further when both fallbacks fire on the same call.
Try These Tools
Run the numbers next
Agent Cost Envelope Calculator
Model an LLM research loop end-to-end — steps, tool calls, convergence checks, markets per day — and see per-loop, daily, and monthly cost with cost-cap.
Token-Cost Optimizer
Compute the dollar cost of a trading research loop across Claude, GPT, and Gemini. Prompt length × model × retry × call volume → cost per idea and per.
Model Selector for Finance
Input task, latency budget, cost budget, context size, and quality sensitivity; get ranked model recommendations with rationale — grounded in published.
FAQ
Questions people ask next
The short answers readers usually want after the first pass.
Related Content
Keep the topic connected
Agent-Cost Envelope
The agent-cost envelope: the loop of (calls × tokens × retries × model_price) that determines the dollar cost of an LLM-driven trading agent per decision.
Model Drift
Model drift: when an LLM's behavior changes between calls, versions, or weeks. The monitoring stack that catches it before production breaks.