Playground
Fallback Chain Simulator
LLM fallback chain simulator: Monte Carlo a primary + two fallbacks across Anthropic, OpenAI, Google. Success rate, p50/p95/p99 latency, cost, degradations.
Transparent by design — computed in your browser from a published formula and sourced rates, not a black box. Data verified May 25, 2026. Sources: Anthropic pricing ↗ · OpenAI pricing ↗ · Google AI / Gemini pricing ↗ Full methodology →
- Inputs
- Paste + configure
- Runtime
- 1–15 s
- Privacy
- Client-side · no upload
- API key
- Not required
- Methodology
- Open →
1 · Configure the chain
ref p50 = 900ms
ref p50 = 700ms
Success rate
70.0%
700 / 1000 trials cleared the 1.00s deadline · p95 1.00s · p99 1.00s
p50: 571ms · Cost: $20.604 total ($0.021/call) · Degradations: F1 55 · F2 5
2 · Provider utilization (share of successful trials)
Recommendation
Gemini 2.5 Flash has a better cost-per-successful-call than the current primary (0.0316 → 0.0030). Consider swapping.
How the trial is simulated
for each trial:
elapsed = 0
for leg in [primary, fallback1, fallback2?]:
if uniform() < rate_429: # throttled
elapsed += 50ms; continue
latency ~ Exponential(mean = max(p50, p99 / ln(100)))
elapsed += latency
if elapsed <= deadline: # success on this leg
return success
return failureFailure modes modeled independently per leg (rate-limit + latency tail). See methodology for assumptions and limits — real outages are correlated and bursty.
How to use
Step-by-step
- 1
Add chain links in priority order: primary endpoint, fallback 1, fallback 2, ...
- 2
For each link, set: model, per-call cost, expected latency, and failure rate.
- 3
Run the simulator. Read total cost, end-to-end latency distribution, and overall chain success rate.
- 4
Toggle 'retry on timeout' vs. 'fall through on timeout'. Retry preserves the primary's cost; fall-through escalates to the next link.
- 5
Compare chains: 2-link primary+fallback vs. 3-link with extra cache. The marginal gain in success rate often diminishes after the second fallback.
For agents
Use in an agent
Same math, same result shape as the UI above — as a static ES module. No HTTP request, no auth, no rate limit.
import { compute } from "https://aifinhub.io/engines/fallback-chain-simulator.js"; Contract: /contracts/fallback-chain-simulator.json Full agent guide →
Glossary references
Terms used by this tool
Questions people ask next
FAQ
What's a fallback chain?
A sequence of LLM endpoints to try in order when an upstream call fails: e.g., primary Anthropic, fallback OpenAI, last-resort cache. Production agent systems use chains to maintain availability when individual providers go down or rate-limit. The simulator models cost, latency, and success rate across the chain.
How are failure rates calibrated?
From two sources documented on the methodology page: provider status pages (incident frequency over the last 12 months) and OpenAI's published reliability targets. Defaults are conservative — 99.5% for production-tier APIs, lower for free tiers. You can override per-link to model your specific deployment's observed reliability.
What's the cost of a fallback path?
Cumulative cost of every link tried before success. If the primary fails 0.5% of the time and the fallback is 3× more expensive, your effective cost is 1.005 × primary + 0.005 × 3·primary ≈ 1.020× primary. Worst-case (primary AND fallback both down) you pay all three providers.
Should the fallback always be a different provider?
Multi-provider fallback gives correlation-of-failure protection (one provider's outage doesn't take down both). Same-provider fallback (different model tier) gives capacity-overflow protection (rate-limit on Opus, retry on Sonnet). The simulator lets you mix both strategies.
Does the simulator include retry-on-timeout vs. retry-on-error?
Yes, configurable per link. Retry-on-timeout adds latency but keeps cost low. Retry-on-error after a rate-limit is wasteful — better to fall back. The tool surfaces both metrics so you can see the cost-vs-latency tradeoff.
Related deep dive
All articles →Read further
Long-form context behind the tool output.
- Tutorial · Runnable·10 min
Rate Limit Design for LLM Research Loops
Three primitives that turn bursty finance workloads into stable loops: per-provider token bucket, cross-provider fallback chain, and graceful degradation.
Read - Pillar · Guide·13 min
Model Selection in Finance: Surviving Benchmarks
Model selection finance methodology: a five-axis rubric, quarterly rebench cadence, version-pinning, and shadow A/B that survive 3-6 month.
Read - Pillar · Guide·11 min
Vendor Lock-In Risk: How to Architect Cross-Provider
Anthropic, OpenAI, and Google can all break or price-jump in one quarter. The fallback-chain architecture that survives a single-vendor outage.
Read
Complementary tools
Users of this tool often explore
Agent Cost Envelope Calculator
Model an LLM research loop end-to-end — steps, tool calls, convergence checks, markets per day — and see per-loop, daily, and monthly cost with cost-cap.
Trading System Blueprinter
Pick your data source, LLM, broker, storage, risk engine, and logger. Get a Mermaid architecture diagram, a starter repo scaffold (ZIP), and a list.
Token-Cost Optimizer
Compute the dollar cost of a trading research loop across Claude, GPT, and Gemini. Prompt length × model × retry × call volume → cost per idea and per.