Playground

Fallback Chain Simulator

Name: Fallback Chain Simulator
Author: AI Fin Hub Research

LLM fallback chain simulator: Monte Carlo a primary + two fallbacks across Anthropic, OpenAI, Google. Success rate, p50/p95/p99 latency, cost, degradations.

AI Fin Hub Research Published Apr 23, 2026 Methodology Corrections

Transparent by design — computed in your browser from a published formula and sourced rates, not a black box. Data verified May 25, 2026. Sources: Anthropic pricing ↗ · OpenAI pricing ↗ · Google AI / Gemini pricing ↗ Full methodology →

Inputs: Paste + configure
Runtime: 1–15 s
Privacy: Client-side · no upload
API key: Not required
Methodology: Open →

Education · Not investment advice. BaFin/EU framework. Past performance does not indicate future results. Editorial standards Sponsor disclosure Corrections

1 · Configure the chain

Primaryprimary

ProviderModel

429 rate8.0%

p99 latency (ms)3500ms

ref p50 = 900ms

Fallback 1fallback 1

ProviderModel

429 rate5.0%

p99 latency (ms)3000ms

ref p50 = 700ms

Enable fallback 2 (optional)

Fallback 2fallback 2

ProviderModel

429 rate3.0%

p99 latency (ms)2500ms

Deadline (ms)Monte Carlo trialsAvg input tokens / callAvg output tokens / call

Success rate

70.0%

700 / 1000 trials cleared the 1.00s deadline · p95 1.00s · p99 1.00s

p50: 571ms · Cost: $20.604 total ($0.021/call) · Degradations: F1 55 · F2 5

2 · Provider utilization (share of successful trials)

Claude Sonnet 4.6primary91.4% · 640 trials

GPT-5.4 minifallback 17.9% · 55 trials

Gemini 2.5 Flashfallback 20.7% · 5 trials

Failed (exceeded deadline)30.0% · 300 trials

Recommendation

Gemini 2.5 Flash has a better cost-per-successful-call than the current primary (0.0316 → 0.0030). Consider swapping.

How the trial is simulated

for each trial:
  elapsed = 0
  for leg in [primary, fallback1, fallback2?]:
    if uniform() < rate_429:   # throttled
      elapsed += 50ms; continue
    latency ~ Exponential(mean = max(p50, p99 / ln(100)))
    elapsed += latency
    if elapsed <= deadline:    # success on this leg
      return success
  return failure

Failure modes modeled independently per leg (rate-limit + latency tail). See methodology for assumptions and limits — real outages are correlated and bursty.

How to use

Step-by-step

Full calculator guide →

1
Add chain links in priority order: primary endpoint, fallback 1, fallback 2, ...
2
For each link, set: model, per-call cost, expected latency, and failure rate.
3
Run the simulator. Read total cost, end-to-end latency distribution, and overall chain success rate.
4
Toggle 'retry on timeout' vs. 'fall through on timeout'. Retry preserves the primary's cost; fall-through escalates to the next link.
5
Compare chains: 2-link primary+fallback vs. 3-link with extra cache. The marginal gain in success rate often diminishes after the second fallback.

For agents

Use in an agent

Same math, same result shape as the UI above — as a static ES module. No HTTP request, no auth, no rate limit.

import { compute } from "https://aifinhub.io/engines/fallback-chain-simulator.js";

Contract: /contracts/fallback-chain-simulator.json Full agent guide →

Glossary references

Terms used by this tool

All glossary →

Questions people ask next

FAQ

What's a fallback chain?

A sequence of LLM endpoints to try in order when an upstream call fails: e.g., primary Anthropic, fallback OpenAI, last-resort cache. Production agent systems use chains to maintain availability when individual providers go down or rate-limit. The simulator models cost, latency, and success rate across the chain.

How are failure rates calibrated?

From two sources documented on the methodology page: provider status pages (incident frequency over the last 12 months) and OpenAI's published reliability targets. Defaults are conservative — 99.5% for production-tier APIs, lower for free tiers. You can override per-link to model your specific deployment's observed reliability.

What's the cost of a fallback path?

Cumulative cost of every link tried before success. If the primary fails 0.5% of the time and the fallback is 3× more expensive, your effective cost is 1.005 × primary + 0.005 × 3·primary ≈ 1.020× primary. Worst-case (primary AND fallback both down) you pay all three providers.

Should the fallback always be a different provider?

Multi-provider fallback gives correlation-of-failure protection (one provider's outage doesn't take down both). Same-provider fallback (different model tier) gives capacity-overflow protection (rate-limit on Opus, retry on Sonnet). The simulator lets you mix both strategies.

Does the simulator include retry-on-timeout vs. retry-on-error?

Yes, configurable per link. Retry-on-timeout adds latency but keeps cost low. Retry-on-error after a rate-limit is wasteful — better to fall back. The tool surfaces both metrics so you can see the cost-vs-latency tradeoff.

Related deep dive

All articles →

Read further

Long-form context behind the tool output.

Complementary tools

Agent Cost Envelope Calculator

Model an LLM research loop end-to-end — steps, tool calls, convergence checks, markets per day — and see per-loop, daily, and monthly cost with cost-cap.

Calculators Open

Trading System Blueprinter

Pick your data source, LLM, broker, storage, risk engine, and logger. Get a Mermaid architecture diagram, a starter repo scaffold (ZIP), and a list.

Generators Open

Token-Cost Optimizer

Compute the dollar cost of a trading research loop across Claude, GPT, and Gemini. Prompt length × model × retry × call volume → cost per idea and per.

Calculators Open

1 · Configure the chain

2 · Provider utilization (share of successful trials)

Recommendation

How the trial is simulated

Step-by-step

Use in an agent

Terms used by this tool

FAQ

Read further

Users of this tool often explore

Agent Cost Envelope Calculator

Trading System Blueprinter

Token-Cost Optimizer