Skip to main content
aifinhub

Playground

Fallback Chain Simulator

LLM fallback chain simulator: Monte Carlo a primary + two fallbacks across Anthropic, OpenAI, Google. Success rate, p50/p95/p99 latency, cost, degradations.

Transparent by design — computed in your browser from a published formula and sourced rates, not a black box. Data verified May 25, 2026. Sources: Anthropic pricing ↗ · OpenAI pricing ↗ · Google AI / Gemini pricing ↗ Full methodology →

Inputs
Paste + configure
Runtime
1–15 s
Privacy
Client-side · no upload
API key
Not required
Methodology
Open →

Education · Not investment advice. BaFin/EU framework. Past performance does not indicate future results. Editorial standards Sponsor disclosure Corrections

1 · Configure the chain

Primaryprimary
8.0%
3500ms

ref p50 = 900ms

Fallback 1fallback 1
5.0%
3000ms

ref p50 = 700ms

Fallback 2fallback 2
3.0%
2500ms

Success rate

70.0%

700 / 1000 trials cleared the 1.00s deadline · p95 1.00s · p99 1.00s

p50: 571ms  ·  Cost: $20.604 total ($0.021/call)  ·  Degradations: F1 55 · F2 5

2 · Provider utilization (share of successful trials)

Claude Sonnet 4.6primary91.4% · 640 trials
GPT-5.4 minifallback 17.9% · 55 trials
Gemini 2.5 Flashfallback 20.7% · 5 trials
Failed (exceeded deadline)30.0% · 300 trials

Recommendation

Gemini 2.5 Flash has a better cost-per-successful-call than the current primary (0.0316 → 0.0030). Consider swapping.

How the trial is simulated

for each trial:
  elapsed = 0
  for leg in [primary, fallback1, fallback2?]:
    if uniform() < rate_429:   # throttled
      elapsed += 50ms; continue
    latency ~ Exponential(mean = max(p50, p99 / ln(100)))
    elapsed += latency
    if elapsed <= deadline:    # success on this leg
      return success
  return failure

Failure modes modeled independently per leg (rate-limit + latency tail). See methodology for assumptions and limits — real outages are correlated and bursty.

How to use

Step-by-step

Full calculator guide →
  1. 1

    Add chain links in priority order: primary endpoint, fallback 1, fallback 2, ...

  2. 2

    For each link, set: model, per-call cost, expected latency, and failure rate.

  3. 3

    Run the simulator. Read total cost, end-to-end latency distribution, and overall chain success rate.

  4. 4

    Toggle 'retry on timeout' vs. 'fall through on timeout'. Retry preserves the primary's cost; fall-through escalates to the next link.

  5. 5

    Compare chains: 2-link primary+fallback vs. 3-link with extra cache. The marginal gain in success rate often diminishes after the second fallback.

For agents

Use in an agent

Same math, same result shape as the UI above — as a static ES module. No HTTP request, no auth, no rate limit.

import { compute } from "https://aifinhub.io/engines/fallback-chain-simulator.js";

Contract: /contracts/fallback-chain-simulator.json Full agent guide →

Glossary references

Terms used by this tool

All glossary →

Questions people ask next

FAQ

What's a fallback chain?

A sequence of LLM endpoints to try in order when an upstream call fails: e.g., primary Anthropic, fallback OpenAI, last-resort cache. Production agent systems use chains to maintain availability when individual providers go down or rate-limit. The simulator models cost, latency, and success rate across the chain.

How are failure rates calibrated?

From two sources documented on the methodology page: provider status pages (incident frequency over the last 12 months) and OpenAI's published reliability targets. Defaults are conservative — 99.5% for production-tier APIs, lower for free tiers. You can override per-link to model your specific deployment's observed reliability.

What's the cost of a fallback path?

Cumulative cost of every link tried before success. If the primary fails 0.5% of the time and the fallback is 3× more expensive, your effective cost is 1.005 × primary + 0.005 × 3·primary ≈ 1.020× primary. Worst-case (primary AND fallback both down) you pay all three providers.

Should the fallback always be a different provider?

Multi-provider fallback gives correlation-of-failure protection (one provider's outage doesn't take down both). Same-provider fallback (different model tier) gives capacity-overflow protection (rate-limit on Opus, retry on Sonnet). The simulator lets you mix both strategies.

Does the simulator include retry-on-timeout vs. retry-on-error?

Yes, configurable per link. Retry-on-timeout adds latency but keeps cost low. Retry-on-error after a rate-limit is wasteful — better to fall back. The tool surfaces both metrics so you can see the cost-vs-latency tradeoff.

Complementary tools

Planning estimates only — not financial, tax, or investment advice.