aifinhub

Playground

Fallback Chain Simulator

LLM fallback chain simulator: Monte Carlo a primary + two fallbacks across Anthropic, OpenAI, Google. Success rate, p50/p95/p99 latency, cost, degradations.

Inputs
Paste + configure
Runtime
1–15 s
Privacy
Client-side · no upload
API key
Not required
Methodology
Open →

1 · Configure the chain

Primaryprimary
8.0%
3500ms

ref p50 = 900ms

Fallback 1fallback 1
5.0%
3000ms

ref p50 = 700ms

Fallback 2fallback 2
3.0%
2500ms

Success rate

70.0%

700 / 1000 trials

p50 latency

571ms

p95 latency

1.00s

p99 latency

1.00s

Total cost

$22.906

avg $0.023/call

Degradations

60

F1:55 · F2:5

2 · Provider utilization (share of successful trials)

Claude Sonnet 4.6primary91.4% · 640 trials
GPT-5 minifallback 17.9% · 55 trials
Gemini 2.5 Flashfallback 20.7% · 5 trials
Failed (exceeded deadline)30.0% · 300 trials

Recommendation

Gemini 2.5 Flash has a better cost-per-successful-call than the current primary (0.0316 → 0.0030). Consider swapping.

How the trial is simulated

for each trial:
  elapsed = 0
  for leg in [primary, fallback1, fallback2?]:
    if uniform() < rate_429:   # throttled
      elapsed += 50ms; continue
    latency ~ Exponential(mean = max(p50, p99 / ln(100)))
    elapsed += latency
    if elapsed <= deadline:    # success on this leg
      return success
  return failure

Failure modes modeled independently per leg (rate-limit + latency tail). See methodology for assumptions and limits — real outages are correlated and bursty.

Complementary tools

Planning estimates only — not financial, tax, or investment advice.