Skip to main content
aifinhub

Worked example

Running the shipped fallback-chain-simulator engine on the input below produces exactly this output. Continuous integration recomputes it against the engine bundle on every build, so these numbers cannot drift from the code.

Input

{
  "tool": "fallback_chain_simulator",
  "provider_id": "",
  "model_id": "",
  "rate_429": 0,
  "p99_latency_ms": 0,
  "deadline_ms": 0,
  "input_tokens": 0,
  "output_tokens": 0,
  "trials": 1000,
  "seed": 1
}

Output

{
  "providers": [
    {
      "id": "anthropic",
      "name": "Anthropic",
      "models": [
        "claude-opus-4-7",
        "claude-sonnet-4-6",
        "claude-haiku-4-5"
      ]
    },
    {
      "id": "openai",
      "name": "OpenAI",
      "models": [
        "gpt-5",
        "gpt-5-mini",
        "o4-mini"
      ]
    },
    {
      "id": "google",
      "name": "Google",
      "models": [
        "gemini-2-5-pro",
        "gemini-3-5-flash",
        "gemini-2-5-flash",
        "gemini-2-5-flash-lite"
      ]
    }
  ],
  "hint": "Send a POST with a `chain` body to simulate. See OpenAPI spec for schema."
}

Frequently asked questions

What does the Fallback Chain Simulator methodology page document?
How the Fallback Chain Simulator computes success rate, latency percentiles, cost, and degradation events for primary + fallback LLM provider chains. It states the formulas, assumptions, data sources, limitations, and reproducibility steps behind the Fallback Chain Simulator, in the Finance category.
When was the Fallback Chain Simulator methodology last reviewed?
This methodology was last reviewed on 2026-04-24. The matching tool is at https://aifinhub.io/fallback-chain-simulator/.
Are the Fallback Chain Simulator numbers reproducible?
Yes. This page embeds a worked example whose output is the verbatim result of running the shipped fallback-chain-simulator engine on a fixed input; the embedded JSON is recomputed and diffed against the engine in CI, so the numbers cannot drift from the code.

How Fallback Chain Simulator works

The Fallback Chain Simulator models the pattern most production LLM agents eventually reach for: a primary provider that handles the happy path, a first fallback that catches rate limits and latency blowups, and an optional second fallback for full-provider outages. It reports how often the whole chain makes it inside your deadline, how much you pay per call on average, and where requests actually end up landing.

What the tool computes

For each Monte Carlo trial (default 1,000), the simulator walks the chain in order. For every leg it rolls two independent outcomes: a 429 / throttle response, and a latency sample drawn from an exponential distribution. A leg "succeeds" when it does not throttle and the cumulative elapsed time stays at or under your deadline. If a leg fails, control passes to the next leg and the already-spent time is carried forward. If no leg succeeds in time, the trial is a failure and contributes the deadline value to the latency distribution.

Outputs: overall success rate, p50 / p95 / p99 latency, aggregate and per-call cost across the trial set, number of trials that degraded to fallback 1 or fallback 2, and a share-of-successes bar chart per provider. A short recommendation flags whether the current primary is the best cost-per-successful-call leg given the inputs — a common configuration error is picking a cheap but unreliable primary whose failures push most traffic (and most cost) onto the fallback.

Inputs and assumptions

Formulas

for each trial:
  elapsed = 0; cost = 0
  for leg in [primary, fallback1, fallback2?]:
    if uniform() < leg.rate_429:
      elapsed += 50ms
      continue
    mean      = max(model.p50, leg.p99 / ln(100))
    latency   = -ln(1 - uniform()) * mean   # exponential
    elapsed  += latency
    cost     += leg.input_tokens * price_in
              + leg.output_tokens * price_out
    if elapsed <= deadline:
      return success
  return failure

Recommendation logic

Each leg gets an expected cost-per-successful-call score: cost_per_success = tokens × price / P(success), where P(success) = (1 − rate_429) × P(latency ≤ deadline). The recommendation flags the leg with the lowest score; if it is not the current primary, it suggests swapping. This is a per-leg comparison, not an end-to-end optimization — the Monte Carlo result above still reflects the chain you actually configured.

Limitations

Pricing sources

Related articles

Changelog

Planning estimates only — not financial, tax, or investment advice.