Fallback Chain Simulation: When Claude Is Down

A finance research stack that depends on a single provider has roughly 99.5% effective uptime on a good year. A two-stage fallback chain (Anthropic → OpenAI) lifts that to 99.99%. The Fallback Chain Simulator catalogue lists three providers and ten models — claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5, gpt-5, gpt-5-mini, o4-mini, gemini-2-5-pro, gemini-3-5-flash, gemini-2-5-flash, gemini-2-5-flash-lite. The simulator's job is not to compute the same answer twice; it is to compute an answer when the primary provider returns 503.

TL;DR

Single-provider uptime: ~99.5% on a typical year. Anthropic, OpenAI, and Google all publish status pages with comparable historic figures¹²³.
Two-stage chain (primary → fallback): ~99.99% effective uptime, assuming independent outages.
The fallback model does not need to match the primary on capability. It needs to return a defensible answer.
Cost overhead of a fallback chain: 5-15% on average, dominated by occasional retry duplication.
Implementation pattern: try primary with timeout, on failure attempt fallback, log both attempts.

The providers

The Fallback Chain Simulator ships with three providers and ten models:

Provider	Models
Anthropic	claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5
OpenAI	gpt-5, gpt-5-mini, o4-mini
Google	gemini-2-5-pro, gemini-3-5-flash, gemini-2-5-flash, gemini-2-5-flash-lite

The simulator takes a chain spec (ordered list of {provider, model, timeout}) and a synthetic failure model, and returns the expected effective uptime and the blended cost per request.

The single-provider failure mode

Provider outages are rare but not negligible. Anthropic's public status page lists a small number of multi-minute incidents per quarter¹. OpenAI's incident history is similar². Google's Gemini API has a comparable pattern³. None of the three has demonstrated 99.99% uptime over a multi-year period.

A retail research loop running 4-hourly through a single provider hits provider downtime roughly 8-12 times per year on average. Each incident is 5-30 minutes. The cumulative downtime is hours to single-digit days per year of effective unavailability.

For a finance research stack, the cost of downtime is bounded but real. A morning research review that requires the previous night's batch is delayed; a trading decision that requires real-time LLM analysis is postponed. The loss is not catastrophic but it is recurring.

The two-stage chain

A two-stage chain switches to a different provider on primary failure. The mechanics:

def fallback_call(prompt, primary, fallback, timeout_s=30):
    try:
        return primary.call(prompt, timeout=timeout_s)
    except (RateLimit, ServerError, Timeout) as e:
        log.warn(f"primary {primary.model} failed: {e}, falling back")
        return fallback.call(prompt, timeout=timeout_s)

The fallback is a different provider, not a different model from the same provider. Single-provider outages cascade across all models from that provider; a fallback to claude-haiku when claude-opus is down does not help if Anthropic's API itself is unreachable.

The effective uptime under independent outages is:

P(up) = 1 - P(primary down) × P(fallback down)
      = 1 - 0.005 × 0.005
      = 0.99997

99.997% effective uptime. The independence assumption is the load-bearing one — see "When the chain doesn't help" below.

Cost overhead

The fallback chain costs more than the primary alone for three reasons:

Failed primary calls still cost. Most providers do not charge for 5xx errors, but they do charge for 4xx errors and for successful calls that the caller times out before completion. Retries on partial completions can incur double cost on the retried token-count.
Fallback provider may be more expensive. If primary is Haiku and fallback is GPT-5, the fallback's per-call cost is higher. Blend depends on the failure rate of the primary.
Retry logic adds latency on every call. A 30-second primary timeout means every failed call adds 30s before fallback engages. For real-time workloads, the latency budget gets eaten by the timeout choice.

Empirically, a well-tuned two-stage chain runs 5-15% over single-provider cost. The variance is dominated by the primary's failure rate during the measurement window.

When the chain doesn't help

The independence assumption fails in three scenarios:

Correlated provider outages. A network-level outage (Cloudflare, AWS region) can take down multiple providers at once. The fallback is only useful if it lives on a different network path.
Rate-limit cascades. A user who saturates their rate limit on the primary will saturate it on the fallback if they fail-over the entire workload. The fallback needs its own rate budget.
Single-vendor compliance. A pipeline that requires BaFin-approved data residency may be constrained to one provider's EU region. Falling over to a US-only provider violates the compliance constraint. See BaFin + EU Guide for Retail AI Traders.

In each case, the fallback is partial cover at best, not full cover.

Three-stage chains

For mission-critical loops, a three-stage chain adds a third provider:

Anthropic Claude Sonnet → OpenAI GPT-5 mini → Google Gemini 2.5 Flash

Under independent outages: 99.9999% effective uptime. The marginal benefit over a two-stage chain is small — the gap is bounded by the probability of a correlated outage across all three providers, which is dominated by infrastructure failures, not LLM provider issues.

The cost overhead climbs to 10-25% over single-provider. For most retail research loops, the two-stage chain is the right balance. For pipelines whose downtime cost exceeds €1k/hour, the three-stage is defensible.

Implementation rules

A fallback chain that does what it says requires four guardrails:

Different network paths. Hosting the primary on Anthropic's API and the fallback on OpenAI's API satisfies this; both hosted on AWS does not, if AWS is the failure mode.
Independent rate budgets. Don't share rate-limit headroom across the chain.
Log both attempts. Without telemetry on fallback rate, you can't tell when the primary is degrading.
Test the fallback periodically. A fallback that has not run in 6 months may be misconfigured. Synthetic monthly checks against the fallback alone catch silent rot.

Compliance considerations

For published trading content under MiFID II and BaFin frameworks, the fallback's output must meet the same disclosure standards as the primary. If the chain falls back to a provider whose output behaviour differs materially, the published research may carry inconsistent framing across requests. Document the fallback policy and the comparative output behaviour in the methodology disclosure⁴.

Failure modes

Correlated rate limits across providers. A user who saturates one provider will likely saturate another if the workload fails over wholesale. Pre-allocate quota separately.
Silent fallback rot. A fallback that has not run in months may be misconfigured or have a stale API key. Run synthetic test traffic against the fallback.
Treating the fallback as identical to the primary. Output formatting, structured-output adherence, and citation behaviour differ across providers. The fallback's output may need a different downstream parser.
Skipping the timeout. A primary that hangs (rather than fails) blocks the fallback. Set a hard timeout on every primary call.

FAQ

Should the fallback be a cheaper model?

Usually yes. The fallback runs rarely; a cheaper model accepts a small quality drop on the rare fallback case to keep blended cost low. For high-stakes decision loops where the fallback's output drives money, match the primary's tier.

How do I monitor the chain's effective uptime?

Log every primary attempt, every fallback attempt, and every failure of both. Compute monthly: (successful primary + successful fallback) / total attempts. Anything under 99.95% is a chain that is not adding the uptime it promises.

Does the chain affect prompt caching?

Each provider caches independently. A fallback that runs once per 200 calls does not get cache hits — the cache requires repeated calls to the same provider. Cost models should not credit cache savings to the fallback leg.

Connects to

Vendor Lock-in: Cross-Provider Fallback: strategic side of the same decision.
Production Claude Agent for Finance: primary-side reliability patterns.
Heartbeats, Watchdogs, and Circuit Breakers: observability for the chain.
Rate-Limit Design for LLM Research: per-provider quota planning.
Fallback Chain Simulator: model your own chain.
Fallback Chain Simulator methodology: full input/output specification.

References

Anthropic Status (2026). "Historical incident reports." status.anthropic.com ↩ ↩²
OpenAI Status (2026). "Historical incident reports." status.openai.com ↩ ↩²
Google Cloud Status (2026). "Vertex AI incident history." status.cloud.google.com ↩ ↩²
ESMA (2023). "Guidelines on certain aspects of the MiFID II suitability requirements." esma.europa.eu ↩

Verified engine output

Show the recompute-verified inputs and outputs

Provider and model catalogue (3 providers, 10 models)

Result
providers › row 1 › id	anthropic
providers › row 1 › name	Anthropic
providers › row 1 › models › row 1	claude-opus-4-8
providers › row 1 › models › row 2	claude-sonnet-4-6
providers › row 1 › models › row 3	claude-haiku-4-5
providers › row 2 › id	openai
providers › row 2 › name	OpenAI
providers › row 2 › models › row 1	gpt-5
providers › row 2 › models › row 2	gpt-5-mini
providers › row 2 › models › row 3	o4-mini
providers › row 3 › id	google
providers › row 3 › name	Google
providers › row 3 › models › row 1	gemini-2-5-pro
providers › row 3 › models › row 2	gemini-3-5-flash
providers › row 3 › models › row 3	gemini-2-5-flash
providers › row 3 › models › row 4	gemini-2-5-flash-lite
hint	Send a POST with a `chain` body to simulate. See OpenAPI spec for schema.

Computed live at build time.