Skip to main content
aifinhub
Risk & Portfolio Construction Guide

How to Backtest a Value-at-Risk Model

A value-at-risk number is a prediction: losses should exceed it only as often as the confidence level allows, and the breaches should be scattered, not bunched. Backtesting checks both. A VaR model that passes the frequency test but fails independence understates tail risk in stressed periods, which is the worst time to be wrong. The two standard tests, what each catches, and why the independence check matters more than the frequency count are covered here.

By AI Fin Hub Research · AI Fin Hub Team
Best Next MovePlaygrounds

VaR Backtest — Kupiec & Christoffersen

Paste P&L + VaR series and run Kupiec POF, Christoffersen independence, and joint conditional-coverage tests. Likelihood-ratio χ² p-values.

CalculatorOpen ->

On This Page

Before You Start

Set up the inputs that make the next steps easier

A series of VaR forecasts at a stated confidence level, one per period.
The realized portfolio returns or losses for the same periods.
Enough observations that the expected number of breaches is large enough to test meaningfully.

Guide Steps

Move through it in order

Each step focuses on one decision so you can keep momentum without losing the thread.

  1. 1

    Define the breach indicator

    For each period, mark a breach when the realized loss exceeds the VaR forecast for that period. This produces a sequence of zeros and ones, the breach indicator, which is the raw material for every VaR backtest. The VaR confidence level sets the expected breach rate: a 99 percent one-day VaR should be breached on about one percent of days. Everything downstream tests whether this sequence behaves the way the model claims.

    Use out-of-sample VaR forecasts. Backtesting a VaR model on the same data it was fit on overstates how well it works, just like any other backtest.

  2. 2

    Run the Kupiec frequency test

    The Kupiec proportion-of-failures test checks whether the observed breach rate matches the expected rate implied by the confidence level. Too many breaches means the model understates risk; too few means it overstates risk and ties up capital needlessly. The test produces a statistic and a p-value for the null hypothesis that the breach frequency is correct. It is the first gate: a model that fails frequency is mis-scaled before you even look at timing.

    Too few breaches is a real failure too, not a free pass. An overly conservative VaR wastes capital and signals the model is not capturing the actual distribution.

  3. 3

    Run the Christoffersen independence test

    Passing the frequency test is not enough, because breaches can occur at the right rate but bunch together in stressed periods. The Christoffersen test checks whether a breach today is independent of a breach yesterday. If breaches cluster, the model has the right average but the wrong dynamics: it underestimates risk precisely when volatility spikes. Independence is the property that tells you the model holds up in the conditions you most need it to.

    Clustered breaches mean your VaR is calm until it is suddenly very wrong, all at once. That is the failure mode that causes blowups.

  4. 4

    Combine frequency and independence

    The Christoffersen conditional-coverage test combines both properties into one: correct breach frequency and independent breaches. A model must pass both to be trustworthy. Passing only frequency means the average is right but the timing is dangerous; passing only independence is meaningless if the rate is wrong. Read the combined test as the headline result, then use the individual tests to diagnose which property failed when it does.

    When the combined test fails, look at the components to localize the problem. Frequency failure is a scaling issue; independence failure is a dynamics issue, and they need different fixes.

  5. 5

    Act on the diagnosis

    A frequency failure usually means recalibrating the VaR level or the distributional assumption. An independence failure usually means the model is not capturing volatility clustering, which calls for a model that updates its risk estimate as volatility changes rather than a static one. Either way, the backtest does not just pass or fail; it points at what to fix. Re-run the tests after the fix on fresh data to confirm the correction held.

    Independence failures point toward a volatility-aware model. A static VaR that ignores changing volatility will keep clustering breaches no matter how you rescale it.

Common Mistakes

The misses that undo good inputs

1

Testing only breach frequency

A model can breach at exactly the right rate while bunching all the breaches into stressed periods. Frequency alone misses this, and clustered breaches are the failure mode that causes losses to arrive all at once.

2

Treating too few breaches as success

An overly conservative VaR that rarely breaches is not capturing the real distribution; it ties up capital and signals the model is mis-specified, which the frequency test correctly flags as a failure.

3

Backtesting VaR on in-sample data

Evaluating a VaR model on the data it was calibrated to overstates its accuracy. The breach behavior must be tested on out-of-sample forecasts to reflect how the model performs on unseen markets.

Try These Tools

Run the numbers next

FAQ

Questions people ask next

The short answers readers usually want after the first pass.

Because a model with the right breach frequency but clustered breaches fails exactly when it matters. If breaches bunch during a volatility spike, the model is calm and reassuring right up until it is very wrong all at once, which is the dynamic behind risk-model blowups. A correct average breach rate offers little comfort if the breaches arrive together, so independence is the property that determines whether the model holds in stress.

Sources & References

Related Content

Keep the topic connected

Planning estimates only — not financial, tax, or investment advice.