What does the Kupiec test actually measure?

It measures whether the proportion of VaR breaches matches the rate implied by the confidence level. For a 99 percent VaR, roughly one percent of periods should breach. The Kupiec proportion-of-failures test compares the observed count to this expectation and returns a statistic and p-value for the null that the frequency is correct. It catches a mis-scaled model but says nothing about whether the breaches are independent over time.

How many observations do I need to backtest VaR?

Enough that the expected number of breaches is large enough for the tests to have power. At a 99 percent confidence level only about one percent of periods breach, so you need a long history, often a year or more of daily data, before the breach count is large enough to distinguish a good model from a bad one. With too few expected breaches, both the frequency and independence tests lose the ability to reject a faulty model.

My VaR passes frequency but fails independence. What should I change?

An independence failure means breaches cluster, which usually indicates the model is not capturing volatility clustering. The fix is typically a volatility-aware model that updates its risk estimate as conditions change, rather than a static VaR that assumes constant volatility. Rescaling the level will not help, because the problem is the dynamics, not the average. After switching to a conditional model, re-run both tests on fresh data to confirm the clustering is gone.

Risk & Portfolio Construction Guide

How to Backtest a Value-at-Risk Model

A value-at-risk number is a prediction: losses should exceed it only as often as the confidence level allows, and the breaches should be scattered, not bunched. Backtesting checks both. A VaR model that passes the frequency test but fails independence understates tail risk in stressed periods, which is the worst time to be wrong. The two standard tests, what each catches, and why the independence check matters more than the frequency count are covered here.

8 MIN READPublished May 26, 2026Live Content

By AI Fin Hub Research · AI Fin Hub Team

Best Next MovePlaygrounds

VaR Backtest — Kupiec & Christoffersen

Paste P&L + VaR series and run Kupiec POF, Christoffersen independence, and joint conditional-coverage tests. Likelihood-ratio χ² p-values.

CalculatorOpen ->

On This Page

Before you start 5 steps Common mistakes FAQ

Before You Start

Set up the inputs that make the next steps easier

A series of VaR forecasts at a stated confidence level, one per period.

The realized portfolio returns or losses for the same periods.

Enough observations that the expected number of breaches is large enough to test meaningfully.

Guide Steps

Move through it in order

Each step focuses on one decision so you can keep momentum without losing the thread.

1

Define the breach indicator

For each period, mark a breach when the realized loss exceeds the VaR forecast for that period. This produces a sequence of zeros and ones, the breach indicator, which is the raw material for every VaR backtest. The VaR confidence level sets the expected breach rate: a 99 percent one-day VaR should be breached on about one percent of days. Everything downstream tests whether this sequence behaves the way the model claims.

Use out-of-sample VaR forecasts. Backtesting a VaR model on the same data it was fit on overstates how well it works, just like any other backtest.
2

Run the Kupiec frequency test

The Kupiec proportion-of-failures test checks whether the observed breach rate matches the expected rate implied by the confidence level. Too many breaches means the model understates risk; too few means it overstates risk and ties up capital needlessly. The test produces a statistic and a p-value for the null hypothesis that the breach frequency is correct. It is the first gate: a model that fails frequency is mis-scaled before you even look at timing.

Too few breaches is a real failure too, not a free pass. An overly conservative VaR wastes capital and signals the model is not capturing the actual distribution.
3

Run the Christoffersen independence test

Passing the frequency test is not enough, because breaches can occur at the right rate but bunch together in stressed periods. The Christoffersen test checks whether a breach today is independent of a breach yesterday. If breaches cluster, the model has the right average but the wrong dynamics: it underestimates risk precisely when volatility spikes. Independence is the property that tells you the model holds up in the conditions you most need it to.

Clustered breaches mean your VaR is calm until it is suddenly very wrong, all at once. That is the failure mode that causes blowups.
4

Combine frequency and independence

The Christoffersen conditional-coverage test combines both properties into one: correct breach frequency and independent breaches. A model must pass both to be trustworthy. Passing only frequency means the average is right but the timing is dangerous; passing only independence is meaningless if the rate is wrong. Read the combined test as the headline result, then use the individual tests to diagnose which property failed when it does.

When the combined test fails, look at the components to localize the problem. Frequency failure is a scaling issue; independence failure is a dynamics issue, and they need different fixes.
5

Act on the diagnosis

A frequency failure usually means recalibrating the VaR level or the distributional assumption. An independence failure usually means the model is not capturing volatility clustering, which calls for a model that updates its risk estimate as volatility changes rather than a static one. Either way, the backtest does not just pass or fail; it points at what to fix. Re-run the tests after the fix on fresh data to confirm the correction held.

Independence failures point toward a volatility-aware model. A static VaR that ignores changing volatility will keep clustering breaches no matter how you rescale it.

Common Mistakes

The misses that undo good inputs

Testing only breach frequency

A model can breach at exactly the right rate while bunching all the breaches into stressed periods. Frequency alone misses this, and clustered breaches are the failure mode that causes losses to arrive all at once.

Treating too few breaches as success

An overly conservative VaR that rarely breaches is not capturing the real distribution; it ties up capital and signals the model is mis-specified, which the frequency test correctly flags as a failure.

Backtesting VaR on in-sample data

Evaluating a VaR model on the data it was calibrated to overstates its accuracy. The breach behavior must be tested on out-of-sample forecasts to reflect how the model performs on unseen markets.

Try These Tools

Run the numbers next

CalculatorsCalculator

Returns Distribution Analyzer

Paste a returns CSV. Histogram, normal-overlay, QQ plot, skewness, excess kurtosis, Jarque-Bera test, tail-weight index. See why Sharpe alone misleads.

Launch toolOpen ->

CalculatorsCalculator

Risk-Adjusted Returns Calculator

Paste a returns CSV. Sharpe, Sortino, Calmar, Omega, alpha, beta, tracking error, information ratio, max drawdown, and tail moments — plus.

Launch toolOpen ->

CalculatorsCalculator

Drawdown-Recovery Markov Simulator

Time to recover from an N% drawdown given monthly Sharpe + skew + kurtosis. Cornish-Fisher Monte Carlo, percentile distribution of recovery months.

Launch toolOpen ->

FAQ

Questions people ask next

The short answers readers usually want after the first pass.

Because a model with the right breach frequency but clustered breaches fails exactly when it matters. If breaches bunch during a volatility spike, the model is calm and reassuring right up until it is very wrong all at once, which is the dynamic behind risk-model blowups. A correct average breach rate offers little comfort if the breaches arrive together, so independence is the property that determines whether the model holds in stress.

Sources & References

Techniques for Verifying the Accuracy of Risk Measurement Models — Paul H. Kupiec, Federal Reserve (1995)
Evaluating Interval Forecasts — Peter F. Christoffersen, International Economic Review (1998)

Keep the topic connected

Risk & Portfolio Construction1 FAQS

Value at Risk (VaR)

Value at Risk: the loss threshold you'll exceed with probability α. Why historical VaR is brittle and what it doesn't tell you about the tail.

Keep readingRead ->

Risk & Portfolio Construction1 FAQS

Expected Shortfall (CVaR)

Expected shortfall: the average loss given a VaR breach. Why regulators are migrating from VaR and what ES catches that VaR misses.

Keep readingRead ->

Risk & Portfolio Construction1 FAQS

Volatility

Volatility as the standard deviation of returns: realized vs implied, the annualization gotcha, and why volatility-of-volatility matters.

Keep readingRead ->

Backtesting & Validation1 FAQS

Monte Carlo Simulation

Monte Carlo simulation in trading: when it's the right tool, when it's overkill, and the seed-discipline gotcha that ruins most published examples.

Keep readingRead ->

Set up the inputs that make the next steps easier

Move through it in order

Define the breach indicator

Run the Kupiec frequency test

Run the Christoffersen independence test

Combine frequency and independence

Act on the diagnosis

The misses that undo good inputs

Testing only breach frequency

Treating too few breaches as success

Backtesting VaR on in-sample data

Run the numbers next

Returns Distribution Analyzer

Risk-Adjusted Returns Calculator

Drawdown-Recovery Markov Simulator

Questions people ask next

Keep the topic connected

Value at Risk (VaR)

Expected Shortfall (CVaR)

Volatility

Monte Carlo Simulation