Playground
VaR Backtest — Kupiec & Christoffersen
Paste P&L + VaR series and run Kupiec POF, Christoffersen independence, and joint conditional-coverage tests. Likelihood-ratio χ² p-values.
- Inputs
- Paste + configure
- Runtime
- 1–15 s
- Privacy
- Client-side · no upload
- API key
- Not required
- Methodology
- Open →
Inputs
VaR confidence level
Kupiec p-value
0.034
36/500 exceptions (7.20%) vs expected 5.00%. p < 0.05 ⇒ reject correct unconditional coverage.
Test results
Kupiec POF (LR_uc)
4.51
p = 0.034
Christoffersen Ind.
0.10
p = 0.754
Joint Cond. Coverage
4.61
p = 0.100
Interpretation
- Kupiec POF — tests whether the count of exceptions matches the expected rate.
- Christoffersen independence — tests whether exceptions cluster (consecutive) more than chance allows.
- Joint LR — combines both: a model can pass count yet fail clustering, or vice versa.
See methodology for likelihood derivations.
How to use
Step-by-step
- 1
Upload your VaR forecasts and realized returns (daily granularity is standard).
- 2
Set the confidence level (95% or 99%) and the test window length (≥ 250 days for Basel-style validation).
- 3
Run Kupiec's POF test for unconditional coverage. Reject = wrong number of violations.
- 4
Run Christoffersen's independence test for conditional coverage. Reject = violations cluster (model fails during volatile regimes).
- 5
Read the green/yellow/red Basel zone classification. Yellow zone triggers capital multiplier increases; red rejects the model.
Glossary references
Terms used by this tool
Questions people ask next
FAQ
What do the Kupiec and Christoffersen tests check?
Kupiec's POF (1995) tests whether the realized rate of VaR violations matches the stated confidence level. If you ran 95% VaR and observed 12% violations, Kupiec rejects. Christoffersen (1998) extends this by also testing whether violations are independent — clustered violations fail the independence test even if the unconditional rate is correct.
Why do both tests matter?
A model can pass Kupiec (right number of violations on average) but fail Christoffersen (violations cluster during volatile periods). Clustered violations mean the model is mis-specified during regime changes. Both passing is necessary to declare a VaR model adequate.
How many days of backtest do I need?
Basel rules require 250 trading days minimum. Below that, the test power is too low to distinguish a good model from a bad one. The tool flags low-N estimates. For 95% VaR over 250 days, expecting ~12.5 violations gives a tight enough range to detect models that are off by 30%+.
What's the 'green/yellow/red' zone?
Basel Committee classification of a VaR model based on observed violations over 250 days: green (0-4 violations, model OK), yellow (5-9, model questionable, capital multiplier increases), red (10+, model rejected). The tool shows the band — useful for checking compliance posture.
Does it handle Conditional VaR (Expected Shortfall)?
Yes, in a separate mode. ES is harder to backtest than VaR because there's no exact small-sample distribution for the test statistic. The tool uses Acerbi-Szekely's joint test for ES backtesting. Methodology page documents the limitations.
Related deep dive
All articles →Read further
Long-form context behind the tool output.
- Comparison · Benchmark·12 min
Model Selection Framework for Finance Tasks
A task × latency × cost × context decision tree for finance LLM workloads. Ten concrete scenarios mapped to tier bands. Grounded in published pricing, not.
Read - Tutorial · Runnable·9 min
How to Read a Backtest Report: 2026 Cheat Sheet
Five questions a backtest report must answer — edge real, persistent, cheap to trade, bearable, explainable — with the statistics that verify each.
Read - Pillar · Guide·13 min
Backtest Overfitting in LLM Trading Strategies
The Probability of Backtest Overfitting, applied to LLM-augmented research. Why LLM strategies inflate PBO, how to compute it, and the three-gate.
Read
Complementary tools
Users of this tool often explore
Returns Distribution Analyzer
Paste a returns CSV. Histogram, normal-overlay, QQ plot, skewness, excess kurtosis, Jarque-Bera test, tail-weight index. See why Sharpe alone misleads.
Risk-Adjusted Returns Calculator
Paste a returns CSV. Sharpe, Sortino, Calmar, Omega, alpha, beta, tracking error, information ratio, max drawdown, and tail moments — plus.
Calibration Dojo
Train your probabilistic intuition. Answer binary forecasting questions at any confidence level; track Brier score and reliability curve over time. All.