Methodology · Playground · Last updated 2026-04-20
How Calibration Dojo works
How the Calibration Dojo tool actually works — assumptions, algorithms, limitations.
Scoring
Brier score is the squared error between predicted probability and outcome, averaged across all answers:
Brier = (1/N) · Σᵢ (pᵢ − yᵢ)² where pᵢ is the user's stated probability the statement is true and yᵢ ∈ 1 is the actual outcome. Perfect prediction → 0. Constant 50% on everything → 0.25. Worse than random is possible if the user is systematically miscalibrated.
Reliability curve
Predictions are binned into 10 deciles based on stated probability.
For each bin, the X axis plots the average predicted probability
and the Y axis plots the fraction of statements that were actually true.
A perfectly calibrated forecaster's bins fall on the diagonal y = x.
Dot size on the chart is proportional to the number of answers in that decile.
Question bank
20 evergreen binary statements covering physics, history, economics,
geography, technology, and generic finance concepts. Each question has
a public source and short rationale revealed after the user
commits a probability. No real-time market questions or specific-ticker
prompts (compliance + education-focus).
Questions are drawn without replacement until all have been seen; after that, selection reverts to random with replacement for continued practice.
Privacy
All answer history is stored in your browser's localStorage under a
single key (aifinhub.calibration-dojo.v1). Nothing is transmitted
to any server. Clearing your browser data, or pressing the "Clear local
history" button, resets all progress. The site sets no tracking cookies.
Limitations
- Small question bank. 20 questions means early Brier scores have high variance. Answer at least 30–50 (re-randomized) for a stable personal baseline.
- Domain coverage is deliberately generic, not finance-specific. For finance-specific calibration training, use your own journal of real forecasts + outcomes; the tool's pattern (probability → outcome → Brier) transfers directly.
- No overconfidence detection beyond Brier. Full reliability diagnostics (resolution, refinement, ECE) are not surfaced in the UI but can be derived from the stored answer log if you export it.
References
- Brier, G. W. (1950). "Verification of Forecasts Expressed in Terms of Probability." Monthly Weather Review 78(1).
- Murphy, A. H. (1973). "A New Vector Partition of the Probability Score." Journal of Applied Meteorology.
- Tetlock, P. E., & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction.