aifinhub

Methodology · Playground · Last updated 2026-04-20

How Calibration Dojo works

How the Calibration Dojo tool actually works — assumptions, algorithms, limitations.

Scoring

Brier score is the squared error between predicted probability and outcome, averaged across all answers:

Brier = (1/N) · Σᵢ (pᵢ − yᵢ)²

where pᵢ is the user's stated probability the statement is true and yᵢ ∈ 1 is the actual outcome. Perfect prediction → 0. Constant 50% on everything → 0.25. Worse than random is possible if the user is systematically miscalibrated.

Reliability curve

Predictions are binned into 10 deciles based on stated probability. For each bin, the X axis plots the average predicted probability and the Y axis plots the fraction of statements that were actually true. A perfectly calibrated forecaster's bins fall on the diagonal y = x.

Dot size on the chart is proportional to the number of answers in that decile.

Question bank

20 evergreen binary statements covering physics, history, economics, geography, technology, and generic finance concepts. Each question has a public source and short rationale revealed after the user commits a probability. No real-time market questions or specific-ticker prompts (compliance + education-focus).

Questions are drawn without replacement until all have been seen; after that, selection reverts to random with replacement for continued practice.

Privacy

All answer history is stored in your browser's localStorage under a single key (aifinhub.calibration-dojo.v1). Nothing is transmitted to any server. Clearing your browser data, or pressing the "Clear local history" button, resets all progress. The site sets no tracking cookies.

Limitations

  1. Small question bank. 20 questions means early Brier scores have high variance. Answer at least 30–50 (re-randomized) for a stable personal baseline.
  2. Domain coverage is deliberately generic, not finance-specific. For finance-specific calibration training, use your own journal of real forecasts + outcomes; the tool's pattern (probability → outcome → Brier) transfers directly.
  3. No overconfidence detection beyond Brier. Full reliability diagnostics (resolution, refinement, ECE) are not surfaced in the UI but can be derived from the stored answer log if you export it.

References

  • Brier, G. W. (1950). "Verification of Forecasts Expressed in Terms of Probability." Monthly Weather Review 78(1).
  • Murphy, A. H. (1973). "A New Vector Partition of the Probability Score." Journal of Applied Meteorology.
  • Tetlock, P. E., & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction.
Planning estimates only — not financial, tax, or investment advice.