Skip to main content
aifinhub

Playground

Calibration Dojo

Train probabilistic intuition. Binary forecasting questions at any confidence level; track Brier score + reliability curve over time. Browser-only. Free.

Inputs
Paste + configure
Runtime
1–15 s
Privacy
Client-side · no upload
API key
Not required
Methodology
Open →

Education · Not investment advice. BaFin/EU framework. Past performance does not indicate future results. Editorial standards Sponsor disclosure Corrections

Brier score

Lower is better · perfect calibration = 0.000 · uniform random 50% = 0.250

Resolution:  ·  0 answered  ·  stored locally only

1 · Read the statement

Loading…

Reliability curve

Dot size ∝ number of answers in that decile. Diagonal = perfect calibration.

History persists in your browser's localStorage only.

How to use

Step-by-step

Full calculator guide →
  1. 1

    Make probabilistic forecasts on questions with eventual ground truth (binary outcomes work best for entry-level practice).

  2. 2

    Resolve forecasts as outcomes become known. The dojo computes Brier score, log score, calibration curve, and resolution.

  3. 3

    Read the calibration plot: do your 70%-confident forecasts hit at 70%? If they hit at 50%, you're overconfident.

  4. 4

    Compare to the base-rate benchmark. Beating 50/50 is necessary; the gap between your score and the benchmark is your skill.

  5. 5

    Make at least 50 forecasts before drawing conclusions. Calibration estimates are noisy below that.

For agents

Use in an agent

Same math, same result shape as the UI above — as a static ES module. No HTTP request, no auth, no rate limit.

import { compute } from "https://aifinhub.io/engines/calibration-dojo.js";

Contract: /contracts/calibration-dojo.json Full agent guide →

Glossary references

Terms used by this tool

All glossary →

Questions people ask next

FAQ

What's calibration in this context?

How well a model's stated confidence matches its empirical hit rate. A 70%-confident forecast that's right 70% of the time is well-calibrated; a 90%-confident forecast that's right 60% of the time is overconfident. The dojo lets you self-train calibration using forecast feedback loops.

How is the Brier score calculated?

Brier score is the mean squared error between probability forecast and outcome. For a single forecast: (probability − outcome)². For a series: average across forecasts. Lower is better. A 50/50 random forecaster scores 0.25; a perfectly calibrated forecaster with discrimination scores below 0.20.

What's the difference between calibration and resolution?

Calibration is whether your stated probabilities match observed frequencies. Resolution is whether you give different probabilities to different outcomes (vs. 50% on everything). A forecaster who always says 50% is perfectly calibrated but has zero resolution — they're useless. Brier score combines both.

How many forecasts before I get a stable calibration estimate?

At least 50, ideally 100+. The dojo shows the calibration curve (predicted probability vs. observed frequency in bins) — it's noisy below 50 forecasts. The methodology page documents the bootstrap CI on the curve.

Why does a low Brier score not always mean I'm right?

A Brier score is meaningful only relative to a benchmark. Forecasting 'AAPL up tomorrow' at 50% gives Brier 0.25 with zero skill. The dojo always reports your score against the benchmark of forecasting the base rate, so you can see whether your work is adding signal or just hitting the average.

Complementary tools

Planning estimates only — not financial, tax, or investment advice.