Skip to main content
aifinhub
general Calculator Guide

How to use Forecast Scoring Sandbox

Paste a forecast stream of (probability, outcome) pairs. The page computes Brier score with full decomposition, log loss, reliability diagram, and bootstrap confidence intervals — the diagnostics for whether a model is calibrated, lucky, or both.

By Orbyd Editorial · AI Fin Hub Team
Best Next MovePlaygrounds

Forecast Scoring Sandbox

Paste a forecast stream (probability + outcome) and see Brier score with full decomposition, log loss, reliability diagram, and bootstrap confidence.

CalculatorOpen ->

On This Page

What It Does

Use the calculator with intent

Paste a forecast stream of (probability, outcome) pairs. The page computes Brier score with full decomposition, log loss, reliability diagram, and bootstrap confidence intervals — the diagnostics for whether a model is calibrated, lucky, or both.

Quants and PMs running forecasting models who need to score them on calibration, not just hit rate — and need bootstrap CIs to tell skill from luck.

Interpreting Results

Brier decomposition is the headline — reliability (calibration), resolution (informativeness), and uncertainty are the three components. A high Brier score is bad; check whether reliability or resolution is the bigger contributor.

Input Steps

Field by field

  1. 1

    Upload data

    Upload or enter probabilistic forecasts paired with realized outcomes (binary, categorical, or continuous).

  2. 2

    Pick option

    Pick scoring rule: Brier (squared error), log (penalize overconfidence harshly), CRPS (continuous distributions).

  3. 3

    Read outputs

    Read your aggregate score per forecast. Compare against the base-rate benchmark — beating the benchmark = skill.

  4. 4

    Read outputs

    Read the calibration plot to see where you're miscalibrated (overconfident or underconfident at specific probability ranges).

  5. 5

    Run calculation

    Run at least 50 forecasts before drawing conclusions. Below 50, scores are dominated by sample noise.

Common Scenarios

Use realistic starting points

Earnings beat/miss predictions

Predictions

100

Outcome

binary

Reliability often dominates — model is roughly calibrated but not very informative. Improve resolution before improving calibration.

Macro event predictions

Predictions

50

Outcome

binary

Sample too small to distinguish skill from luck — bootstrap CI on Brier wide. Add more predictions before claiming an edge.

Try These Tools

Run the numbers next

FAQ

Questions people ask next

The short answers readers usually want after the first pass.

Brier score (squared error), log score (log-likelihood penalty for overconfidence), and continuous ranked probability score (CRPS, for full distributional forecasts). All three are proper scoring rules — they reward calibrated probabilities.

Related Content

Keep the topic connected

Planning estimates only — not financial, tax, or investment advice.