aifinhub

Playground

Forecast Scoring Sandbox

Forecast scoring sandbox: paste a CSV of predictions to get Brier score, log loss, Murphy decomposition, bootstrap CIs and a reliability diagram.

Inputs
Paste + configure
Runtime
1–15 s
Privacy
Client-side · no upload
API key
Not required
Methodology
Open →

1 · Configure — paste or upload a forecast stream

Parsed 100 rows

Used for 95% CI on every metric.

Bin count for decomposition + diagram.

Prediction range

0.0710.947

Base rate (observed)

46.0%

2 · Results — scores, decomposition, 95% bootstrap CIs

Brier score

0.2084

95% CI [0.1698, 0.2521]

Lower is better · 0 = perfect

Log loss

0.6072

95% CI [0.5063, 0.7071]

Natural log · lower is better

Forecasts

100

Bins populated: 10/10

Uncertainty

0.2484

p̄(1 − p̄) — data ceiling

ComponentValue95% CIReading
Reliability0.0090[0.0085, 0.0556]Lower = better calibrated
Resolution0.0494[0.0320, 0.1033]Higher = more discriminating
Uncertainty0.2484Irreducible from base rate
Check: rel − res + unc0.2080Should equal Brier (0.2084)

BS = reliability − resolution + uncertainty (Murphy, 1973)

3 · Reliability diagram

0.000.250.500.751.000.000.250.500.751.00predicted probabilityobserved frequencybin 0.00–0.10 · n=6 · mean pred=0.081 · observed=0.167bin 0.10–0.20 · n=10 · mean pred=0.140 · observed=0.200bin 0.20–0.30 · n=14 · mean pred=0.245 · observed=0.357bin 0.30–0.40 · n=11 · mean pred=0.353 · observed=0.364bin 0.40–0.50 · n=15 · mean pred=0.440 · observed=0.267bin 0.50–0.60 · n=11 · mean pred=0.551 · observed=0.545bin 0.60–0.70 · n=14 · mean pred=0.658 · observed=0.643bin 0.70–0.80 · n=8 · mean pred=0.759 · observed=0.750bin 0.80–0.90 · n=6 · mean pred=0.831 · observed=0.667bin 0.90–1.00 · n=5 · mean pred=0.935 · observed=1.000populated bin (n ≥ 5)sparse bin (low weight)

X axis: predicted probability bin. Y axis: observed frequency of outcome=1 in that bin. Dots on the diagonal = perfectly calibrated. Dots below the diagonal = overconfident (predictions exceed observed). Dots above = underconfident. Sparse bins are drawn lighter: with few observations, their y position is noisy and the deviation from the diagonal is not informative on its own.

Formulas

Brier(p, y)     = mean( (p_i - y_i)^2 )
LogLoss(p, y)   = -mean( y_i·log(p_i) + (1-y_i)·log(1-p_i) )

Bin forecasts into K equal-width buckets of predicted probability.
For bin k with n_k observations, mean pred p̄_k, observed freq ō_k:

reliability  = Σ_k (n_k / N) · (p̄_k − ō_k)²
resolution   = Σ_k (n_k / N) · (ō_k − ȳ)²
uncertainty  = ȳ · (1 − ȳ)             where ȳ = mean(y_i)

Identity:   Brier = reliability − resolution + uncertainty

95% CI:  bootstrap-percentile over B resamples with replacement.

See the methodology page for derivations, references, and why sparse bins are shown lighter.

Complementary tools

Planning estimates only — not financial, tax, or investment advice.