Why can log loss be infinite?

Log loss takes the logarithm of the probability assigned to the outcome that occurred. If the model assigns probability zero to an event that then happens, the log of zero is negative infinity, so the penalty is unbounded. This is mathematically correct, since claiming an event is impossible and being wrong is the worst possible forecast, but in practice it means you must clip predicted probabilities into a range like a tiny epsilon to one minus epsilon, otherwise one such case makes the whole average infinite.

Can the Brier score and log loss disagree on which forecaster is better?

Yes. Because log loss punishes confident errors far more steeply, it can rank a cautious, well-hedged forecaster above a sharper one that is usually right but occasionally confidently wrong, while the bounded Brier score may favor the sharper forecaster. When the ranking matters, report both and inspect the cases that drive the difference, since the disagreement itself tells you whether the forecaster's errors are concentrated in overconfident predictions.

AI in Markets Comparison

Brier Score vs Log Loss

When an LLM or model outputs a probability rather than a label, you need a scoring rule that rewards calibrated honesty and cannot be gamed by hedging. Both the Brier score and log loss are proper, meaning the expected score is optimized by reporting your true belief. They differ in the shape of the penalty as a prediction approaches certainty and is wrong. The Brier score grows quadratically and stays bounded; log loss grows without bound and explodes near the confident-and-wrong corner. That difference decides which is appropriate for a given forecasting task. This matrix compares them for scoring financial and agent forecasts.

6 CRITERIAPublished May 26, 2026Live Content

By AI Fin Hub Research · AI Fin Hub Team

On This Page

Options 6 criteria Verdict FAQ

Brier Score Option

The mean squared difference between predicted probabilities and outcomes. A bounded, proper scoring rule where lower is better and zero is perfect.

Pros

Bounded between zero and one for binary outcomes, so a single bad forecast cannot dominate the average
Interpretable as a mean squared error of probabilities, intuitive to reason about
Decomposes cleanly into calibration and refinement components for diagnostics
Robust to occasional overconfident errors, which it penalizes only quadratically

Cons

Penalizes confident wrong predictions relatively gently, which may understate their real cost
Less sensitive than log loss to differences among already-good probabilistic forecasts
Quadratic shape does not match the information-theoretic cost of being surprised
Can rate an overconfident model leniently when confident errors should be severe

Bounded, interpretable scoring robust to outliers, and reporting forecast quality where one confident miss should not dominate

Log Loss (Cross-Entropy) Option

The negative log-likelihood of the observed outcomes under the predicted probabilities. A proper scoring rule, unbounded, that punishes confident errors severely.

Pros

Punishes confident wrong predictions extremely hard, matching risk-sensitive use cases
Information-theoretically grounded as the expected surprise, the standard ML training loss
Highly sensitive among good forecasts, sharply rewarding better-calibrated probabilities
Directly the objective most classifiers optimize, so it aligns metric with training

Cons

Unbounded: a single confident, wrong prediction can blow up the average score
Infinite penalty for assigning zero probability to an event that occurs, requiring clipping
Harder to interpret on an absolute scale than the bounded Brier score
Sensitive to outliers and to probabilities pushed near zero or one

Risk-sensitive forecasting that must avoid confident errors, model training, and discriminating among already-good forecasters

Decision Table

See the tradeoffs side by side

Criterion	Brier Score	Log Loss (Cross-Entropy)
Penalty shape	Quadratic, bounded	Logarithmic, unbounded
Confident wrong prediction	Penalized gently	Penalized severely, can be infinite
Range (binary)	0 to 1	0 to infinity
Outlier robustness	High	Low
Proper scoring rule	Yes	Yes
Interpretability	Mean squared error of probabilities	Expected surprise, less direct

Verdict

Both are proper, so neither rewards hedging, and the choice is about how harshly you want to punish confident mistakes. Use the Brier score when you want a bounded, interpretable number that no single overconfident miss can dominate, which makes it the safer default for reporting forecast quality and for comparing forecasters when robustness matters. Use log loss when the cost of being confidently wrong is genuinely catastrophic, as in risk-sensitive financial forecasts where a model that says 99 percent and is wrong should be penalized far more than one that says 60 percent and is wrong; log loss encodes exactly that asymmetry. Two practical notes: log loss requires clipping probabilities away from zero and one or it returns infinity, and because the two metrics can rank forecasters differently, report both when the decision is important rather than trusting a single number.

Try These Tools

Run the numbers next

PlaygroundsCalculator

Forecast Scoring Sandbox

Paste a forecast stream (probability + outcome) and see Brier score with full decomposition, log loss, reliability diagram, and bootstrap confidence.

Launch toolOpen ->

PlaygroundsCalculator

Calibration Dojo

Train your probabilistic intuition. Answer binary forecasting questions at any confidence level; track Brier score and reliability curve over time. All.

Launch toolOpen ->

PlaygroundsCalculator

Hallucination Detector

Paste a source document + an LLM's extraction. Every numeric claim in the output is checked against the source. Client-side. Catches silent fabrication.

Launch toolOpen ->

FAQ

Questions people ask next

The short answers readers usually want after the first pass.

A proper scoring rule is one whose expected score is optimized when the forecaster reports their true probability, so there is no incentive to shade the forecast toward a hedge or toward overconfidence. Both the Brier score and log loss are proper. This matters because an improper rule can be gamed: a forecaster could improve their score by reporting something other than their honest belief, which corrupts the very thing you are trying to measure. Properness guarantees the metric rewards calibrated honesty.

Sources & References

Verification of Forecasts Expressed in Terms of Probability — Glenn W. Brier, Monthly Weather Review (1950)
Strictly Proper Scoring Rules, Prediction, and Estimation — Gneiting and Raftery, Journal of the American Statistical Association (2007)

Keep the topic connected

AI in Markets1 FAQS

Model Drift

Model drift: when an LLM's behavior changes between calls, versions, or weeks. The monitoring stack that catches it before production breaks.

Keep readingRead ->

AI in Markets1 FAQS

LLM Hallucination Detection in Finance

How to detect LLM hallucinations in financial outputs: citation grounding, verifiable-claim checks, and cross-model agreement that flag fabricated data.

Keep readingRead ->

Backtesting & Validation1 FAQS

Monte Carlo Simulation

Monte Carlo simulation in trading: when it's the right tool, when it's overkill, and the seed-discipline gotcha that ruins most published examples.

Keep readingRead ->

Backtesting & Validation1 FAQS

Probability of Backtest Overfitting (PBO) Explained

Probability of Backtest Overfitting (PBO), the Bailey-Lopez de Prado test for how likely your best in-sample strategy underperforms out-of-sample.

Keep readingRead ->