Uniform calibration tests for forecasting systems with small lead time

Published 2022 in Statistics and computing

ABSTRACT

A long noted difficulty when assessing calibration (or reliability) of forecasting systems is that calibration, in general, is a hypothesis not about a finite dimensional parameter but about an entire functional relationship. A calibrated probability forecast for binary events for instance should equal the conditional probability of the event given the forecast, whatever the value of the forecast. A new class of tests is presented that are based on estimating the cumulative deviations from calibration. The supremum of those deviations is taken as a test statistic, and the asymptotic distribution of the test statistic is established rigorously. It turns out to be universal, provided the forecasts “look one step ahead” only, or in other words, verify at the next time step in the future. The new tests apply to various different forecasting problems and are compared with established approaches which work in a regression based framework. In comparison to those approaches, the new tests develop power against a wider class of alternatives. Numerical experiments for both artificial data as well as operational weather forecasting systems are presented, and possible extensions to longer lead times are discussed.

PUBLICATION RECORD

Publication year
2022
Venue
Statistics and computing
Publication date
2022-10-29
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1007/s11222-022-10144-9
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Honest calibration assessment for binary outcome predictions
2022cited by this paper
Testing the reliability of forecasting systems
2021cited by this paper
Regression diagnostics meets forecast evaluation: conditional calibration, reliability diagrams, and coefficient of determination
2021cited by this paper
Stratified rank histograms for ensemble forecast verification under serial dependence
2020cited by this paper
Stable reliability diagrams for probabilistic classifiers
2020cited by this paper
Elicitability and backtesting: Perspectives for banking regulation
2016cited by this paper
Higher order elicitability and Osband’s principle
2015cited by this paper
Elicitation and Identification of Properties
2014cited by this paper
The concept of exchangeability in ensemble forecasting
2011cited by this paper
Making and Evaluating Point Forecasts
2009cited by this paper
Evaluating Value-at-Risk Models via Quantile Regression
2008influential reference
Increasing the Reliability of Reliability Diagrams
2007cited by this paper
Estimation of the reliability of ensemble‐based probabilistic forecasts
2004cited by this paper
CAViaR : Conditional Autoregressive Value at Risk by Regression Quantiles
1999cited by this paper
CAViaR
1999cited by this paper
The evaluation of economic forecasts
1997cited by this paper
Forecast Evaluation and Combination
1996cited by this paper
THE BIERENS TEST UNDER DATA DEPENDENCE
1996influential reference
A consistent conditional moment test of functional form
1990influential reference
Economic Forecasts and Expectations: Analysis of Forecasting Behavior and Performance
1970cited by this paper
On certain limit theorems of the theory of probability
1946cited by this paper

CITED BY

Evaluating Probabilistic Classifiers: The Triptych
2023cites this paper