Scalable Utility-Aware Multiclass Calibration

M. Hegazy,Michael I. Jordan,Aymeric Dieuleveut

Published 2025 in arXiv.org

ABSTRACT

Ensuring that classifiers are well-calibrated, i.e., their predictions align with observed frequencies, is a minimal and fundamental requirement for classifiers to be viewed as trustworthy. Existing methods for assessing multiclass calibration often focus on specific aspects associated with prediction (e.g., top-class confidence, class-wise calibration) or utilize computationally challenging variational formulations. In this work, we study scalable \emph{evaluation} of multiclass calibration. To this end, we propose utility calibration, a general framework that measures the calibration error relative to a specific utility function that encapsulates the goals or decision criteria relevant to the end user. We demonstrate how this framework can unify and re-interpret several existing calibration metrics, particularly allowing for more robust versions of the top-class and class-wise calibration metrics, and, going beyond such binarized approaches, toward assessing calibration for richer classes of downstream utilities.

PUBLICATION RECORD

Publication year
2025
Venue
arXiv.org
Publication date
2025-10-29
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.48550/arXiv.2510.25458 arXiv 2510.25458
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Can a calibration metric be both testable and actionable?
2025influential reference
Forecasting for Swap Regret for All Downstream Agents
2024cited by this paper
On Computationally Efficient Multi-Class Calibration
2024influential reference
U-Calibration: Forecasting for an Unknown Agent
2023influential reference
Swap Agnostic Learning, or Characterizing Omniprediction via Multicalibration
2023cited by this paper
Calibration by Distribution Matching: Trainable Kernel Calibration Metrics
2023cited by this paper
High-Dimensional Prediction for Sequential Decision Making
2023cited by this paper
Taking a Step Back with KCal: Multi-Class Kernel-Based Calibration for Deep Neural Networks
2022cited by this paper
A Unifying Theory of Distance from Calibration
2022cited by this paper
Loss Minimization through the Lens of Outcome Indistinguishability
2022cited by this paper
A Consistent and Differentiable Lp Canonical Calibration Error Estimator
2022cited by this paper
Class-wise and reduced calibration methods
2022cited by this paper
Better Uncertainty Calibration via Proper Scores for Classification and Beyond
2022cited by this paper
T-Cal: An optimal test for the calibration of predictive models
2022cited by this paper
Low-Degree Multicalibration
2022influential reference
Efficient and Modular Implicit Differentiation
2021cited by this paper
Foundations of Machine Learning
2021influential reference
Top-label calibration and multiclass-to-binary reductions
2021cited by this paper
Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration
2021influential reference
Distribution-free calibration guarantees for histogram binning without sample splitting
2021cited by this paper
Mitigating bias in calibration error estimation
2020cited by this paper
Calibrating Deep Neural Networks using Focal Loss
2020cited by this paper
Intra Order-preserving Functions for Calibration of Multi-Class Neural Networks
2020cited by this paper
The VC-Dimension of K-Vertex D-Polytopes
2020cited by this paper
Multi-Class Uncertainty Calibration via Mutual Information Maximization-based Binning
2020cited by this paper
Calibration of Neural Networks using Splines
2020cited by this paper
Moment Multicalibration for Uncertainty Estimation
2020influential reference
Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration
2019cited by this paper
Measuring Calibration in Deep Learning
2019cited by this paper
Calibration tests in multi-class classification: A unifying framework
2019cited by this paper
Verified Uncertainty Calibration
2019cited by this paper
Trainable Calibration Measures For Neural Networks From Kernel Mean Embeddings
2018cited by this paper
Multicalibration: Calibration for the (Computationally-Identifiable) Masses
2018influential reference
On Calibration of Modern Neural Networks
2017influential reference
Beyond sigmoids: How to obtain well-calibrated probabilities from binary classifiers with beta calibration
2017influential reference
A Vector-Contraction Inequality for Rademacher Complexities
2016cited by this paper
Obtaining Well Calibrated Probabilities Using Bayesian Binning
2015cited by this paper
Character-level Convolutional Networks for Text Classification
2015cited by this paper
Efficient Estimation of Word Representations in Vector Space
2013cited by this paper
Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition
2012cited by this paper
Information Theory and Statistics
2011influential reference
ImageNet: A large-scale hierarchical image database
2009influential reference
Learning Multiple Layers of Features from Tiny Images
2009influential reference
Licence for this version UNSPECIFIED Additional information Versions of research works Versions of Record
2009cited by this paper
Rademacher and Gaussian Complexities: Risk Bounds and Structural Results
2003cited by this paper
Transforming classifier scores into accurate multiclass probability estimates
2002cited by this paper
Cumulated gain-based evaluation of IR techniques
2002influential reference
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers
2001cited by this paper
Neural Network Learning: Theoretical Foundations
1999cited by this paper
Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods
1999cited by this paper
Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers
1993influential reference
The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality
1990cited by this paper
Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities
1971cited by this paper

CITED BY

No citing papers are available for this paper.