Posterior calibration and exploratory analysis for natural language processing models

Published 2015 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

Many models in natural language processing define probabilistic distributions over linguistic structures. We argue that (1) the quality of a model' s posterior distribution can and should be directly evaluated, as to whether probabilities correspond to empirical frequencies, and (2) NLP uncertainty can be projected not only to pipeline components, but also to exploratory data analysis, telling a user when to trust and not trust the NLP analysis. We present a method to analyze calibration, and apply it to compare the miscalibration of several commonly used models. We also contribute a coreference sampling algorithm that can create confidence intervals for a political event extraction task.

PUBLICATION RECORD

Publication year
2015
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2015-08-21
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.18653/v1/D15-1182 arXiv 1508.05154
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

大規模要約資源としてのNew York Times Annotated Corpus
2015cited by this paper
Universal Stanford dependencies: A cross-linguistic typology
2014cited by this paper
A Joint Model for Entity Analysis: Coreference, Typing, and Linking
2014cited by this paper
Joint inference of entities, relations, and coreference
2013cited by this paper
Automatic Extraction of Events from Open Source Text for Predictive Forecasting
2013cited by this paper
Easy Victories and Uphill Battles in Coreference Resolution
2013cited by this paper
A Systematic Exploration of Diversity in Machine Translation
2013cited by this paper
Likelihood-ratio calibration using prior-weighted proper scoring rules
2013cited by this paper
Learning Latent Personas of Film Characters
2013cited by this paper
Learning to Extract International Relations from Political Context
2013cited by this paper
Deep parsing in Watson
2012cited by this paper
Large-scale machine learning at twitter
2012cited by this paper
Precedents, Progress, and Prospects in Political Event Data
2012cited by this paper
CoNLL-2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes
2011cited by this paper
Bayesian Checking for Topic Models
2011cited by this paper
Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure
2011cited by this paper
Scikit-learn: Machine Learning in Python
2011cited by this paper
Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions
2010cited by this paper
Evaluating Dependency Representations for Event Extraction
2010cited by this paper
Bayesian data analysis.
2010cited by this paper
A Global Joint Model for Semantic Role Labeling
2008cited by this paper
Wider Pipelines: N-Best Alignments and Parses in MT Training
2008cited by this paper
Rich Source-Side Context for Statistical Machine Translation
2008cited by this paper
Reliability, Sufficiency, and the Decomposition of Proper Scores
2008cited by this paper
Strictly Proper Scoring Rules, Prediction, and Estimation
2007cited by this paper
Unsupervised Coreference Resolution in a Nonparametric Bayesian Model
2007cited by this paper
All of Nonparametric Statistics
2007cited by this paper
Minimum Risk Annealing for Training Log-Linear Models
2006cited by this paper
Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines
2006influential reference
Predicting good probabilities with supervised learning
2005influential reference
Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification
2005cited by this paper
Minimum Bayes-Risk Decoding for Statistical Machine Translation
2004cited by this paper
Transforming classifier scores into accurate multiclass probability estimates
2002cited by this paper
On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes
2001cited by this paper
Assessing the Calibration of Naive Bayes Posterior Estimates
2000cited by this paper
Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods
1999cited by this paper
Markov Chain Monte Carlo in Practice: A Roundtable Discussion
1998cited by this paper
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
1997cited by this paper
Parsing Algorithms and Metrics
1996cited by this paper
Political Science: KEDS—A Program for the Machine Coding of Event Data
1994cited by this paper
A General Framework for Forecast Verification
1987cited by this paper
The Comparison and Evaluation of Forecasters.
1983cited by this paper
Curves As Parameters, and Touch Estimation
1961cited by this paper
VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY
1950cited by this paper
of the Association for Computational Linguistics
year unknowncited by this paper
Journal of Machine Learning Research () Submitted; Published Distance Dependent Chinese Restaurant Processes
year unknowncited by this paper

CITED BY

TransTS: an adaptive post-hoc method for probability calibration under label noise
2026cites this paper
Semantic Consistency Interaction With Calibration Loss for Remote Sensing Image–Text Retrieval
2026cites this paper
Calibration and Discrimination Optimization Using Clusters of Learned Representation
2025cites this paper
Op-Fed: Opinion, Stance, and Monetary Policy Annotations on FOMC Transcripts Using Active Learning
2025cites this paper
LayerMix: Enhanced Data Augmentation for Robust Deep Learning
2025cites this paper
Deep Neural Network Calibration by Reducing Classifier Shift with Stochastic Masking
2025cites this paper
Fact-Level Calibration and Correction for Long-Form Generations
2025cites this paper
H-Calibration: Rethinking Classifier Recalibration With Probabilistic Error-Bounded Objective
2025cites this paper
Learning to Insert [PAUSE] Tokens for Better Reasoning
2025cites this paper
Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability
2025cites this paper
Uniform convergence of the smooth calibration error and its relationship with functional gradient
2025cites this paper
Towards More Reliable Chinese Spelling Correction: Fine-Grained Confidence Estimation Against Suboptimal Corrections
2025cites this paper
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
2025cites this paper
Balancing Two Classifiers via A Simplex ETF Structure for Model Calibration
2025cites this paper
Uncertainty Weighted Gradients for Model Calibration
2025cites this paper
Heterogeneous Correlation Aware Regularization for Sequential Confidence Calibration
2025cites this paper
GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration
2025cites this paper
Multi-Objective Optimization for Deep Neural Network Calibration
2025cites this paper
Optimal and Provable Calibration in High-Dimensional Binary Classification: Angular Calibration and Platt Scaling
2025cites this paper
LayerMix: Enhanced Data Augmentation through Fractal Integration for Robust Deep Learning
2025cites this paper
Probabilities of Chat LLMs Are Miscalibrated but Still Predict Correctness on Multiple-Choice Q&A
2024cites this paper
A PID Controller Approach for Adaptive Probability-dependent Gradient Decay in Model Calibration
2024cites this paper
LaSCal: Label-Shift Calibration without target labels
2024cites this paper
Evaluating System Responses Based On Overconfidence and Underconfidence
2024cites this paper
On Calibration of LLM-based Guard Models for Reliable Content Moderation
2024cites this paper
Calibrating Expressions of Certainty
2024cites this paper
Toward a Holistic Evaluation of Robustness in CLIP Models
2024cites this paper
Calibration of Network Confidence for Unsupervised Domain Adaptation Using Estimated Accuracy
2024cites this paper
DNN-GDITD: Out-of-distribution detection via Deep Neural Network based Gaussian Descriptor for Imbalanced Tabular Data
2024cites this paper
Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration
2024cites this paper
Towards Certification of Uncertainty Calibration under Adversarial Attacks
2024cites this paper
You Only Need Half: Boosting Data Augmentation by Using Partial Content
2024cites this paper
On Calibration of Pretrained Code Models
2024cites this paper
From Game Theory to Visual Recognition: Advancing DNN Robustness
2024cites this paper
API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access
2024cites this paper
An Empirical Study Into What Matters for Calibrating Vision-Language Models
2024cites this paper
A Closer Look at the Robustness of Contrastive Language-Image Pre-Training (CLIP)
2024cites this paper
Confidence Calibration of a Medical Imaging Classification System That is Robust to Label Noise
2024cites this paper
Neural Keyphrase Generation: Analysis and Evaluation
2023cites this paper
Reliability in Semantic Segmentation: Are we on the Right Track?
2023cites this paper
HyperMix: Out-of-Distribution Detection and Classification in Few-Shot Settings
2023cites this paper
Consistent and Asymptotically Unbiased Estimation of Proper Calibration Errors
2023cites this paper
Estimating calibration error under label shift without labels
2023cites this paper
Gatekeeper to save COGS and improve efficiency of Text Prediction
2023cites this paper
Understanding Data Augmentation From A Robustness Perspective
2023cites this paper
On the Calibration of Large Language Models and Alignment
2023cites this paper
LitCab: Lightweight Language Model Calibration over Short- and Long-form Responses
2023cites this paper
MaxEnt Loss: Constrained Maximum Entropy for Calibration under Out-of-Distribution Shift
2023cites this paper
Understanding Calibration of Deep Neural Networks for Medical Image Classification
2023cites this paper
Where's the Liability in Harmful AI Speech?
2023cites this paper
Model Calibration in Dense Classification with Adaptive Label Perturbation
2023cites this paper
Strength in Numbers: Estimating Confidence of Large Language Models by Prompt Agreement
2023influential citation
LitCab: Lightweight Calibration of Language Models on Outputs of Varied Lengths
2023cites this paper
Calibration of A Regression Network Based on the Predictive Variance with Applications to Medical Images
2023influential citation
Perception and Semantic Aware Regularization for Sequential Confidence Calibration
2023influential citation
Dual Focal Loss for Calibration
2023cites this paper
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
2023cites this paper
Extracting Victim Counts from Text
2023cites this paper
Minimum-Risk Recalibration of Classifiers
2023cites this paper
On Improving Automated Detection of Cyber-Bully in Social Networks with Constrained Datasets: A Hierarchical Deep Learning Approach
2022cites this paper
You Only Cut Once: Boosting Data Augmentation with a Single Cut
2022cites this paper
Concept-based Explanations for Out-Of-Distribution Detectors
2022cites this paper
On the Calibration of Pre-trained Language Models using Mixup Guided by Area Under the Margin and Saliency
2022cites this paper
Trustworthy Deep Learning via Proper Calibration Errors: A Unifying Approach for Quantifying the Reliability of Predictive Uncertainty
2022cites this paper
Confidence Calibration for Intent Detection via Hyperspherical Space and Rebalanced Accuracy-Uncertainty Loss
2022cites this paper
Learning Confidence for Transformer-based Neural Machine Translation
2022cites this paper
Platt-Bin: Efficient Posterior Calibrated Training for NLP Classifiers
2022cites this paper
Calibration Matters: Tackling Maximization Bias in Large-scale Advertising Recommendation Systems
2022cites this paper
Calibration of Natural Language Understanding Models with Venn-ABERS Predictors
2022cites this paper
Revisiting Calibration for Question Answering
2022cites this paper
Teaching Models to Express Their Uncertainty in Words
2022cites this paper
CropMix: Sampling a Rich Input Distribution via Multi-Scale Cropping
2022cites this paper
Forecasting Future World Events with Neural Networks
2022cites this paper
Language Models (Mostly) Know What They Know
2022cites this paper
Design and Evaluation of Object Classifiers for Probabilistic Decision-Making in Autonomous Systems
2022cites this paper
Bridging the Gap between Training and Inference: Multi-Candidate Optimization for Diverse Neural Machine Translation
2022cites this paper
On the Relation between Sensitivity and Accuracy in In-context Learning
2022cites this paper
Better Uncertainty Calibration via Proper Scores for Classification and Beyond
2022cites this paper
Calibration of Medical Imaging Classification Systems with Weight Scaling
2022cites this paper
A Novel Gradient Accumulation Method for Calibration of Named Entity Recognition Models
2022cites this paper
A Consistent and Differentiable Lp Canonical Calibration Error Estimator
2022cites this paper
Network Calibration by Temperature Scaling based on the Predicted Confidence
2022cites this paper
Re-Examining Calibration: The Case of Question Answering
2022cites this paper
What Images are More Memorable to Machines?
2022cites this paper
Interpretable Self-Aware Neural Networks for Robust Trajectory Prediction
2022cites this paper
AdaFocal: Calibration-aware Adaptive Focal Loss
2022cites this paper
A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks
2022cites this paper
MixBoost: Improving the Robustness of Deep Neural Networks by Boosting Data Augmentation
2022cites this paper
Calibrating Student Models for Emotion-related Tasks
2022cites this paper
Incremental Predictive Coding: A Parallel and Fully Automatic Learning Algorithm
2022cites this paper
Calibrated and Sharp Uncertainties in Deep Learning via Simple Density Estimation
2021cites this paper
Online Calibrated and Conformal Prediction Improves Bayesian Optimization
2021cites this paper
Calibration Improves Bayesian Optimization
2021cites this paper
PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures
2021cites this paper
Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework
2021cites this paper
Unsolved Problems in ML Safety
2021cites this paper
Generalization Self-distillation with Epoch-wise Regularization
2021cites this paper
Making Heads and Tails of Models with Marginal Calibration for Sparse Tagsets
2021cites this paper
Knowing More About Questions Can Help: Improving Calibration in Question Answering
2021cites this paper
Distributed NLI: Learning to Predict Human Opinion Distributions for Language Reasoning
2021cites this paper