Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity

Dennis Ulmer,J. Frellsen,Christian Hardmeier

Published 2022 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

We investigate the problem of determining the predictive confidence (or, conversely, uncertainty) of a neural classifier through the lens of low-resource languages. By training models on sub-sampled datasets in three different languages, we assess the quality of estimates from a wide array of approaches and their dependence on the amount of available data. We find that while approaches based on pre-trained models and ensembles achieve the best results overall, the quality of uncertainty estimates can surprisingly suffer with more data. We also perform a qualitative analysis of uncertainties on sequences, discovering that a model's total uncertainty seems to be influenced to a large degree by its data uncertainty, not model uncertainty. All model implementations are open-sourced in a software package.

PUBLICATION RECORD

Publication year
2022
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2022-10-20
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.48550/arXiv.2210.15452 arXiv 2210.15452
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Feature Collapse
2023cited by this paper
Out-of-Domain Evaluation of Finnish Dependency Parsing
2022influential reference
deep-significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks
2022cited by this paper
Experimental Standards for Deep Learning Research: A Natural Language Processing Perspective
2022cited by this paper
Toward More Meaningful Resources for Lower-resourced Languages
2022cited by this paper
Explaining Prediction Uncertainty of Pre-trained Language Models by Detecting Uncertain Words in Inputs
2022cited by this paper
Einops: Clear and Reliable Tensor Manipulations with Einstein-like Notation
2022cited by this paper
A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness
2022influential reference
Local Languages, Third Spaces, and other High-Resource Scenarios
2022cited by this paper
Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks
2021cited by this paper
Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration
2021cited by this paper
Revisiting the Calibration of Modern Neural Networks
2021cited by this paper
On Feature Collapse and Deep Kernel Learning for Single Forward Pass Uncertainty
2021cited by this paper
An Information-theoretic Approach to Distribution Shifts
2021cited by this paper
Knowing More About Questions Can Help: Improving Calibration in Question Answering
2021cited by this paper
Uncertainty-Aware Abstractive Summarization
2021cited by this paper
On Hallucination and Predictive Uncertainty in Conditional Language Generation
2021cited by this paper
Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty
2021cited by this paper
How Certain is Your Transformer?
2021cited by this paper
On Calibration and Out-of-domain Generalization
2021cited by this paper
Calibrate Before Use: Improving Few-Shot Performance of Language Models
2021cited by this paper
On the Calibration and Uncertainty of Neural Learning to Rank Models for Conversational Search
2021cited by this paper
Uncertainty Estimation in Autoregressive Structured Prediction
2021cited by this paper
Rethinking Calibration of Deep Neural Networks: Do Not Be Afraid of Overconfidence
2021cited by this paper
On the Effects of Transformer Size on In- and Out-of-Domain Calibration
2021cited by this paper
A Geometric Perspective towards Neural Calibration via Sensitivity Decomposition
2021cited by this paper
Uncertainty-Aware Machine Translation Evaluation
2021cited by this paper
Types of Out-of-Distribution Texts and How to Detect Them
2021influential reference
Genre as Weak Supervision for Cross-lingual Dependency Parsing
2021cited by this paper
Datasets: A Community Library for Natural Language Processing
2021cited by this paper
Soft Calibration Objectives for Neural Networks
2021cited by this paper
Infinite attention: NNGP and NTK for deep attention networks
2020cited by this paper
Discriminative Jackknife: Quantifying Uncertainty in Deep Learning via Higher-Order Influence Functions
2020cited by this paper
Being Bayesian about Categorical Probability
2020cited by this paper
Calibrating Deep Neural Networks using Focal Loss
2020cited by this paper
Calibration of Pre-trained Transformers
2020cited by this paper
Calibrating Structured Output Predictors for Natural Language Processing
2020cited by this paper
On the Inference Calibration of Neural Machine Translation
2020cited by this paper
DaNE: A Named Entity Resource for Danish
2020cited by this paper
Wat zei je? Detecting Out-of-Distribution Translations with Variational Transformers
2020influential reference
Selective Question Answering under Domain Shift
2020cited by this paper
Infinite attention: NNGP and NTK for deep attention networks
2020cited by this paper
Contrastive Training for Improved Out-of-Distribution Detection
2020cited by this paper
Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference
2020cited by this paper
Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures
2020cited by this paper
Uncertainty-Aware Semantic Augmentation for Neural Machine Translation
2020cited by this paper
Understanding Neural Abstractive Summarization Models via Uncertainty
2020cited by this paper
Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI
2020cited by this paper
Modeling Token-level Uncertainty to Learn Unknown Concepts in SLU via Calibrated Dirichlet Prior RNN
2020cited by this paper
A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios
2020cited by this paper
Computational Methods in Legal Analysis
2020cited by this paper
Underspecification Presents Challenges for Credibility in Modern Machine Learning
2020cited by this paper
Trust Issues: Uncertainty Estimation Does Not Enable Reliable OOD Detection On Medical Tabular Data
2020cited by this paper
Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty
2020cited by this paper
Uncertainty Estimation and Calibration with Finite-State Probabilistic RNNs
2020cited by this paper
DaN+: Danish Nested Named Entities and Lexical Normalization
2020cited by this paper
How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering
2020cited by this paper
Know your limits: Uncertainty estimation with ReLU classifiers fails at reliable OOD detection
2020cited by this paper
Quantifying the Carbon Emissions of Machine Learning
2019cited by this paper
Measuring Calibration in Deep Learning
2019influential reference
On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks
2019cited by this paper
Bayesian Modelling in Practice: Using Uncertainty to Improve Trustworthiness in Medical Applications
2019cited by this paper
Deep Dominance - How to Properly Compare Deep Neural Models
2019cited by this paper
An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction
2019influential reference
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
Multilingual is not enough: BERT for Finnish
2019cited by this paper
Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods
2019cited by this paper
Out-of-Domain Detection for Low-Resource Text Classification Tasks
2019cited by this paper
Energy Usage Reports: Environmental awareness as part of algorithmic accountability
2019cited by this paper
PyTorch: An Imperative Style, High-Performance Deep Learning Library
2019cited by this paper
A guide to deep learning in healthcare
2019cited by this paper
Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
2019cited by this paper
Predictive Uncertainty Estimation via Prior Networks
2018cited by this paper
Understanding Measures of Uncertainty for Adversarial Example Detection
2018cited by this paper
Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem
2018cited by this paper
Do Deep Generative Models Know What They Don't Know?
2018cited by this paper
GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration
2018cited by this paper
Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study
2018cited by this paper
Evidential Deep Learning to Quantify Classification Uncertainty
2018cited by this paper
Single-Model Uncertainties for Deep Learning
2018cited by this paper
Attention is All you Need
2017cited by this paper
On Calibration of Modern Neural Networks
2017cited by this paper
An Optimal Transportation Approach for Assessing Almost Stochastic Order
2017cited by this paper
Bayesian Recurrent Neural Networks
2017influential reference
Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization
2016cited by this paper
A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
2016cited by this paper
Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling
2016cited by this paper
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
2016cited by this paper
Data-driven HR - Résumé Analysis Based on Natural Language Processing and Machine Learning
2016cited by this paper
Obtaining Well Calibrated Probabilities Using Bayesian Binning
2015cited by this paper
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
2015influential reference
Universal Dependencies for Finnish
2015cited by this paper
Weight Uncertainty in Neural Network
2015cited by this paper
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
2015influential reference
Adam: A Method for Stochastic Optimization
2014cited by this paper
Recurrent Neural Network Regularization
2014cited by this paper
Building the essential resources for Finnish: the Turku Dependency Treebank
2013cited by this paper
Processing
2013cited by this paper
A unifying view on dataset shift in classification
2012cited by this paper
Random Search for Hyper-Parameter Optimization
2012cited by this paper

CITED BY

On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation
2026cites this paper
Making Foundation Models Probabilistic via Singular Value Ensembles
2026cites this paper
Show me the evidence: Evaluating the role of evidence and natural language explanations in AI-supported fact-checking
2026cites this paper
Enhancing Causal Text Detection Using Uncertainty-Weighted Machine Learning Ensembles
2026cites this paper
Calibration Is Not Enough: Evaluating Confidence Estimation Under Language Variations
2026cites this paper
The Capabilities and Limitations of Weak-to-Strong Generalization: Generalization and Calibration
2025cites this paper
Interplay between Bayesian neural networks and deep learning: A survey
2025cites this paper
Understanding the Capabilities and Limitations of Weak-to-Strong Generalization
2025cites this paper
Uncertainty in Causality: A New Frontier
2025cites this paper
MutantPrompt: Prompt Optimization via Mutation Under a Budget on Modest-sized LMs
2025cites this paper
Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models
2024cites this paper
Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks?
2024cites this paper
Beyond Simple Averaging: Improving NLP Ensemble Performance with Topological-Data-Analysis-Based Weighting
2024cites this paper
Graph-based Confidence Calibration for Large Language Models
2024cites this paper
Can LLMs Learn Uncertainty on Their Own? Expressing Uncertainty Effectively in A Self-Training Manner
2024cites this paper
Understanding the Impact of Confidence in Retrieval Augmented Generation: A Case Study in the Medical Domain
2024cites this paper
On the State of NLP Approaches to Modeling Depression in Social Media: A Post-COVID-19 Outlook
2024cites this paper
Non-Exchangeable Conformal Language Generation with Nearest Neighbors
2024cites this paper
Diversity-Aware Ensembling of Language Models Based on Topological Data Analysis
2024cites this paper
Interpreting Predictive Probabilities: Model Confidence or Human Label Variation?
2024cites this paper
API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access
2024cites this paper
Calibrating Large Language Models Using Their Generations Only
2024cites this paper
How much reliable is ChatGPT's prediction on Information Extraction under Input Perturbations?
2024cites this paper
When Quantization Affects Confidence of Large Language Models?
2024cites this paper
Are Data Augmentation Methods in Named Entity Recognition Applicable for Uncertainty Estimation?
2024cites this paper
Efficient Nearest Neighbor based Uncertainty Estimation for Natural Language Processing Tasks
2024cites this paper
Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models
2023cites this paper
Uncertainty in Natural Language Generation: From Theory to Applications
2023cites this paper
Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models
2023cites this paper
A Close Look into the Calibration of Pre-trained Language Models
2022cites this paper
Prior and Posterior Networks: A Survey on Evidential Deep Learning Methods For Uncertainty Estimation
2021cites this paper
S HIFTING A TTENTION TO R ELEVANCE : T OWARDS THE U NCERTAINTY
year unknowncites this paper