We investigate the problem of determining the predictive confidence (or, conversely, uncertainty) of a neural classifier through the lens of low-resource languages. By training models on sub-sampled datasets in three different languages, we assess the quality of estimates from a wide array of approaches and their dependence on the amount of available data. We find that while approaches based on pre-trained models and ensembles achieve the best results overall, the quality of uncertainty estimates can surprisingly suffer with more data. We also perform a qualitative analysis of uncertainties on sequences, discovering that a model's total uncertainty seems to be influenced to a large degree by its data uncertainty, not model uncertainty. All model implementations are open-sourced in a software package.
Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity
Dennis Ulmer,J. Frellsen,Christian Hardmeier
Published 2022 in Conference on Empirical Methods in Natural Language Processing
ABSTRACT
PUBLICATION RECORD
- Publication year
2022
- Venue
Conference on Empirical Methods in Natural Language Processing
- Publication date
2022-10-20
- Fields of study
Linguistics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
CITED BY
Showing 1-32 of 32 citing papers · Page 1 of 1