Beyond Cross-Entropy: Discounted Least Information Theory of Entropy (DLITE) Loss and the Impact of Loss Functions on AI-Driven Named Entity Recognition

Published 2025 in Inf.

ABSTRACT

Loss functions play a significant role in shaping model behavior in machine learning, yet their design implications remain underexplored in natural language processing tasks such as Named Entity Recognition (NER). This study investigates the performance and optimization behavior of five loss functions—L1, L2, Cross-Entropy (CE), KL Divergence (KL), and the proposed DLITE (Discounted Least Information Theory of Entropy) Loss—within transformer-based NER models. DLITE introduces a bounded, entropy-discounting approach to penalization, prioritizing recall and training stability, especially under noisy or imbalanced data conditions. We conducted empirical evaluations across three benchmark NER datasets: Basic NER, CoNLL-2003, and the Broad Twitter Corpus. While CE and KL achieved the highest weighted F1-scores in clean datasets, DLITE Loss demonstrated distinct advantages in macro recall, precision–recall balance, and convergence stability—particularly in noisy environments. Our findings suggest that the choice of loss function should align with application-specific priorities, such as minimizing false negatives or managing uncertainty. DLITE adds a new dimension to model design by enabling more measured predictions, making it a valuable alternative in high-stakes or real-world NLP deployments.

PUBLICATION RECORD

Publication year
2025
Venue
Inf.
Publication date
2025-09-02
Fields of study
Computer Science
Identifiers
DOI 10.3390/info16090760
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

OWAdapt: An adaptive loss function for deep learning using OWA operators
2023cited by this paper
Alternatives to Classic BM25-IDF based on a New Information Theoretical Framework
2022cited by this paper
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension
2021cited by this paper
CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review
2021cited by this paper
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English
2021cited by this paper
A Comprehensive Survey of Loss Functions in Machine Learning
2020cited by this paper
5分で分かる!? 有名論文ナナメ読み：Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding
2020influential reference
Attention is not not Explanation
2019cited by this paper
Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review
2019cited by this paper
Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels
2018cited by this paper
Focal Loss for Dense Object Detection
2017cited by this paper
Robust Loss Functions under Label Noise for Deep Neural Networks
2017cited by this paper
Long Short-Term Memory (LSTM) networks with jet constituents for boosted top tagging at the LHC
2017cited by this paper
Attention is All you Need
2017cited by this paper
On Loss Functions for Deep Neural Networks in Classification
2017cited by this paper
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
2016cited by this paper
The FAIR Guiding Principles for scientific data management and stewardship
2016cited by this paper
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
2016cited by this paper
The CHEMDNER corpus of chemicals and drugs and its annotation principles
2015cited by this paper
Least Information Modeling for Information Retrieval
2012cited by this paper
A survey of named entity recognition and classification
2007cited by this paper
A Tutorial on the Cross-Entropy Method
2005cited by this paper
Hidden Markov Models
2005cited by this paper
The Elements of Statistical Learning
2003cited by this paper
Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
2003cited by this paper
GENIA corpus - a semantically annotated corpus for bio-textmining
2003cited by this paper
Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability
2001cited by this paper
Hidden Markov models.
1996cited by this paper
$I$-Divergence Geometry of Probability Distributions and Minimization Problems
1975cited by this paper
Procedures As A Representation For Data In A Computer Program For Understanding Natural Language
1971cited by this paper
On Information and Sufficiency
1951cited by this paper

CITED BY

No citing papers are available for this paper.