Detecting toxic language in user-generated text remains a critical challenge due to linguistic nuance, evolving expressions, and severe class imbalance. While Transformer-based models have established state-of-the-art performance, their significant computational costs pose scalability barriers for real-time moderation. We investigate whether integrating social and contextual metadata—such as user reactions and platform ratings—can bridge the performance gap between computationally efficient classical models and modern deep learning architectures. Using a 40,000-comment subset of the Jigsaw Toxic Comment Classification Challenge, we conduct a controlled, two-phase comparison. We evaluate a Baseline configuration (TF-IDF for classical ensembles vs. raw text for ALBERT) against an Enhanced configuration that fuses text representations with explicit social signals. Our investigation analyzes whether these high-fidelity metadata features allow lightweight models (e.g., LightGBM) to rival the discriminative power of deep Transformers. The findings challenge the prevailing assumption that deep semantic understanding is strictly necessary for high-performance toxicity detection, offering significant implications for the design of scalable, "Green AI" moderation systems.
Feature Engineering in the Transformer Era: A Controlled Study on Toxic Comment Classification
Zhanyi Ding,Zijing Wei,Chaoqun Yang,Hailiang Wang,Shuo Xu,Yixiang Li,Xuanjie Chen
Published 2026 in Proceedings of the 2026 International Conference on Human-Computer Interaction, Neural Networks and Deep Learning
ABSTRACT
PUBLICATION RECORD
- Publication year
2026
- Venue
Proceedings of the 2026 International Conference on Human-Computer Interaction, Neural Networks and Deep Learning
- Publication date
2026-01-09
- Fields of study
Not labeled
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-20 of 20 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1