Feature Engineering in the Transformer Era: A Controlled Study on Toxic Comment Classification

Zhanyi Ding,Zijing Wei,Chaoqun Yang,Hailiang Wang,Shuo Xu,Yixiang Li,Xuanjie Chen

Published 2026 in Proceedings of the 2026 International Conference on Human-Computer Interaction, Neural Networks and Deep Learning

ABSTRACT

Detecting toxic language in user-generated text remains a critical challenge due to linguistic nuance, evolving expressions, and severe class imbalance. While Transformer-based models have established state-of-the-art performance, their significant computational costs pose scalability barriers for real-time moderation. We investigate whether integrating social and contextual metadata—such as user reactions and platform ratings—can bridge the performance gap between computationally efficient classical models and modern deep learning architectures. Using a 40,000-comment subset of the Jigsaw Toxic Comment Classification Challenge, we conduct a controlled, two-phase comparison. We evaluate a Baseline configuration (TF-IDF for classical ensembles vs. raw text for ALBERT) against an Enhanced configuration that fuses text representations with explicit social signals. Our investigation analyzes whether these high-fidelity metadata features allow lightweight models (e.g., LightGBM) to rival the discriminative power of deep Transformers. The findings challenge the prevailing assumption that deep semantic understanding is strictly necessary for high-performance toxicity detection, offering significant implications for the design of scalable, "Green AI" moderation systems.

PUBLICATION RECORD

  • Publication year

    2026

  • Venue

    Proceedings of the 2026 International Conference on Human-Computer Interaction, Neural Networks and Deep Learning

  • Publication date

    2026-01-09

  • Fields of study

    Not labeled

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-20 of 20 references · Page 1 of 1

CITED BY

  • No citing papers are available for this paper.

Showing 0-0 of 0 citing papers · Page 1 of 1