Effective document summarization: a hybrid clustering approach using transformer model

No author metadata is attached to this paper.

Published 2026 in PeerJ Computer Science

ABSTRACT

The rapid growth of web documents has led to the need for automatic document summarization so readers can quickly browse the document for information. The challenge in extractive summarization is extracting the relevant information while ignoring redundant content to generate a precise summary. This article introduces a hybrid clustering algorithm that ignores outliers to eliminate irrelevant content and group similar sentences. The proposed methodology combines the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithms, with DBSCAN forming densely coupled clusters and BIRCH splitting these massive clusters into sub-clusters. This hybrid approach facilitates effective sentence selection from each cluster and overcomes the limitations found when using DBSCAN and BIRCH individually. This proposed method further addresses the arbitrary decision issues found in hierarchical clustering by carefully tuning the Epsilon (ε) and MinPts parameters of DBSCAN to manage larger datasets. The systematic evaluation of the proposed summarization model using hybrid clustering was performed using the CNN/DailyMail dataset and measured in terms of Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. The model achieved values of 0.429, 0.213, and 0.371 for ROUGE-1, ROUGE-2, and ROUGE-L, respectively. The proposed summarization model outperforms the state-of-the-art unsupervised and supervised automatic text summarization models.

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-43 of 43 references · Page 1 of 1

CITED BY

  • No citing papers are available for this paper.

Showing 0-0 of 0 citing papers · Page 1 of 1