The rapid growth of web documents has led to the need for automatic document summarization so readers can quickly browse the document for information. The challenge in extractive summarization is extracting the relevant information while ignoring redundant content to generate a precise summary. This article introduces a hybrid clustering algorithm that ignores outliers to eliminate irrelevant content and group similar sentences. The proposed methodology combines the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithms, with DBSCAN forming densely coupled clusters and BIRCH splitting these massive clusters into sub-clusters. This hybrid approach facilitates effective sentence selection from each cluster and overcomes the limitations found when using DBSCAN and BIRCH individually. This proposed method further addresses the arbitrary decision issues found in hierarchical clustering by carefully tuning the Epsilon (ε) and MinPts parameters of DBSCAN to manage larger datasets. The systematic evaluation of the proposed summarization model using hybrid clustering was performed using the CNN/DailyMail dataset and measured in terms of Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. The model achieved values of 0.429, 0.213, and 0.371 for ROUGE-1, ROUGE-2, and ROUGE-L, respectively. The proposed summarization model outperforms the state-of-the-art unsupervised and supervised automatic text summarization models.
Effective document summarization: a hybrid clustering approach using transformer model
No author metadata is attached to this paper.
Published 2026 in PeerJ Computer Science
ABSTRACT
PUBLICATION RECORD
- Publication year
2026
- Venue
PeerJ Computer Science
- Publication date
2026-02-27
- Fields of study
Not labeled
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-43 of 43 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1