Big data has become popular for processing, storing and managing massive volumes of data. The clustering of datasets has become a challenging issue in the field of big data analytics. The K-means algorithm is best suited for finding similarities between entities based on distance measures with small datasets. Existing clustering algorithms require scalable solutions to manage large datasets. This study presents two approaches to the clustering of large datasets using MapReduce. The first approach, K-Means Hadoop MapReduce (KM-HMR), focuses on the MapReduce implementation of standard K-means. The second approach enhances the quality of clusters to produce clusters with maximum intra-cluster and minimum inter-cluster distances for large datasets. The results of the proposed approaches show significant improvements in the efficiency of clustering in terms of execution times. Experiments conducted on standard K-means and proposed solutions show that the KM-I2C approach is both effective and efficient.
Clustering large datasets using K-means modified inter and intra clustering (KM-I2C) in Hadoop
C. Sreedhar,N. Kasiviswanath,P. C. Reddy
Published 2017 in Journal of Big Data
ABSTRACT
PUBLICATION RECORD
- Publication year
2017
- Venue
Journal of Big Data
- Publication date
2017-12-01
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-38 of 38 references · Page 1 of 1
CITED BY
Showing 1-73 of 73 citing papers · Page 1 of 1