The development of data science has brought about many discussions of noise detection, and so far, there is no universal best method. In this paper, we propose a clustering-algorithm-based solution to identify and remove noise from air pollution data collected with mobile portable sensors. The test dataset is the air pollution data collected by the portable sensors throughout three seasons at the campus in Macao. We have applied and compared six clustering algorithms to identify the most appropriate clustering algorithm to achieve this goal: Simple K-means, Hierarchical Clustering, Cascading K-means, X-means, Expectation Maximization, and Self-Organizing Map. The performance is evaluated by their accuracy and the best number of clusters calculated by the Silhouette Coefficient. Additionally, a classification algorithm J48 tree can extract the key attributes and identify the noise cluster for future unlabeled data that may contain noise. The experiment results indicate that the Expectation Maximization and Cascading Simple K-Means perform the best. Moreover, temperature and carbon dioxide are vital attributes in identifying the noise cluster.
Clustering Algorithms based Noise Identification from Air Pollution Monitoring Data
Xinyi Fang,Chak Fong Chong,Xu Yang,Yapeng Wang
Published 2022 in 2022 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)
ABSTRACT
PUBLICATION RECORD
- Publication year
2022
- Venue
2022 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)
- Publication date
2022-12-18
- Fields of study
Not labeled
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-23 of 23 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1