Simultaneous feature selection and clustering using mixture models

Martin H. C. Law,Mário A. T. Figueiredo,Anil K. Jain

Published 2004 in IEEE Transactions on Pattern Analysis and Machine Intelligence

ABSTRACT

Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectation-maximization (EM) algorithm to estimate it, in the context of mixture-based clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters.

PUBLICATION RECORD

Publication year
2004
Venue
IEEE Transactions on Pattern Analysis and Machine Intelligence
Publication date
2004-09-01
Fields of study
Medicine, Computer Science
Identifiers
DOI 10.1109/TPAMI.2004.71 PMID 15742891
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Pattern Classification
2012cited by this paper
Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images
2003cited by this paper
UNSUPERVISED TEXTURE SEGMENTATION USING
2003cited by this paper
Feature Selection in Clustering Problems
2003cited by this paper
Feature Weighting in k-Means Clustering
2003cited by this paper
Bayesian Feature Weighting for Unsupervised Learning, with Application to Object Recognition
2003cited by this paper
A Feature Selection Wrapper for Mixtures
2003cited by this paper
Input Feature Selection by Mutual Information Based on Parzen Window
2002cited by this paper
Unsupervised Learning of Finite Mixture Models
2002cited by this paper
Unsupervised Feature Selection Using Feature Similarity
2002cited by this paper
DNA microarrays and gene expression
2002cited by this paper
Feature selection for clustering - a filter solution
2002cited by this paper
Feature Selection in Mixture-Based Clustering
2002cited by this paper
Feature selection from huge feature sets
2001cited by this paper
Repairing Faulty Mixture Models using Density Estimation
2001cited by this paper
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks
2001cited by this paper
A Component-Wise EM Algorithm for Mixtures
2001cited by this paper
Variational Bayesian Model Selection for Mixture Distributions
2001influential reference
Feature selection for high-dimensional genomic microarray data
2001cited by this paper
Rapid object detection using a boosted cascade of simple features
2001cited by this paper
Statistical Pattern Recognition: A Review
2000cited by this paper
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
2000cited by this paper
MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions
2000cited by this paper
Feature Selection for Clustering
2000cited by this paper
Dependency-based feature selection for clustering symbolic data
2000cited by this paper
Feature Subset Selection and Order Identification for Unsupervised Learning
2000cited by this paper
Clustering very large databases using EM mixture models
2000cited by this paper
Maximum certainty data partitioning
2000cited by this paper
Feature selection in unsupervised learning via evolutionary search
2000cited by this paper
Generalized Model Selection for Unsupervised Learning in High Dimensions
1999cited by this paper
Concept Learning and Feature Selection Based on Square-Error Clustering
1999cited by this paper
A Robust Competitive Clustering Algorithm With Applications in Computer Vision
1999cited by this paper
Data clustering: a review
1999cited by this paper
Using machine learning to improve information access
1998cited by this paper
Conceptual clustering in information retrieval
1998cited by this paper
Rough Sets: A Tutorial
1998cited by this paper
Automatic subspace clustering of high dimensional data for data mining applications
1998cited by this paper
Feature Subset Selection Using a Genetic Algorithm
1998cited by this paper
Normalized cuts and image segmentation
1997cited by this paper
Feature Selection: Evaluation, Application, and Small Sample Performance
1997cited by this paper
Efficient Feature Selection in Conceptual Clustering
1997cited by this paper
A decision-theoretic generalization of on-line learning and an application to boosting
1997cited by this paper
Selection of Relevant Features and Examples in Machine Learning
1997cited by this paper
Wrappers for Feature Subset Selection
1997cited by this paper
A Feature-Based Approach to Market Segmentation via Overlapping K-Centroids Clustering
1997cited by this paper
Advances in Cluster Analysis Relevant to Marketing Research
1996cited by this paper
Divergence Based Feature Selection for Multimodal Class Densities
1996cited by this paper
Toward Optimal Feature Selection
1996cited by this paper
Cluster-based text categorization: a comparison of category search strategies
1995cited by this paper
Feature selection based on the approximation of class densities by finite mixtures of special type
1995cited by this paper
Using mutual information for selecting features in supervised neural net learning
1994cited by this paper
Estimating Attributes: Analysis and Extensions of RELIEF
1994cited by this paper
Greedy Attribute Selection
1994cited by this paper
Measures of information and their applications
1994cited by this paper
Cluster analysis in marketing research
1994cited by this paper
Floating search methods in feature selection
1994cited by this paper
Subset Selection in Regression
1992cited by this paper
The Feature Selection Problem: Traditional Methods and a New Algorithm
1992cited by this paper
Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners
1991cited by this paper
Mixture models : inference and applications to clustering
1989cited by this paper
Algorithms for Clustering Data
1988cited by this paper
A Problem of Dimensionality: A Simple Example
1979cited by this paper
Image segmentation by clustering
1979cited by this paper
Pattern Classification
1973cited by this paper

CITED BY

Unsupervised feature selection using bidirectional fuzzy rough divergence metrics
2026cites this paper
Exposing Optimal Feature Sets for Enhancing Machine Learning Performance
2025cites this paper
Multivariate bounded support Kotz mixture model with semi-supervised projected model-based clustering
2025cites this paper
Comparative Analysis of Multivariate Mixture Models for Clustering Cancer Expression Data
2025cites this paper
A comprehensive survey on recent feature selection methods for mixed data: Challenges, solutions and future directions
2025cites this paper
Federated Variational Inference for Bayesian Mixture Models
2025cites this paper
Kotz Mixture Model with Semi-Supervised Projected Model-Based Clustering
2025cites this paper
Meta-level Experience Sharing for Autonomous Systems by Fusing Generative Hierarchical Dynamic Bayesian Networks
2025cites this paper
Real-Time Video Segmentation by Means of Finite GMMs and Background Subtraction
2025cites this paper
Bayesian Model Averaging with Diffused Priors for Model-Based Clustering Under a Cluster Forests Architecture
2025cites this paper
Multivariate Bounded Support Kotz Mixture Model: Addressing Financial Fraud and Network Security Challenges
2025cites this paper
Spatial Prediction of Soil Attributes from PRISMA Hyperspectral Imagery Using Wrapper Feature Selection and Ensemble Modeling
2024cites this paper
Improvements on Gaussian mixture model and its application in identifying aerosol types in two major cities in the Yangtze River Delta, China.
2024cites this paper
VICatMix: variational Bayesian clustering and variable selection for discrete biomedical data
2024cites this paper
Spillovers in Europe: The role of ESG
2024cites this paper
A hierarchical count data clustering based on Multinomial Nested Dirichlet Mixture using the Minorization-Maximization framework
2024cites this paper
Fundamental Components and Principles of Supervised Machine Learning Workflows with Numerical and Categorical Data
2024cites this paper
A multi-scale information fusion-based multiple correlations for unsupervised attribute selection
2024cites this paper
Discriminative Dimension Selection for Enhancing the Interpretability and Performance of Clustering Output
2024cites this paper
Clustering Functional Magnetic Resonance Imaging Time Series in Glioblastoma Characterization: A Review of the Evolution, Applications, and Potentials
2024cites this paper
HMOSHSSA: a novel framework for solving simultaneous clustering and feature selection problems
2024cites this paper
Unsupervised attribute reduction based on neighborhood dependency
2024cites this paper
Simultaneous count data feature selection and clustering using Multinomial Nested Dirichlet Mixture
2024cites this paper
Application of machine learning in delineating groundwater contamination in present and climate change scenarios
2024cites this paper
A Bayesian hierarchical hidden Markov model for clustering and gene selection: Application to kidney cancer gene expression data
2024cites this paper
Unsupervised attribute reduction based on variable precision weighted neighborhood dependency
2024cites this paper
A Robust TabNet-Based Multi-Classification Algorithm for Infrared Spectral Data of Chinese Herbal Medicine with High-Dimensional Small Samples.
2024cites this paper
Filter unsupervised spectral feature selection method for mixed data based on a new feature correlation measure
2023cites this paper
Feature Extraction with Differential Evolution Algorithm
2023cites this paper
A novel clustering-based resampling with cost-sensitive boosting method to model and map wildfire susceptibility
2023cites this paper
Second-Order Unsupervised Feature Selection via Knowledge Contrastive Distillation
2023cites this paper
Regularization and optimization in model-based clustering
2023cites this paper
Unsupervised Mixture Models on the Edge for Smart Energy Consumption Segmentation with Feature Saliency
2023influential citation
Fast Unsupervised Feature Selection With Bipartite Graph and $\ell _{2,0}$ℓ2,0-Norm Constraint
2023cites this paper
Bayesian outcome-guided multi-view mixture models with applications in molecular precision medicine
2023cites this paper
Unsupervised feature selection via discrete spectral clustering and feature weights
2023cites this paper
Clustering Spatially Correlated Functional Data With Multiple Scalar Covariates
2022cites this paper
Feature Saliencies in Asymmetric Hidden Markov Models
2022influential citation
A Comprehensive Survey on the Process, Methods, Evaluation, and Challenges of Feature Selection
2022cites this paper
Fuzzy Theory in Fog Computing: Review, Taxonomy, and Open Issues
2022cites this paper
Exploring Various International Law and Its Classification
2022cites this paper
A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count data
2022influential citation
An Optimized Gradient Dynamic-Neuro-Weighted-Fuzzy Clustering Method: Application in the Nutrition Field
2022cites this paper
Unsupervised modeling and feature selection of sequential spherical data through nonparametric hidden Markov models
2022cites this paper
MARK: Fill in the blanks through a JointGAN based data augmentation for network anomaly detection
2022cites this paper
A novel feature selection method using generalized inverted Dirichlet-based HMMs for image categorization
2022influential citation
Artificial intelligence-based analytics for impacts of COVID-19 and online learning on college students’ mental health
2022cites this paper
Compactness score: a fast filter method for unsupervised feature selection
2022cites this paper
Spatio-temporal mixture process estimation to detect dynamical changes in population
2022cites this paper
Economic Order Quantity Model-Based Optimized Fuzzy Nonlinear Dynamic Mathematical Schemes
2022cites this paper
Remodeling the tumor microenvironment via blockade of LAIR-1 and TGF-β signaling enables PD-L1–mediated tumor eradication
2022cites this paper
Bayesian profile regression for clustering analysis involving a longitudinal response and explanatory variables.
2021cites this paper
Why Stable Learning Works? A Theory of Covariate Shift Generalization
2021cites this paper
A Framework based on Finite Mixture Models and Adaptive Kriging for Characterizing Non-Smooth and Multimodal Failure Regions in a Nuclear Passive Safety System
2021cites this paper
A class-specific metaheuristic technique for explainable relevant feature selection
2021cites this paper
A Novel Multi-objective Differential Evolution Algorithm for Clustering Data Streams
2021cites this paper
Unsupervised and Semisupervised Learning
2021cites this paper
Reliability Assessment of Passive Safety Systems for Nuclear Energy Applications: State-of-the-Art and Open Issues
2021cites this paper
Cluster analysis of mixed data based on Feature Space Instance Cluster Closeness Metric
2021cites this paper
Analysis and modeling of air conditioner usage behavior in residential buildings using monitoring data during hot and humid season
2021cites this paper
Adapting Supervised Feature Selection Methods for Clustering Tasks
2021cites this paper
Machine learning algorithm for feature space clustering of mixed data with missing information based on molecule similarity
2021cites this paper
An hybrid particle swarm optimization with crow search algorithm for feature selection
2021cites this paper
Media Forensics Considerations on DeepFake Detection with Hand-Crafted Features
2021cites this paper
Introduction to Sentiment Analysis Covering Basics, Tools, Evaluation Metrics, Challenges, and Applications
2021cites this paper
Research on the Mental Health of College Students Based on Fuzzy Clustering Algorithm
2021cites this paper
Online mixture-based clustering for high dimensional count data using Neerchal-Morel distribution
2021cites this paper
Multiobjective optimization technique for gene selection and sample categorization
2021cites this paper
Hybrid optimization algorithm for security aware cluster head selection process to aid hierarchical routing in wireless sensor network
2021cites this paper
Simultaneous positive sequential vectors modeling and unsupervised feature selection via continuous hidden Markov models
2021cites this paper
Online variational inference on finite multivariate Beta mixture models for medical applications
2021cites this paper
Bayesian inference for infinite asymmetric Gaussian mixture with feature selection
2021cites this paper
Mixture-Based Unsupervised Learning for Positively Correlated Count Data
2021cites this paper
Maximum A Posteriori Approximation of Hidden Markov Models for Proportional Sequential Data Modeling With Simultaneous Feature Selection
2021cites this paper
Inferring latent heterogeneity using many feature variables supervised by survival outcome
2021cites this paper
A Review on Feature Selection and Ensemble Techniques for Intrusion Detection System
2021cites this paper
Efficient neural spike sorting using data subdivision and unification.
2021cites this paper
Utilising Flow Aggregation to Classify Benign Imitating Attacks
2021cites this paper
Dual space latent representation learning for unsupervised feature selection
2021cites this paper
An Experimental Evaluation of Clustering And Classification of High-Speed Dimensional Data Stream in Dynamic Feature Selection
2021cites this paper
A Theoretical Analysis on Independence-driven Importance Weighting for Covariate-shift Generalization
2021cites this paper
SSA-CFS:An Effective Feature Selection Method for Intrusion Detection System
2021cites this paper
An evaluation of feature selection methods for environmental data
2021cites this paper
An experience selecting quality features of apps for people with disabilities using abductive approach to explanatory theory generation
2021cites this paper
Clustering Analysis in the Wireless Propagation Channel with a Variational Gaussian Mixture Model
2020cites this paper
Background subtraction using infinite asymmetric Gaussian mixture models with simultaneous feature selection
2020cites this paper
Multi-view feature selection via Nonnegative Structured Graph Learning
2020cites this paper
Preserving Ordinal Consensus: Towards Feature Selection for Unlabeled Data
2020cites this paper
On variable selection in matrix mixture modelling
2020cites this paper
Modeling and Clustering Positive Vectors via Nonparametric Mixture Models of Liouville Distributions
2020cites this paper
Multivariate bounded support Laplace mixture model
2020cites this paper
Unsupervised feature selection via adaptive hypergraph regularized latent representation learning
2020cites this paper
Gene encoder: a feature selection technique through unsupervised deep learning-based clustering for large gene expression data
2020cites this paper
Large scale data mining for banking credit risk prediction
2020cites this paper
Gaussian mixture model with feature selection: An embedded approach
2020cites this paper
Mixture‐based clustering for count data using approximated Fisher Scoring and Minorization–Maximization approaches
2020cites this paper
Variational Inference of Infinite Generalized Gaussian Mixture Models with Feature Selection
2020cites this paper
Validity Based Approach for Feature Selection in Intrusion Detection Systems
2020cites this paper
A Doubly Enhanced EM Algorithm for Model-Based Tensor Clustering
2020cites this paper
A systematic evaluation of filter Unsupervised Feature Selection methods
2020cites this paper