Discriminative variable selection for clustering with the sparse Fisher-EM algorithm

Published 2012 in Computational statistics (Zeitschrift)

ABSTRACT

The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results. Existing approaches have demonstrated the importance of variable selection for clustering but turn out to be either very time consuming or not sparse enough in high-dimensional spaces. This work proposes to perform a selection of the discriminative variables by introducing sparsity in the loading matrix of the Fisher-EM algorithm. This clustering method has been recently proposed for the simultaneous visualization and clustering of high-dimensional data. It is based on a latent mixture model which fits the data into a low-dimensional discriminative subspace. Three different approaches are proposed in this work to introduce sparsity in the orientation matrix of the discriminative subspace through ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{1}$$\end{document}-type penalizations. Experimental comparisons with existing approaches on simulated and real-world data sets demonstrate the interest of the proposed methodology. An application to the segmentation of hyperspectral images of the planet Mars is also presented.

PUBLICATION RECORD

Publication year
2012
Venue
Computational statistics (Zeitschrift)
Publication date
2012-04-10
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1007/s00180-013-0433-6 arXiv 1204.2067
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Model-based clustering of high-dimensional data: A review
2014cited by this paper
Theoretical and practical considerations on the convergence properties of the Fisher-EM algorithm
2012cited by this paper
Letter to the Editor
2011cited by this paper
Dimensionally reduced mixtures of regression models
2011cited by this paper
Simultaneous model-based clustering and visualization in the Fisher discriminative subspace
2011influential reference
Heteroscedastic factor mixture analysis
2010cited by this paper
Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data
2010cited by this paper
A Framework for Feature Selection in Clustering
2010influential reference
Sparse Linear Discriminant Analysis with Applications to High Dimensional Low Sample Size Data
2009influential reference
Variable Selection for Clustering with Gaussian Mixture Models
2009cited by this paper
A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.
2009influential reference
Penalized factor mixture analysis for variable selection in clustered data
2009cited by this paper
Variable selection in model-based clustering: A general variable role modeling
2009cited by this paper
A Flexible and Efficient Algorithm for Regularized Fisher Discriminant Analysis
2009cited by this paper
Mixtures of Factor Analyzers with Common Factor Loadings for the Clustering and Visualisation of High-Dimensional Data
2008influential reference
Parsimonious Gaussian mixture models
2008cited by this paper
Variable Selection for Model‐Based High‐Dimensional Clustering and Its Application to Microarray Data
2008cited by this paper
Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables.
2008cited by this paper
On the “degrees of freedom” of the lasso
2007cited by this paper
Penalized Model-Based Clustering with Application to Variable Selection
2007cited by this paper
The EM Algorithm for Factor Analyzers:An Extension with Latent Variable
2006cited by this paper
Variable Selection for Model-Based Clustering
2006cited by this paper
High-dimensional data clustering
2006cited by this paper
Sparse Principal Component Analysis
2006cited by this paper
Procrustes Problems
2005cited by this paper
Addendum: Regularization and variable selection via the elastic net
2005cited by this paper
High-Dimensional Discriminant Analysis
2005cited by this paper
Mars Surface Diversity as Revealed by the OMEGA/Mars Express Observations
2005cited by this paper
Review of: J.C. Gower & G.B. Dijksterhuis: Procrustes Problems, Oxford University Press.
2004cited by this paper
Simultaneous feature selection and clustering using mixture models
2004cited by this paper
A mixed factors model for dimension reduction and extraction of a group structure in gene expression data
2004cited by this paper
Modelling high-dimensional data by mixtures of factor analyzers
2003cited by this paper
Bayesian Clustering with Variable and Transformation Selections
2003cited by this paper
Estimating the number of clusters in a dataset via the gap statistic
2000cited by this paper
Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood
2000cited by this paper
Mixtures of Probabilistic Principal Component Analyzers
1999influential reference
Loading and correlations in the interpretation of principle compenents
1995cited by this paper
Introduction to statistical pattern recognition (2nd ed.)
1990cited by this paper
An Optimal Set of Discriminant Vectors
1975cited by this paper
Introduction to Statistical Pattern Recognition
1972cited by this paper
Letter to the editor.
1967cited by this paper
THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS
1936cited by this paper
Transactions on Pattern Analysis and Machine Intelligence 1 Mixtures of Factor Analyzers with Common Factor Loadings: Applications to the Clustering and Visualisation of High-dimensional Data
year unknowncited by this paper

CITED BY

Iterative Exploration-Driven Sparse SDP Clustering via Thompson Sampling
2025cites this paper
A consensus-constrained parsimonious Gaussian mixture model for clustering hyperspectral images
2024cites this paper
Limitations of clustering with PCA and correlated noise
2024cites this paper
Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection
2023cites this paper
Cluster Analysis for the Selection of Potential Discriminatory Variables and the Identification of Subgroups in Archaeometry
2023cites this paper
Sparse GEMINI for Joint Discriminative Clustering and Feature Selection
2023cites this paper
Quantile-based Clustering for Functional Data via Modelling Functional Principal Components Scores
2023cites this paper
TLS-EM algorithm of Mixture Density Models for exponential families
2022cites this paper
High-dimensional logistic entropy clustering
2021cites this paper
Bayesian inference for infinite asymmetric Gaussian mixture with feature selection
2021cites this paper
On variable selection in matrix mixture modelling
2020cites this paper
A Bayesian Fisher-EM algorithm for discriminative Gaussian subspace clustering
2020cites this paper
Cluster analysis with cellwise trimming and applications to robust clustering of curves
2020cites this paper
Model-Based Clustering and Classification for Data Science: With Applications in R
2019cites this paper
A hierarchical Bayesian approach for examining heterogeneity in choice decisions
2018cites this paper
The VIMOS Public Extragalactic Redshift Survey (VIPERS)
2018cites this paper
A survey of feature selection methods for Gaussian mixture models and hidden Markov models
2017influential citation
Variable selection methods for model-based clustering
2017influential citation
Variable selection in model-based clustering and discriminant analysis with a regularization approach
2017cites this paper
The discriminative functional mixture model for a comparative analysis of bike sharing systems
2016influential citation
THE DISCRIMINATIVE FUNCTIONAL MIXTURE MODEL FOR A COMPARATIVE ANALYSIS OF BIKE SHARING SYSTEMS
2016influential citation
Model-based clustering of high-dimensional data in Astrophysics
2016cites this paper
Behavioural variability and motor performance: Effect of practice specialization in front crawl swimming.
2016cites this paper
A statistical framework for modeling asthma and COPD biological heterogeneity, and a novel variable selection method for model-based clustering
2016cites this paper
The Discriminative Functional Mixture Model for the Analysis of Bike Sharing Systems
2014cites this paper
Model-based clustering of high-dimensional data: A review
2014cites this paper
Key Point Selection and Clustering of Swimmer Coordination Through Sparse Fisher-EM
2014cites this paper
Nonlinear Pedagogy: An Effective Approach to Cater for Individual Differences in Learning a Sports Skill
2014cites this paper
Spatial modelling of plant diversity from high-throughput environmental dna sequence data
2013cites this paper
MLSA13 - Proceedings of "Machine Learning and Data Mining for Sports Analytics", workshop @ ECML/PKDD 2013
2013cites this paper
MLSA 13-Proceedings of “ Machine Learning and Data Mining for Sports Analytics ”
2013cites this paper
Probabilistic model‐based discriminant analysis and clustering methods in chemometrics
2013influential citation
Sparse matrices in data analysis
2013cites this paper
Learning algorithms for sparse classification
2013cites this paper
Probabilistic model‐based discriminant analysis and clustering methods in chemometrics
2013influential citation
Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering.
2013cites this paper
Contributions à l'apprentissage statistique en grande dimension, adaptatif et sur données atypiques
2012cites this paper
Sparse and discriminative clustering for complex data. An application to cytology.
2011cites this paper