In sparse principal component analysis we are given noisy observations of a low-rank matrix of dimension n x p and seek to reconstruct it under additional sparsity assumptions. In particular, we assume here that the principal components v1,..., vr have at most k1, · · · , kq non-zero entries respectively, and study the high-dimensional regime in which p is of the same order as n. In an influential paper, Johnstone and Lu [JL04] introduced a simple algorithm that estimates the support of the principal vectors v1,..., vr by the largest entries in the diagonal of the empirical covariance. This method can be shown to succeed with high probability if kq ≤ C1 √n/ log p, and to fail with high probability if kq ≥ C2 √n/ log p for two constants 0 < C1, C2 < ∞. Despite a considerable amount of work over the last ten years, no practical algorithm exists with provably better support recovery guarantees. Here we analyze a covariance thresholding algorithm that was recently proposed by Krauthgamer, Nadler and Vilenchik [KNV13]. We confirm empirical evidence presented by these authors and rigorously prove that the algorithm succeeds with high probability for k of order √n. Recent conditional lower bounds [BR13] suggest that it might be impossible to do significantly better. The key technical component of our analysis develops new bounds on the norm of kernel random matrices, in regimes that were not considered before.
Sparse PCA via Covariance Thresholding
Published 2013 in Journal of machine learning research
ABSTRACT
PUBLICATION RECORD
- Publication year
2013
- Venue
Journal of machine learning research
- Publication date
2013-11-20
- Fields of study
Mathematics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-45 of 45 references · Page 1 of 1