This article studies the outlier detection problem from the standpoint of penalized regression. In the regression model, we add one mean shift parameter for each of the n data points. We then apply a regularization favoring a sparse vector of mean shift parameters. The usual L1 penalty yields a convex criterion, but fails to deliver a robust estimator. The L1 penalty corresponds to soft thresholding. We introduce a thresholding (denoted by Θ) based iterative procedure for outlier detection (Θ–IPOD). A version based on hard thresholding correctly identifies outliers on some hard test problems. We describe the connection between Θ–IPOD and M-estimators. Our proposed method has one tuning parameter with which to both identify outliers and estimate regression coefficients. A data-dependent choice can be made based on the Bayes information criterion. The tuned Θ–IPOD shows outstanding performance in identifying outliers in various situations compared with other existing approaches. In addition, Θ–IPOD is much faster than iteratively reweighted least squares for large data, because each iteration costs at most O(np) (and sometimes much less), avoiding an O(np2) least squares estimate. This methodology can be extended to high-dimensional modeling with p ≫ n if both the coefficient vector and the outlier pattern are sparse.
Outlier Detection Using Nonconvex Penalized Regression
Published 2010 in arXiv.org
ABSTRACT
PUBLICATION RECORD
- Publication year
2010
- Venue
arXiv.org
- Publication date
2010-06-14
- Fields of study
Mathematics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-45 of 45 references · Page 1 of 1