Feature selection for linear SVM with provable guarantees

Saurabh Paul,M. Magdon-Ismail,P. Drineas

Published 2014 in Pattern Recognition

ABSTRACT

We give two provably accurate feature-selection techniques for the linear SVM. The algorithms run in deterministic and randomized time respectively. Our algorithms can be used in an unsupervised or supervised setting. The supervised approach is based on sampling features from support vectors. We prove that the margin in the feature space is preserved to within ź-relative error of the margin in the full feature space in the worst-case. In the unsupervised setting, we also provide worst-case guarantees of the radius of the minimum enclosing ball, thereby ensuring comparable generalization as in the full feature space and resolving an open problem posed in Dasgupta et al. (2007) 7. We present extensive experiments on real-world datasets to support our theory and to demonstrate that our method is competitive and often better than prior state-of-the-art, for which there are no known provable guarantees. HighlightsWe give two provably accurate feature-selection techniques for the linear SVM.Algorithms can be used in supervised or unsupervised setting.We prove margin is preserved to within e-relative error in the full feature space.In unsupervised case, we provide worst-case guarantees of margin and radius of minimum enclosing ball.Extensive experiments demonstrate that our method is competitive and often better than prior art.

PUBLICATION RECORD

Publication year
2014
Venue
Pattern Recognition
Publication date
2014-06-01
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.1016/j.patcog.2016.05.018 arXiv 1406.0167
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Feature Selection for Ridge Regression with Provable Guarantees
2015cited by this paper
Core-Sets For Canonical Correlation Analysis
2015cited by this paper
Gene Selection for Cancer Classification using Support Vector Machines
2014cited by this paper
Deterministic Feature Selection for Regularized Least Squares Classification
2014cited by this paper
Convex formulations of radius-margin based Support Vector Machines
2013cited by this paper
Oracle properties of SCAD-penalized support vector machine
2012cited by this paper
Random Projections for Linear Support Vector Machines
2012cited by this paper
Random Projections for Support Vector Machines
2012cited by this paper
Near-Optimal Coresets for Least-Squares Regression
2012cited by this paper
Efficient variable selection in support vector machines via the alternating direction method of multipliers
2011cited by this paper
LIBSVM: A library for support vector machines
2011cited by this paper
Deterministic Feature Selection for K-Means Clustering
2011cited by this paper
Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets
2010cited by this paper
Feature Weighting Using Margin and Radius Based Error Bound Optimization in SVMs
2009cited by this paper
CUR matrix decompositions for improved data analysis
2009cited by this paper
Unsupervised Feature Selection for the $k$-means Clustering Problem
2009cited by this paper
Hybrid huberized support vector machines for microarray classification and gene selection
2008cited by this paper
Twice-ramanujan sparsifiers
2008influential reference
LIBLINEAR: A Library for Large Linear Classification
2008influential reference
Feature selection methods for text classification
2007cited by this paper
The doubly regularized support vector machine
2006cited by this paper
Sampling from large matrices: An approach through geometric functional analysis
2005cited by this paper
RCV1: A New Benchmark Collection for Text Categorization Research
2004influential reference
Second Order Cone Programming Formulations for Feature Selection
2004cited by this paper
Margin based feature selection - theory and algorithms
2004cited by this paper
Parameterized generation of labeled datasets for text categorization based on a hierarchical directory
2004influential reference
A Feature Selection Newton Method for Support Vector Machine Classification
2004cited by this paper
Use of the Zero-Norm with Linear Models and Kernel Methods
2003cited by this paper
Variable Selection Using SVM-based Criteria
2003cited by this paper
Gene Selection for Cancer Classification using Support Vector Machines
2002influential reference
An Introduction to Support Vector Machines and Other Kernel‐based Learning Methods
2001cited by this paper
Feature Selection for SVMs
2000cited by this paper
An Introduction to Support Vector Machines and Other Kernel-based Learning Methods
2000cited by this paper
Statistical learning theory
1998influential reference
Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization
1996influential reference
Pattern recognition
1974cited by this paper
Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities
1971influential reference

CITED BY

A Hybrid Framework for Ancient Glass Classification and Recognition: Combining Binary Nutcracker Optimization Algorithm with KNN Method
2024cites this paper
Detection of apple moldy cores using transmittance spectroscopy combined with supervised classifier comparison and parameter optimization
2023cites this paper
Classification of smart grid stability prediction using cascade machine learning methods and the internet of things in smart grid
2023cites this paper
A robust model selection framework for fault detection and system health monitoring with limited failure examples: Heterogeneous data fusion and formal sensitivity bounds
2022cites this paper
Expert Systems With Applications
2022cites this paper
Comparison of different classification methods for autism spectrum diagnosis
2022cites this paper
Improved barnacles mating optimizer algorithm for feature selection and support vector machine optimization
2021cites this paper
Chaotic binary Group Search Optimizer for feature selection
2021cites this paper
Sampling discretization and related problems
2021cites this paper
The key successful factors of video and mobile game crowdfunding projects using a lexicon-based feature selection approach
2021cites this paper
Improved Subsampled Randomized Hadamard Transform for Linear SVM
2020cites this paper
An ordered search with a large margin classifier for feature selection
2020cites this paper
A Novel Data Analytics Method for Predicting the Delivery Speed of Software Enhancement Projects
2020cites this paper
The Effectiveness of Johnson-Lindenstrauss Transform for High Dimensional Optimization With Adversarial Outliers, and the Recovery
2020cites this paper
A Hybrid Improved Dragonfly Algorithm for Feature Selection
2020cites this paper
Text feature extraction based on stacked variational autoencoder
2020cites this paper
Electrocardiogram classification of lead convolutional neural network based on fuzzy algorithm
2020cites this paper
An Improved Whale Optimization Algorithm for Feature Selection
2020cites this paper
The Effectiveness of Johnson-Lindenstrauss Transform for High Dimensional Optimization with Outliers
2020cites this paper
Unsupervised feature selection via adaptive hypergraph regularized latent representation learning
2020cites this paper
Privileged Learning using Unselected Features
2019cites this paper
A Unified Framework for Decision Tree on Continuous Attributes
2019cites this paper
Classification of UAV-to-ground vehicles based on micro-Doppler signatures using singular value decomposition and reconstruction
2019cites this paper
Improving CNN linear layers with power mean non-linearity
2019cites this paper
A Review on Dimensionality Reduction Techniques
2019cites this paper
Bi-sparse optimization-based least squares regression
2019cites this paper
Recognizing important factors of influencing trust in O2O models: an example of OpenTable
2019cites this paper
Unsupervised feature selection via latent representation learning and manifold regularization
2019cites this paper
A Novel Approach to Classify Video Scenes via Mining Weighted Association Rules of Actions in Temporal Sequence
2019influential citation
The Vision–Brain Hypothesis
2019cites this paper
Classification of Ground Vehicles Based on Micro-Doppler Effect and Singular Value Decomposition
2019influential citation
Whale optimization approaches for wrapper feature selection
2018cites this paper
Image compression based on SVD for BoVW model in fingerprint classification
2018cites this paper
Temporal - spatial recognizer for multi-label data
2018cites this paper
A pattern recognition modeling approach based on the intelligent ensemble classifier: Application to identification and appraisal of water-flooded layers
2018cites this paper
Quantifying the resilience of machine learning classifiers used for cyber security
2018cites this paper
Building material prices forecasting based on least square support vector machine and improved particle swarm optimization
2018cites this paper
Machine Learning for Enhancement Land Cover and Crop Types Classification
2018cites this paper
Fault Signal Recognition in Power Distribution System using Deep Belief Network
2018cites this paper
Optimasi Support Vector Machine untuk Memprediksi Adanya Mutasi pada DNA Hepatitis C Virus
2018influential citation
Final Program Report Theoretical Foundations of Big Data Analysis (Fall 2013)
2018cites this paper
A RF-PSO Based Hybrid Feature Selection Model in Intrusion Detection System
2018cites this paper
Pattern recognition for water flooded layer based on ensemble classifier
2018influential citation
Improving Land-Cover and Crop-Types Classification of Sentinel-2 Satellite Images
2018cites this paper
Delineating the impact of machine learning elements in pre-microRNA detection
2017cites this paper
Hop-by-Hop Congestion Avoidance in wireless sensor networks based on genetic support vector machine
2017influential citation
Hybrid Whale Optimization Algorithm with simulated annealing for feature selection
2017cites this paper
Stochastic Sequential Minimal Optimization for Large-Scale Linear SVM
2017cites this paper
An adaptive INS/GPS/VPS federal Kalman filter for UAV based on SVM
2017cites this paper
Unsupervised feature selection via Diversity-induced Self-representation
2017cites this paper
Efficient Classification of DDoS Attacks Using an Ensemble Feature Selection Algorithm
2017cites this paper
Feature selection for an SVM based webpage classifier
2017cites this paper
An Efficient Privacy-Preserving Classification Method with Condensed Information
2017cites this paper
City traffic flow breakdown prediction based on fuzzy rough set
2017cites this paper
Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in Plants
2016cites this paper
Learning Using Unselected Features (LUFe)
2016cites this paper
Ensemble Clustering Classification Applied to Competing SVM and One-Class Classifiers Exemplified by Plant MicroRNAs Data
2016cites this paper
Core-Sets For Canonical Correlation Analysis
2015cites this paper
Feature Selection for Ridge Regression with Provable Guarantees
2015cites this paper
Deterministic Feature Selection for Regularized Least Squares Classification
2014cites this paper
Distributed under Creative Commons Cc-by 4.0 the Impact of Feature Selection on One and Two-class Classification Performance for Plant Micrornas
year unknowncites this paper