Data Driven Resource Allocation for Distributed Learning

Travis Dick,Mu Li,Krishna Pillutla,Colin White,Maria-Florina Balcan,Alex Smola

Published 2015 in International Conference on Artificial Intelligence and Statistics

ABSTRACT

In distributed machine learning, data is dispatched to multiple machines for processing. Motivated by the fact that similar data points often belong to the same or similar classes, and more generally, classification rules of high accuracy tend to be "locally simple but globally complex" (Vapnik & Bottou 1993), we propose data dependent dispatching that takes advantage of such structure. We present an in-depth analysis of this model, providing new algorithms with provable worst-case guarantees, analysis proving existing scalable heuristics perform well in natural non worst-case conditions, and techniques for extending a dispatching rule from a small sample to the entire distribution. We overcome novel technical challenges to satisfy important conditions for accurate distributed learning, including fault tolerance and balancedness. We empirically compare our approach with baselines based on random partitioning, balanced partition trees, and locality sensitive hashing, showing that we achieve significantly higher accuracy on both synthetic and real world image and advertising datasets. We also demonstrate that our technique strongly scales with the available computing power.

PUBLICATION RECORD

Publication year
2015
Venue
International Conference on Artificial Intelligence and Statistics
Publication date
2015-12-01
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1512.04848
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

CA-SVM : Communication-Avoiding Support Vector Machines on Clusters
2016cited by this paper
Approximation Algorithms for Clustering Problems with Lower Bounds and Outliers
2016influential reference
k-center Clustering under Perturbation Resilience
2015cited by this paper
Distributed Balanced Partitioning via Linear Embedding
2015cited by this paper
Mixed Robust/Average Submodular Partitioning: Fast Algorithms, Guarantees, and Applications
2015cited by this paper
Communication Efficient Distributed Machine Learning with the Parameter Server
2014cited by this paper
Improved Distributed Principal Component Analysis
2014cited by this paper
Balanced graph edge partition
2014cited by this paper
An Improved Approximation Algorithm for the Hard Uniform Capacitated k-median Problem
2014cited by this paper
Going deeper with convolutions
2014cited by this paper
Approximating capacitated k-median with (1 + ∊)k open facilities
2014cited by this paper
Decompositions of triangle-dense graphs
2013cited by this paper
Bi-Factor Approximation Algorithms for Hard Capacitated k-Median Problems
2013cited by this paper
Centrality of Trees for Capacitated k-Center
2013influential reference
Clustering under approximation stability
2013cited by this paper
PLAL: Cluster-based active learning
2013influential reference
Distributed k-means and k-median clustering on general communication topologies
2013cited by this paper
k-Means++ under approximation stability
2013influential reference
Approximation algorithms for hard capacitated k-facility location problems
2013cited by this paper
Fast Clustering with Lower Bounds: No Customer too Far, No Shop too Small
2013cited by this paper
Information-theoretic lower bounds for distributed statistical estimation with communication constraints
2013cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
LP Rounding for k-Centers with Non-uniform Hard Capacities
2012cited by this paper
Communication-efficient algorithms for statistical optimization
2012cited by this paper
Access to Unlabeled Data can Speed up Prediction Time
2011cited by this paper
Graph Partitioning with Natural Cuts
2011cited by this paper
The curse of dimension in nonparametric regression
2010cited by this paper
Approximate Nash Equilibria under Stability Conditions
2010cited by this paper
Learning Multiple Layers of Features from Tiny Images
2009cited by this paper
Nearest Neighbor Clustering: A Baseline Method for Consistent Clustering with Arbitrary Objective Functions
2009cited by this paper
PNUTS: Yahoo!'s hosted data serving platform
2008cited by this paper
LIBLINEAR: A Library for Large Linear Classification
2008cited by this paper
k-means++: the advantages of careful seeding
2007cited by this paper
A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering
2007cited by this paper
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
2006cited by this paper
The Effectiveness of Lloyd-Type Methods for the k-Means Problem
2006cited by this paper
Achieving anonymity via clustering
2006cited by this paper
Training Invariant Support Vector Machines using Selective Sampling
2005cited by this paper
Navigating nets: simple algorithms for proximity search
2004cited by this paper
Locality-sensitive hashing scheme based on p-stable distributions
2004cited by this paper
Universal Facility Location
2003cited by this paper
Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP
2002cited by this paper
Building Steiner trees with incomplete global knowledge
2000cited by this paper
The Capacitated K-Center Problem
2000influential reference
Hierarchical placement and network design problems
2000cited by this paper
A constant-factor approximation algorithm for the k-median problem (extended abstract)
1999cited by this paper
How to Allocate Network Centers
1993cited by this paper
Local Algorithms for Pattern Recognition and Dependencies Estimation
1993cited by this paper
Combinatorial optimization:Algorithms and complexity
1984cited by this paper
Combinatorial Optimization: Algorithms and Complexity
1981cited by this paper
JMLR: Workshop and Conference Proceedings vol 30 (2013) 1–21 Randomized partition trees for exact nearest neighbor search
year unknowninfluential reference
25th Annual Conference on Learning Theory Distributed Learning, Communication Complexity and Privacy
year unknowncited by this paper

CITED BY

Conservative classifiers do consistently well with improving agents: characterizing statistical and online learning
2025cites this paper
Learning-Based Heavy Hitters and Flow Frequency Estimation in Streams
2024cites this paper
On parameterized approximation algorithms for balanced clustering
2023influential citation
Evolutionary Deep Fusion Method and its Application in Chemical Structure Recognition
2021cites this paper
New Aspects of Beyond Worst-Case Analysis
2021cites this paper
Federated learning with superquantile aggregation for heterogeneous data
2020cites this paper
Learning-Augmented Data Stream Algorithms
2020cites this paper
An Analysis of Robustness of Non-Lipschitz Networks
2020cites this paper
Distributed classification based on distances between probability distributions in feature space
2019cites this paper
LEARNING-AUGMENTED DATA STREAM ALGO-
2019cites this paper
Faster Balanced Clusterings in High Dimension
2018influential citation
Proposal Draft Beyond Worst-Case Analysis in Combinatorial Optimization
2018cites this paper
General and Robust Communication-Efficient Algorithms for Distributed Clustering
2017cites this paper
Capacitated Center Problems with Two-Sided Bounds and Outliers
2017cites this paper
Robust Communication-Optimal Distributed Clustering Algorithms
2017cites this paper
Balanced k-Center Clustering When k Is A Constant
2017cites this paper