Parallelizing MCMC with Random Partition Trees

Xiangyu Wang,Fangjian Guo,K. Heller,D. Dunson

Published 2015 in Neural Information Processing Systems

ABSTRACT

The modern scale of data has brought new challenges to Bayesian inference. In particular, conventional MCMC algorithms are computationally very expensive for large data sets. A promising approach to solve this problem is embarrassingly parallel MCMC (EP-MCMC), which first partitions the data into multiple subsets and runs independent sampling algorithms on each subset. The subset posterior draws are then aggregated via some combining rules to obtain the final approximation. Existing EP-MCMC algorithms are limited by approximation accuracy and difficulty in resampling. In this article, we propose a new EP-MCMC algorithm PART that solves these problems. The new algorithm applies random partition trees to combine the subset posterior draws, which is distribution-free, easy to re-sample from and can adapt to multiple scales. We provide theoretical justification and extensive experiments illustrating empirical performance.

PUBLICATION RECORD

Publication year
2015
Venue
Neural Information Processing Systems
Publication date
2015-06-10
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1506.03164
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

WASP: Scalable Bayes via barycenters of subset posteriors
2015cited by this paper
Firefly Monte Carlo: Exact MCMC with Subsets of Data
2014cited by this paper
Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment
2014cited by this paper
MULTIVARIATE DENSITY ESTIMATION BASED ON ADAPTIVE PARTITIONING: CONVERGENCE RATE, VARIABLE SELECTION AND SPATIAL ADAPTATION
2014cited by this paper
Scalable and Robust Bayesian Inference via the Median Posterior
2014cited by this paper
Asymptotically Exact, Embarrassingly Parallel MCMC
2013influential reference
Parallel MCMC via Weierstrass Sampler
2013cited by this paper
Bayesian Learning via Stochastic Gradient Langevin Dynamics
2011cited by this paper
DRAM: Efficient adaptive MCMC
2006influential reference
Boosted decision trees as an alternative to artificial neural networks for particle identification
2004cited by this paper
Random Forests
2001cited by this paper
An adaptive Metropolis algorithm
2001influential reference
Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables
1999cited by this paper
Regression Shrinkage and Selection via the Lasso
1996cited by this paper
Bagging Predictors
1996cited by this paper
Nonparametric Density Estimation with a Parametric Start
1995cited by this paper
Convergence Rate of Sieve Estimates
1994cited by this paper
Almost sure L 1 -norm convergence for data-based histogram density estimates
1987influential reference
Multidimensional binary search trees used for associative searching
1975cited by this paper
Time Bounds for Selection
1973cited by this paper

CITED BY

Stacking Variational Bayesian Monte Carlo
2025cites this paper
Robust and Scalable Variational Bayes
2025cites this paper
Physics-Inspired Single-Particle Tracking Accelerated with Parallelism
2025cites this paper
Client-only Distributed Markov Chain Monte Carlo Sampling over a Network
2025cites this paper
Recursive Adaptive Importance Sampling with Optimal Replenishment
2025cites this paper
Parallelizing MCMC with Machine Learning Classifier and Its Criterion Based on Kullback-Leibler Divergence
2024cites this paper
Bayes goes big: Distributed MCMC and the drivers of E-commerce conversion
2024cites this paper
Embarrassingly Parallel GFlowNets
2024cites this paper
Pigeons.jl: Distributed Sampling From Intractable Distributions
2023cites this paper
Efficiently analyzing large patient registries with Bayesian joint models for longitudinal and time-to-event data
2023cites this paper
Federated Variational Inference Methods for Structured Latent Variable Models
2023cites this paper
Machine Learning and the Future of Bayesian Computation
2023cites this paper
Spatial meshing for general Bayesian multivariate models
2022cites this paper
Federated Averaging Langevin Dynamics: Toward a unified theory and new algorithms
2022cites this paper
A communication-efficient method for ℓ0 regularization linear regression models
2022cites this paper
Parallel MCMC Without Embarrassing Failures
2022influential citation
Statistical and Computational Needs for Big Data Challenges
2022cites this paper
Distributed quantile regression for massive heterogeneous data
2021cites this paper
Bayesian Fusion: Scalable unification of distributed statistical analyses
2021cites this paper
Distributed Bayesian Kriging
2021cites this paper
Scalable Bayesian inference for time series via divide-and-conquer
2021cites this paper
An algorithm for distributed Bayesian inference
2021cites this paper
DG-LMC: A Turn-key and Scalable Synchronous Distributed MCMC Algorithm
2021cites this paper
QLSD: Quantised Langevin stochastic dynamics for Bayesian federated learning
2021cites this paper
PMBA: A Parallel MCMC Bayesian Computing Accelerator
2021cites this paper
Distributed Bayesian Inference in Linear Mixed-Effects Models
2021cites this paper
Markov Chain Monte Carlo Algorithms for Bayesian Computation, a Survey and Some Generalisation
2020cites this paper
A Decentralized Approach to Bayesian Learning
2020cites this paper
Decentralized Langevin Dynamics for Bayesian Learning
2020cites this paper
A survey of Monte Carlo methods for parameter estimation
2020cites this paper
Distributed Bayesian clustering using finite mixture of mixtures
2020cites this paper
Federated stochastic gradient Langevin dynamics
2020cites this paper
Variance reduction for distributed stochastic gradient MCMC
2020cites this paper
A High-Performance Implementation of Bayesian Matrix Factorization with Limited Communication
2020cites this paper
Distributed Bayesian clustering
2020cites this paper
Decentralized Stochastic Gradient Langevin Dynamics and Hamiltonian Monte Carlo
2020cites this paper
Unbiased Markov chain Monte Carlo methods with couplings
2020cites this paper
E cient Linear Fusion of Partial Estimators
2019cites this paper
Dynamic predictions in Bayesian functional joint models for longitudinal and time-to-event data: An application to Alzheimer’s disease
2019cites this paper
ProBO: a Framework for Using Probabilistic Programming in Bayesian Optimization
2019cites this paper
Embarrassingly Parallel MCMC using Deep Invertible Transformations
2019influential citation
Efficient posterior sampling for high-dimensional imbalanced logistic regression.
2019cites this paper
Efficient MCMC Sampling with Dimension-Free Convergence Rate using ADMM-type Splitting
2019cites this paper
ProBO: Versatile Bayesian Optimization Using Any Probabilistic Programming Language
2019cites this paper
Large-scale variational inference for Bayesian joint regression modelling of high-dimensional genetic data
2019cites this paper
A Distributed Algorithm for Polya-Gamma Data Augmentation
2019cites this paper
Parallelising MCMC via Random Forests
2019cites this paper
Dynamic prediction using joint models of longitudinal and recurrent event data: a Bayesian perspective
2019cites this paper
An Algorithm for Distributed Bayesian Inference in Generalized Linear Models
2019cites this paper
Communication Efficient Parallel Algorithms for Optimization on Manifolds
2018cites this paper
Parallel and Distributed MCMC via Shepherding Distributions
2018cites this paper
Accelerating MCMC algorithms
2018cites this paper
Method G: Uncertainty Quantification for Distributed Data Problems Using Generalized Fiducial Inference
2018cites this paper
Global Consensus Monte Carlo
2018influential citation
Quantile regression under memory constraint
2018cites this paper
Statistical Validity and Consistency of Big Data Analytics: A General Framework
2018cites this paper
Bayesian Bootstraps for Massive Data
2017cites this paper
A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging
2017cites this paper
Parallelizing MCMC with Random Partition Trees-Application to Book Crossing Dataset
2017cites this paper
Double-Parallel Monte Carlo for Bayesian analysis of big data
2017cites this paper
Distributed Bayesian matrix factorization with limited communication
2017cites this paper
Unbiased Markov chain Monte Carlo with couplings
2017cites this paper
Comparing consensus Monte Carlo strategies for distributed Bayesian computation
2017cites this paper
Average of Recentered Parallel MCMC for Big Data
2017cites this paper
DECOrrelated feature space partitioning for distributed sparse regression
2016influential citation
Bayesian nonparametric modeling and its applications
2016cites this paper
Nonparametric Heterogeneity Testing For Massive Data
2016cites this paper
Merging MCMC Subposteriors through Gaussian-Process Approximations
2016influential citation
Distributed Feature Selection in Large n and Large p Regression Problems
2016cites this paper
Simple, scalable and accurate posterior interval estimation
2016cites this paper
Bayes and big data: the consensus Monte Carlo algorithm
2016cites this paper
Stochastic Gradient MCMC with Stale Gradients
2016cites this paper
Nonparametric Bayesian Aggregation for Massive Data
2015cites this paper
Scalable Bayes via Barycenter in Wasserstein Space
2015cites this paper
Statistical methods and computing for big data.
2015cites this paper
Orthogonal parallel MCMC methods for sampling and optimization
2015cites this paper
Efficient linear fusion of partial estimators
2014cites this paper
Bayesian Inference in Linear Mixed-Effects Models
year unknowncites this paper