Model-Based Clustering of Large Networks

Published 2012 in Annals of Applied Statistics

ABSTRACT

We describe a network clustering framework, based on finite mixture models, that can be applied to discrete-valued networks with hundreds of thousands of nodes and billions of edge variables. Relative to other recent model-based clustering work for networks, we introduce a more flexible modeling framework, improve the variational-approximation estimation algorithm, discuss and implement standard error estimation via a parametric bootstrap approach, and apply these methods to much larger data sets than those seen elsewhere in the literature. The more flexible framework is achieved through introducing novel parameterizations of the model, giving varying degrees of parsimony, using exponential family models whose structure may be exploited in various theoretical and algorithmic ways. The algorithms are based on variational generalized EM algorithms, where the E-steps are augmented by a minorization-maximization (MM) idea. The bootstrapped standard error estimates are based on an efficient Monte Carlo network simulation idea. Last, we demonstrate the usefulness of the model-based clustering framework by applying it to a discrete-valued network with more than 131,000 nodes and 17 billion edge variables.

PUBLICATION RECORD

Publication year
2012
Venue
Annals of Applied Statistics
Publication date
2012-07-01
Fields of study
Mathematics, Physics, Computer Science, Medicine
Identifiers
DOI 10.1214/12-AOAS617 arXiv 1207.0188 PMID 26605002 PMCID PMC4655199
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

From evidence to understanding: a commentary on Fisher (1922) ‘On the mathematical foundations of theoretical statistics’
2015cited by this paper
Disaster response on September 11, 2001 through the lens of statistical network analysis
2014cited by this paper
Fast Inference for the Latent Space Network Model Using a Case-Control Approximate Likelihood
2012cited by this paper
Consistency of maximum-likelihood and variational estimators in the Stochastic Block Model
2011cited by this paper
Instability, Sensitivity, and Degeneracy of Discrete Exponential Families
2011cited by this paper
A scalable bootstrap for massive data
2011cited by this paper
Bayesian Inference for Contact Networks Given Epidemic Data
2010cited by this paper
Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation
2010cited by this paper
Model for Heterogeneous Random Networks Using Continuous Latent Variables and an Application to a Tree–Fungus Network
2010cited by this paper
Bayesian inference for exponential random graph models
2010cited by this paper
Uncovering latent structure in valued graphs: A variational approach
2010cited by this paper
The slashdot zoo: mining a social network with negative edges
2009cited by this paper
Strategies for online inference of model-based clustering in large and growing networks
2009influential reference
Variational Bayesian inference for the Latent Position Cluster Model
2009influential reference
Finite Mixture Models
2008cited by this paper
A mixture model for random graphs
2008cited by this paper
Bootstrap Methods: Another Look at the Jackknife
2008cited by this paper
Graphical Models, Exponential Families, and Variational Inference
2008cited by this paper
Trust Metrics on Controversial Users: Balancing Between Tyranny of the Majority
2007influential reference
Mixed Membership Stochastic Blockmodels
2007influential reference
Model‐based clustering for social networks
2007cited by this paper
An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants
2006influential reference
Inference in Curved Exponential Family Models for Networks
2006influential reference
New Specifications for Exponential Random Graph Models
2006cited by this paper
Random graphs
2006cited by this paper
The political blogosphere and the 2004 U.S. election: divided they blog
2005cited by this paper
Center for Studies in Demography and Ecology Assessing Degeneracy in Statistical Models of Social Networks
2005influential reference
Inadequacy of interval estimates corresponding to variational Bayesian approximations
2005cited by this paper
A BAYESIAN APPROACH TO MODELING STOCHASTIC BLOCKSTRUCTURES WITH COVARIATES
2004cited by this paper
A Tutorial on MM Algorithms
2004influential reference
Convex Quadratic Minimization Subject to a Linear Constraint and Box Constraints
2004cited by this paper
10. Settings in Social Networks: A Measurement Model
2003cited by this paper
Numerical recipes in C++: The art of scientific computing
2003cited by this paper
Bayesian inference for stochastic epidemics in populations with random social structure
2002cited by this paper
Latent Space Approaches to Social Network Analysis
2002cited by this paper
Numerical recipes in C
2002influential reference
Markov Chain Monte Carlo Estimation of Exponential Random Graph Models
2002cited by this paper
Estimation and Prediction for Stochastic Blockstructures
2001influential reference
Dealing with label switching in mixture models
2000cited by this paper
Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure
1997cited by this paper
LOGIT MODELS AND LOGISTIC REGRESSIONS FOR SOCIAL NETWORKS: I. AN INTRODUCTION TO MARKOV GRAPHS AND p* STANLEY WASSERMAN UNIVERSITY OF ILLINOIS
1996influential reference
Logit models and logistic regressions for social networks: I. An introduction to Markov graphs andp
1996cited by this paper
A new view of the EM algorithm that justifies incremental and other variants
1993cited by this paper
Pseudolikelihood Estimation for Social Networks
1990cited by this paper
Numerical Recipes in C: The Art of Scientific Computing
1989influential reference
On a general class of models for interaction
1986influential reference
Random Graphs
1985cited by this paper
An Exponential Family of Probability Distributions for Directed Graphs
1981influential reference
Information and Exponential Families in Statistical Theory
1980influential reference
Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper
1977cited by this paper
Spatial Interaction and the Statistical Analysis of Lattice Systems
1974cited by this paper
Statistical analysis of pair relationships: symmetry, subjective consistency and reciprocity.
1968cited by this paper
On the Mathematical Foundations of Theoretical Statistics
year unknowninfluential reference

CITED BY

Balanced Stochastic Block Model for Community Detection in Signed Networks
2026cites this paper
A Novel Approach for Biclustering Bipartite Networks: An Extension of Finite Mixtures of Latent Trait Analyzers
2025cites this paper
Clustering Time-Evolving Networks Using Temporal Exponential-Family Random Graph Models with Conditional Dyadic Independence and Dynamic Latent Blocks
2025cites this paper
Scalable Durational Event Models: Application to Physical and Digital Interactions
2025cites this paper
Euclidean Ideal Point Estimation From Roll-Call Data via Distance-Based Bipartite Network Models
2025cites this paper
Model-Based Co-Clustering in Customer Targeting Utilizing Large-Scale Online Product Rating Networks
2024cites this paper
Biclustering bipartite networks via extended Mixture of Latent Trait Analyzers
2024cites this paper
A Strategic Model of Software Dependency Networks
2024cites this paper
A Latent Space Approach to Inferring Distance‐Dependent Reciprocity in Directed Networks
2024cites this paper
Recent advances on mechanisms of network generation: Community, exchangeability, and scale‐free properties
2024cites this paper
A Strategic Model of Software Dependency Networks
2023influential citation
Restricted Tweedie Stochastic Block Models
2023cites this paper
Statistical Clustering of Networks with Additional Information
2023cites this paper
Homophily and Community Structure at Scale: An Application to a Large Professional Network
2023cites this paper
Model‐based clustering of semiparametric temporal exponential‐family random graph models
2022cites this paper
Spectral Estimation of Large Stochastic Blockmodels with Discrete Nodal Covariates
2022cites this paper
Perfect Spectral Clustering with Discrete Covariates
2022cites this paper
Hybrid maximum likelihood inference for stochastic block models
2022cites this paper
A semiparametric Bayesian approach to epidemics, with application to the spread of the coronavirus MERS in South Korea in 2015
2021cites this paper
Community detection in complex networks: From statistical foundations to data science applications
2021cites this paper
Clustering of Longitudinal Trajectories Using Correlation-Based Distances
2021cites this paper
Parameter Estimation Procedures for Exponential-Family Random Graph Models on Count-Valued Networks: A Comparative Simulation Study
2021cites this paper
CUSUM multi-chart for detecting unknown abrupt changes under finite measure space for network observation sequences
2021cites this paper
Mixture models and networks: The stochastic blockmodel
2021cites this paper
A Structural Model of Business Card Exchange Networks
2021influential citation
Large-scale estimation of random graph models with local dependence
2020cites this paper
Mixture Models and Networks -- Overview of Stochastic Blockmodelling
2020influential citation
On finite mixture modeling and model-based clustering of directed weighted multilayer networks
2020cites this paper
Centered Partition Processes: Informative Priors for Clustering (with Discussion).
2019cites this paper
A review of stochastic block models and extensions for graph clustering
2019influential citation
Optimal sequential tests for detection of changes under finite measure space for finite sequences of networks
2019cites this paper
Spectral Inference for Large Stochastic Blockmodels With Nodal Covariates
2019cites this paper
missSBM: An R Package for Handling Missing Values in the Stochastic Block Model
2019cites this paper
Adapting Stochastic Block Models to Power-Law Degree Distributions
2019cites this paper
Marginal models with individual-specific effects for the analysis of longitudinal bipartite networks
2018cites this paper
Dealing with reciprocity in dynamic stochastic block models
2018cites this paper
A Social Network Analysis of Articles on Social Network Analysis
2018cites this paper
hergm: Hierarchical Exponential-Family Random Graph Models
2018cites this paper
Semiparametric Analysis of Network Formation
2018cites this paper
On the role of latent variable models in the era of big data
2018cites this paper
Model-Based Clustering of Time-Evolving Networks through Temporal Exponential-Family Random Graph Models
2017cites this paper
Consistent structure estimation of exponential-family random graph models with block structure
2017cites this paper
Model-Based Clustering of Nonparametric Weighted Networks With Application to Water Pollution Analysis
2017influential citation
Modeling node incentives in directed networks
2017cites this paper
Statistical modelling of a terrorist network
2017cites this paper
LCN: a random graph mixture model for community detection in functional brain networks.
2017cites this paper
Massive-scale estimation of exponential-family random graph models with local dependence
2017influential citation
Topics at the Frontier of Statistics and Network Analysis: (Re)Visiting the Foundations
2017cites this paper
Dynamic degree-corrected blockmodels for social networks: A nonparametric approach
2017cites this paper
A Bayesian analysis of weighted stochastic block models with applications in brain functional connectomics
2016cites this paper
Exponential-Family Random Graph Models with Time Varying Network Parameters
2016cites this paper
Editorial of the Special Issue "Networks and Statistics"
2015cites this paper
Recent Developments in Model-Based Clustering with Applications
2015cites this paper
Bayesian stochastic blockmodels for community detection in networks and community-structured covariance selection
2015cites this paper
Generalised stochastic blockmodels and their applications in the analysis of brain networks
2015cites this paper
Variational algorithms for biclustering models
2015cites this paper
Local dependence in random graph models: characterization, properties and statistical inference
2015cites this paper
Detection of Epigenomic Network Community Oncomarkers
2015cites this paper
Statistical modelling of the group structure of social networks
2014cites this paper
Disaster response on September 11, 2001 through the lens of statistical network analysis
2014cites this paper
Modeling heterogeneity in random graphs: a selective review
2014cites this paper
Modeling heterogeneity in random graphs through latent space models: a selective review
2014cites this paper
Stochastic Blockmodeling of the Modules and Core of the Caenorhabditis elegans Connectome
2014cites this paper
Vertex nomination
2014cites this paper
Estimating Tea Stock Values Using Cluster Analysis
2014cites this paper
MODELING HETEROGENEITY IN RANDOM GRAPHS THROUGH LATENT SPACE MODELS: A SELECTIVE REVIEW
2014cites this paper
Bayesian inference for protein signalling networks
2013cites this paper
Bayesian Degree-Corrected Stochastic Block Models for Community Detection
2013cites this paper
Joint estimation of multiple networks from time course data
2013cites this paper
Computational Statistical Methods for Social Network Models
2012cites this paper
An Introduction to Estimating Exponential Random Graph Models for Large Networks with bigergm
year unknowncites this paper