Information-based clustering.

Published 2005 in Proceedings of the National Academy of Sciences of the United States of America

ABSTRACT

In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial assumptions about the structure of data. Here, we reformulate the clustering problem from an information theoretic perspective that avoids many of these assumptions. In particular, our formulation obviates the need for defining a cluster "prototype," does not require an a priori similarity metric, is invariant to changes in the representation of the data, and naturally captures nonlinear relations. We apply this approach to different domains and find that it consistently produces clusters that are more coherent than those extracted by existing algorithms. Finally, our approach provides a way of clustering based on collective notions of similarity rather than the traditional pairwise measures.

PUBLICATION RECORD

Publication year
2005
Venue
Proceedings of the National Academy of Sciences of the United States of America
Publication date
2005-11-26
Fields of study
Biology, Mathematics, Computer Science, Medicine
Identifiers
DOI 10.1073/pnas.0507432102 arXiv q-bio/0511043 PMID 16352721 PMCID PMC1317937
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Elements of Information Theory
2005influential reference
Information-based clustering.
2005cited by this paper
Measured and modeled properties of mammalian skeletal muscle: IV. Dynamics of activation and deactivation
2004cited by this paper
Use of Logic Relationships to Decipher Protein Network Organization
2004cited by this paper
Open source clustering software.
2004cited by this paper
Splitting vessels: Keeping lymph apart from blood
2003cited by this paper
Network information and connected correlations.
2003cited by this paper
Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data
2003cited by this paper
Topics in current genetics
2003cited by this paper
A theory of proximity based clustering: structure detection by optimization
2000cited by this paper
Gene Ontology: tool for the unification of biology
2000cited by this paper
Genomic expression programs in the response of yeast cells to environmental changes.
2000cited by this paper
Array of hope
1999cited by this paper
Data clustering: a review
1999cited by this paper
Cluster analysis and display of genome-wide expression patterns.
1998cited by this paper
Deterministic annealing for clustering, compression, classification, regression, and related optimization problems
1998cited by this paper
Entropy and Information in Neural Spike Trains
1996cited by this paper
Learning with Graphical Models
1994cited by this paper
コンピュータ・サイエンス : ACM computing surveys
1978cited by this paper
Pattern recognition
1974cited by this paper
A mathematical theory of communication
1948cited by this paper

CITED BY

A Tutorial on Discriminative Clustering and Mutual Information
2025cites this paper
Early insight into social network structure predicts climbing the social ladder
2025cites this paper
Rate-Adaptable Multitask-Oriented Semantic Communication: An Extended Rate–Distortion Theory-Based Scheme
2024cites this paper
An unbiased method to partition diverse neuronal responses into functional ensembles reveals interpretable population dynamics during innate social behavior
2024cites this paper
A Deterministic Information Bottleneck Method for Clustering Mixed-Type Data
2024cites this paper
Information theoretic clustering via divergence maximization among clusters
2023cites this paper
Energy-based clustering: Fast and robust clustering of data with known likelihood functions.
2023cites this paper
Estimation of mutual information via quantum kernel methods
2023cites this paper
Task-Oriented Semantic Communication with Semantic Reconstruction: An Extended Rate-Distortion Theory Based Scheme
2022cites this paper
Bringing Bayes and Shannon to the Study of Behavioural and Neurobiological Timing and Associative Learning
2022cites this paper
Automated Cancer Subtyping via Vector Quantization Mutual Information Maximization
2022influential citation
Learning Deep Generative Clustering via Mutual Information Maximization
2022cites this paper
The Moral Debater: A Study on the Computational Generation of Morally Framed Arguments
2022influential citation
Task-Oriented Image Semantic Communication Based on Rate-Distortion Theory
2022cites this paper
Quantifying information of intracellular signaling: progress with machine learning
2022cites this paper
Task-Oriented Semantic Communication Systems Based on Extended Rate-Distortion Theory
2022cites this paper
Integrated Use of Data Mining Techniques for Personality Structure Analysis
2021cites this paper
An autonomous debating system
2021cites this paper
Mapping the dynamic transfer functions of epigenome editing
2021cites this paper
Mapping the dynamic transfer functions of eukaryotic gene regulation.
2021cites this paper
Quantifying the compressibility of complex networks
2021cites this paper
Synthetic cell–based materials extract positional information from morphogen gradients
2021cites this paper
What Do We Gain When Tolerating Loss? The Information Bottleneck Wrings Out Recombination
2021cites this paper
Neural Network Models for the Analysis and Visualization of Latent Dependencies: Examples of Psycho Diagnostic Data Processing
2021cites this paper
A jackknife entropy-based clustering algorithm for probability density functions
2020cites this paper
Compressibility of complex networks
2020cites this paper
What if we had no Wikipedia? Domain-independent Term Extraction from a Large News Corpus
2020cites this paper
Unsupervised ranking of clustering algorithms by INFOMAX
2020cites this paper
A framework for studying behavioral evolution by reconstructing ancestral repertoires
2020influential citation
Dynamic landscape of protein occupancy across the Escherichia coli chromosome
2020cites this paper
An effective multi-level synchronization clustering method based on a linear weighted Vicsek model
2020cites this paper
Disentangled Information Bottleneck
2020cites this paper
A Large-scale Dataset for Argument Quality Ranking: Construction and Analysis
2019cites this paper
Specialized coding of sensory, motor, and cognitive variables in VTA dopamine neurons
2019cites this paper
Topological Information Data Analysis
2019cites this paper
The Convex Information Bottleneck Lagrangian
2019cites this paper
Discrete Infomax Codes for Meta-Learning
2019cites this paper
AMIC: An Adaptive Information Theoretic Method to Identify Multi-Scale Temporal Correlations in Big Time Series Data
2019cites this paper
High-Resolution Raman Microscopic Detection of Follicular Thyroid Cancer Cells with Unsupervised Machine Learning.
2019cites this paper
Deep Semi-Supervised Anomaly Detection
2019cites this paper
Raman spectroscopic histology using machine learning for nonalcoholic fatty liver disease
2019cites this paper
Empirical Estimation of Information Measures: A Literature Guide
2019cites this paper
Instructions for use Title Error-based Extraction of States and Energy Landscapes from Experimental Single-Molecule Time-Series Author ( s )
2019cites this paper
Discrete Neural Processes
2018cites this paper
Deciphering hierarchical features in the energy landscape of adenylate kinase folding/unfolding.
2018cites this paper
Information Bottleneck Methods for Distributed Learning
2018cites this paper
Hierarchical Bayesian Modeling for Clustering Sparse Sequences in the Context of User Profiling in Customer Loyalty Program
2018cites this paper
Learning Thematic Similarity Metric from Article Sections Using Triplet Networks
2018cites this paper
Distributed Variational Representation Learning
2018cites this paper
Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss
2018cites this paper
Searching for structure in collective systems
2018cites this paper
MSIGNET: a Metropolis sampling-based method for global optimal significant network identification
2018cites this paper
Covariance-based dissimilarity measures applied to clustering wide-sense stationary ergodic processes
2018cites this paper
Optimization of Mutual Information in Learning: Explorations in Science
2018cites this paper
Genomic big data hitting the storage bottleneck
2018cites this paper
Cortical parcellation based on structural connectivity: A case for generative models
2018cites this paper
Clustering Analysis on Locally Asymptotically Self-similar Processes with Known Number of Clusters
2018cites this paper
Self-organizing network for variable clustering
2017cites this paper
Gaussian Lower Bound for the Information Bottleneck Limit
2017cites this paper
Brain parcellation based on information theory
2017cites this paper
Fixing a Broken ELBO
2017cites this paper
Distributed Information Bottleneck Method for Discrete and Gaussian Sources
2017cites this paper
Statistical mechanics for metabolic networks during steady state growth
2017cites this paper
Deep Variational Information Bottleneck
2017cites this paper
The Information Bottleneck and Geometric Clustering
2017cites this paper
Anomaly Detection in Large Databases Using Behavioral Patterning
2017cites this paper
Inverse statistical problems: from the inverse Ising problem to data science
2017cites this paper
On the capacity of cloud radio access networks with oblivious relaying
2017cites this paper
Multivariate statistical analysis for the assessment of groundwater quality under different hydrogeological regimes
2017cites this paper
Automated long-term recording and analysis of neural activity in behaving animals
2016cites this paper
A Novel Information-Theoretic Approach for Variable Clustering and Predictive Modeling Using Dirichlet Process Mixtures
2016influential citation
Common modulation of limbic network activation underlies the unfolding of musical emotions and its temporal attributes
2016cites this paper
Distributed information-theoretic clustering
2016cites this paper
Comprehensive discovery of subsample gene expression components by information explanation: therapeutic implications in cancer
2016cites this paper
Distributed information-theoretic biclustering
2016cites this paper
Single Molecule Data Analysis: An Introduction
2016influential citation
Common modulation of limbic network activation underlies musical emotions as they unfold
2016cites this paper
Can We Group Storage? Statistical Techniques to Identify Predictive Groupings in Storage System Accesses
2016cites this paper
Improving Quality of Hierarchical Clustering for Large Data Series
2016cites this paper
Relationship representations and change in adolescents and emerging adults during psychodynamic psychotherapy
2016cites this paper
Limits on information transduction through amplitude and frequency regulation of transcription factor activity
2015cites this paper
Compression of Market Research Data Using Clustering
2015cites this paper
Introducing co-clustering for hyperspectral image analysis
2015cites this paper
Zombies Reading Segmented Graphene Articles On The Arxiv
2015cites this paper
Improved model-based clustering performance using Bayesian initialization averaging
2015cites this paper
Collective analysis of multiple high-throughput gene expression datasets
2015cites this paper
Error-based Extraction of States and Energy Landscapes from Experimental Single-Molecule Time-Series
2015cites this paper
Multilinear objective function-based clustering
2015cites this paper
Promoter Decoding of Transcription Factor Translocation Dynamics
2015cites this paper
Internal Representations of the Therapeutic Relationship Among Adolescents in Psychodynamic Psychotherapy.
2015cites this paper
A Clustering Method Based on the Maximum Entropy Principle
2015cites this paper
Statistical thermodynamics of transcription profiles in normal development and tumorigeneses in cohorts of patients
2015cites this paper
Diffused Kernel DMMI Approach for Theoretic Clustering using Data Mining
2015cites this paper
A Nonparametric Clustering Algorithm with a Quantile-Based Likelihood Estimator
2014cites this paper
Cellular noise and information transmission.
2014cites this paper
SMART: Unique Splitting-While-Merging Framework for Gene Clustering
2014cites this paper
Metabolic Disorders in HIV-infected Children Metabolic Disorders in HIVinfectedChildren
2014cites this paper
Information processing in living systems
2014cites this paper
Mining the Modular Structure of Protein Interaction Networks
2014cites this paper
Distributed Information Theoretic Clustering
2014cites this paper