Online Correlation Clustering

Published 2010 in Symposium on Theoretical Aspects of Computer Science

ABSTRACT

We study the online clustering problem where data items arrive in an online fashion. The algorithm maintains a clustering of data items into similarity classes. Upon arrival of v, the relation between v and previously arrived items is revealed, so that for each u we are told whether v is similar to u. The algorithm can create a new cluster for v and merge existing clusters. When the objective is to minimize disagreements between the clustering and the input, we prove that a natural greedy algorithm is O(n)-competitive, and this is optimal. When the objective is to maximize agreements between the clustering and the input, we prove that the greedy algorithm is .5-competitive; that no online algorithm can be better than .834-competitive; we prove that it is possible to get better than 1/2, by exhibiting a randomized algorithm with competitive ratio .5+c for a small positive fixed constant c.

PUBLICATION RECORD

Publication year
2010
Venue
Symposium on Theoretical Aspects of Computer Science
Publication date
2010-01-06
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.4230/LIPIcs.STACS.2010.2486 arXiv 1001.0920
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Correlation clustering with noisy input
2010cited by this paper
Enforcing Transitivity in Coreference Resolution
2008cited by this paper
Linear time approximation schemes for the Gale-Berlekamp game and related minimization problems
2008cited by this paper
Correlation clustering in general weighted graphs
2006cited by this paper
Correlation clustering with a fixed number of clusters
2005cited by this paper
Error bounds for correlation clustering
2005cited by this paper
Aggregating inconsistent information: ranking and clustering
2005cited by this paper
Clustering with qualitative information
2003influential reference
Cluster graph modification problems
2002cited by this paper
Correlation Clustering
2002influential reference
Learning to match and cluster large high-dimensional data sets for data integration
2002cited by this paper
Clustering Gene Expression Patterns
1999cited by this paper
Online computation and competitive analysis
1998influential reference
Incremental clustering and dynamic information retrieval
1997cited by this paper

CITED BY

Scalable Algorithms for Uniform Max Size-Constrained Correlation Clustering
2025cites this paper
Sampling for Beyond-Worst-Case Online Ranking
2024cites this paper
Combinatorial Correlation Clustering
2024cites this paper
Understanding the Cluster Linear Program for Correlation Clustering
2024cites this paper
Single-Pass Pivot Algorithm for Correlation Clustering. Keep it simple!
2023cites this paper
Handling Correlated Rounding Error via Preclustering: A 1.73-approximation for Correlation Clustering
2023cites this paper
Online Level-wise Hierarchical Clustering
2023cites this paper
Breaking 3-Factor Approximation for Correlation Clustering in Polylogarithmic Rounds
2023cites this paper
Correlation clustering algorithm for dynamic complete signed graphs: an index-based approach
2023cites this paper
Four Algorithms for Correlation Clustering: A Survey
2022cites this paper
Privacy preserved incremental record linkage
2022cites this paper
Correlation Clustering with Sherali-Adams
2022cites this paper
Online and Consistent Correlation Clustering
2022cites this paper
Differentially Private Correlation Clustering
2021cites this paper
An Overview of End-to-End Entity Resolution for Big Data
2020cites this paper
Approximation algorithms for two variants of correlation clustering problem
2020cites this paper
Approximation Algorithm for the Balanced 2-correlation Clustering Problem on Well-Proportional Graphs
2020cites this paper
Approximation Algorithm for the Correlation Clustering Problem with Non-uniform Hard Constrained Cluster Sizes
2019cites this paper
Incremental entity resolution process over query results for data integration systems
2019cites this paper
End-to-End Entity Resolution for Big Data: A Survey
2019influential citation
Algorithmic Aspects in Information and Management: 13th International Conference, AAIM 2019, Beijing, China, August 6–8, 2019, Proceedings
2019cites this paper
A Dynamic Indexing for Incremental Entity Resolution over Query Results
2016cites this paper
Improved Theoretical and Practical Guarantees for Chromatic Correlation Clustering
2015cites this paper
Approximation Algorithms for Clique Clustering
2014cites this paper
Online Algorithms for Graph Partitioning into Cliques
2014cites this paper
Near Optimal LP Rounding Algorithm for CorrelationClustering on Complete and Complete k-partite Graphs
2014influential citation
Competitive Strategies for Online Clique Clustering
2014cites this paper
Online Clique Clustering
2014cites this paper
Incremental Record Linkage
2014cites this paper
Data Curation at Scale: The Data Tamer System
2013cites this paper
Competitive Online Clique Clustering
2013cites this paper
Big data integration
2013cites this paper
Online Spectral Clustering on Network Streams
2012cites this paper
Approximation Schemes for Inferring Rankings and Clusterings from Pairwise Data
2011cites this paper
Clustering with local restrictions
2011cites this paper