Clustering Words with the MDL Principle

Published 1996 in International Conference on Computational Linguistics

ABSTRACT

We address the problem of automatically constructing a thesaurus by clustering words based on corpus data. We view this problem as that of estimating a joint distribution over the Cartesian product of a partition of a set of nouns and a partition of a set of verbs, and propose a learning algorithm based on the Minimum Description Length (MDL) Principle for such estimation. We empirically compared the performance of our method based on the MDL Principle against the Maximum Likelihood Estimator in word clustering, and found that the former outperforms the latter. We also evaluated the method by conducting pp-attachment disambiguation experiments using an automatically constructed thesaurus. Our experimental results indicate that such a thesaurus can be used to improve accuracy in disambiguation.

PUBLICATION RECORD

Publication year
1996
Venue
International Conference on Computational Linguistics
Publication date
1996-05-11
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.3115/992628.992633 arXiv cmp-lg/9605014
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Elements of Information Theory
2005cited by this paper
Automatic thesaurus construction based on grammatical relations
1995cited by this paper
Generalizing Case Frames Using a Thesaurus and the MDL Principle
1995influential reference
Automatic Thesaurus Construction based on Grammatical Relations
1995cited by this paper
Inducing Probabilistic Grammars by Bayesian Model Merging
1994cited by this paper
Building a Large Annotated Corpus of English: The Penn Treebank
1993cited by this paper
Contextual Word Similarity and Estimation From Sparse Data
1993cited by this paper
Class-Based n-gram Models of Natural Language
1992cited by this paper
Structural Ambiguity and Lexical Relations
1991cited by this paper
Minimum complexity density estimation
1991cited by this paper
Poor Estimates of Context are Worse than None
1990cited by this paper
Introduction to WordNet: An On-line Lexical Database
1990cited by this paper
Noun Classification From Predicate-Argument Structures
1990cited by this paper
Inferring Decision Trees Using the Minimum Description Length Principle
1989cited by this paper
Stochastic Complexity in Statistical Inquiry
1989cited by this paper
Stochastic Complexity and Modeling
1986cited by this paper
Universal coding, information, prediction, and estimation
1984cited by this paper
A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH
1983cited by this paper
Modeling By Shortest Data Description*
1978influential reference
over and
year unknowncited by this paper

CITED BY

The minimum description length principle for pattern mining: a survey
2020cites this paper
Inducing Word Clusters from Classical Chinese Poems
2018cites this paper
A New Method to Build NLP Knowledge for Improving Term Disambiguation
2016cites this paper
Word clustering for parallelism in Classical Chinese poems
2016cites this paper
Semantic Parallelism in Classical Chinese Poems
2016cites this paper
Online pattern recognition in subsequence time series clustering
2014cites this paper
Image/Time Series Mining Algorithms: Applications to Developmental Biology, Document Processing and Data Streams
2013cites this paper
Efficient Proper Length Time Series Motif Discovery
2013cites this paper
Clustering of Symbols Using Minimal Description Length
2013cites this paper
MDL-based time series clustering
2012cites this paper
Determining provenance in phishing websites using automated conceptual analysis
2009cites this paper
Superior and Efficient Fully Unsupervised Pattern-based Concept Acquisition Using an Unsupervised Parser
2009cites this paper
MDL-BASED ATTRIBUTE MODELS IN NA ÏVE BAYES CLASSIFICATION
2009cites this paper
Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task
2009cites this paper
Unsupervised query segmentation using generative language models and wikipedia
2008cites this paper
Ontology learning: state of the art and open issues
2007cites this paper
Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words
2006cites this paper
New Hierarchy Technique Using Co-occurrence Word Information
2004cites this paper
Word classification and hierarchy using co-occurrence word information
2004cites this paper
Anti-aliasing on the web
2004cites this paper
A practical semantic representation for natural language parsing
2004cites this paper
Statistical Techniques for Automatically Inferring the Semantics of Verb-Particle Constructions
2003cites this paper
A new algorithm for construction specific field terms using co-occurrence words information
2003cites this paper
Word clustering and disambiguation based on co-occurrence data
2002cites this paper
Class-Based Probability Estimation Using a Semantic Hierarchy
2002cites this paper
Clustering Co-occurrence Graph based on Transitivity
2002cites this paper
Composicionalidad, cómputo de estructura y redes neuronales
2002cites this paper
Unsupervised Language Acquisition: Theory and Practice
2002cites this paper
Lexical acquisition at the syntax-semantics interface : diathesis alternations, subcategorization frames and selectional preferences
2001influential citation
Class-Based Probability Estimation Using a Semantic Hierarchy
2001cites this paper
Automatic Extraction of Semantic Relations from Specialized Corpora
2000cites this paper
A Clustering Algorithm for Chinese Adjectives and Nouns
2000cites this paper
Similarity measurement using term negative weight and its application to word similarity
2000cites this paper
An Empirical Assessment of Semantic Interpretation
2000cites this paper
A Probabilistic Approach to Lexical Semantic Knowledge Acquisition and Structural Disambiguation
1998cites this paper
Can a Computer Really Model Cognition ? A Case Study of Six Computational Models of Infant Word Discovery
1998cites this paper
Word Clustering and Disambiguation Based on Co-occurrence Data
1998influential citation
Using Case Prototypicality as a Semantic Primitive
1998cites this paper
Clustering Co-occurrence Graph based on Transitivity
1997cites this paper
Constructing semantic representations using the MDL principle
1997cites this paper
Constructing semantic representationsusing the MDL principleNiels
1997cites this paper
Learning and Using Continuous Linguistic Representations
1996cites this paper
Clustering Cooccurrence Graph Based on Transitivity
year unknowncites this paper