Similarity-Based Models of Word Cooccurrence Probabilities

Ido Dagan,Lillian Lee,Fernando C Pereira

Published 1998 in Machine-mediated learning

ABSTRACT

In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations “eat a peach” and ”eat a beach” is more likely. Statistical NLP methods determine the likelihood of a word combination from its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in any given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on “most similar” words.We describe probabilistic word association models based on distributional word similarity, and apply them to two tasks, language modeling and pseudo-word disambiguation. In the language modeling task, a similarity-based model is used to improve probability estimates for unseen bigrams in a back-off language model. The similarity-based method yields a 20% perplexity improvement in the prediction of unseen bigrams and statistically significant reductions in speech-recognition error.We also compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency to avoid giving too much weight to easy-to-disambiguate high-frequency configurations. The similarity-based methods perform up to 40% better on this particular task.

PUBLICATION RECORD

Publication year
1998
Venue
Machine-mediated learning
Publication date
1998-09-27
Fields of study
Computer Science
Identifiers
DOI 10.1023/A:1007537716579 arXiv cs/9809110
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Elements of Information Theory
2005influential reference
Instance-based learning algorithms
2004cited by this paper
Context Space
2001cited by this paper
An Information-Theoretic Definition of Similarity
1998cited by this paper
Exemplar-Based Word Sense Disambiguation” Some Recent Improvements
1997cited by this paper
Memory-Based Learning: Using Similarity for Smoothing
1997cited by this paper
Aggregate and mixed-order Markov models for statistical language processing
1997cited by this paper
Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy
1997cited by this paper
Using Syntactic Dependency as Local Context to Resolve Word Sense Ambiguity
1997cited by this paper
Locally Weighted Learning
1997cited by this paper
Similarity-Based Approaches to Natural Language Processing
1997cited by this paper
Similarity-Based Methods for Word Sense Disambiguation
1997cited by this paper
An Empirical Study of Smoothing Techniques for Language Modeling
1996cited by this paper
Hierarchical Clustering of Words and Application to NLP Tasks
1996cited by this paper
Learning Similarity-based Word Sense Disambiguation from Sparse Data
1996cited by this paper
Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach
1996cited by this paper
Hierarchical Clustering of Words
1996cited by this paper
A Probabilistic Theory of Pattern Recognition
1996cited by this paper
Finding structure in language
1995cited by this paper
Statistical Sense Disambiguation with Relatively Small Corpora Using Dictionary Definitions
1995cited by this paper
Disambiguating Noun Groupings with Respect to Wordnet Senses
1995influential reference
Explorations in automatic thesaurus discovery
1994influential reference
Similarity-Based Estimation of Word Cooccurrence Probabilities
1994cited by this paper
An extended clustering algorithm for statistical language models
1994cited by this paper
Contextual Word Similarity and Estimation From Sparse Data
1993cited by this paper
A Case-Based Approach to Knowledge Acquisition for Domain-Specific Sentence Analysis
1993cited by this paper
Improved clustering techniques for class-based statistical language modelling
1993cited by this paper
Smoothing of Automatically Generated Selectional Constraints
1993cited by this paper
Word Space
1992cited by this paper
Word-Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora
1992cited by this paper
Dimensions of meaning
1992cited by this paper
Use of syntactic context to produce term association lists for text retrieval
1992influential reference
Cooccurrence smoothing for stochastic language modeling
1992influential reference
Experiments on Linguistically-Based Term Associations
1992cited by this paper
WordNet and Distributional Analysis: A Class-based Approach to Lexical Discovery
1992cited by this paper
Work on Statistical Methods for Word Sense Disambiguation
1992cited by this paper
Class-Based n-gram Models of Natural Language
1992cited by this paper
The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression
1991cited by this paper
A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams
1991cited by this paper
Divergence measures based on the Shannon entropy
1991influential reference
Experience with a Stack Decoder-Based HMM CSR and Back-Off N-Gram Language Models
1991cited by this paper
Noun Classification From Predicate-Argument Structures
1990cited by this paper
Introduction to WordNet: An On-line Lexical Database
1990cited by this paper
A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text
1988cited by this paper
Estimation of probabilities from sparse data for the language model component of a speech recognizer
1987influential reference
Toward memory-based reasoning
1986cited by this paper
Discovery Procedures for Sublanguage Selectional Patterns: Initial Experiments
1986cited by this paper
Isolated word recognition using hidden Markov models
1985influential reference
ISOLATED WORD RECOGNITION
1984cited by this paper
Diversity: its measurement, decomposition, apportionment and analysis
1982influential reference
Interpolated estimation of Markov source parameters from sparse data
1980cited by this paper
To the Best of Our Knowledge
1979cited by this paper
Pattern classification and scene analysis
1974cited by this paper
Nearest neighbor pattern classification
1967cited by this paper
THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS
1953cited by this paper

CITED BY

Heterogeneous co-occurrence embedding for visual information exploration
2025cites this paper
Measuring Similarity in Causal Graphs: A Framework for Semantic and Structural Analysis
2025cites this paper
A dynamic graph structural framework for implicit sentiment identification based on complementary semantic and structural information
2024cites this paper
Complementary Roles of Inference and Language Models in QA
2023influential citation
Align-then-Enhance: Multilingual Entailment Graph Enhancement with Soft Predicate Alignment
2023cites this paper
Improving word similarity computation accuracy by multiple parameter optimization based on ontology knowledge
2023cites this paper
Multimodal Language Data and Platform Construction for LCTLs Teaching
2023cites this paper
Knowledge-Fusion-Based Iterative Graph Structure Learning Framework for Implicit Sentiment Identification
2023cites this paper
Estimating word co-occurrence probabilities from pretrained static embeddings using a log-bilinear model
2022cites this paper
College student expression on Twitter during the COVID-19 pandemic
2022cites this paper
CADE: The Missing Benchmark in Evaluating Dataset Requirements of AI-enabled Software
2022cites this paper
Framework for entity extraction with verification: application to inference of data set usage in research publications
2022cites this paper
Context-aware incremental clustering of alerts in monitoring systems
2022cites this paper
B-AIS: An Automated Process for Black-box Evaluation of Visual Perception in AI-enabled Software against Domain Semantics
2022cites this paper
Better Language Model with Hypernym Class Prediction
2022cites this paper
Trade co-occurrence, trade flow decomposition and conditional order imbalance in equity markets
2022influential citation
Cross-lingual Inference with A Chinese Entailment Graph
2022cites this paper
Incorporating Temporal Information in Entailment Graph Mining
2021cites this paper
Revising the Curricula of Higher Education to Connect to the Job Market: An Approach Based on Job Description Mining
2021cites this paper
Multivalent Entailment Graphs for Question Answering
2021cites this paper
Single Document Viewpoint Summarization based on Triangle Identification in Dependency Graph
2021cites this paper
COVID-19 Coverage By Cable and Broadcast Networks
2021cites this paper
Graph Fusion Network for Text Classification
2021cites this paper
Empirical Comparison of Word Similarity Measures Based on Co-Occurrence, Context, and a Vector Space Model
2020influential citation
Extracting Users’ Explicit Preferences from Free-text using Second Order Co-occurrence PMI in Indian Matrimony
2020cites this paper
Copy from DBC Webarchive
2020cites this paper
A survey of word embeddings based on deep learning
2019cites this paper
Choosing Between Lexeme vs. Token in Russian Collocations
2019cites this paper
Duality of Link Prediction and Entailment Graph Induction
2019influential citation
Scalable Cross-lingual Document Similarity through Language-specific Concept Hierarchies
2019cites this paper
A Lexical Resource-Constrained Topic Model for Word Relatedness
2019cites this paper
Detecting Word-Based Algorithmically Generated Domains Using Semantic Analysis
2019cites this paper
Semantic Classification of Unknown Words based on Graph-based Semi-supervised Clustering
2018cites this paper
Learning Typed Entailment Graphs with Global Soft Constraints
2018cites this paper
Authorship Identification with Multi Sequence Word Selection Method
2018cites this paper
Diachronic Variation of Temporal Expressions in Scientific Writing Through the Lens of Relative Entropy
2017influential citation
Language Technologies for the Challenges of the Digital Age
2017cites this paper
FRAPpuccino: Fault-detection through Runtime Analysis of Provenance
2017cites this paper
Definition text's syntactic feature using stationarity control
2017cites this paper
Efficient Clustering from Distributions over Topics
2017cites this paper
Use Your Mind and Learn to Write: The Problem of Producing Coherent Text
2017cites this paper
An Initial Analysis of Topic-based Similarity among Scientific Documents Based on their Rhetorical Discourse Parts
2017cites this paper
Why Do Some Patents Get Licensed While Others Do Not
2017cites this paper
Autoencoder and selectional preference
2017cites this paper
Distributional Inclusion Hypothesis for Tensor-based Composition
2016cites this paper
Induction, Semantic Validation and Evaluation of a Derivational Morphology Lexicon for German
2016cites this paper
Adding Context to Concept Trees
2016cites this paper
Word and Document Embeddings based on Neural Network Approaches
2016cites this paper
Semi-supervised Learning with Induced Word Senses for State of the Art Word Sense Disambiguation
2016cites this paper
Utilising Wikipedia for Text Mining Applications
2016cites this paper
Clustering algorithm based on asymmetric similarity and paradigmatic features
2016cites this paper
Indexation aléatoire et similarité inter-phrases appliquées au résumé automatique. (Random indexing and inter-sentences similarity applied to automatic summarization)
2016cites this paper
“Mahoshadha”, the Sinhala Tagged Corpus Based Question Answering System
2016cites this paper
A Compositional Distributional Inclusion Hypothesis
2016cites this paper
Information-based Modeling of Diachronic Linguistic Change: from Typicality to Productivity
2016cites this paper
Amélioration des modèles de repli par des sacs de mots et des n-grammes à variables
2016cites this paper
A Hybrid Method for Domain Ontology Construction from the Web
2016cites this paper
中文近義詞的偵測與判別(Detection and Discrimination of Chinese Near-synonyms)[In Chinese]
2016cites this paper
Exploiting Linguistic Knowledge in Lexical and Compositional Semantic Models
2016cites this paper
Reducing Large Semantic Graphs to Improve Semantic Relatedness
2015cites this paper
Adding New Words into a Language Model using Parameters of Known Words with Similar Behavior
2015cites this paper
Linguistic Individuality Transformation for Spoken Language
2015cites this paper
A link-based approach to semantic relation analysis
2015cites this paper
Sentence entailment in compositional distributional semantics
2015influential citation
Unilateral Weighted Jaccard Coefficient for NLP
2015cites this paper
The corpus-based identification of cross-lectal synonyms in pluricentric languages
2015cites this paper
Automatic Creation of a Semantic Network Encoding part_of Relations
2015cites this paper
Analyse distributionnelle appliquée aux textes de spécialité - Réduction de la dispersion des données par abstraction des contextes [Distributional analysis applied to domain-specific texts - Data dispersion reduction by context abstraction]
2015cites this paper
Sparsity and normalization in word similarity systems
2015cites this paper
Word Semantic Similarity Measurement Based on Evidence Theory
2015cites this paper
Detecting Singleton Review Spammers Using Semantic Similarity
2015cites this paper
Peta Pikiran Otomatis Teks Berbahasa Indonesia Menggunakan Word Co-occurrence Dan Bobot Kalimat
2015cites this paper
Book Reviews: Semantic Similarity from Natural Language and Ontology Analysis by Sébastien Harispe, Sylvie Ranwez, Stefan Janaqi, and Jacky Montmain
2015influential citation
A Large Probabilistic Semantic Network Based Approach to Compute Term Similarity
2015cites this paper
RESSIST: a research object-basedrecommender system
2015cites this paper
GSB ’ 15 Graph Search and Beyond Workshop
2015cites this paper
Word Sense Induction and Disambiguation Rivaling Supervised Methods
2015cites this paper
Tuning a semantic relatedness algorithm using a multiscale approach
2015cites this paper
Unilateral Jaccard Similarity Coefficient
2015cites this paper
Towards Automatic Construction of Arabic Synonyms " " " ءانب وحن زنكم ةغلل تافدارتملا ةيبرع ًايلا " Student Number : 1105486 Supervised by :
2015cites this paper
Natural Language Dialog Systems and Intelligent Assistants
2015cites this paper
Meta-analysis of knowledge assets for continuous improvement of maintenance cost controlling
2014cites this paper
A generic framework and methodology for extracting semantics from co-occurrences
2014cites this paper
Predicting Fine-grained Social Roles with Selectional Preferences
2014cites this paper
Scale-Free Distribution in Chinese Semantic Field Network: A Main Cause of Using the Shortest Path Length for Representing Semantic Distance Between Terms
2014cites this paper
Semantic Processing of Semitic Languages
2014cites this paper
Improving sparse word similarity models with asymmetric measures
2014cites this paper
Evaluation of Automatic Updates of Roget's Thesaurus
2014cites this paper
Continual Word Embedding Based for Matching Lightweight Ontologies
2014cites this paper
Computational Modeling and Simulation of Language and Meaning: Similarity-Based Approaches
2014cites this paper
SciRecSys: A Recommendation System for Scientific Publication by Discovering Keyword Relationships
2014cites this paper
Extracting Clusters of Specialist Terms from Unstructured Text
2014cites this paper
Looking for Hyponyms in Vector Space
2014cites this paper
Computing Concept Relatedness Based on Ontology
2014cites this paper
Improving in-domain data selection for small in-domain sets
2014cites this paper
Probabilistic Distributional Semantics with Latent Variable Models
2014cites this paper
Event knowledge and models of logical metonymy interpretation
2014cites this paper
A Comparison of Selectional Preference Models for Automatic Verb Classification
2014cites this paper
Automatic Generation of Association Thesaurus Based on Domain-Specific Text Collection
2014cites this paper
Markov Random Fields and Mass Spectra Discrimination
2014cites this paper