New Experiments in Distributional Representations of Synonymy

Dayne Freitag,Matthias Blume,John Byrnes,Edmond Chow,Sadik Kapadia,R. Rohwer,Zhiqiang Wang

Published 2005 in Conference on Computational Natural Language Learning

ABSTRACT

Recent work on the problem of detecting synonymy through corpus analysis has used the Test of English as a Foreign Language (TOEFL) as a benchmark. However, this test involves as few as 80 questions, prompting questions regarding the statistical significance of reported results. We overcome this limitation by generating a TOEFL-like test using WordNet, containing thousands of questions and composed only of words occurring with sufficient corpus frequency to support sound distributional comparisons. Experiments with this test lead us to a similarity measure which significantly outperforms the best proposed to date. Analysis suggests that a strength of this measure is its relative robustness against polysemy.

PUBLICATION RECORD

Publication year
2005
Venue
Conference on Computational Natural Language Learning
Publication date
2005-06-29
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.3115/1706543.1706548
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Characterising Measures of Lexical Distributional Similarity
2004cited by this paper
Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems
2003cited by this paper
Frequency Estimates for Statistical Word Similarity Measures
2003cited by this paper
Discovering word senses from text
2002cited by this paper
Vector-based semantic analysis: representing word meanings based on random labels
2001influential reference
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
2001cited by this paper
Book Reviews: WordNet: An Electronic Lexical Database
1999cited by this paper
Measures of Distributional Similarity
1999influential reference
Information Geometry, Bayesian Inference, Ideal Estimates and Error Decomposition
1998cited by this paper
Automatic Retrieval and Clustering of Similar Words
1998cited by this paper
A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.
1997influential reference
Using Syntactic Dependency as Local Context to Resolve Word Sense Ambiguity
1997cited by this paper
$I$-Divergence Geometry of Probability Distributions and Minimization Problems
1975cited by this paper
Mathematical structures of language
1968cited by this paper

CITED BY

Modelling Intertextuality with N-gram Embeddings
2025cites this paper
Building Static Embeddings from Contextual Ones: Is It Useful for Building Distributional Thesauri?
2022cites this paper
Décontextualiser des plongements contextuels pour construire des thésaurus distributionnels (Decontextualizing contextual embeddings for building distributional thesauri )
2022cites this paper
Paraphrase type identification for plagiarism detection using contexts and word embeddings
2021cites this paper
Building and Comparing Lemma Embeddings for Latin. Classical Latin versus Thomas Aquinas
2020cites this paper
Asymmetric Attributional Word Similarity Measures to Detect the Relations of Textual Generality
2020cites this paper
Word Representation
2020cites this paper
Unsupervised Compositionality Prediction of Nominal Compounds
2019cites this paper
Extracting and Learning Semantics from Social Web Data
2019cites this paper
Vir is to Moderatus as Mulier is to Intemperans - Lemma Embeddings for Latin
2019cites this paper
Wordnet-based Evaluation of Large Distributional Models for Polish
2018cites this paper
Application of Text Analytics to Extract and Analyze Material–Application Pairs from a Large Scientific Corpus
2018cites this paper
A Survey on Portuguese Lexical Knowledge Bases: Contents, Comparison and Combination
2018cites this paper
Knowing the Author by the Company His Words Keep
2018cites this paper
Comparing and Combining Portuguese Lexical-Semantic Knowledge Bases
2017cites this paper
Extending Thesauri Using Word Embeddings and the Intersection Method
2017cites this paper
Distributional models of multiword expression compositionality prediction
2017cites this paper
Compositional Semantics using Feature-Based Models from WordNet
2017cites this paper
Pattern-based methods for Improved Lexical Semantics and Word Embeddings
2017cites this paper
Turkish synonym identification from multiple resources: monolingual corpus, mono/bilingual online dictionaries, and WordNet
2017cites this paper
B2SG: a TOEFL-like Task for Portuguese
2016influential citation
The Portuguese B ^2 2 SG: A Semantic Test for Distributional Thesaurus
2016cites this paper
Comparing explicit and predictive distributional semantic models endowed with syntactic contexts
2016cites this paper
Data-driven natural language generation using statistical machine translation and discriminative learning. (L'approche discriminante à la génération de la parole)
2016cites this paper
Predicting the Compositionality of Nominal Compounds: Giving Word Embeddings a Hard Time
2016cites this paper
Automatic Corpus Extension for Data-driven Natural Language Generation
2016cites this paper
Specializing Word Embeddings for Similarity or Relatedness
2015cites this paper
Réordonnancer des thésaurus distributionnels en combinant différents critères [Reorder distributional thesauri by combining different criteria]
2015cites this paper
Sparsity and normalization in word similarity systems
2015cites this paper
Tesauros Distribucionais para o Português: avaliação de metodologias
2015influential citation
Typing Relations in Distributional Thesauri
2015cites this paper
Nothing like Good Old Frequency: Studying Context Filters for Distributional Thesauri
2014influential citation
On the effect of word frequency on distributional similarity
2014cites this paper
Exploring the neighbor graph to improve distributional thesauri (Explorer le graphe de voisinage pour améliorer les thésaurus distributionnels) [in French]
2014cites this paper
Improving sparse word similarity models with asymmetric measures
2014cites this paper
Improving distributional thesauri by exploring the graph of neighbors
2014cites this paper
Calculating semantic relatedness for biomedical use in a knowledge-poor environment
2014cites this paper
Comparing Similarity Measures for Distributional Thesauri
2014influential citation
Partial Measure of Semantic Relatedness Based on the Local Feature Selection
2014cites this paper
Predicting the relevance of distributional semantic similarity with contextual information
2014cites this paper
Compounds and distributional thesauri
2014influential citation
An Integrated Approach to Automatic Synonym Detection in Turkish Corpus
2014cites this paper
Unsupervised selection of semantic relations for improving a distributional thesaurus (Sélection non supervisée de relations sémantiques pour améliorer un thésaurus distributionnel) [in French]
2013cites this paper
Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
2013cites this paper
Discovery of noun semantic relations based on sentential context analysis
2013influential citation
Asymmetric Distributional Similarity Measures to Recognize Textual Entailment by Generality. (Mesures de similarité distributionnelle asymétrique pour la détection de l'implication textuelle par généralité)
2013cites this paper
Disambiguating implicit temporal queries for temporal information retrieval applications
2013cites this paper
Lexical Acquisition for Clinical Text Mining Using Distributional Similarity
2012cites this paper
Enriching temporal query understanding through date identification: how to tag implicit temporal queries?
2012cites this paper
GTE: a distributional second-order co-occurrence approach to improve the identification of top relevant dates in web snippets
2012cites this paper
Distributional Semantics Approach to Detecting Synonyms in Croatian Language
2012influential citation
Combining Bootstrapping and Feature Selection for Improving a Distributional Thesaurus
2012cites this paper
Exploring the Semantic Meaning of Constructs that Lead to Human Decisions
2011cites this paper
Comparaison d’une approche miroir et d’une approche distributionnelle pour l’extraction de mots sémantiquement reliés (Comparing a mirror approach and a distributional approach for extracting semantically related words)
2011cites this paper
Exploring patterns in dictionary definitions for synonym extraction
2011cites this paper
Comparing Distributional and Mirror Translation Similarities for Extracting Synonyms
2011cites this paper
Automatic discovery of word semantic relations using paraphrase alignment and distributional lexical semantics analysis
2010influential citation
Discovery of numerous specific topics via term co-occurrence analysis
2010cites this paper
Paraphrase Alignment for Synonym Evidence Discovery
2010cites this paper
Parallel, massive processing in SuperMatrix—A general tool for distributional semantic analysis of corpus
2010cites this paper
Testing Semantic Similarity Measures for Extracting Synonyms from a Corpus
2010influential citation
Simmered Greedy Optimization for Co-clustering
2010cites this paper
Extraction of Polish noun senses from large corpora by means of clustering
2010cites this paper
Similarité sémantique et extraction de synonymes à partir de corpus
2010cites this paper
Morphosyntactic Constraints in the Acquisition of Linguistic Knowledge for Polish
2009cites this paper
A Wordnet from the ground up
2009cites this paper
Extracting Synonyms from Dictionary Definitions
2009cites this paper
Relieving Polysemy Problem for Synonymy Detection
2009influential citation
Normalized Web Distance and Word Similarity
2009cites this paper
Extracting Synonyms from Dictionary Definitions
2009cites this paper
Rank-Based Transformation in Measuring Semantic Relatedness
2009cites this paper
Sense-based clustering of Polish nouns in the extraction of semantic relatedness
2008cites this paper
SuperMatrix: a General tool for lexical semantic knowledge acquisition
2008influential citation
Words, Concepts and Relations in the Construction of Polish WordNet
2008cites this paper
Towards semi-automatic extraction of lexical semantics relations for Polish
2008influential citation
The XTREEM Methods for Ontology Learning from Web Documents
2008cites this paper
Corpus-based Semantic Relatedness for the Construction of Polish WordNet
2008influential citation
Mapping nanosciences by citation flows: A preliminary analysis
2007cites this paper
Semantic Similarity Measure of Polish Nouns Based on Linguistic Features
2007influential citation
Statistical Translation, Heat Kernels and Expected Distances
2007cites this paper
Extended Similarity Test for the Evaluation of Semantic SimilarityFunctions
2007influential citation
One Sense Per Discourse for Synonym Detection
2007influential citation
Automatic Selection of Heterogeneous Syntactic Features in Semantic Similarity of Polish Nouns
2007cites this paper
Combination of Global and Local Attributional Similarities for Synonym Detection
2007cites this paper
Model Unification in Support of Political Process
2006cites this paper
Business Process Interoperability with Living Ontologies
2006cites this paper
Synonym Extraction Using a Semantic Distance on a Dictionary
2006cites this paper
Using context-window overlapping in synonym discovery and ontology extension
2005cites this paper