On the Role of Seed Lexicons in Learning Bilingual Word Embeddings

Published 2016 in Annual Meeting of the Association for Computational Linguistics

ABSTRACT

A shared bilingual word embedding space (SBWES) is an indispensable resource in a variety of cross-language NLP and IR tasks. A common approach to the SB-WES induction is to learn a mapping function between monolingual semantic spaces, where the mapping critically relies on a seed word lexicon used in the learning process. In this work, we analyze the importance and properties of seed lexicons for the SBWES induction across different dimensions (i.e., lexicon source, lexicon size, translation method, translation pair reliability). On the basis of our analysis, we propose a simple but effective hybrid bilingual word embedding (BWE) model. This model (HYBWE) learns the mapping be-tween two monolingual embedding spaces using only highly reliable symmetric translation pairs from a seed document-level embedding space. We perform bilingual lexicon learning (BLL) with 3 language pairs and show that by carefully selecting reliable translation pairs our new HYBWE model outperforms benchmarking BWE learning models, all of which use more expensive bilingual signals. Effectively, we demonstrate that a SBWES may be induced by leveraging only a very weak bilingual signal (document alignments) along with monolingual data.

PUBLICATION RECORD

Publication year
2016
Venue
Annual Meeting of the Association for Computational Linguistics
Publication date
Unknown publication date
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.18653/v1/P16-1024
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Cross-lingual Wikification Using Multilingual Embeddings
2016cited by this paper
Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings
2016cited by this paper
Massively Multilingual Word Embeddings
2016cited by this paper
The Role of Context Types and Dimensionality in Learning Word Embeddings
2016cited by this paper
Multi-Modal Representations for Improved Bilingual Lexicon Learning
2016cited by this paper
A Dual Embedding Space Model for Document Ranking
2016cited by this paper
Cross-lingual Models of Word Embeddings: An Empirical Comparison
2016cited by this paper
Any-language frame-semantic parsing
2015cited by this paper
If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages
2015cited by this paper
Trans-gram, Fast Cross-lingual Word-embeddings
2015cited by this paper
Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings
2015cited by this paper
A Simple Word Embedding Model for Lexical Substitution
2015cited by this paper
Cross-lingual Dependency Parsing Based on Distributed Representations
2015cited by this paper
Improving Distributional Similarity with Lessons Learned from Word Embeddings
2015influential reference
Bilingual Word Representations with Monolingual Quality in Mind
2015influential reference
Hubness and Pollution: Delving into Cross-Space Mapping for Zero-Shot Learning
2015cited by this paper
Skip-Thought Vectors
2015cited by this paper
Learning Cross-lingual Word Embeddings via Matrix Co-factorization
2015cited by this paper
Judgment Language Matters: Multilingual Vector Space Models for Judgment Language Aware Lexical Semantics
2015cited by this paper
Bilingual Distributed Word Representations from Document-Aligned Comparable Data
2015cited by this paper
Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors
2014cited by this paper
An Autoencoder Approach to Learning Bilingual Word Representations
2014cited by this paper
Neural Word Embedding as Implicit Matrix Factorization
2014cited by this paper
Dependency-Based Word Embeddings
2014cited by this paper
Improving Vector Space Word Representations Using Multilingual Correlation
2014cited by this paper
Learning Bilingual Word Representations by Marginalizing Alignments
2014cited by this paper
BilBOWA: Fast Bilingual Distributed Representations without Word Alignments
2014influential reference
Evaluating Neural Word Representations in Tensor-Based Compositional Settings
2014cited by this paper
Leveraging Monolingual Data for Crosslingual Compositional Word Representations
2014cited by this paper
A Fast and Accurate Dependency Parser using Neural Networks
2014cited by this paper
Multilingual Models for Compositional Distributed Semantics
2014cited by this paper
Improving zero-shot learning by mitigating the hubness problem
2014influential reference
Probabilistic Models of Cross-Lingual Semantic Similarity in Context Based on Latent Cross-Lingual Concepts Induced from Comparable Data
2014cited by this paper
Distributed Representations of Words and Phrases and their Compositionality
2013influential reference
Polyglot: Distributed Word Representations for Multilingual NLP
2013influential reference
A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)
2013cited by this paper
Multilingual Distributed Representations without Word Alignment
2013cited by this paper
Exploiting Similarities among Languages for Machine Translation
2013influential reference
Bilingual Word Embeddings for Phrase-Based Machine Translation
2013influential reference
Improving Word Representations via Global Context and Multiple Word Prototypes
2012cited by this paper
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
2012cited by this paper
Inducing Crosslingual Distributed Representations of Words
2012cited by this paper
Parallel Data, Tools and Interfaces in OPUS
2012cited by this paper
WSABIE: Scaling Up to Large Vocabulary Image Annotation
2011cited by this paper
Natural Language Processing (Almost) from Scratch
2011cited by this paper
From Frequency to Meaning: Vector Space Models of Semantics
2010cited by this paper
Word Representations: A Simple and General Method for Semi-Supervised Learning
2010cited by this paper
Cross-lingual Induction of Selectional Preferences with Bilingual Vector Spaces
2010cited by this paper
SemEval-2010 Task 2: Cross-Lingual Lexical Substitution
2009cited by this paper
Vector-based Models of Semantic Composition
2008cited by this paper
Bootstrapping Feature-Rich Dependency Parsers with Entropic Priors
2007cited by this paper
Creating bilingual lexica using reference wordlists for alignment of monolingual semantic vector spaces
2005cited by this paper
A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora
2004cited by this paper
Putting frequencies in the dictionary
1997cited by this paper
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Learning Bilingual Lexicons Using the Visual Similarity of Labeled Web Images
year unknowncited by this paper

CITED BY

Synonym Set for Indonesian WordNet Based on KBBI Using the DBSCAN Algorithm
2025cites this paper
A survey of neural-network-based methods utilising comparable data for finding translation equivalents
2024cites this paper
Enhancing bilingual lexicon induction via harnessing polysemous words
2024cites this paper
Data Driven Analysis of Semantic and Phraseo Semantic Fields Integrating Information Retrieval Systems for Cross Linguistic Comparative Studies
2024cites this paper
Unsupervised Neural Machine Translation between the Portuguese language and the Chinese and Korean languages
2023cites this paper
Topic-Based Unsupervised and Supervised Dictionary Induction
2022cites this paper
Quantized Wasserstein Procrustes Alignment of Word Embedding Spaces
2022cites this paper
Manipuri–English comparable corpus for cross-lingual studies
2022cites this paper
Learning Bilingual Word Embedding Mappings with Similar Words in Related Languages Using GAN
2022cites this paper
Aligning Word Vectors on Low-Resource Languages with Wiktionary
2022cites this paper
Improving Machine Translation of Rare and Unseen Word Senses
2021influential citation
Improving bilingual word embeddings mapping with monolingual context information
2021cites this paper
Do not neglect related languages: The case of low-resource Occitan cross-lingual word embeddings
2021cites this paper
The Cross-Lingual Arabic Information REtrieval (CLAIRE) System
2021cites this paper
Evaluating a Joint Training Approach for Learning Cross-lingual Embeddings with Sub-word Information without Parallel Corpora on Lower-resource Languages
2021cites this paper
Cross-Lingual Word Embedding Refinement by \ell_{1} Norm Optimisation
2021cites this paper
Learning Cross-Lingual Word Embeddings from Twitter via Distant Supervision
2020cites this paper
Refinement of Unsupervised Cross-Lingual Word Embeddings
2020cites this paper
Multi-SimLex: A Large-Scale Evaluation of Multilingual and Crosslingual Lexical Semantic Similarity
2020cites this paper
A Statistical Test for Legal Interpretation: Theory and Applications
2020cites this paper
Bilingual Lexicon Induction across Orthographically-distinct Under-Resourced Dravidian Languages
2020cites this paper
Combining Word Embeddings with Bilingual Orthography Embeddings for Bilingual Dictionary Induction
2020cites this paper
Target-Level Sentiment Analysison Various Genres
2020cites this paper
Mono- and cross-lingual paraphrased text reuse and extrinsic plagiarism detection
2020cites this paper
Anchor-based Bilingual Word Embeddings for Low-Resource Languages
2020cites this paper
Exploiting Comparable Corpora to Enhance Bilingual Lexicon Induction from Monolingual Corpora
2020influential citation
Towards Handling Compositionality in Low-Resource Bilingual Word Induction
2020influential citation
Improving Bilingual Lexicon Induction with Unsupervised Post-Processing of Monolingual Word Vector Spaces
2020cites this paper
Non-Linear Instance-Based Cross-Lingual Mapping for Non-Isomorphic Embedding Spaces
2020cites this paper
Neural Machine Translation
2020cites this paper
Exploring Bilingual Word Embeddings for Hiligaynon, a Low-Resource Language
2020cites this paper
LMU Bilingual Dictionary Induction System with Word Surface Similarity Scores for BUCC 2020
2020cites this paper
A Call for More Rigor in Unsupervised Cross-lingual Learning
2020cites this paper
LessLex: Linking Multilingual Embeddings to SenSe Representations of LEXical Items
2020cites this paper
A Common Semantic Space for Monolingual and Cross-Lingual Meta-Embeddings
2019influential citation
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions
2019cites this paper
Aligning Vector-spaces with Noisy Supervised Lexicon
2019cites this paper
Semantic Drift in Multilingual Representations
2019cites this paper
Learning Cross-lingual Embeddings from Twitter via Distant Supervision
2019cites this paper
A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity
2019cites this paper
Learning Unsupervised Multilingual Word Embeddings with Incremental Multilingual Hubs
2019cites this paper
Learning Bilingual Word Embeddings Using Lexical Definitions
2019cites this paper
Best Practices for Learning Domain-Specific Cross-Lingual Embeddings
2019influential citation
Unsupervised Cross-Lingual Representation Learning
2019cites this paper
On the Robustness of Unsupervised and Semi-supervised Cross-lingual Word Embedding Learning
2019cites this paper
Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?
2019cites this paper
Comparing Unsupervised Word Translation Methods Step by Step
2019cites this paper
A Robust Self-Learning Method for Fully Unsupervised Cross-Lingual Mappings of Word Embeddings: Making the Method Robustly Reproducible as Well
2019cites this paper
Improving Bilingual Lexicon Induction on Distant Language Pairs
2019cites this paper
Seeking robustness in a multilingual world: from pipelines to embeddings
2019cites this paper
Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations
2018influential citation
On the Limitations of Unsupervised Bilingual Dictionary Induction
2018cites this paper
Multilingual word embeddings and their utility in cross-lingual learning
2018cites this paper
Learning to Represent Bilingual Dictionaries
2018cites this paper
Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable
2018influential citation
An Empirical Study on Crosslingual Transfer in Probabilistic Topic Models
2018cites this paper
A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
2018cites this paper
A deep learning approach to bilingual lexicon induction in the biomedical domain
2018cites this paper
Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources
2018cites this paper
Harnessing sense-level information for semantically augmented knowledge extraction
2018cites this paper
A Discriminative Latent-Variable Model for Bilingual Lexicon Induction
2018cites this paper
Word and Phrase Dictionaries Generated with Multiple Translation Paths
2018cites this paper
Unsupervised Cross-lingual Transfer of Word Embedding Spaces
2018cites this paper
Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization
2018cites this paper
NORMA: Neighborhood Sensitive Maps for Multilingual Word Embeddings
2018cites this paper
Cross-lingual Lexical Sememe Prediction
2018cites this paper
Understanding Crosslingual Transfer Mechanisms in Probabilistic Topic Modeling
2018cites this paper
Learning Translations via Images: A Large Multilingual Dataset and Comprehensive Study
2018cites this paper
Using Communities of Words Derived from Multilingual Word Vectors for Cross-Language Information Retrieval in Indian Languages
2018cites this paper
Cross-lingual Word Analogies using Linear Transformations between Semantic Spaces
2018cites this paper
Linear Transformations for Cross-lingual Semantic Textual Similarity
2018cites this paper
Characterizing Departures from Linearity in Word Translation
2018cites this paper
Orthographic Features for Bilingual Lexicon Induction
2018cites this paper
Evaluating bilingual word embeddings on the long tail
2018cites this paper
Phrase Table Induction Using Monolingual Data for Low-Resource Statistical Machine Translation
2018cites this paper
Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding
2018cites this paper
Bilingual Lexicon Induction by Learning to Combine Word-Level and Character-Level Representations
2017cites this paper
Morph-fitting: Fine-Tuning Word Vector Spaces with Simple Language-Specific Rules
2017influential citation
A survey of cross-lingual embedding models
2017cites this paper
Enriching low resource Statistical Machine Translation using induced bilingual lexicons
2017cites this paper
Transfer learning for low-resource natural language analysis
2017cites this paper
Similarités Textuelles Sémantiques Translingues : vers la Détection Automatique du Plagiat par Traduction. (Cross Lingual Semantic Textual Similarity Detection : towards Cross-Language Plagiarism Detection)
2017cites this paper
Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints
2017cites this paper
Learning Bilingual Lexicon for Low-Resource Language Pairs
2017cites this paper
EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text
2017cites this paper
Phrase Table Induction Using In-Domain Monolingual Data for Domain Adaptation in Statistical Machine Translation
2017cites this paper
Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction
2017cites this paper
Bilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision
2017influential citation
Efficient Extraction of Pseudo-Parallel Sentences from Raw Monolingual Data Using Word Embeddings
2017cites this paper
Adversarial Training for Unsupervised Bilingual Lexicon Induction
2017influential citation
Learning Translations via Matrix Completion
2017cites this paper
A Survey of Cross-lingual Word Embedding Models
2017cites this paper
Knowledge Distillation for Bilingual Dictionary Induction
2017influential citation
Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints
2017influential citation
Negative Sampling Improves Hypernymy Extraction Based on Projection Learning
2017cites this paper
Learning bilingual word embeddings with (almost) no bilingual data
2017cites this paper
Cross-Lingual Syntactically Informed Distributed Word Representations
2017cites this paper
Graph-Based Bilingual Word Embedding for Statistical Machine Translation
2016cites this paper
Learning Indonesian-Chinese Lexicon with Bilingual Word Embedding Models and Monolingual Signals
2016influential citation
Learning Word Subsumption Projections for the Russian Language
2016cites this paper