Cross-lingual Models of Word Embeddings: An Empirical Comparison

Shyam Upadhyay,Manaal Faruqui,Chris Dyer,Dan Roth

Published 2016 in Annual Meeting of the Association for Computational Linguistics

ABSTRACT

Despite interest in using cross-lingual knowledge to learn word embeddings for various tasks, a systematic comparison of the possible approaches is lacking in the literature. We perform an extensive evaluation of four popular approaches of inducing cross-lingual embeddings, each requiring a different form of supervision, on four typographically different language pairs. Our evaluation setup spans four different tasks, including intrinsic evaluation on mono-lingual and cross-lingual similarity, and extrinsic evaluation on downstream semantic and syntactic applications. We show that models which require expensive cross-lingual knowledge almost always perform better, but cheaply supervised models often prove competitive on certain tasks.

PUBLICATION RECORD

Publication year
2016
Venue
Annual Meeting of the Association for Computational Linguistics
Publication date
2016-04-01
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.18653/v1/P16-1157 arXiv 1604.00425
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

A Representation Learning Framework for Multi-Source Transfer Parsing
2016cited by this paper
Massively Multilingual Word Embeddings
2016cited by this paper
Evaluation of Word Vector Representations by Subspace Alignment
2015cited by this paper
Cross-lingual Dependency Parsing Based on Distributed Representations
2015influential reference
Deep Multilingual Correlation for Improved Word Embeddings
2015cited by this paper
Bilingual Word Representations with Monolingual Quality in Mind
2015influential reference
Multiview LSA: Representation Learning via Generalized CCA
2015influential reference
Inverted indexing for cross-lingual NLP
2015cited by this paper
Trans-gram, Fast Cross-lingual Word-embeddings
2015cited by this paper
Bilingual Word Embeddings from Non-Parallel Document-Aligned Data Applied to Bilingual Lexicon Induction
2015cited by this paper
Evaluation methods for unsupervised word embeddings
2015cited by this paper
Any-language frame-semantic parsing
2015cited by this paper
Simple task-specific bilingual word embeddings
2015influential reference
An Autoencoder Approach to Learning Bilingual Word Representations
2014cited by this paper
Multilingual Models for Compositional Distributed Semantics
2014influential reference
BilBOWA: Fast Bilingual Distributed Representations without Word Alignments
2014influential reference
SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation
2014cited by this paper
Improving Vector Space Word Representations Using Multilingual Correlation
2014influential reference
Bilingual Word Embeddings for Phrase-Based Machine Translation
2013cited by this paper
Universal Dependency Annotation for Multilingual Parsing
2013cited by this paper
Exploiting Similarities among Languages for Machine Translation
2013influential reference
Efficient Estimation of Word Representations in Vector Space
2013cited by this paper
A Simple, Fast, and Effective Reparameterization of IBM Model 2
2013cited by this paper
A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)
2013cited by this paper
Polyglot: Distributed Word Representations for Multilingual NLP
2013cited by this paper
Cross-Lingual Semantic Similarity of Words as the Similarity of Their Semantic Word Responses
2013cited by this paper
Linking and Extending an Open Multilingual Wordnet
2013cited by this paper
Inducing Crosslingual Distributed Representations of Words
2012cited by this paper
Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure
2012cited by this paper
cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models
2010cited by this paper
Visualizing Data using t-SNE
2008cited by this paper
CoNLL-X Shared Task on Multilingual Dependency Parsing
2006cited by this paper
A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005
2005cited by this paper
Europarl: A Parallel Corpus for Statistical Machine Translation
2005cited by this paper
Canonical Correlation Analysis: An Overview with Application to Learning Methods
2004cited by this paper
Placing search in context: the concept revisited
2002cited by this paper
Large Margin Classification Using the Perceptron Algorithm
1998cited by this paper
Research Design & Statistical Analysis
1995cited by this paper
Term-Weighting Approaches in Automatic Text Retrieval
1988cited by this paper
Tests for comparing elements of a correlation matrix.
1980cited by this paper
Note on the sampling error of the difference between correlated proportions or percentages
1947cited by this paper
Relations Between Two Sets of Variates
1936cited by this paper

CITED BY

Marito: Structuring and Building Open Multilingual Terminologies for South African NLP
2025cites this paper
Can we Operationalize Conceptual Metaphor Cross-Lingually?
2025cites this paper
Low-Resource Language Models: Leveraging Transfer and Zero-Shot Learning for Underrepresented Languages
2025cites this paper
Mono-lingual text reuse detection for the Urdu language at lexical level
2024cites this paper
Ultra-Lightweight Neural Differential DSP Vocoder for High Quality Speech Synthesis
2024cites this paper
Improving BERTScore for Machine Translation Evaluation Through Contrastive Learning
2024cites this paper
Guiding ontology translation with hubness-aware translation memory
2024cites this paper
Learning Cross-Architecture Instruction Embeddings for Binary Code Analysis in Low-Resource Architectures
2024cites this paper
ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning
2023cites this paper
Urdu Text Reuse Detection at Phrasal level using Sentence Transformer-based approach
2023cites this paper
Urdu Short Paraphrase Detection at Sentence Level
2023cites this paper
Cross-Lingual Text Reuse Detection at sentence level for English-Urdu language pair
2022cites this paper
Impact of Sentence Representation Matching in Neural Machine Translation
2022cites this paper
Transfer Learning Parallel Metaphor using Bilingual Embeddings
2022cites this paper
Multi-Stage Framework with Refinement Based Point Set Registration for Unsupervised Bi-Lingual Word Alignment
2022cites this paper
Develop corpora and methods for cross-lingual text reuse detection for English Urdu language pair at lexical, syntactical, and phrasal levels
2022cites this paper
Improving Multilingual Frame Identification by Estimating Frame Transferability
2022cites this paper
Cross-lingual Word Embeddings in Hyperbolic Space
2022cites this paper
A Comprehensive Understanding of Code-Mixed Language Semantics Using Hierarchical Transformer
2022cites this paper
FedKC: Federated Knowledge Composition for Multilingual Natural Language Understanding
2022cites this paper
Learning Bilingual Word Embedding Mappings with Similar Words in Related Languages Using GAN
2022cites this paper
Cross-lingual Transfer for Text Classification with Dictionary-based Heterogeneous Graph
2021cites this paper
Cross-Lingual Word Embedding Refinement by \ell_{1} Norm Optimisation
2021cites this paper
DEVELOPING A BILINGUAL MODEL OF WORD EMBEDDING FOR DETECTING INDONESIAN ENGLISH PLAGIARISM
2021cites this paper
A Survey on Green Deep Learning
2021cites this paper
Jointly learning bilingual word embeddings and alignments
2021cites this paper
HIT - A Hierarchically Fused Deep Attention Network for Robust Code-mixed Language Representation
2021cites this paper
Exploiting Transfer Learning and Hand-Crafted Features in a Unified Neural Model for Identifying Actionable Informative Tweets
2021cites this paper
Towards Learning Language Agnostic Features for NLP in Low-resource Languages
2021cites this paper
Multilingual Neural Translation
2020cites this paper
Towards End-to-End Multilingual Question Answering
2020cites this paper
Contextual Embeddings for Arabic-English Code-Switched Data
2020cites this paper
Unsupervised Word Translation Pairing using Refinement based Point Set Registration
2020cites this paper
Exhaustive Entity Recognition for Coptic: Challenges and Solutions
2020cites this paper
Target-Level Sentiment Analysison Various Genres
2020cites this paper
Evaluating cross-lingual textual similarity on dictionary alignment problem
2020influential citation
Mono- and cross-lingual paraphrased text reuse and extrinsic plagiarism detection
2020cites this paper
DICT-MLM: Improved Multilingual Pre-Training using Bilingual Dictionaries
2020cites this paper
Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages
2020cites this paper
From Brooklyn Barbers To Movie Stars: Using Introductions To Construct Embeddings Of People
2020cites this paper
TRADER: Trace Divergence Analysis and Embedding Regulation for Debugging Recurrent Neural Networks
2020cites this paper
Exploring Crosslinguistic Frame Alignment
2020cites this paper
MultiSeg: Parallel Data and Subword Information for Learning Bilingual Embeddings in Low Resource Scenarios
2020cites this paper
Sentiment Analysis for Hinglish Code-mixed Tweets by means of Cross-lingual Word Embeddings
2020cites this paper
Semantic Relation Detection based on Multi-task Learning and Cross-Lingual-View Embedding
2020cites this paper
Morphological Segmentation to Improve Crosslingual Word Embeddings for Low Resource Languages
2020cites this paper
Extending Multilingual BERT to Low-Resource Languages
2020cites this paper
From static to dynamic word representations: a survey
2020cites this paper
Learning Cross-Lingual Word Embeddings from Twitter via Distant Supervision
2020influential citation
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
2020cites this paper
Multi-SimLex: A Large-Scale Evaluation of Multilingual and Crosslingual Lexical Semantic Similarity
2020cites this paper
Data Augmentation with Unsupervised Machine Translation Improves the Structural Similarity of Cross-lingual Word Embeddings
2020cites this paper
Massively Multilingual Sparse Word Representations
2020cites this paper
On the Robustness of Unsupervised and Semi-supervised Cross-lingual Word Embedding Learning
2019cites this paper
Funnelling
2019cites this paper
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions
2019cites this paper
Polyglot Contextual Representations Improve Crosslingual Transfer
2019cites this paper
C L ] 8 M ar 2 01 9 Context-Aware Crosslingual Mapping
2019cites this paper
Expanding the Text Classification Toolbox with Cross-Lingual Embeddings
2019cites this paper
Learning English-Chinese bilingual word representations from sentence-aligned parallel corpus
2019cites this paper
Context-Aware Cross-Lingual Mapping
2019cites this paper
Density Matching for Bilingual Word Embedding
2019cites this paper
Scalable Cross-Lingual Transfer of Neural Sentence Embeddings
2019cites this paper
Toward any-language zero-shot topic classification of textual documents
2019cites this paper
A model of synesthetic metaphor interpretation based on cross-modality similarity
2019cites this paper
Learning Cross-lingual Embeddings from Twitter via Distant Supervision
2019influential citation
A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity
2019cites this paper
Latent Space Cartography: Visual Analysis of Vector Space Embeddings
2019cites this paper
Learning Bilingual Word Embeddings Using Lexical Definitions
2019cites this paper
Unsupervised Joint Training of Bilingual Word Embeddings
2019influential citation
A conditional-probability zone transformation coding method for categorical features
2019cites this paper
Multilingual Open Information Extraction: Challenges and Opportunities
2019cites this paper
Unsupervised Cross-Lingual Representation Learning
2019cites this paper
Specializing Distributional Vectors of All Words for Lexical Entailment
2019cites this paper
Code-Switching Language Modeling with Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English
2019influential citation
Extracting and Learning Semantics from Social Web Data
2019cites this paper
Exploiting Cross-Lingual Representations For Natural Language Processing
2019cites this paper
Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation
2019cites this paper
Weighted Compositional Vectors for Translating Collocations Using Monolingual Corpora
2019cites this paper
Neural Cross-Lingual Relation Extraction Based on Bilingual Word Embedding Mapping
2019cites this paper
Research on Cross-Language Retrieval Using Bilingual Word Vectors in Different Languages
2019cites this paper
Bridging Between Emojis and Kaomojis by Learning Their Representations from Linguistic and Visual Information
2019cites this paper
Representing Movie Characters in Dialogues
2019cites this paper
Cross-Lingual and Low-Resource Sentiment Analysis
2019cites this paper
Neural Machine Translation: A Review
2019cites this paper
A Common Semantic Space for Monolingual and Cross-Lingual Meta-Embeddings
2019cites this paper
Cross-Lingual Ability of Multilingual BERT: An Empirical Study
2019cites this paper
Seeking robustness in a multilingual world: from pipelines to embeddings
2019cites this paper
Neural models for information retrieval: towards asymmetry sensitive approaches based on attention models. (Modèles neuronaux pour la recherche d'information : vers des approches sensibles à l'asymétrie basées sur des modèles d'attention)
2019cites this paper
CROSS-LANGUAGE TEXT ALIGNMENT FOR PLAGIARISM DETECTION BASED ON CONTEXTUAL AND CONTEXT-FREE MODELS
2019cites this paper
Cross-Lingual Learning With Distributed Representations
2018cites this paper
Neural Cross-Lingual Coreference Resolution And Its Application To Entity Linking
2018cites this paper
Characterizing Departures from Linearity in Word Translation
2018cites this paper
Multilingual Neural Machine Translation with Task-Specific Attention
2018cites this paper
Embedding Learning Through Multilingual Concept Induction
2018cites this paper
UMD at SemEval-2018 Task 10: Can Word Embeddings Capture Discriminative Attributes?
2018cites this paper
The Limitations of Cross-language Word Embeddings Evaluation
2018cites this paper
A neural generative autoencoder for bilingual word embeddings
2018cites this paper
Multilingual Embeddings Jointly Induced from Contexts and Concepts: Simple, Strong and Scalable
2018cites this paper
Lexical and Semantic Features for Cross-lingual Text Reuse Classification: an Experiment in English and Latin Paraphrases
2018cites this paper