Multilingual Alignment of Contextual Word Representations

Published 2020 in International Conference on Learning Representations

ABSTRACT

We propose procedures for evaluating and strengthening contextual embedding alignment and show that they are useful in analyzing and improving multilingual BERT. In particular, after our proposed alignment procedure, BERT exhibits significantly improved zero-shot performance on XNLI compared to the base model, remarkably matching pseudo-fully-supervised translate-train models for Bulgarian and Greek. Further, to measure the degree of alignment, we introduce a contextual version of word retrieval and show that it correlates well with downstream zero-shot transfer. Using this word retrieval task, we also analyze BERT and find that it exhibits systematic deficiencies, e.g. worse alignment for open-class parts-of-speech and word pairs written in different scripts, that are corrected by the alignment procedure. These results support contextual alignment as a useful concept for understanding large multilingual pre-trained models.

PUBLICATION RECORD

Publication year
2020
Venue
International Conference on Learning Representations
Publication date
2020-02-10
Fields of study
Linguistics, Computer Science
Identifiers
arXiv 2002.03518
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing
2019influential reference
Cross-lingual Language Model Pretraining
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019influential reference
Learning Unsupervised Multilingual Word Embeddings with Incremental Multilingual Hubs
2019cited by this paper
How Multilingual is Multilingual BERT?
2019cited by this paper
Context-Aware Cross-Lingual Mapping
2019cited by this paper
Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing
2019influential reference
Simple and Effective Paraphrastic Similarity from Parallel Translations
2019cited by this paper
Non-Adversarial Unsupervised Word Translation
2018cited by this paper
Deep Contextualized Word Representations
2018cited by this paper
Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations
2018cited by this paper
On the Limitations of Unsupervised Bilingual Dictionary Induction
2018cited by this paper
A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
2018influential reference
Universal Language Model Fine-tuning for Text Classification
2018cited by this paper
Improving Language Understanding by Generative Pre-Training
2018cited by this paper
Unsupervised Multilingual Word Embeddings
2018cited by this paper
Unsupervised Cross-lingual Transfer of Word Embedding Spaces
2018cited by this paper
XNLI: Evaluating Cross-lingual Sentence Representations
2018influential reference
Concatenated p-mean Word Embeddings as Universal Cross-Lingual Sentence Representations
2018cited by this paper
A Survey of Cross-lingual Word Embedding Models
2017cited by this paper
Word Translation Without Parallel Data
2017cited by this paper
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
2017cited by this paper
Learning bilingual word embeddings with (almost) no bilingual data
2017cited by this paper
Offline bilingual word vectors, orthogonal transformations and the inverted softmax
2017influential reference
Enriching Word Vectors with Subword Information
2016cited by this paper
Learning principled bilingual mappings of word embeddings while preserving monolingual invariance
2016cited by this paper
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
2016cited by this paper
Polyglot: Distributed Word Representations for Multilingual NLP
2013cited by this paper
Distributed Representations of Words and Phrases and their Compositionality
2013cited by this paper
Exploiting Similarities among Languages for Machine Translation
2013cited by this paper
A Simple, Fast, and Effective Reparameterization of IBM Model 2
2013cited by this paper
Parallel Data, Tools and Interfaces in OPUS
2012cited by this paper
A Universal Part-of-Speech Tagset
2011cited by this paper
MultiUN: A Multilingual Corpus from United Nation Documents
2010cited by this paper
Visualizing Data using t-SNE
2008cited by this paper
Moses: Open Source Toolkit for Statistical Machine Translation
2007cited by this paper
Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding
2006cited by this paper
Europarl: A Parallel Corpus for Statistical Machine Translation
2005influential reference
A Systematic Comparison of Various Statistical Alignment Models
2003cited by this paper
The Mathematics of Statistical Machine Translation: Parameter Estimation
1993cited by this paper
A generalized solution of the orthogonal procrustes problem
1966cited by this paper

CITED BY

The Grammar of Transformers: A Systematic Review of Interpretability Research on Syntactic Knowledge in Language Models
2026cites this paper
LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?
2026cites this paper
Enhancing Multilingual Embeddings via Multi-Way Parallel Text Alignment
2026cites this paper
Layer-Targeted Multilingual Knowledge Erasure in Large Language Models
2026cites this paper
A Subword Embedding Approach for Variation Detection in Luxembourgish User Comments
2026cites this paper
Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders
2025cites this paper
Rethinking what Matters: Effective and Robust Multilingual Realignment for Low-Resource Languages
2025cites this paper
Batch Effects Remain a Fundamental Barrier to Universal Embeddings in Single-Cell Foundation Models
2025cites this paper
Efficient Cross-Lingual Transfer for Language Models
2025cites this paper
False Friends Are Not Foes: Investigating Vocabulary Overlap in Multilingual Language Models
2025cites this paper
From Construction to Injection: Edit-Based Fingerprints for Large Language Models
2025cites this paper
ALGEN: Few-shot Inversion Attacks on Textual Embeddings using Alignment and Generation
2025cites this paper
AlignFreeze: Navigating the Impact of Realignment on the Layers of Multilingual Models Across Diverse Languages
2025influential citation
Just Go Parallel: Improving the Multilingual Capabilities of Large Language Models
2025cites this paper
NeighXLM: Enhancing Cross-Lingual Transfer in Low-Resource Languages via Neighbor-Augmented Contrastive Pretraining
2025influential citation
Lost in Alignment: A Survey on Cross-Lingual Alignment Methods for Contextualized Representation
2025cites this paper
Cross-Domain Bilingual Lexicon Induction via Pretrained Language Models
2025cites this paper
A NER method based on location-aware multi-feature fusion
2025cites this paper
Kardeş-NLU: Transfer to Low-Resource Languages with the Help of a High-Resource Cousin – A Benchmark and Evaluation for Turkic Languages
2024cites this paper
Lens: Rethinking Multilingual Enhancement for Large Language Models
2024cites this paper
MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
2024cites this paper
Probing the Emergence of Cross-lingual Alignment during LLM Training
2024cites this paper
J-SNACS: Adposition and Case Supersenses for Japanese Joshi
2024cites this paper
Multilingual Sentence-T5: Scalable Sentence Encoders for Multilingual Applications
2024cites this paper
Multilingual Meta-Distillation Alignment for Semantic Retrieval
2024cites this paper
Cross-Lingual Transfer Learning for Speech Translation
2024cites this paper
How Transliterations Improve Crosslingual Alignment
2024cites this paper
LangSAMP: Language-Script Aware Multilingual Pretraining
2024cites this paper
Pruning Multilingual Large Language Models for Multilingual Inference
2024cites this paper
PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment
2024cites this paper
SeNSe: embedding alignment via semantic anchors selection
2024cites this paper
Wave to Interlingua: Analyzing Representations of Multilingual Speech Transformers for Spoken Language Translation
2024cites this paper
A survey on multilingual large language models: corpora, alignment, and bias
2024influential citation
Improving Zero-Shot Cross-Lingual Transfer via Progressive Code-Switching
2024cites this paper
A Comparison of Language Modeling and Translation as Multilingual Pretraining Objectives
2024cites this paper
Unknown Script: Impact of Script on Cross-Lingual Transfer
2024cites this paper
Unveiling Linguistic Regions in Large Language Models
2024cites this paper
Exploring Alignment in Shared Cross-lingual Spaces
2024cites this paper
Can Pretrained English Language Models Benefit Non-English NLP Systems in Low-Resource Scenarios?
2024cites this paper
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
2024cites this paper
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models
2024cites this paper
Kardeş-NLU: Transfer to Low-Resource Languages with Big Brother’s Help – A Benchmark and Evaluation for Turkic Languages
2024cites this paper
Can Machine Translation Bridge Multilingual Pretraining and Cross-lingual Transfer Learning?
2024cites this paper
Tomato, Tomahto, Tomate: Do Multilingual Language Models Understand Based on Subword-Level Semantic Concepts?
2024cites this paper
Transferring BERT Capabilities from High-Resource to Low-Resource Languages Using Vocabulary Matching
2024cites this paper
WCC-EC 2.0: Enhancing Neural Machine Translation with a 1.6M+ Web-Crawled English-Chinese Parallel Corpus
2024cites this paper
Share What You Already Know: Cross-Language-Script Transfer and Alignment for Sentiment Detection in Code-Mixed Data
2024cites this paper
Understanding Cross-Lingual Alignment - A Survey
2024influential citation
Fine-tuning Large Language Models with Limited Data: A Survey and Practical Guide
2024cites this paper
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
2024cites this paper
Domain Adaptation of Multilingual Semantic Search - Literature Review
2024cites this paper
XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples
2024cites this paper
Do Llamas Work in English? On the Latent Language of Multilingual Transformers
2024cites this paper
Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment
2024cites this paper
PMI-Align: Word Alignment With Point-Wise Mutual Information Without Requiring Parallel Training Data
2023cites this paper
WAD-X: Improving Zero-shot Cross-lingual Transfer via Adapter-based Word Alignment
2023influential citation
Multilingual Pre-training with Self-supervision from Global Co-occurrence Information
2023cites this paper
Enhancing Few-shot Cross-lingual Transfer with Target Language Peculiar Examples
2023cites this paper
PRAM: An End-to-end Prototype-based Representation Alignment Model for Zero-resource Cross-lingual Named Entity Recognition
2023cites this paper
Script, Language, and Labels: Overcoming Three Discrepancies for Low-Resource Language Specialization
2023influential citation
Macular: A Multi-Task Adversarial Framework for Cross-Lingual Natural Language Understanding
2023cites this paper
Why Does Zero-Shot Cross-Lingual Generation Fail? An Explanation and a Solution
2023cites this paper
Exploring the Relationship between Alignment and Cross-lingual Transfer in Multilingual Transformers
2023influential citation
Linear Cross-Lingual Mapping of Sentence Embeddings
2023cites this paper
Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging
2023cites this paper
Multilingual Text Representation
2023cites this paper
Instruct-Align: Teaching Novel Languages with to LLMs through Alignment-based Cross-Lingual Instruction
2023cites this paper
Implicit Cross-Lingual Word Embedding Alignment for Reference-Free Machine Translation Evaluation
2023cites this paper
KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation
2023cites this paper
PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning
2023cites this paper
Multilingual BERT-based Word Alignment By Incorporating Common Chinese Characters
2023cites this paper
Take a Closer Look at Multilinguality! Improve Multilingual Pre-Training Using Monolingual Corpora Only
2023cites this paper
Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment
2023cites this paper
Isotropic Representation Can Improve Zero-Shot Cross-Lingual Transfer on Multilingual Language Models
2023influential citation
InstructAlign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning
2023cites this paper
X-SNS: Cross-Lingual Transfer Prediction through Sub-Network Similarity
2023cites this paper
Human-Like Distractor Response in Vision-Language Model
2023cites this paper
Macedon: Minimizing Representation Coding Rate Reduction for Cross-Lingual Natural Language Understanding
2023cites this paper
Bilingual Terminology Alignment Using Contextualized Embeddings
2023cites this paper
A General-Purpose Multilingual Document Encoder
2023cites this paper
Self-Augmentation Improves Zero-Shot Cross-Lingual Transfer
2023cites this paper
Are Structural Concepts Universal in Transformer Language Models? Towards Interpretable Cross-Lingual Generalization
2023cites this paper
Massively Multilingual Lexical Specialization of Multilingual Transformers
2022cites this paper
BabelBERT: Massively Multilingual Transformers Meet a Massively Multilingual Lexical Resource
2022cites this paper
Measuring Social Solidarity During Crisis: The Role of Design Choices
2022cites this paper
The Geometry of Multilingual Language Model Representations
2022cites this paper
Same Neurons, Different Languages: Probing Morphosyntax in Multilingual Pre-trained Models
2022cites this paper
The Impact of Cross-Lingual Adjustment of Contextual Word Representations on Zero-Shot Transfer
2022influential citation
Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual Retrieval
2022cites this paper
Probing language identity encoded in pre-trained multilingual models: a typological view
2022cites this paper
Overlap-based Vocabulary Generation Improves Cross-lingual Transfer Among Related Languages
2022cites this paper
CINO: A Chinese Minority Pre-trained Language Model
2022cites this paper
Cross-Lingual Text Classification with Multilingual Distillation and Zero-Shot-Aware Training
2022cites this paper
Analyzing Gender Representation in Multilingual Models
2022cites this paper
UScore: An Effective Approach to Fully Unsupervised Evaluation Metrics for Machine Translation
2022cites this paper
Exploiting In-Domain Bilingual Corpora for Zero-Shot Transfer Learning in NLU of Intra-Sentential Code-Switching Chatbot Interactions
2022cites this paper
Constrained Density Matching and Modeling for Cross-lingual Alignment of Contextualized Representations
2022influential citation
Combining Static and Contextualised Multilingual Embeddings
2022cites this paper
Zero-shot language extension for dialogue state tracking via pre-trained models and multi-auxiliary-tasks fine-tuning
2022cites this paper
Do Transformers know symbolic rules, and would we know if they did?
2022cites this paper