Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings

Yuan Zhang,David Gaddy,R. Barzilay,T. Jaakkola

Published 2016 in North American Chapter of the Association for Computational Linguistics

ABSTRACT

In the absence of annotations in the target language, multilingual models typically draw on extensive parallel resources. In this paper, we demonstrate that accurate multilingual part-of-speech (POS) tagging can be done with just a few (e.g., ten) word translation pairs. We use the translation pairs to establish a coarse linear isometric (orthonormal) mapping be-tween monolingual embeddings. This enables the supervised source model expressed in terms of embeddings to be used directly on the target language. We further reﬁne the model in an unsupervised manner by initializing and regularizing it to be close to the direct transfer model. Averaged across six languages, our model yields a 37.5% absolute improvement over the monolingual prototype-driven method (Haghighi and Klein, 2006) when using a comparable amount of super-vision. Moreover, to highlight key linguistic characteristics of the generated tags, we use them to predict typological properties of languages, obtaining a 50% error reduction relative to the prototype model. 1

PUBLICATION RECORD

Publication year
2016
Venue
North American Chapter of the Association for Computational Linguistics
Publication date
2016-06-01
Fields of study
Computer Science
Identifiers
DOI 10.18653/v1/N16-1156
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Universal Dependencies v1: A Multilingual Treebank Collection
2016influential reference
Building a shared world: mapping distributional to model-theoretic semantic spaces
2015cited by this paper
Deep Multilingual Correlation for Improved Word Embeddings
2015cited by this paper
Bilingual Word Embeddings from Non-Parallel Document-Aligned Data Applied to Bilingual Lexicon Induction
2015cited by this paper
Part-of-speech Taggers for Low-resource Languages using CCA Features
2015cited by this paper
Cross-lingual Dependency Parsing Based on Distributed Representations
2015cited by this paper
Unsupervised POS Induction with Word Embeddings
2015cited by this paper
Bilingual Word Representations with Monolingual Quality in Mind
2015cited by this paper
Improving Vector Space Word Representations Using Multilingual Correlation
2014cited by this paper
GloVe: Global Vectors for Word Representation
2014cited by this paper
What Can We Get From 1000 Tokens? A Case Study of Multilingual POS Tagging For Resource-Poor Languages
2014cited by this paper
Distributed Word Representation Learning for Cross-Lingual Dependency Parsing
2014cited by this paper
An Autoencoder Approach to Learning Bilingual Word Representations
2014cited by this paper
BilBOWA: Fast Bilingual Distributed Representations without Word Alignments
2014cited by this paper
Cross-Lingual Part-of-Speech Tagging through Ambiguous Learning
2014cited by this paper
Increasing the Quality and Quantity of Source Language Data for Unsupervised Cross-Lingual POS Tagging
2013cited by this paper
Efficient Estimation of Word Representations in Vector Space
2013influential reference
Exploiting Similarities among Languages for Machine Translation
2013influential reference
Polyglot: Distributed Word Representations for Multilingual NLP
2013cited by this paper
Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging
2013cited by this paper
Universal Dependency Annotation for Multilingual Parsing
2013cited by this paper
Bilingual Word Embeddings for Phrase-Based Machine Translation
2013cited by this paper
Vine Pruning for Efficient Multi-Pass Dependency Parsing
2012cited by this paper
Syntactic Transfer Using a Bilingual Lexicon
2012cited by this paper
Unsupervised Bilingual POS Tagging with Markov Random Fields
2011cited by this paper
A Universal Part-of-Speech Tagset
2011cited by this paper
Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
2011cited by this paper
Generalized Inverse Matrices
2011cited by this paper
Painless Unsupervised Learning with Features
2010cited by this paper
Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment
2010cited by this paper
Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches
2009cited by this paper
Unsupervised Multilingual Learning for POS Tagging
2008cited by this paper
Steepest Descent Algorithms for Optimization Under Unitary Matrix Constraint
2008cited by this paper
Cross-Language Parser Adaptation between Related Languages
2008cited by this paper
The CoNLL 2007 Shared Task on Dependency Parsing
2007cited by this paper
CoNLL-X Shared Task on Multilingual Dependency Parsing
2006cited by this paper
Prototype-Driven Learning for Sequence Models
2006influential reference
Europarl: A Parallel Corpus for Statistical Machine Translation
2005cited by this paper
The World Atlas of Language Structures
2005cited by this paper
A Resource-light Approach to Russian Morphology: Tagging Russian using Czech resources
2004cited by this paper
Support vector machine learning for interdependent and structured output spaces
2004cited by this paper
Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora
2001cited by this paper
On the limited memory BFGS method for large scale optimization
1989cited by this paper

CITED BY

Part of speech (POS) tagging in Roman Urdu: datasets and models
2025cites this paper
Frame-Based Zero-Shot Semantic Channel Equalization for AI-Native Communications
2025cites this paper
Unfair clause detection in terms of service across multiple languages
2024cites this paper
Decipherment-Aware Multilingual Learning in Jointly Trained Language Models
2024cites this paper
Alignment of Multilingual Embeddings to Estimate Job Similarities in Online Labour Market
2024cites this paper
Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques
2024influential citation
SeNSe: embedding alignment via semantic anchors selection
2024cites this paper
On a Novel Application of Wasserstein-Procrustes for Unsupervised Cross-Lingual Alignment of Embeddings
2024cites this paper
Al-ways Bad: Retroﬁtting Cross-Lingual Word Embeddings to Dictionaries . Association for Computational Linguistics , 2020
2023cites this paper
Prague to Penn Discourse Transformation
2023cites this paper
ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning
2023cites this paper
Al-ways Bad: Retroﬁtting Cross-Lingual Word Embeddings to Dictionaries . Association for Computational Linguistics , 2020
2023cites this paper
Transferring Word-Formation Networks Between Languages
2023cites this paper
Cross-lingual Argument Mining in the Medical Domain
2023influential citation
Al-ways Bad: Retroﬁtting Cross-Lingual Word Embeddings to Dictionaries . Association for Computational Linguistics , 2020
2023cites this paper
Al-ways Bad: Retroﬁtting Cross-Lingual Word Embeddings to Dictionaries . Association for Computational Linguistics , 2020
2023cites this paper
Al-ways Bad: Retroﬁtting Cross-Lingual Word Embeddings to Dictionaries . Association for Computational Linguistics , 2020
2023cites this paper
Al-ways Bad: Retroﬁtting Cross-Lingual Word Embeddings to Dictionaries . Association for Computational Linguistics , 2020
2023cites this paper
Al-ways Bad: Retroﬁtting Cross-Lingual Word Embeddings to Dictionaries . Association for Computational Linguistics
2023cites this paper
Unsupervised Alignment of Distributional Word Embeddings
2022cites this paper
Overview of the 2022 BUCC Shared Task: Bilingual Term Alignment in Comparable Specialized Corpora
2022cites this paper
Al-ways Bad: Retroﬁtting Cross-Lingual Word Embeddings to Dictionaries . Association for Computational Linguistics , 2020
2022cites this paper
Toward Semantic History Compression for Reinforcement Learning
2022cites this paper
Multi-Stage Framework with Refinement Based Point Set Registration for Unsupervised Bi-Lingual Word Alignment
2022cites this paper
Building Comparable Corpora for Assessing Multi-Word Term Alignment
2022cites this paper
From unified phrase representation to bilingual phrase alignment in an unsupervised manner
2022cites this paper
Investigating Language Relationships in Multilingual Sentence Encoders Through the Lens of Linguistic Typology
2022cites this paper
Learning Bilingual Word Embedding Mappings with Similar Words in Related Languages Using GAN
2022cites this paper
Semantic Recommendation System for Bilingual Corpus of Academic Papers
2021cites this paper
Semi-supervised machine learning methods for developing derivational networks Ph.D. Thesis Proposal
2021cites this paper
A Chinese-Thai Cross-language Word Embedding Method Based on Unequal Corpus of Small Dictionaries
2021cites this paper
Broad Language Support for Automatic Translation Insight Extractors of Complex Language Patterns Name : Olzhas Aldabergenov Student ID : s 1928643 Date : 27 / 07 / 2021 Specialisation :
2021influential citation
Preserving Cross-Linguality of Pre-trained Models via Continual Learning
2021cites this paper
FII_CROSS at SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation
2021cites this paper
Filtered Inner Product Projection for Crosslingual Embedding Alignment
2021cites this paper
Combining Static Word Embeddings and Contextual Representations for Bilingual Lexicon Induction
2021cites this paper
Cross-lingual learning for text processing: A survey
2021cites this paper
On a Novel Application of Wasserstein-Procrustes for Unsupervised Cross-Lingual Learning
2020cites this paper
Bi-Decoder Augmented Network for Neural Machine Translation
2020cites this paper
Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages
2020cites this paper
Refinement of Unsupervised Cross-Lingual Word Embeddings
2020cites this paper
Improving Candidate Generation for Low-resource Cross-lingual Entity Linking
2020cites this paper
XPersona: Evaluating Multilingual Personalized Chatbot
2020cites this paper
Massively Multilingual Sparse Word Representations
2020cites this paper
Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning
2020cites this paper
Why Overfitting Isn’t Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries
2020cites this paper
Cross-Lingual Word Embeddings for Turkic Languages
2020cites this paper
Learning aligned embeddings for semi-supervised word translation using Maximum Mean Discrepancy
2020cites this paper
Bilingual Dictionary Based Neural Machine Translation without Using Parallel Sentences
2020cites this paper
Modeling Code-Switch Languages Using Bilingual Parallel Corpus
2020cites this paper
Unsupervised Multilingual Alignment using Wasserstein Barycenter
2020cites this paper
Cross-lingual Spoken Language Understanding with Regularized Representation Alignment
2020cites this paper
Semi-Supervised Bilingual Lexicon Induction with Two-way Interaction
2020cites this paper
Exploiting Comparable Corpora to Enhance Bilingual Lexicon Induction from Monolingual Corpora
2020cites this paper
Unsupervised Word Translation Pairing using Refinement based Point Set Registration
2020cites this paper
Cross-lingual Annotation Projection in Legal Texts
2020cites this paper
Al-ways Bad: Retroﬁtting Cross-Lingual Word Embeddings to Dictionaries . Association for Computational Linguistics , 2020
2020cites this paper
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions
2019cites this paper
Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization
2019cites this paper
Cross-lingual Structure Transfer for Relation and Event Extraction
2019cites this paper
Exploiting languages proximity for part-of-speech tagging of three French regional languages
2019influential citation
A Common Semantic Space for Monolingual and Cross-Lingual Meta-Embeddings
2019cites this paper
Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?
2019cites this paper
Should All Cross-Lingual Embeddings Speak English?
2019cites this paper
Cross-Lingual Word Embeddings for Morphologically Rich Languages
2019cites this paper
Exploring Crosslingual Word Embeddings for Semantic Classification in Text and Dialogue
2019cites this paper
Reconstructed similarity for faster GANs-based word translation to mitigate hubness
2019cites this paper
Neural morphosyntactic tagging for Rusyn
2019cites this paper
Supervised Training on Synthetic Languages: A Novel Framework for Unsupervised Parsing
2019cites this paper
Cross-Lingual Vision-Language Navigation
2019cites this paper
Source ! Target Incorrect Predicted aunt ! тетя бабушка ( Grandmother ) uruguay ! уругвая аргентины ( Argentina ) regiments ! полков кавалерийские
2019cites this paper
Unsupervised Cross-Lingual Representation Learning
2019cites this paper
A Bilingual Adversarial Autoencoder for Unsupervised Bilingual Lexicon Induction
2019cites this paper
Learning Unsupervised Word Mapping via Maximum Mean Discrepancy
2019cites this paper
Attention-Informed Mixed-Language Training for Zero-shot Cross-lingual Task-oriented Dialogue Systems
2019cites this paper
Evaluating Resource-Lean Cross-Lingual Embedding Models in Unsupervised Retrieval
2019cites this paper
Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces
2019cites this paper
Cross-Lingual Syntactic Transfer through Unsupervised Adaptation of Invertible Projections
2019influential citation
Weakly-Supervised Concept-based Adversarial Learning for Cross-lingual Word Embeddings
2019cites this paper
Towards Optimal Transport with Global Invariances
2018cites this paper
Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource
2018cites this paper
Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!
2018cites this paper
Multilingual word embeddings and their utility in cross-lingual learning
2018cites this paper
MASTER THESIS
2018cites this paper
A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
2018cites this paper
Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding
2018cites this paper
Étiquetage en parties du discours de langues peu dotées par spécialisation des plongements lexicaux (POS tagging for low-resource languages by adapting word embeddings )
2018cites this paper
Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations
2018cites this paper
Bilingual Embeddings with Random Walks over Multilingual Wordnets
2018cites this paper
Learning Unsupervised Word Mapping by Maximizing Mean Discrepancy
2018cites this paper
Unsupervised Bilingual Lexicon Induction via Latent Variable Models
2018influential citation
PD3: Better Low-Resource Cross-Lingual Transfer By Combining Direct Transfer and Annotation Projection
2018cites this paper
Joint Representation Learning of Cross-lingual Words and Entities via Attentive Distant Supervision
2018cites this paper
The impact of corpus domain on word representation: a study on Persian word embeddings
2018cites this paper
Zero-Resource Multilingual Model Transfer: Learning What to Share
2018cites this paper
XNLI: Evaluating Cross-lingual Sentence Representations
2018cites this paper
Unsupervised Cross-lingual Transfer of Word Embedding Spaces
2018cites this paper
Gromov-Wasserstein Alignment of Word Embedding Spaces
2018influential citation
Neural Cross-Lingual Named Entity Recognition with Minimal Resources
2018influential citation
A Discriminative Latent-Variable Model for Bilingual Lexicon Induction
2018cites this paper