Cross-Lingual Morphological Tagging for Low-Resource Languages

Published 2016 in Annual Meeting of the Association for Computational Linguistics

ABSTRACT

Morphologically rich languages often lack the annotated linguistic resources required to develop accurate natural language processing tools. We propose models suitable for training morphological taggers with rich tagsets for low-resource languages without using direct supervision. Our approach extends existing approaches of projecting part-of-speech tags across languages, using bitext to infer constraints on the possible tags for a given word type or token. We propose a tagging model using Wsabie, a discriminative embeddingbased model with rank-based learning. In our evaluation on 11 languages, on average this model performs on par with a baseline weakly-supervised HMM, while being more scalable. Multilingual experiments show that the method performs best when projecting between related language pairs. Despite the inherently lossy projection, we show that the morphological tags predicted by our models improve the downstream performance of a parser by +0.6 LAS on average.

PUBLICATION RECORD

Publication year
2016
Venue
Annual Meeting of the Association for Computational Linguistics
Publication date
2016-06-01
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.18653/v1/P16-1184 arXiv 1606.04279
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Paradigm classification in supervised learning of morphology
2015cited by this paper
Universal Dependencies : A cross-linguistic typology
2015cited by this paper
Robust Morphological Tagging with Word Representations
2015cited by this paper
A Language-Independent Feature Schema for Inflectional Morphology
2015cited by this paper
Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning
2015cited by this paper
A Universal Feature Schema for Rich Morphological Annotation and Fine-Grained Cross-Lingual Part-of-Speech Tagging
2015cited by this paper
What Can We Get From 1000 Tokens? A Case Study of Multilingual POS Tagging For Resource-Poor Languages
2014cited by this paper
Automatic speech recognition for under-resourced languages: A survey
2014cited by this paper
Cross-Lingual Part-of-Speech Tagging through Ambiguous Learning
2014cited by this paper
Universal Stanford dependencies: A cross-linguistic typology
2014cited by this paper
Distributed Representations of Words and Phrases and their Compositionality
2013cited by this paper
Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
2013cited by this paper
Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging
2013influential reference
Supervised Learning of Complete Morphological Paradigms
2013cited by this paper
A Simple, Fast, and Effective Reparameterization of IBM Model 2
2013cited by this paper
Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure
2012influential reference
Wiki-ly Supervised Part-of-Speech Tagging
2012influential reference
WSABIE: Scaling Up to Large Vocabulary Image Annotation
2011influential reference
Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
2011influential reference
A Universal Part-of-Speech Tagset
2011cited by this paper
Transition-based Dependency Parsing with Rich Non-local Features
2011cited by this paper
Painless Unsupervised Learning with Features
2010cited by this paper
Statistical Parsing of Morphologically Rich Languages (SPMRL) What, How and Whither
2010cited by this paper
The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages
2009cited by this paper
Reusable Tagset Conversion Using Tagset Drivers
2008cited by this paper
Unsupervised Multilingual Learning for Morphological Segmentation
2008cited by this paper
Distributed Word Clustering for Large Scale Class-Based Language Modeling in Machine Translation
2008cited by this paper
Applying Morphology Generation Models to Machine Translation
2008cited by this paper
Unsupervised models for morpheme segmentation and morphology learning
2007cited by this paper
Automatically Inducing a Part-of-Speech Tagger by Projecting from Multiple Source Languages Across Aligned Corpora
2005cited by this paper
Europarl: A Parallel Corpus for Statistical Machine Translation
2005cited by this paper
Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora
2001cited by this paper
Proceedings
1947cited by this paper

CITED BY

Computational Methods for Language Documentation and Description
2026cites this paper
Hybrid Neural-LLM Pipeline for Morphological Glossing in Endangered Language Documentation: A Case Study of Jungar Tuvan
2026cites this paper
A BERT-Based Approach for Part-of-Speech Tagging in the Low-Resource Context of Sardinian
2025cites this paper
A Systematic Comparison of Statistical and Neural Frameworks for Spanish POS Tagging
2025cites this paper
LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models
2024cites this paper
NCC: Neural concept compression for multilingual document recommendation
2023cites this paper
Meeting the Needs of Low-Resource Languages: The Value of Automatic Alignments via Pretrained Models
2023cites this paper
Unsupervised Stem-based Cross-lingual Part-of-Speech Tagging for Morphologically Rich Low-Resource Languages
2022cites this paper
Towards Unsupervised Morphological Analysis of Polysynthetic Languages
2022cites this paper
Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging
2022influential citation
Deciphering and Characterizing Out-of-Vocabulary Words for Morphologically Rich Languages
2022cites this paper
Morphological Processing of Low-Resource Languages: Where We Are and What’s Next
2022cites this paper
Optimal Size-Performance Tradeoffs: Weighing PoS Tagger Models
2021cites this paper
Unsupervised Morphological Segmentation and Part-of-Speech Tagging for Low-Resource Scenarios
2021cites this paper
Highly Efficient Parts of Speech Tagging in Low Resource Languages with Improved Hidden Markov Model and Deep Learning
2021cites this paper
Cross-Register Projection for Headline Part of Speech Tagging
2021cites this paper
Generalized Supervised Attention for Text Generation
2021cites this paper
Computational Morphology with Neural Network Approaches
2021influential citation
Cross-lingual learning for text processing: A survey
2021cites this paper
Low-resource Languages: A Review of Past Work and Future Challenges
2020cites this paper
Reconciling historical data and modern computational models in corpus creation
2020cites this paper
Morphological Disambiguation of South Sámi with FSTs and Neural Networks
2020cites this paper
Evaluating Neural Morphological Taggers for Sanskrit
2020cites this paper
Fine-grained Morphosyntactic Analysis and Generation Tools for More Than One Thousand Languages
2020cites this paper
KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text Classification for Kinyarwanda and Kirundi
2020cites this paper
Unsupervised Cross-Lingual Part-of-Speech Tagging for Truly Low-Resource Scenarios
2020influential citation
Lingual-Agnostic Meta-Learning for Low-Resource Part-of-Speech Tagging
2020cites this paper
Proceedings of the Society for Computation in Linguistics Proceedings of the Society for Computation in Linguistics
2020cites this paper
NeuMorph
2019cites this paper
Neural morphosyntactic tagging for Rusyn
2019cites this paper
Morphosyntactic Disambiguation in an Endangered Language Setting
2019cites this paper
Learning Morphosyntactic Analyzers from the Bible via Iterative Annotation Projection across 26 Languages
2019cites this paper
Adversarial Multitask Learning for Joint Multi-Feature and Multi-Dialect Morphological Modeling
2019cites this paper
Crosslingual Document Embedding as Reduced-Rank Ridge Regression
2019cites this paper
Towards Combining Multitask and Multilingual Learning
2019cites this paper
Neural sequence-to-sequence models for low-resource morphology
2019cites this paper
Initial Experiments In Cross-Lingual Morphological Analysis Using Morpheme Segmentation
2019cites this paper
CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology
2019cites this paper
Fast and Scalable Expansion of Natural Language Understanding Functionality for Intelligent Agents
2018cites this paper
Near or Far, Wide Range Zero-Shot Cross-Lingual Dependency Parsing
2018cites this paper
On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing
2018cites this paper
Automatic Glossing in a Low-Resource Setting for Language Documentation
2018cites this paper
Neural Factor Graph Models for Cross-lingual Morphological Tagging
2018cites this paper
Cross-lingual Character-Level Neural Morphological Tagging
2017cites this paper
Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary
2017cites this paper
A Rich Morphological Tagger for English: Exploring the Cross-Linguistic Tradeoff Between Morphology and Syntax
2017cites this paper
One-Shot Neural Cross-Lingual Transfer for Paradigm Completion
2017cites this paper
Multi-source morphosyntactic tagging for spoken Rusyn
2017cites this paper