Synthetic Data Made to Order: The Case of Parsing

Published 2018 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

To approximately parse an unfamiliar language, it helps to have a treebank of a similar language. But what if the closest available treebank still has the wrong word order? We show how to (stochastically) permute the constituents of an existing dependency treebank so that its surface part-of-speech statistics approximately match those of the target language. The parameters of the permutation model can be evaluated for quality by dynamic programming and tuned by gradient descent (up to a local optimum). This optimization procedure yields trees for a new artificial language that resembles the target language. We show that delexicalized parsers for the target language can be successfully trained using such “made to order” artificial languages.

PUBLICATION RECORD

Publication year
2018
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
Unknown publication date
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.18653/v1/D18-1163
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Surface Statistics of an Unknown Language Indicate How to Parse It
2018cited by this paper
BinLin: A Simple Method of Dependency Tree Linearization
2018cited by this paper
The First Multilingual Surface Realisation Shared Task (SR’18): Overview and Evaluation Results
2018cited by this paper
Fine-Grained Prediction of Syntactic Typology: Discovering Latent Structure with Supervised Learning
2017cited by this paper
Prague Dependency Treebank
2017cited by this paper
A Survey of Cross-lingual Word Embedding Models
2017cited by this paper
Multi-lingual Dependency Parsing Evaluation: a Large-scale Analysis of Word Order Properties using Artificial Data
2016cited by this paper
Multilingual Projection for Parsing Truly Low-Resource Languages
2016cited by this paper
Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations
2016cited by this paper
Cross-Lingual Syntactic Transfer with Limited Resources
2016cited by this paper
Many Languages, One Parser
2016influential reference
A Representation Learning Framework for Multi-Source Transfer Parsing
2016influential reference
Twelve Years of Unsupervised Dependency Parsing
2016cited by this paper
Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus
2016cited by this paper
The Galactic Dependencies Treebanks: Getting More Data by Synthesizing New Languages
2016influential reference
Frustratingly Easy Cross-Lingual Transfer for Transition-Based Dependency Parsing
2016cited by this paper
SQuAD: 100,000+ Questions for Machine Comprehension of Text
2016cited by this paper
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
2015cited by this paper
Universal Dependencies 1.4
2015influential reference
Teaching Machines to Read and Comprehend
2015cited by this paper
Cross-lingual Transfer for Unsupervised Dependency Parsing Without Parallel Data
2015influential reference
KLcpos3 - a Language Similarity Measure for Delexicalized Parser Transfer
2015cited by this paper
Experiments with Generative Models for Dependency Tree Linearization
2015cited by this paper
Hierarchical Low-Rank Tensors for Multilingual Transfer Parsing
2015influential reference
Yara Parser: A Fast and Accurate Dependency Parser
2015influential reference
MSTParser Model Interpolation for Multi-Source Delexicalized Transfer
2015cited by this paper
Density-Driven Cross-Lingual Transfer of Dependency Parsers
2015cited by this paper
Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014influential reference
Treebank Translation for Cross-Lingual Parser Induction
2014cited by this paper
Rediscovering Annotation Projection for Cross-Lingual Parser Induction
2014cited by this paper
Nonconvex Global Optimization for Latent-Variable Models
2013cited by this paper
The World Atlas of Language Structures Online
2013cited by this paper
Target Language Adaptation of Discriminative Transfer Parsers
2013cited by this paper
Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
2013cited by this paper
Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction
2013cited by this paper
Universal Dependency Annotation for Multilingual Parsing
2013cited by this paper
Selective Sharing for Multilingual Dependency Parsing
2012cited by this paper
Concavity and Initialization for Unsupervised Dependency Parsing
2012cited by this paper
Three Dependency-and-Boundary Models for Grammar Induction
2012cited by this paper
Multi-Source Transfer of Delexicalized Dependency Parsers
2011influential reference
A Universal Part-of-Speech Tagset
2011cited by this paper
Data point selection for cross-language adaptation of dependency parsers
2011cited by this paper
Sparsity in Dependency Grammar Induction
2010cited by this paper
Using Universal Linguistic Knowledge to Guide Grammar Induction
2010cited by this paper
Tree Linearization in English: Improving Language Model Based Approaches
2009cited by this paper
Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing
2009cited by this paper
Parser Adaptation and Projection with Quasi-Synchronous Grammar Features
2009cited by this paper
Dependency Grammar Induction via Bitext Projection Constraints
2009cited by this paper
Grammar Induction
2009cited by this paper
Semi-Supervised Convex Training for Dependency Parsing
2008cited by this paper
Algorithms for Deterministic Incremental Dependency Parsing
2008cited by this paper
Cross-Language Parser Adaptation between Related Languages
2008cited by this paper
Hierarchical Phrase-Based Translation
2007influential reference
Discriminative learning and spanning tree algorithms for dependency parsing
2006cited by this paper
Local Search with Very Large-Scale Neighborhoods for Optimal Permutations in Machine Translation
2006cited by this paper
Annealing Structural Bias in Multilingual Weighted Grammar Induction
2006cited by this paper
Bootstrapping parsers via syntactic projection across parallel texts
2005cited by this paper
Guiding Unsupervised Grammar Induction Using Contrastive Estimation
2005cited by this paper
Semi-supervised Learning by Entropy Minimization
2004cited by this paper
Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency
2004cited by this paper
The Prague Dependency Treebank
2003cited by this paper
The estimation of stochastic context-free grammars using the Inside-Outside algorithm
2003cited by this paper
On the effectiveness of the skew divergence for statistical language analysis
2001cited by this paper
Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora
2001cited by this paper
Measures of Distributional Similarity
1999cited by this paper
Two Experiments on Learning Probabilistic Dependency Grammars from Corpora
1992cited by this paper
Applications of stochastic context-free grammars using the Inside-Outside algorithm
1990cited by this paper
Permutation Generation Methods
1977cited by this paper

CITED BY

Mask Factory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation
2024cites this paper
Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning
2024cites this paper
Improving Cross-Lingual Transfer through Subtree-Aware Word Reordering
2023influential citation
Synthetic Data in Healthcare
2023cites this paper
Cross-lingual Inflection as a Data Augmentation Method for Parsing
2022cites this paper
PPT: Parsimonious Parser Transfer for Unsupervised Cross-Lingual Adaptation
2021cites this paper
Generating Synthetic Text Data to Evaluate Causal Inference Methods
2021cites this paper
Word Reordering for Zero-shot Cross-lingual Structured Prediction
2021influential citation
On the Relation between Syntactic Divergence and Zero-Shot Performance
2021cites this paper
Not All Linearizations Are Equally Data-Hungry in Sequence Labeling Parsing
2021cites this paper
Linear-Time Calculation of the Expected Sum of Edge Lengths in Random Projective Linearizations of Trees
2021cites this paper
A Survey of the Model Transfer Approaches to Cross-Lingual Dependency Parsing
2020influential citation
Improving cross-lingual model transfer by chunking
2020cites this paper
On understanding character-level models for representing morphology
2020cites this paper
Towards Instance-Level Parser Selection for Cross-Lingual Transfer of Dependency Parsers
2020cites this paper
A survey of syntactic-semantic parsing based on constituent and dependency structures
2020cites this paper
Cross-Lingual Dependency Parsing by POS-Guided Word Reordering
2020cites this paper
Data Augmentation via Subtree Swapping for Dependency Parsing of Low-Resource Languages
2020cites this paper
Supervised Training on Synthetic Languages: A Novel Framework for Unsupervised Parsing
2019cites this paper
Low-Resource Syntactic Transfer with Unsupervised Source Reordering
2019cites this paper
How to Parse Low-Resource Languages: Cross-Lingual Parsing, Target Language Annotation, or Both?
2019cites this paper
Long-Distance Dependencies Don’t Have to Be Long: Simplifying through Provably (Approximately) Optimal Permutations
2019cites this paper
Multilingual Abstractions: Abstract Syntax Trees and Universal Dependencies
2019cites this paper
Cross-Lingual Syntactic Transfer through Unsupervised Adaptation of Invertible Projections
2019cites this paper
Bootstrapping UD treebanks for Delexicalized Parsing
2019cites this paper
Low-Resource Parsing with Crosslingual Contextualized Representations
2019cites this paper
A little perturbation makes a difference: Treebank augmentation by perturbation improves transfer parsing
2019cites this paper
Target Language-Aware Constrained Inference for Cross-lingual Dependency Parsing
2019cites this paper
Perturbation Based Learning for Structured NLP Tasks with Application to Dependency Parsing
2019cites this paper
On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing
2018cites this paper
Surface Statistics of an Unknown Language Indicate How to Parse It
2018cites this paper
Near or Far, Wide Range Zero-Shot Cross-Lingual Dependency Parsing
2018cites this paper
Edinburgh Research Explorer A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages
year unknowncites this paper