Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction

Valentin I. Spitkovsky,H. Alshawi,Dan Jurafsky

Published 2013 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

Many statistical learning problems in NLP call for local model search methods. But accuracy tends to suffer with current techniques, which often explore either too narrowly or too broadly: hill-climbers can get stuck in local optima, whereas samplers may be inefficient. We propose to arrange individual local optimizers into organized networks. Our building blocks are operators of two types: (i) transform, which suggests new places to search, via non-random restarts from already-found local optima; and (ii) join, which merges candidate solutions to find better optima. Experiments on grammar induction show that pursuing different transforms (e.g., discarding parts of a learned model or ignoring portions of training data) results in improvements. Groups of locally-optimal solutions can be further perturbed jointly, by constructing mixtures. Using these tools, we designed several modular dependency grammar induction networks of increasing complexity. Our complete system achieves 48.6% accuracy (directed dependency macro-average over all 19 languages in the 2006/7 CoNLL data) — more than 5% higher than the previous state-of-the-art.

PUBLICATION RECORD

Publication year
2013
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2013-10-01
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.18653/v1/d13-1204
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Differential Evolution: A Practical Approach to Global Optimization
2014cited by this paper
Nonconvex Global Optimization for Latent-Variable Models
2013cited by this paper
Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
2013influential reference
Unsupervised Word Sense Disambiguation
2013cited by this paper
Fast dropout training
2013cited by this paper
Improving neural networks by preventing co-adaptation of feature detectors
2012cited by this paper
Simple Robust Grammar Induction with Combinatory Categorial Grammars
2012influential reference
Bootstrapping Dependency Grammar Inducers from Incomplete Sentence Fragments via Austere Models
2012influential reference
Concavity and Initialization for Unsupervised Dependency Parsing
2012influential reference
Exploiting Reducibility in Unsupervised Dependency Parsing
2012cited by this paper
Grammar Induction: Beyond Local Search
2012cited by this paper
A Feature-Rich Constituent Context Model for Grammar Induction
2012influential reference
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
2012influential reference
Improved Constituent Context Model with Features
2012influential reference
Three Dependency-and-Boundary Models for Grammar Induction
2012influential reference
Posterior Sparsity in Unsupervised Dependency Parsing
2011influential reference
Gibbs Sampling with Treeness Constraint in Unsupervised Dependency Parsing
2011cited by this paper
Semi-supervised Relation Extraction with Large-scale Word Clustering
2011cited by this paper
Punctuation: Making a Point in Unsupervised Dependency Parsing
2011influential reference
Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models
2011influential reference
Using Semantic Cues to Learn Syntax
2011influential reference
Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction
2011influential reference
Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
2011influential reference
On the Utility of Curricula in Unsupervised Learning of Probabilistic Grammars
2011cited by this paper
Boosting-Based System Combination for Machine Translation
2010cited by this paper
Generative Alignment and Semantic Parsing for Learning from Ambiguous Supervision
2010cited by this paper
Inducing Tree-Substitution Grammars
2010cited by this paper
Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing
2010influential reference
An Exact A* Method for Deciphering Letter-Substitution Ciphers
2010cited by this paper
Viterbi Training Improves Unsupervised Dependency Parsing
2010cited by this paper
Plot Induction and Evolutionary Search for Story Generation
2010cited by this paper
Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization
2010cited by this paper
Instance Sense Induction from Attribute Sets
2010cited by this paper
Ensemble Models for Dependency Parsing: Cheap and Good?
2010cited by this paper
Products of Random Latent Variable Grammars
2010cited by this paper
Curriculum learning
2009cited by this paper
Baby Steps: How “Less is More” in Unsupervised Dependency Parsing
2009influential reference
Bounding and Comparing Methods for Correlation Clustering Beyond ILP
2009cited by this paper
Random Restarts in Global Optimization
2009cited by this paper
Flexible shaping: how learning in small steps helps.
2009cited by this paper
Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning
2009influential reference
Online EM for Unsupervised Models
2009cited by this paper
Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing
2009cited by this paper
Multiple Word Alignment with Profile Hidden Markov Models
2009cited by this paper
Minimized Models for Unsupervised Part-of-Speech Tagging
2009cited by this paper
Random Restarts in Minimum Error Rate Training for Statistical Machine Translation
2008cited by this paper
Semi-Supervised Convex Training for Dependency Parsing
2008cited by this paper
Fast Unsupervised Incremental Parsing
2007cited by this paper
Convex Clustering with Exemplar-Based Models
2007cited by this paper
Cubi-time Parsing and Learning
2007cited by this paper
The CoNLL 2007 Shared Task on Dependency Parsing
2007cited by this paper
CoNLL-X Shared Task on Multilingual Dependency Parsing
2006cited by this paper
Axon pruning: an essential step underlying the developmental plasticity of neuronal connections
2006cited by this paper
Cognition through the lifespan: mechanisms of change.
2006cited by this paper
Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling
2005cited by this paper
Optimization of HMM by the Tabu Search Algorithm
2004cited by this paper
Automatic Learning of Language Model Structure
2004cited by this paper
Stochastic Local Search: Foundations & Applications
2004cited by this paper
Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency
2004cited by this paper
Head-Driven Statistical Models for Natural Language Parsing
2003cited by this paper
Optimization Models of Sound Systems Using Genetic Algorithms
2003cited by this paper
Combining Distributional and Morphological Information for Part of Speech Induction
2003cited by this paper
A Generative Constituent-Context Model for Improved Grammar Induction
2002influential reference
Stochastic Neighbor Embedding
2002cited by this paper
Data perturbation for escaping local maxima in learning
2002cited by this paper
Grammatical Bigrams
2001cited by this paper
Cubic-time Parsing and Learning Algorithms for Grammatical Bigram
2001cited by this paper
Optimiztion of HMM by the Tabu Search Algorithm
2001cited by this paper
Inducing Syntactic Categories by Context Distribution Clustering
2000cited by this paper
Unsupervised Models for Named Entity Classification
1999cited by this paper
A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants
1998cited by this paper
Experiments Using Stochastic Search for Text Planning
1998cited by this paper
Deterministic annealing for clustering, compression, classification, regression, and related optimization problems
1998influential reference
Discovering Phonotactic Finite-State Automata by Genetic Search
1998cited by this paper
Tabu Search
1997cited by this paper
Head automata for speech translation
1996cited by this paper
Comparison of genetic algorithms, random restart and two-opt switching for solving large location-allocation problems
1996cited by this paper
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods
1995cited by this paper
Lexical Heads, Phrase Structure and the Induction of Grammar
1995cited by this paper
Noise Strategies for Improving Local Search
1994cited by this paper
Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, by John H. Holland MIT Press (Bradford Books), Cambridge, Mass., 1992, xiv+211 pp. (Paperback £13.50, cloth £26.95)
1993cited by this paper
Genetic Algorithms for Multiobjective Optimization: FormulationDiscussion and Generalization
1993cited by this paper
Learning and development in neural networks: the importance of starting small.
1993cited by this paper
Building a Large Annotated Corpus of English: The Penn Treebank
1993cited by this paper
Inside-Outside Reestimation From Partially Bracketed Corpora
1992cited by this paper
A New Method for Solving Hard Satis ability Problems
1992cited by this paper
A New Method for Solving Hard Satisfiability Problems
1992cited by this paper
Markov Chain Monte Carlo Maximum Likelihood
1991cited by this paper
Tabu Search - Part I
1989cited by this paper
Genetic Algorithms in Search, Optimization & Machine Learning
1989cited by this paper
Tabu Search - Part II
1989cited by this paper
Multicriteria Optimization in Engineering and in the Sciences
1988cited by this paper
How Easy is Local Search?
1985cited by this paper
Judgment Under Uncertainty: Heuristics and Biases.
1984cited by this paper
Optimization by Simulated Annealing
1983cited by this paper
Judgment under uncertainty: On the psychology of prediction
1982cited by this paper
Evidential impact of base rates
1981cited by this paper
Minimization by Random Search Techniques
1981cited by this paper
The base-rate fallacy in probability judgments.
1980cited by this paper
Interpolated estimation of Markov source parameters from sparse data
1980cited by this paper

CITED BY

Step-wise discriminative learning on uncertain annotations for word sense disambiguation
2023cites this paper
Indução Gramatical para o Português: a Contribuição da Informação Mútua para Descoberta de Relações de Dependência
2023cites this paper
Co-training an Unsupervised Constituency Parser with Weak Supervision
2021cites this paper
Second-Order Unsupervised Neural Dependency Parsing
2020cites this paper
Clustering Contextualized Representations of Text for Unsupervised Syntax Induction
2020cites this paper
On the Role of Supervision in Unsupervised Constituency Parsing
2020cites this paper
The Return of Lexical Dependencies: Neural Lexicalized PCFGs
2020cites this paper
A Survey of Unsupervised Dependency Parsing
2020cites this paper
Deep Clustering of Text Representations for Supervision-Free Probing of Syntax
2020cites this paper
StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling
2020influential citation
Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders
2019cites this paper
Supervised Training on Synthetic Languages: A Novel Framework for Unsupervised Parsing
2019cites this paper
Unsupervised Labeled Parsing with Deep Inside-Outside Recursive Autoencoders
2019cites this paper
Enhancing Unsupervised Generative Dependency Parser with Contextual Information
2019cites this paper
Compound Probabilistic Context-Free Grammars for Grammar Induction
2019influential citation
Lexicalized Neural Unsupervised Dependency Parsing
2019cites this paper
Unsupervised Recurrent Neural Network Grammars
2019cites this paper
Cross-Lingual Transfer of Natural Language Processing Systems
2019cites this paper
Unsupervised Learning of Syntactic Structure with Invertible Neural Projections
2018cites this paper
Synthetic Data Made to Order: The Case of Parsing
2018cites this paper
Surface Statistics of an Unknown Language Indicate How to Parse It
2018cites this paper
Deep Latent Variable Models of Natural Language
2018influential citation
Dependency Grammar Induction with Neural Lexicalization and Big Training Data
2017cites this paper
CRF Autoencoder for Unsupervised Dependency Parsing
2017influential citation
Fine-Grained Prediction of Syntactic Typology: Discovering Latent Structure with Supervised Learning
2017cites this paper
12 years of Unsupervised Dependency Parsing
2016cites this paper
Learning vector representations for sentences: The recursive deep learning approach
2016cites this paper
Left-corner Methods for Syntactic Modeling with Universal Structural Constraints
2016influential citation
Computational learning of construction grammars
2016cites this paper
Unsupervised Neural Dependency Parsing
2016cites this paper
The Galactic Dependencies Treebanks: Getting More Data by Synthesizing New Languages
2016cites this paper
Twelve Years of Unsupervised Dependency Parsing
2016cites this paper
Building Powerful Dependency Parsers for Resource-Poor Languages
2016cites this paper
Graphical Models with Structured Factors, Neural Factors, and Approximation-aware Training
2015cites this paper
A convex and feature-rich discriminative approach to dependency grammar induction
2015cites this paper
Labeled Grammar Induction with Minimal Supervision
2015cites this paper
Probing the Linguistic Strengths and Limitations of Unsupervised Grammar Induction
2015cites this paper
Spectral Probablistic Modeling and Applications to Natural Language Processing
2015cites this paper
Multilingual Unsupervised Dependency Parsing with Unsupervised POS Tags
2015cites this paper
Density-Driven Cross-Lingual Transfer of Dependency Parsers
2015cites this paper
Joint Learning of Constituency and Dependency Grammars by Decomposed Cross-Lingual Induction
2015cites this paper
Unsupervised grammar induction with Combinatory Categorial Grammars
2015cites this paper
Unsupervised Dependency Parsing: Let’s Use Supervised Parsers
2015cites this paper
A visualized framework for representing uncertain and incomplete temporal knowledge
2014cites this paper
A new parsing algorithm
2014cites this paper
Linguistically motivated models for lightly-supervised dependency parsing
2014influential citation
Qualitative: Open Source Python Tool for Quality Estimation over Multiple Machine Translation Outputs
2014cites this paper
Low-Resource Semantic Role Labeling
2014cites this paper
Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization
2014cites this paper
Weakly-Supervised Learning with Cost-Augmented Contrastive Estimation
2014cites this paper
Multilingual Dependency Parsing: Using Machine Translated Texts Instead of Parallel Corpora
2014cites this paper
Dealing with Function Words in Unsupervised Dependency Parsing
2014cites this paper
Spectral Unsupervised Parsing with Additive Tree Metrics
2014cites this paper
Grammar induction and parsing with dependency-and-boundary models
2013cites this paper
Spectral Probabilistic Modeling and Applications to Natural Language Processing
2013cites this paper