Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser

A. Kuncoro,Miguel Ballesteros,Lingpeng Kong,Chris Dyer,Noah A. Smith

Published 2016 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

We introduce two first-order graph-based dependency parsers achieving a new state of the art. The first is a consensus parser built from an ensemble of independently trained greedy LSTM transition-based parsers with different random initializations. We cast this approach as minimum Bayes risk decoding (under the Hamming cost) and argue that weaker consensus within the ensemble is a useful signal of difficulty or ambiguity. The second parser is a "distillation" of the ensemble into a single model. We train the distillation parser using a structured hinge loss objective with a novel cost that incorporates ensemble uncertainty estimates for each possible attachment, thereby avoiding the intractable cross-entropy computations required by applying standard distillation objectives to problems with structured outputs. The first-order distillation parser matches or surpasses the state of the art on English, Chinese, and German.

PUBLICATION RECORD

Publication year
2016
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2016-09-24
Fields of study
Computer Science
Identifiers
DOI 10.18653/v1/D16-1180 arXiv 1609.07561
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Sequence-Level Knowledge Distillation
2016cited by this paper
Globally Normalized Transition-Based Neural Networks
2016cited by this paper
Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations
2016influential reference
Training with Exploration Improves a Greedy Stack LSTM Parser
2016cited by this paper
Graph-based Dependency Parsing with Bidirectional LSTM
2016cited by this paper
Two/Too Simple Adaptations of Word2Vec for Syntax Problems
2015cited by this paper
Transition-Based Dependency Parsing with Stack Long Short-Term Memory
2015influential reference
Distilling the Knowledge in a Neural Network
2015influential reference
Improved Transition-based Parsing by Modeling Characters instead of Words with LSTMs
2015cited by this paper
Incremental Recurrent Neural Network Dependency Parser with Search-based Discriminative Training
2015cited by this paper
Structured Training for Neural Network Transition-Based Parsing
2015cited by this paper
A Re-ranking Model for Dependency Parser with Recursive Convolutional Neural Network
2015cited by this paper
A Fast and Accurate Dependency Parser using Neural Networks
2014cited by this paper
Sequence to Sequence Learning with Neural Networks
2014cited by this paper
The Inside-Outside Recursive Neural Network model for Dependency Parsing
2014cited by this paper
Grammar as a Foreign Language
2014cited by this paper
Target Language Adaptation of Discriminative Transfer Parsers
2013cited by this paper
Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers
2013cited by this paper
Do Deep Nets Really Need to be Deep?
2013cited by this paper
Fourth-Order Dependency Parsing
2012cited by this paper
A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing
2012cited by this paper
Transition-based Dependency Parsing with Rich Non-local Features
2011cited by this paper
Stanford typed dependencies manual
2010cited by this paper
Ensemble Models for Dependency Parsing: Cheap and Good?
2010cited by this paper
Efficient Third-Order Dependency Parsers
2010cited by this paper
Products of Random Latent Variable Grammars
2010cited by this paper
Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions
2010cited by this paper
Polyhedral outer approximations with application to natural language parsing
2009cited by this paper
The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages
2009cited by this paper
A Tale of Two Parsers: Investigating and Combining Graph-based and Transition-based Dependency Parsing
2008cited by this paper
Integrating Graph-Based and Transition-Based Dependency Parsers
2008cited by this paper
Stacking Dependency Parsers
2008cited by this paper
Parser Combination by Reparsing
2006influential reference
Model compression
2006cited by this paper
Minimum Risk Annealing for Training Log-Linear Models
2006cited by this paper
Bayes Risk Minimization in Natural Language Parsing
2006cited by this paper
Online Large-Margin Training of Dependency Parsers
2005cited by this paper
Non-Projective Dependency Parsing using Spanning Tree Algorithms
2005cited by this paper
Learning structured prediction models: a large margin approach
2005cited by this paper
Large Margin Methods for Structured and Interdependent Output Variables
2005cited by this paper
Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network
2003cited by this paper
Building a Large-Scale Annotated Chinese Corpus
2002cited by this paper
Three New Probabilistic Models for Dependency Parsing: An Exploration
1996cited by this paper

CITED BY

Encoding and Decoding Graph Representations of Natural Language
2024cites this paper
Empirical Analysis for Unsupervised Universal Dependency Parse Tree Aggregation
2024cites this paper
R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility Across Random User Intents
2023cites this paper
Transformers as Graph-to-Graph Models
2023cites this paper
CPTAM: Constituency Parse Tree Aggregation Method
2022cites this paper
Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding
2022cites this paper
Neural Character-Level Syntactic Parsing for Chinese
2022cites this paper
Ensembling Graph Predictions for AMR Parsing
2021cites this paper
Maximum Bayes Smatch Ensemble Distillation for AMR Parsing
2021cites this paper
Deep Graph-Based Character-Level Chinese Dependency Parsing
2021cites this paper
Multilingual AMR Parsing with Noisy Knowledge Distillation
2021cites this paper
Learning Energy-Based Approximate Inference Networks for Structured Applications in NLP
2021cites this paper
A Modest Pareto Optimisation Analysis of Dependency Parsers in 2021
2021cites this paper
Second-Order Neural Dependency Parsing with Message Passing and End-to-End Training
2020cites this paper
Advancing neural language modeling in automatic speech recognition
2020cites this paper
Multitask Pointer Network for Multi-Representational Parsing
2020cites this paper
Noisy Self-Knowledge Distillation for Text Summarization
2020cites this paper
Autoregressive Knowledge Distillation through Imitation Learning
2020cites this paper
Teacher-Student Networks with Multiple Decoders for Solving Math Word Problem
2020cites this paper
Integrating Graph-Based and Transition-Based Dependency Parsers in the Deep Contextualized Era
2020influential citation
Knowledge Distillation: A Survey
2020cites this paper
AMALGUM – A Free, Balanced, Multilayer English Web Corpus
2020cites this paper
Efficient EUD Parsing
2020cites this paper
Distilling Neural Networks for Greener and Faster Dependency Parsing
2020cites this paper
Structure-Level Knowledge Distillation For Multilingual Sequence Labeling
2020cites this paper
Self Attended Stack-Pointer Networks for Learning Long Term Dependencies
2020cites this paper
Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement
2020cites this paper
Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor
2020cites this paper
Ensemble Policy Distillation in Deep Reinforcement Learning
2020cites this paper
Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation
2020cites this paper
Ensemble Distillation for Structured Prediction: Calibrated, Accurate, Fast—Choose Three
2020cites this paper
Automated Concatenation of Embeddings for Structured Prediction
2020cites this paper
An Empirical Investigation of Structured Output Modeling for Graph-based Neural Dependency Parsing
2019cites this paper
Left-to-Right Dependency Parsing with Pointer Networks
2019cites this paper
BAM! Born-Again Multi-Task Networks for Natural Language Understanding
2019cites this paper
Bayesian Learning for Neural Dependency Parsing
2019cites this paper
Massively Multilingual Transfer for NER
2019cites this paper
Lijunyi at SemEval-2019 Task 9: An attention-based LSTM and ensemble of different models for suggestion mining from online reviews and forums
2019cites this paper
Head-Driven Phrase Structure Grammar Parsing on Penn Treebank
2019cites this paper
Graph-based Dependency Parsing with Graph Neural Networks
2019cites this paper
Self-attentive Biaffine Dependency Parsing
2019cites this paper
Perturbation Based Learning for Structured NLP Tasks with Application to Dependency Parsing
2019influential citation
Target Language-Aware Constrained Inference for Cross-lingual Dependency Parsing
2019cites this paper
Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System
2019cites this paper
Rethinking Self-Attention: Towards Interpretability in Neural Parsing
2019cites this paper
State-of-the-art Italian dependency parsers based on neural and ensemble systems
2019cites this paper
Model Di tillation for Deep Learning ased aze Estimation
2019cites this paper
Parsing Italian Texts Together is Better Than Parsing Them Alone!
2018cites this paper
Comparing decoding mechanisms for parsing argumentative structures
2018cites this paper
Effective Subtree Encoding for Easy-First Dependency Parsing
2018cites this paper
IBM Research at the CoNLL 2018 Shared Task on Multilingual Parsing
2018cites this paper
Semantics as a Foreign Language
2018cites this paper
Frustratingly Easy Model Ensemble for Abstractive Summarization
2018cites this paper
Attention-Guided Answer Distillation for Machine Reading Comprehension
2018cites this paper
An Improved Neural Network Model for Joint POS Tagging and Dependency Parsing
2018cites this paper
YNU_Deep at SemEval-2018 Task 11: An Ensemble of Attention-based BiLSTM Models for Machine Comprehension
2018cites this paper
Distilling Knowledge for Search-based Structured Prediction
2018cites this paper
Stack-Pointer Networks for Dependency Parsing
2018influential citation
Scheduled Multi-Task Learning: From Syntax to Translation
2018cites this paper
Parsing Tweets into Universal Dependencies
2018cites this paper
Prediction of LSTM-RNN Full Context States as a Subtask for N-Gram Feedforward Language Models
2018cites this paper
Learning Approximate Inference Networks for Structured Prediction
2018cites this paper
Effective Representation for Easy-First Dependency Parsing
2018cites this paper
Arc-Standard Spinal Parsing with Stack-LSTMs
2017cites this paper
Exploring global sentence representation for graph-based dependency parsing using BLSTM-SCNN
2017cites this paper
Improving a Strong Neural Parser with Conjunction-Specific Features
2017cites this paper
Center-shared sliding ensemble of neural networks for syntax analysis of natural language
2017cites this paper
Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set
2017cites this paper
DRAGNN: A Transition-based Framework for Dynamically Connected Neural Networks
2017cites this paper
Deep Multitask Learning for Semantic Dependency Parsing
2017cites this paper
Stronger Baselines for Trustable Results in Neural Machine Translation
2017cites this paper
Dependency Parsing with Dilated Iterated Graph CNNs
2017cites this paper
Neural Probabilistic Model for Non-projective MST Parsing
2017influential citation
Sequence-Level Knowledge Distillation
2016cites this paper
What Do Recurrent Neural Network Grammars Learn About Syntax?
2016influential citation
Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling
2016cites this paper
Right-to-left LSTM Summarization Multitask Encoder / Decoder Dependency Trees Right-to-left LSTM Summarization
2016cites this paper