Unsupervised Labeled Parsing with Deep Inside-Outside Recursive Autoencoders

Andrew Drozdov,Pat Verga,Yi-Pei Chen,Mohit Iyyer,A. McCallum

Published 2019 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

Understanding text often requires identifying meaningful constituent spans such as noun phrases and verb phrases. In this work, we show that we can effectively recover these types of labels using the learned phrase vectors from deep inside-outside recursive autoencoders (DIORA). Specifically, we cluster span representations to induce span labels. Additionally, we improve the model’s labeling accuracy by integrating latent code learning into the training procedure. We evaluate this approach empirically through unsupervised labeled constituency parsing. Our method outperforms ELMo and BERT on two versions of the Wall Street Journal (WSJ) dataset and is competitive to prior work that requires additional human annotations, improving over a previous state-of-the-art system that depends on ground-truth part-of-speech tags by 5 absolute F1 points (19% relative error reduction).

PUBLICATION RECORD

Publication year
2019
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2019-11-01
Fields of study
Computer Science
Identifiers
DOI 10.18653/v1/D19-1161
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

AUTO-ENCODING VARIATIONAL BAYES
2020cited by this paper
BERT Rediscovers the Classical NLP Pipeline
2019cited by this paper
Unsupervised Recurrent Neural Network Grammars
2019cited by this paper
A Structural Probe for Finding Syntax in Word Representations
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
Cross-lingual Language Model Pretraining
2019cited by this paper
Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders
2019influential reference
Theory and Experiments on Vector Quantized Autoencoders
2018cited by this paper
Deep Contextualized Word Representations
2018influential reference
Fast Decoding in Sequence Models using Discrete Latent Variables
2018cited by this paper
Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
2018cited by this paper
Constituency Parsing with a Self-Attentive Encoder
2018cited by this paper
Grammar Induction with Neural Language Models: An Unusual Replication
2018cited by this paper
Dissecting Contextual Word Embeddings: Architecture and Representation
2018influential reference
Neural Language Modeling by Jointly Learning Syntax and Lexicon
2017cited by this paper
Do latent tree learning models identify meaningful structure in sentences?
2017cited by this paper
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
2017cited by this paper
Neural Discrete Representation Learning
2017cited by this paper
A large annotated corpus for learning natural language inference
2015cited by this paper
Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction
2013cited by this paper
Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models
2011cited by this paper
On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing
2010cited by this paper
Unsupervised Induction of Labeled Parse Trees by Clustering with Syntactic Features
2008cited by this paper
Fast Unsupervised Incremental Parsing
2007cited by this paper
Prototype-Driven Grammar Induction
2006influential reference
A Generative Constituent-Context Model for Improved Grammar Induction
2002cited by this paper
Pattern Recognition with Fuzzy Objective Function Algorithms
1981cited by this paper
Trainable grammars for speech recognition
1979cited by this paper
A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters
1973cited by this paper
Recognition and Parsing of Context-Free Languages in Time n^3
1967cited by this paper
An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages
1965cited by this paper
Bayesian Model Merging for Unsupervised Constituent Labeling and Grammar Induction
year unknowncited by this paper

CITED BY

On Eliciting Syntax from Language Models via Hashing
2024cites this paper
Simple Hardware-Efficient PCFGs with Independent Left and Right Productions
2023cites this paper
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
2023influential citation
Unsupervised Slot Schema Induction for Task-oriented Dialog
2022cites this paper
Word Segmentation as Unsupervised Constituency Parsing
2022cites this paper
RL-GRIT: Reinforcement Learning for Grammar Inference
2021cites this paper
Extracting Grammars from a Neural Network Parser for Anomaly Detection in Unknown Formats
2021cites this paper
Self-supervised Schema Induction for Task-oriented Dialog Anonymous
2021cites this paper
Anomaly Detection with Neural Parsers That Never Reject
2021cites this paper
Co-training an Unsupervised Constituency Parser with Weak Supervision
2021influential citation
Systematic Generalization with Edge Transformers
2021cites this paper
Deep Clustering of Text Representations for Supervision-Free Probing of Syntax
2020influential citation
Montague Grammar Induction
2020cites this paper
Clustering Contextualized Representations of Text for Unsupervised Syntax Induction
2020influential citation
Grounded PCFG Induction with Images
2020cites this paper
Unsupervised Parsing with S-DIORA: Single Tree Encoding for Deep Inside-Outside Recursive Autoencoders
2020influential citation