Efficient Algorithms for Parsing the DOP Model

Published 1996 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

Excellent results have been reported for Data-Oriented Parsing (DOP) of natural language texts (Bod, 1993). Unfortunately, existing algorithms are both computationally intensive and difficult to implement. Previous algorithms are expensive due to two factors: the exponential number of rules that must be generated and the use of a Monte Carlo parsing algorithm. In this paper we solve the first problem by a novel reduction of the DOP model to a small, equivalent probabilistic context-free grammar. We solve the second problem by a novel deterministic parsing strategy that maximizes the expected number of correct constituents, rather than the probability of a correct parse tree. Using the optimizations, experiments yield a 97% crossing brackets rate and 88% zero crossing brackets rate. This differs significantly from the results reported by Bod, and is comparable to results from a duplication of Pereira and Schabes's (1992) experiment on the same data. We show that Bod's results are at least partially due to an extremely fortuitous choice of test data, and partially due to using cleaner data than other researchers.

PUBLICATION RECORD

Publication year
1996
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
1996-04-01
Fields of study
Computer Science
Identifiers
arXiv cmp-lg/9604008
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Parsing Algorithms and Metrics
1996cited by this paper
The Problem of Computing the Most Probable Tree in Data-Oriented Parsing and Stochastic Tree Grammars
1995influential reference
The Problem of Computing the Most Probable Tree in Data-Oriented Parsing and Stochastic Tree Grammars
1995influential reference
Natural Language Parsing as Statistical Pattern Recognition
1994cited by this paper
Efficient Disambiguation by means of Stochastic Tree Substitution Grammars
1994cited by this paper
An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities
1994cited by this paper
Eecient Disambiguation by Means of Stochastic Tree Substitution Grammars
1994cited by this paper
Using an Annotated Corpus as a Stochastic Grammar
1993influential reference
Monte Carlo Parsing
1993influential reference
Parsing the Wall Street Journal with the Inside-Outside Algorithm
1993cited by this paper
A corpus-based approach to language learning
1993cited by this paper
Inside-Outside Reestimation From Partially Bracketed Corpora
1992cited by this paper
Stochastic Lexicalized Tree-adjoining Grammars
1992cited by this paper
Applications of stochastic context-free grammars using the Inside-Outside algorithm
1990cited by this paper
The ATIS Spoken Language Systems Pilot Corpus
1990cited by this paper
A tutorial on hidden Markov models and selected applications in speech recognition
1989cited by this paper
Trainable grammars for speech recognition
1979cited by this paper

CITED BY

It’s MBR All the Way Down: Modern Generation Techniques Through the Lens of Minimum Bayes Risk
2023cites this paper
Preprocessing Does Matter: Parsing Non-Segmented Arabic
2018cites this paper
Neural Factor Graph Models for Cross-lingual Morphological Tagging
2018cites this paper
Automatic discovery of Latin syntactic changes
2016cites this paper
Inside-Outside and Forward-Backward Algorithms Are Just Backprop (tutorial paper)
2016cites this paper
Approximation-Aware Dependency Parsing by Belief Propagation
2015cites this paper
Graphical Models with Structured Factors, Neural Factors, and Approximation-aware Training
2015cites this paper
Data-oriented Parsing
2013cites this paper
Bayesian Tree Substitution Grammars as a Usage-based Approach
2013cites this paper
Decomposing and regenerating syntactic trees
2012cites this paper
Toward Tree Substitution Grammars with Latent Annotations
2012cites this paper
A New DOP Model for Phrase-structure Parsing of Persian Sentences
2012cites this paper
Judging Grammaticality with Count-Induced Tree Substitution Grammars
2012cites this paper
An All-Fragments Grammar for Simple and Accurate Parsing
2012influential citation
24th International Conference on Computational Linguistics Proceedings of the 10th Workshop on Asian Language Resources
2012cites this paper
Discontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammar
2011cites this paper
Discontinuous Data-Oriented Parsing through Mild Context-Sensitivity
2011cites this paper
The Second Workshop on Statistical Parsing of Morphologically Rich Languages ( SPMRL
2011cites this paper
Incorporating Translation Quality-Oriented Features into Log-Linear Models of Machine Translation
2011cites this paper
The Surprising Variance in Shortest-Derivation Parsing
2011cites this paper
Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP
2011influential citation
Judging Grammaticality with Tree Substitution Grammar Derivations
2011cites this paper
Simple, Accurate Parsing with an All-Fragments Grammar
2010influential citation
Discriminative training and variational decoding in machine translation via novel algorithms for weighted hypergraphs
2010cites this paper
Syntax-based language models for statistical machine translation
2010cites this paper
Enriching Data-Oriented Parsing by blending morphology and syntax
2010cites this paper
Unsupervised Parsing of the Russian Sentence
2009cites this paper
Accuracy-Based Scoring for DOT: Towards Direct Error Minimization for Data-Oriented Translation
2009cites this paper
Variational Decoding for Statistical Machine Translation
2009influential citation
Proceedings of the EACL 2009 Workshop on Cognitive Aspects of Computational Language Acquisition
2009cites this paper
From Exemplar to Grammar: A Probabilistic Analogy-Based Model of Language Learning
2009cites this paper
Darwinised Data-Oriented Parsing - Statistical NLP with Added Sex and Death
2009cites this paper
The great number crunch 1
2008cites this paper
Grammar Induction & Language Evolution
2008cites this paper
The Data-Oriented Parsing Approach: Theory and Application
2008influential citation
From Exemplar to Grammar: Integrating Analogy and Probability in Language Learning
2008influential citation
Simulating & Reranking Data-oriented Parsing
2008cites this paper
The great number crunch1
2008cites this paper
Chapter 1 Polynomial Tree Substitution Grammars : Characterization and New Examples
2007cites this paper
Data Oriented Parsing Literature Review
2007influential citation
What is Data Oriented Parsing ? A Response to Michael Collins ' s Review
2007influential citation
Parsimonious Data-Oriented Parsing
2007cites this paper
Framework and Resources for Natural Language Parser Evaluation
2007influential citation
GF-DOP: grammatical feature data-oriented parsing
2006cites this paper
Linguistic and Statistical Extensions of Data Oriented Parsing
2006influential citation
What are the Productive Units of Natural Language Grammar? A DOP Approach to the Automatic Identification of Constructions.
2006cites this paper
Statistical parsing with non-local dependencies
2005cites this paper
Towards Unifying Perception and Cognition : The Ubiquity of Trees
2005cites this paper
Probabilistic CFG with Latent Annotations
2005cites this paper
Data-oriented models of parsing and translation
2005cites this paper
A L A T E X Package for CSLI Collections
2004cites this paper
Extracting stochastic grammars from treebanks
2003cites this paper
An efficient implementation of a new DOP model
2003influential citation
Grammaires à substitution d'arbres polynomiales et discriminantes : Évolutions en analyse syntaxique
2003cites this paper
Seeing the wood for the trees: data-oriented translation
2003cites this paper
Statistical parsing and language modeling based on constraint dependency grammar
2003cites this paper
Apprentissage discriminant pour les Grammaires à Substitution d’Arbres
2003cites this paper
Unsupervised Language Acquisition: Theory and Practice
2002influential citation
A Unified Model of Structural Organization in Language and Music
2002cites this paper
A Data-Oriented Parsing Model for Lexical-Functional Grammar
2002cites this paper
New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron
2002influential citation
Polynomial Tree Substitution Grammars: Characterization and New Examples
2002cites this paper
Convolution Kernels for Natural Language
2001influential citation
1 Efficient parsing of DOP with PCFG-reductions – DRAFT
2001influential citation
Data-oriented Parsing
2001cites this paper
Grammaire à substitution d’arbre de complexité polynomiale : un cadre efficace pour DOP
2001cites this paper
What is the Minimal Set of Fragments that Achieves Maximal Parse Accuracy?
2001cites this paper
Polynomial Tree Substitution Grammars: an efficient framework for Data-Oriented Parsing
2001cites this paper
Aspects of Pattern-matching in Data-Oriented Parsing
2000cites this paper
Do all fragments count?
2000influential citation
Context-sensitive spoken dialogue processing with the DOP model
1999cites this paper
A memory-based model of syntactic analysis: data-oriented parsing
1999cites this paper
Semiring Parsing
1999cites this paper
Tagging and parsing with cascaded Markov models: automation of corpus annotation
1999cites this paper
Learning Efficient Disambiguation
1999influential citation
Towards Probabilistic Unification-Based Parsing
1999cites this paper
A Probabilistic Corpus-Driven Model for Lexical-Functional Analysis
1998cites this paper
Parsing Inside-Out
1998influential citation
A Probabilistic Corpus-Driven Model for Lexical-Functional Analysis
1998cites this paper
Spoken Dialogue Interpretation with the DOP Model
1998cites this paper
Spoken Dialogue Interpretation with the DOP Model
1998cites this paper
Statistical Techniques for Natural Language Parsing
1997cites this paper
A DOP Model for Semantic Interpretation
1997cites this paper
An Inductive Logic Programming Method for Corpus-based Parser Construction
1997cites this paper
An Inductive Logic Programming Method forCorpus-based Parser
1997cites this paper
Two Questions about Data-Oriented Parsing
1996influential citation
Data-Oriented Language Processing. An Overview
1996influential citation
ROBUST PARSING | A BRIEF OVERVIEW
1996cites this paper
Efficient Algorithms for Parsing the DOP Model? A Reply to Joshua Goodman
1996influential citation
Learning and Using Continuous Linguistic Representations
1996cites this paper
Parsing Algorithms and Metrics
1996cites this paper
I TABLE OF CONTENTS
1967cites this paper
Polynomial Tree Substitution Grammars: Characterization and New Examples
year unknowncites this paper
A discriminant probabilistic model for TSGs
year unknowncites this paper
Towards Simpler Tree Substitution Grammars Msc in Logic
year unknowncites this paper