Exploiting Heterogeneous Treebanks for Parsing

Published 2009 in Annual Meeting of the Association for Computational Linguistics

ABSTRACT

We address the issue of using heterogeneous treebanks for parsing by breaking it down into two sub-problems, converting grammar formalisms of the treebanks to the same one, and parsing on these homogeneous treebanks. First we propose to employ an iteratively trained target grammar parser to perform grammar formalism conversion, eliminating predefined heuristic rules as required in previous methods. Then we provide two strategies to refine conversion results, and adopt a corpus weighting technique for parsing on homogeneous treebanks. Results on the Penn Treebank show that our conversion method achieves 42% error reduction over the previous best result. Evaluation on the Penn Chinese Treebank indicates that a converted dependency treebank helps constituency parsing and the use of unlabeled data by self-training further increases parsing f-score to 85.2%, resulting in 6% error reduction over the previous best result.

PUBLICATION RECORD

Publication year
2009
Venue
Annual Meeting of the Association for Computational Linguistics
Publication date
2009-08-02
Fields of study
Computer Science
Identifiers
DOI 10.3115/1687878.1687887
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Two Languages are Better than One (for Syntactic Parsing)
2008influential reference
Towards a Multi-Representational Treebank
2008influential reference
Towards a Multi-Representational Treebank
2008influential reference
Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets
2007influential reference
Improved Inference for Unlexicalized Parsing
2007influential reference
Building a Dependency Treebank for Improving Chinese Parser
2006influential reference
Effective Self-Training for Parsing
2006cited by this paper
A Fast, Accurate Deterministic Parser for Chinese
2006cited by this paper
Reranking and Self-Training for Parser Adaptation
2006cited by this paper
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus
2005influential reference
A corrigendum to Sun and Jurafsky ( 2004 ) “ Shallow Semantic Parsing of Chinese ” TR-CSLR-2005-01
2005cited by this paper
Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking
2005cited by this paper
Parsing the Penn Chinese Treebank with Semantic Knowledge
2005cited by this paper
On the parameter space of generative lexicalized statistical parsing models
2004cited by this paper
Shallow Semantic Parsing of Chinese
2004cited by this paper
SYNTACTIC ANNOTATION OF A GERMAN NEWSPAPER CORPUS
2003cited by this paper
Treebank Conversion - Establishing a testsuite for a broad-coverage LFG from the TIGER treebank
2003cited by this paper
Supervised and unsupervised PCFG adaptation to novel domains
2003cited by this paper
Is it Harder to Parse Chinese, or the Chinese Treebank?
2003cited by this paper
Developing a Syntactic Annotation Scheme and Tools for a Spanish Treebank
2003cited by this paper
Development and Evaluation of a Korean Treebank and its Application to NLP
2002cited by this paper
Recovering Latent Information in Treebanks
2002cited by this paper
Converting Dependency Structures to Phrase Structures
2001influential reference
Translating Treebank Annotation for Evaluation
2001cited by this paper
Building a Treebank for French
2000cited by this paper
A Maximum-Entropy-Inspired Parser
2000cited by this paper
Two Statistical Parsing Models Applied to the Chinese Treebank
2000cited by this paper
A Statistical Parser for Czech
1999influential reference
Building a Japanese parsed corpus while improving the parsing system
1998cited by this paper
An Automatic Treebank Conversion Algorithm for Corpus Sharing
1994cited by this paper
Building a Large Annotated Corpus of English: The Penn Treebank
1993cited by this paper
GB theory as dependency grammar
1992cited by this paper

CITED BY

Investigating interoperable event corpora: limitations of reusability of resources and portability of models
2023influential citation
句式结构树库的自动构建研究(Automatic Construction of Sentence Pattern Structure Treebank)
2022cites this paper
Conversion and Exploitation of Dependency Treebanks with Full-Tree LSTM
2019cites this paper
Supervised Treebank Conversion: Data and Approaches
2018influential citation
Language Independent Dependency to Constituent Tree Conversion
2016cites this paper
A Universal Framework for Inductive Transfer Parsing across Multi-typed Treebanks
2016influential citation
Exploiting Multi-typed Treebanks for Parsing with Deep Multi-task Learning
2016cites this paper
Hybrid Dependency Parser with Segmented Treebanks and Reparsing
2015cites this paper
Iterative Transformation of Annotation Guidelines for Constituency Parsing
2013cites this paper
Joint Inference for Heterogeneous Dependency Parsing
2013cites this paper
A feature-based approach to better automatic treebank conversion
2013cites this paper
Training Parsers on Incompatible Treebanks
2013cites this paper
Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
2012influential citation
Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations
2012cites this paper
Phrase Parses Reranking Based on Higher-Order Lexical Dependencies: Phrase Parses Reranking Based on Higher-Order Lexical Dependencies
2012influential citation
Pushing the boundaries of deep parsing
2012cites this paper
Automatic Treebank Conversion via Informed Decoding - A Case Study on Chinese Treebanks
2011cites this paper
Parse Reranking Based on Higher-Order Lexical Dependencies
2011cites this paper
Treeblazing: Using External Treebanks to Filter Parse Forests for Parse Selection and Treebanking
2011cites this paper
Better Automatic Treebank Conversion Using A Feature-Based Approach
2011cites this paper
Any Domain Parsing: Automatic Domain Adaptation for Natural Language Parsing
2010cites this paper
A reranking method for syntactic parsing with heterogeneous treebanks
2010cites this paper
Automatic Treebank Conversion via Informed Decoding
2010cites this paper
Treebank Conversion based Self-training Strategy for Parsing
2010cites this paper
Heterogeneous Parsing via Collaborative Decoding
2010influential citation