Exploiting Heterogeneous Treebanks for Parsing

Zheng-Yu Niu,Haifeng Wang,Hua Wu

Published 2009 in Annual Meeting of the Association for Computational Linguistics

ABSTRACT

We address the issue of using heterogeneous treebanks for parsing by breaking it down into two sub-problems, converting grammar formalisms of the treebanks to the same one, and parsing on these homogeneous treebanks. First we propose to employ an iteratively trained target grammar parser to perform grammar formalism conversion, eliminating predefined heuristic rules as required in previous methods. Then we provide two strategies to refine conversion results, and adopt a corpus weighting technique for parsing on homogeneous treebanks. Results on the Penn Treebank show that our conversion method achieves 42% error reduction over the previous best result. Evaluation on the Penn Chinese Treebank indicates that a converted dependency treebank helps constituency parsing and the use of unlabeled data by self-training further increases parsing f-score to 85.2%, resulting in 6% error reduction over the previous best result.

PUBLICATION RECORD

  • Publication year

    2009

  • Venue

    Annual Meeting of the Association for Computational Linguistics

  • Publication date

    2009-08-02

  • Fields of study

    Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-32 of 32 references · Page 1 of 1

CITED BY

Showing 1-25 of 25 citing papers · Page 1 of 1