Synthetic Data Made to Order: The Case of Parsing

D. Wang,Jason Eisner

Published 2018 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

To approximately parse an unfamiliar language, it helps to have a treebank of a similar language. But what if the closest available treebank still has the wrong word order? We show how to (stochastically) permute the constituents of an existing dependency treebank so that its surface part-of-speech statistics approximately match those of the target language. The parameters of the permutation model can be evaluated for quality by dynamic programming and tuned by gradient descent (up to a local optimum). This optimization procedure yields trees for a new artificial language that resembles the target language. We show that delexicalized parsers for the target language can be successfully trained using such “made to order” artificial languages.

PUBLICATION RECORD

  • Publication year

    2018

  • Venue

    Conference on Empirical Methods in Natural Language Processing

  • Publication date

    Unknown publication date

  • Fields of study

    Linguistics, Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-69 of 69 references · Page 1 of 1

CITED BY

Showing 1-33 of 33 citing papers · Page 1 of 1