Finding simple, non-recursive, base noun phrases is an important subtask for many natural language processing applications. While previous empirical methods for base NP identification have been rather complex, this paper instead proposes a very simple algorithm that is tailored to the relative simplicity of the task. In particular, we present a corpus-based approach for finding base NPs by matching part-of-speech tag sequences. The training phase of the algorithm is based on two successful techniques: first the base NP grammar is read from a "treebank" corpus; then the grammar is improved by selecting rules with high "benefit" scores. Using this simple algorithm with a naive heuristic for matching rules, we achieve surprising accuracy in an evaluation on the Penn Treebank Wall Street Journal.
Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification
Published 1998 in Annual Meeting of the Association for Computational Linguistics
ABSTRACT
PUBLICATION RECORD
- Publication year
1998
- Venue
Annual Meeting of the Association for Computational Linguistics
- Publication date
1998-08-10
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-10 of 10 references · Page 1 of 1
CITED BY
Showing 1-84 of 84 citing papers · Page 1 of 1