Recent work has shown that very large corpora can act as training data for NLP algorithms even without explicit labels. In this paper we show how the use of surface features and paraphrases in queries against search engines can be used to infer labels for structural ambiguity resolution tasks. Using unsupervised algorithms, we achieve 84% precision on PP-attachment and 80% on noun compound coordination.
Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution
Published 2005 in Human Language Technology - The Baltic Perspectiv
ABSTRACT
PUBLICATION RECORD
- Publication year
2005
- Venue
Human Language Technology - The Baltic Perspectiv
- Publication date
2005-10-06
- Fields of study
Linguistics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-27 of 27 references · Page 1 of 1
CITED BY
Showing 1-85 of 85 citing papers · Page 1 of 1