For languages such as German where compounds occur frequently and are written as single tokens, a wide variety of NLP applications benefits from recognizing and splitting compounds. As the traditional word frequency-based approach to compound splitting has several drawbacks, this paper introduces a letter sequence labeling approach, which can utilize rich word form features to build discriminative learning models that are optimized for splitting. Experiments show that the proposed method significantly outperforms state-of-the-art compound splitters.
Letter Sequence Labeling for Compound Splitting
Jianqiang Ma,Verena Henrich,E. Hinrichs
Published 2016 in Special Interest Group on Computational Morphology and Phonology Workshop
ABSTRACT
PUBLICATION RECORD
- Publication year
2016
- Venue
Special Interest Group on Computational Morphology and Phonology Workshop
- Publication date
Unknown publication date
- Fields of study
Linguistics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-27 of 27 references · Page 1 of 1
CITED BY
Showing 1-13 of 13 citing papers · Page 1 of 1