We present a novel approach to the unsupervised detection of affixes, that is, to extract a set of salient prefixes and suffixes from an unlabeled corpus of a language. The underlying theory makes no assumptions on whether the language uses a lot of morphology or not, whether it is prefixing or suffixing, or whether affixes are long or short. It does however make the assumption that 1. salient affixes have to be frequent, i.e occur much more often that random segments of the same length, and that 2. words essentially are variable length sequences of random characters, e.g a character should not occur in far too many words than random without a reason, such as being part of a very frequent affix. The affix extraction algorithm uses only information from fluctation of frequencies, runs in linear time, and is free from thresholds and untransparent iterations. We demonstrate the usefulness of the approach with example case studies on typologically distant languages.
A Naive Theory of Affixation and an Algorithm for Extraction
Published 2006 in Special Interest Group on Computational Morphology and Phonology Workshop
ABSTRACT
PUBLICATION RECORD
- Publication year
2006
- Venue
Special Interest Group on Computational Morphology and Phonology Workshop
- Publication date
2006-06-08
- Fields of study
Linguistics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-52 of 52 references · Page 1 of 1
CITED BY
Showing 1-16 of 16 citing papers · Page 1 of 1