A Naive Theory of Affixation and an Algorithm for Extraction

H. Hammarström

Published 2006 in Special Interest Group on Computational Morphology and Phonology Workshop

ABSTRACT

We present a novel approach to the unsupervised detection of affixes, that is, to extract a set of salient prefixes and suffixes from an unlabeled corpus of a language. The underlying theory makes no assumptions on whether the language uses a lot of morphology or not, whether it is prefixing or suffixing, or whether affixes are long or short. It does however make the assumption that 1. salient affixes have to be frequent, i.e occur much more often that random segments of the same length, and that 2. words essentially are variable length sequences of random characters, e.g a character should not occur in far too many words than random without a reason, such as being part of a very frequent affix. The affix extraction algorithm uses only information from fluctation of frequencies, runs in linear time, and is free from thresholds and untransparent iterations. We demonstrate the usefulness of the approach with example case studies on typologically distant languages.

PUBLICATION RECORD

  • Publication year

    2006

  • Venue

    Special Interest Group on Computational Morphology and Phonology Workshop

  • Publication date

    2006-06-08

  • Fields of study

    Linguistics, Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-52 of 52 references · Page 1 of 1

CITED BY

Showing 1-16 of 16 citing papers · Page 1 of 1