An Unsupervised Morpheme-Based HMM for Hebrew Morphological Disambiguation

M. Adler,Michael Elhadad

Published 2006 in Annual Meeting of the Association for Computational Linguistics

ABSTRACT

Morphological disambiguation is the process of assigning one set of morphological features to each individual word in a text. When the word is ambiguous (there are several possible analyses for the word), a disambiguation procedure based on the word context must be applied. This paper deals with morphological disambiguation of the Hebrew language, which combines morphemes into a word in both agglutinative and fusional ways. We present an unsupervised stochastic model ‐ the only resource we use is a morphological analyzer ‐ which deals with the data sparseness problem caused by the axational morphology of the Hebrew language. We present a text encoding method for languages with axational morphology in which the knowledge of word formation rules (which are quite restricted in Hebrew) helps in the disambiguation. We adapt HMM algorithms for learning and searching this text representation, in such a way that segmentation and tagging can be learned in parallel in one step. Results on a large scale evaluation indicate that this learning improves disambiguation for complex tag sets. Our method is applicable to other languages with ax morphology.

PUBLICATION RECORD

  • Publication year

    2006

  • Venue

    Annual Meeting of the Association for Computational Linguistics

  • Publication date

    2006-07-17

  • Fields of study

    Linguistics, Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

CITED BY

Showing 1-80 of 80 citing papers · Page 1 of 1