An Unsupervised Morpheme-Based HMM for Hebrew Morphological Disambiguation

Published 2006 in Annual Meeting of the Association for Computational Linguistics

ABSTRACT

Morphological disambiguation is the process of assigning one set of morphological features to each individual word in a text. When the word is ambiguous (there are several possible analyses for the word), a disambiguation procedure based on the word context must be applied. This paper deals with morphological disambiguation of the Hebrew language, which combines morphemes into a word in both agglutinative and fusional ways. We present an unsupervised stochastic model ‐ the only resource we use is a morphological analyzer ‐ which deals with the data sparseness problem caused by the axational morphology of the Hebrew language. We present a text encoding method for languages with axational morphology in which the knowledge of word formation rules (which are quite restricted in Hebrew) helps in the disambiguation. We adapt HMM algorithms for learning and searching this text representation, in such a way that segmentation and tagging can be learned in parallel in one step. Results on a large scale evaluation indicate that this learning improves disambiguation for complex tag sets. Our method is applicable to other languages with ax morphology.

PUBLICATION RECORD

Publication year
2006
Venue
Annual Meeting of the Association for Computational Linguistics
Publication date
2006-07-17
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.3115/1220175.1220259
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

POS Tagging of Dialectal Arabic: A Minimally Supervised Approach
2005cited by this paper
Choosing an Optimal Architecture for Segmentation and POS-Tagging of Modern Hebrew
2005cited by this paper
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop
2005influential reference
A Finite-State Morphological Grammar of Hebrew
2005cited by this paper
Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks
2004cited by this paper
A Second-Order Hidden Markov Model for Part-of-Speech Tagging
1999cited by this paper
Morphological Disambiguation for Hebrew Search Systems
1999influential reference
Building Probabilistic Models for Natural Language
1996influential reference
Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging
1995cited by this paper
Learning Morpho-Lexical Probabilities from an Untagged Corpus with an Application to Hebrew
1995influential reference
Tagging English Text with a Probabilistic Model
1994cited by this paper
Does Baum-Welch Re-estimation Help Taggers?
1994influential reference
Speech and Language Processing
1990influential reference
On Latin
1983cited by this paper
An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process
1972cited by this paper

CITED BY

Are All Spanish Doctors Male? Evaluating Gender Bias in German Machine Translation
2025cites this paper
A Truly Joint Neural Architecture for Segmentation and Parsing
2024cites this paper
Do Pretrained Contextual Language Models Distinguish between Hebrew Homograph Analyses?
2024cites this paper
Masking Morphosyntactic Categories to Evaluate Salience for Schizophrenia Diagnosis
2022cites this paper
A Second Wave of UD Hebrew Treebanking and Cross-Domain Parsing
2022cites this paper
Neural Token Segmentation for High Token-Internal Complexity
2022cites this paper
Latest Developments in Morphological Disambiguation Strategies of Modern Hebrew
2021cites this paper
Modeling Repressive Policing: Computational Analysis of Protocols from the Israeli State Commission of Inquiry into the October 2000 Events
2021cites this paper
Neural Machine Translation without Embeddings
2020cites this paper
From SPMRL to NMRL: What Did We Learn (and Unlearn) in a Decade of Parsing Morphologically-Rich Languages (MRLs)?
2020cites this paper
Nakdan: Professional Hebrew Diacritizer
2020cites this paper
Automatic Construction of Aramaic-Hebrew Translation Lexicon
2020cites this paper
A Pointer Network Architecture for Joint Morphological Segmentation and Tagging
2020influential citation
A Novel Challenge Set for Hebrew Morphological Disambiguation and Diacritics Restoration
2020cites this paper
What’s Wrong with Hebrew NLP? And How to Make it Right
2019cites this paper
Historical corpora meet the digital humanities: the Jerusalem Corpus of Emergent Modern Hebrew
2019cites this paper
An Algorithmic Scheme for Statistical Thesaurus Construction in a Morphologically Rich Language
2019cites this paper
Evaluating Gender Bias in Machine Translation
2019cites this paper
Joint Transition-Based Models for Morpho-Syntactic Parsing: Parsing Strategies for MRLs and a Case Study from Modern Hebrew
2019cites this paper
A Characterwise Windowed Approach to Hebrew Morphological Segmentation
2018cites this paper
Automatic Opinion Extraction from Short Hebrew Texts using Machine Learning Techniques
2018cites this paper
Universal Morpho-Syntactic Parsing and the Contribution of Lexica: Analyzing the ONLP Lab Submission to the CoNLL 2018 Shared Task
2018influential citation
The Hebrew FrameNet Project
2016cites this paper
Data-Driven Morphological Analysis and Disambiguation for Morphologically Rich Languages and Universal Dependencies
2016cites this paper
An NLP Pipeline for Coptic
2016influential citation
Transition-Based Morphological Disambiguation
2015cites this paper
Morphological Disambiguation of Classical Sanskrit
2015cites this paper
Context-dependent type-level models for unsupervised morpho-syntactic induction
2015cites this paper
The diminishing role of inalienability in the Hebrew possessive dative
2015cites this paper
Parallels between cross-linguistic and language-internal variation in Hebrew possessive constructions
2014cites this paper
The Responsa Project: Some Promising Future Directions
2014cites this paper
Methodology for Connecting Nouns to Their Modifying Adjectives
2014cites this paper
The phonetics of intonation in learner varieties of French
2014cites this paper
Morphological Tagging of Ugaritic
2014cites this paper
Morphological Processing of Semitic Languages
2014influential citation
Linguistic category induction and tagging using the paradigmatic context representations with substitute words (Düşey kelime bağlamlarını olası kelimeler ile temsil ederek dil bilimsel sözcük kümeleri ve etikletlerinin bulunması)
2014influential citation
Syntax and Parsing of Semitic Languages
2014cites this paper
Statistical Machine Translation
2014cites this paper
Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages
2013cites this paper
Effect of Out Of Vocabulary Terms on Inferring Eligibility Criteria for a Retrospective Study in Hebrew EHR
2013cites this paper
A Unified Morpho-Syntactic Scheme of Stanford Dependencies
2013cites this paper
Word Segmentation, Unknown-word Resolution, and Morphological Agreement in a Hebrew Parsing System
2013influential citation
A rule-based approach to unknown word recognition in Arabic
2012cites this paper
Statistical Thesaurus Construction for a Morphologically Rich Language
2012cites this paper
Morphological disambiguation of Hebrew: a case study in classifier combination
2012influential citation
Joint Evaluation of Morphological Segmentation and Syntactic Parsing
2012cites this paper
Extraction of linguistic resources from multilingual corpora and their exploitation
2012cites this paper
Multilingual text generation from structured formal representations
2012cites this paper
Universal Morphological Analysis using Structured Nearest Neighbor Prediction
2011cites this paper
Part of speech tagging for Arabic
2011cites this paper
Modeling Syntactic Context Improves Morphological Segmentation
2011cites this paper
Updating of the "Contemporary Chinese Language Word Segmentation Specification for Information Processing"
2011cites this paper
Transliterated Pairs Acquisition in Medical Hebrew
2010cites this paper
A New Approach to Lexical Disambiguation of Arabic Text
2010cites this paper
Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data
2010cites this paper
Easy-First Dependency Parsing of Modern Hebrew
2010cites this paper
Unsupervised Concept Discovery In Hebrew Using Simple Unsupervised Word Prefix Segmentation for Hebrew and Arabic
2009cites this paper
Enhancing Unlexicalized Parsing Performance Using a Wide Coverage Lexicon, Fuzzy Tag-Set Mapping, and EM-HMM-Based Lexical Probabilities
2009cites this paper
Transformation-Based Error-Driven Learning
2009cites this paper
Hebrew possessive datives: corpus evidence for the role of affectedness
2009cites this paper
Unsupervised Morphological Disambiguation using Statistical Language Models
2009cites this paper
Language resources for Hebrew
2008cites this paper
The Fast and the Numerous - Combining Machine and Community Intelligence for Semantic Annotation
2008cites this paper
Using Wikipedia Links to Construct Word Segmentation Corpora
2008cites this paper
Tagging a Hebrew Corpus: the Case of Participles
2008cites this paper
Swordfish2: Using Kernel Density Estimation to Smooth N-gram Histograms for Morphological Analysis
2008cites this paper
Part-of-speech tagging of Modern Hebrew text
2008cites this paper
Word-Based or Morpheme-Based? Annotation Strategies for Modern Hebrew Clitics
2008cites this paper
A Single Generative Model for Joint Morphological Segmentation and Syntactic Parsing
2008influential citation
Unsupervised Multilingual Learning for Morphological Segmentation
2008cites this paper
Stat-XFER: A General Search-Based Syntax-Driven Framework for Machine Translation
2008cites this paper
Identification of Transliterated Foreign Words in Hebrew Script
2008cites this paper
Unsupervised Approaches to Sequence Tagging , Morphology Induction , and Lexical Resource Acquisition
2008cites this paper
Morphological Disambiguation of Hebrew: A Case Study in Classifier Combination
2007influential citation
Joint Morphological and Syntactic Disambiguation
2007cites this paper
Morphological Disambiguation of Hebrew
2007cites this paper
Explorer Joint Morphological and Syntactic Disambiguation
2007cites this paper
SVM Model Tampering and Anchored Learning: A Case Study in Hebrew NP Chunking
2007cites this paper
Noun Phrase Chunking in Hebrew: Influence of Lexical and Morphological Features
2006cites this paper
Rapid prototyping of a transfer-based Hebrew-to-English machine translation system
2004cites this paper