A Naive Theory of Affixation and an Algorithm for Extraction

Published 2006 in Special Interest Group on Computational Morphology and Phonology Workshop

ABSTRACT

We present a novel approach to the unsupervised detection of affixes, that is, to extract a set of salient prefixes and suffixes from an unlabeled corpus of a language. The underlying theory makes no assumptions on whether the language uses a lot of morphology or not, whether it is prefixing or suffixing, or whether affixes are long or short. It does however make the assumption that 1. salient affixes have to be frequent, i.e occur much more often that random segments of the same length, and that 2. words essentially are variable length sequences of random characters, e.g a character should not occur in far too many words than random without a reason, such as being part of a very frequent affix. The affix extraction algorithm uses only information from fluctation of frequencies, runs in linear time, and is free from thresholds and untransparent iterations. We demonstrate the usefulness of the approach with example case studies on typologically distant languages.

PUBLICATION RECORD

Publication year
2006
Venue
Special Interest Group on Computational Morphology and Phonology Workshop
Publication date
2006-06-08
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.3115/1622165.1622175
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Ph.D. Thesis
2008cited by this paper
Indigenous Governance : The Harvard Project on Native American Economic Development and appropriate principles of governance for Aboriginal Australia
2006cited by this paper
Adquisició d'informació lèxica i morfosintàctica a partir de corpus sense anotar aplicació al rus i al croat
2005cited by this paper
Refining the SED Heuristic for Morpheme Discovery: Another Look at Swahili
2005cited by this paper
Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0
2005cited by this paper
Using Morphology and Syntax Together in Unsupervised Learning
2005cited by this paper
Refining the SED Heuristic for Morpheme Discovery: Another Look at Swahili
2005cited by this paper
Adquisició d'informació lèxica i morfosintàctica a partir de corpus sense anotar: aplicació al rus i al croat
2005cited by this paper
On Induction of Morphology Grammars and its Role in Bootstrapping
2004influential reference
Efficient Unsupervised Recursive Word Segmentation Using Minimum Description Length
2004influential reference
Induction of a Simple Morphology for Highly-Inflecting Languages
2004cited by this paper
Issues in the study of Pidgin and Creole languages
2004cited by this paper
From Signatures to Finite State Automata
2004cited by this paper
Multilingual Noise-Robust Supervised Morphological Analysis using the WordFrame Model
2004cited by this paper
Morpheme Segmentation Gold Standards for Finnish and English
2004influential reference
Unlimited vocabulary speech recognition based on morphs discovered in an unsupervised manner
2003cited by this paper
Unsupervised Learning of Morphology for English and Inuktitut
2003cited by this paper
Distribution-driven morpheme discovery: a computational/experimental study
2003cited by this paper
Modeling and learning multilingual inflectional morphology in a minimally supervised framework
2003cited by this paper
Unsupervised Learning of Morphology Without Morphemes
2002cited by this paper
Unsupervised Learning of Morphology Using a Novel Directed Search Algorithm: Taking the First Step
2002influential reference
A Probabilistic Model for Learning Concatenative Morphology
2002cited by this paper
Unsupervised Discovery of Morphemes
2002cited by this paper
The identification of bases in morphological paradigms
2002cited by this paper
Unsupervised Learning of Morphology for Building Lexicon for a Highly Inflectional Language
2002influential reference
An unsupervised knowledge free algorithm for the learning of morphology in natu-ral languages
2002cited by this paper
An unsupervised knowledge free algorithm for the learning of morphology in natu-ral languages
2002cited by this paper
Unsupervised discovery of morphologically related words based on orthographic and semantic similarity
2002cited by this paper
Unsupervised Learning of Word Segmentation Rules with Genetic Algorithms and Inductive Logic Programming
2001cited by this paper
Linguistica: An automatic morphological analyzer
2001cited by this paper
Unsupervised Learning of the Morphology of a Natural Language
2001cited by this paper
Knowledge-Free Induction of Inflectional Morphologies
2001cited by this paper
A Bayesian Model For Morpheme and Paradigm Identification
2001cited by this paper
Learning Morphology with Pair Hidden Markov Models
2001cited by this paper
Automatic Language-Specific Stemming in Information Retrieval
2000cited by this paper
Vowels and Consonants
2000cited by this paper
Minimally Supervised Morphological Analysis by Multimodal Alignment
2000cited by this paper
Unsupervised learning of derivational morphology from inflectional lexicons
1999influential reference
Concepts et algorithmes pour la découverte des structures formelles des langues. (Concepts and Algorithms for Discovering Formal Structures of Languages)
1998influential reference
A Hybrid Approach t Word Segmentation
1998cited by this paper
The segmentation problem in morphology learning
1998cited by this paper
Guessing morphology from terms and corpora
1997cited by this paper
Quantitative Morphsegmentierung im Spanischen auf phonologischer Basis
1995cited by this paper
Ein automatisches Morphsegmentierungsverfahren für deutsche Wortformen
1991cited by this paper
In Advances in Neural Information Processing Systems 12
1991cited by this paper
Morphological Segmentation Without a Lexicon
1989cited by this paper
Topics in Warlpiri grammar
1980cited by this paper
Word segmentation by letter successor varieties
1974cited by this paper
Linguistische Modellbildung und Methodologie
1973cited by this paper
Morpheme Boundaries within Words: Report on a Computer Test
1970cited by this paper
A grammar of Biblical Aramaic
1961cited by this paper
From Phoneme to Morpheme
1955cited by this paper

CITED BY

2 Hilbert and Hilpert Problems
2019cites this paper
Unity and disunity in evolutionary sciences: process-based analogies open common research avenues for biology and linguistics
2016cites this paper
Cracking the Voynich Manuscript : Using basic statistics and analyses to determine linguistic relationships
2015cites this paper
From Phoneme to Morpheme: A Computational Model
2015cites this paper
Unsupervised learning of Arabic non-concatenative morphology
2015cites this paper
An unsupervised approach for morphological segmentation of highly agglutinative Tamil language
2015cites this paper
A Survey of Language Identification Techniques and Applications
2014cites this paper
Little by Little: Semi Supervised Stemming through Stem Set Minimization
2013cites this paper
Optimal Stem Identification in Presence of Suffix List
2012cites this paper
Improving word coverage using unsupervised morphological analyser
2009cites this paper
Unsupervised morphological segmentation and clustering with document boundaries
2009cites this paper
Language Identification of Search Engine Queries
2009cites this paper
A Novel approach to improve rule based Telugu morphological analyzer
2009cites this paper
Paramor: from paradigm structure to natural language morphology induction
2008cites this paper
Minimally supervised induction of morphology through bitexts
2008cites this paper
Poor Man's Stemming: Unsupervised Recognition of Same-Stem Words
2006cites this paper