Unsupervised Language Acquisition: Theory and Practice

Published 2002 in arXiv.org

ABSTRACT

In this thesis I present various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models. The scientific object of the work is to examine the validity of the so-called Argument from the Poverty of the Stimulus advanced in favour of the proposition that humans have language-specific innate knowledge. I start by examining an a priori argument based on Gold's theorem, that purports to prove that natural languages cannot be learned, and some formal issues related to the choice of statistical grammars rather than symbolic grammars. I present three novel algorithms for learning various parts of natural languages: first, an algorithm for the induction of syntactic categories from unlabelled text using distributional information, that can deal with ambiguous and rare words; secondly, a set of algorithms for learning morphological processes in a variety of languages, including languages such as Arabic with non-concatenative morphology; thirdly an algorithm for the unsupervised induction of a context-free grammar from tagged text. I carefully examine the interaction between the various components, and show how these algorithms can form the basis for a empiricist model of language acquisition. I therefore conclude that the Argument from the Poverty of the Stimulus is unsupported by the evidence.

PUBLICATION RECORD

Publication year
2002
Venue
arXiv.org
Publication date
2002-12-10
Fields of study
Linguistics, Computer Science
Identifiers
arXiv cs/0212024
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Statistical Learning Theory
2021cited by this paper
Grammar of the arabic language
2011cited by this paper
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
2006cited by this paper
Elements of Information Theory
2005cited by this paper
Recent contributions to the theory of innate ideas
2004cited by this paper
A Grammar of the Arabic Language
2004cited by this paper
A Theory for Memory-Based Learning
2004cited by this paper
Markovian Models for Sequential Data
2004cited by this paper
A Neural Probabilistic Language Model
2003cited by this paper
The estimation of stochastic context-free grammars using the Inside-Outside algorithm
2003cited by this paper
An essay towards solving a problem in the doctrine of chances
2003cited by this paper
Miniature Language Acquisition: A Touchstone for Cognitive Science
2002cited by this paper
What's Within. Nativism Reconsidered
2002influential reference
English for the Computer: The SUSANNE Corpus and Analytic Scheme
2002cited by this paper
A Companion to the Philosophy of Mind
2002cited by this paper
Empirical assessment of stimulus poverty arguments
2002influential reference
Unsupervised Learning of Finite Mixture Models
2002influential reference
Statistical Methods and Linguistics
2002influential reference
Learning Shallow Context-free Languages under Simple Distributions
2001influential reference
Learning Morphology with Pair Hidden Markov Models
2001influential reference
Immediate-Head Parsing for Language Models
2001cited by this paper
Doing without what's within: Fiona Cowie's critique of nativism
2001cited by this paper
Unsupervised Learning of the Morphology of a Natural Language
2001cited by this paper
Finite State Transducers with Predicates and Identities
2001cited by this paper
Some Statistical-Estimation Methods for Stochastic Finite-State Transducers
2001cited by this paper
Partially Supervised Learning of Morphology with Stochastic Transducers
2001cited by this paper
Distributional phrase structure induction
2001influential reference
A Bayesian Model For Morpheme and Paradigm Identification
2001cited by this paper
A physical map of the human genome
2001cited by this paper
Nature, Nurture And Universal Grammar
2001influential reference
Unsupervised induction of stochastic context-free grammars using distributional clustering
2001influential reference
Statistical Mechanics of Learning
2001cited by this paper
Comparing Two Unsupervised Grammar Induction Systems: Alignment-Based Learning vs. EMILE
2001influential reference
What's within? Nativism Reconsidered
2001influential reference
Minimally Supervised Morphological Analysis by Multimodal Alignment
2000cited by this paper
The Design Principles of a Weighted Finite-State Transducer Library
2000cited by this paper
Inducing Syntactic Categories by Context Distribution Clustering
2000cited by this paper
Affixal Homonymy triggers full-form storage, even with inflected words, even in a morphologically rich language
2000cited by this paper
On Information theory, entropy, and phonology in the 20th century
2000cited by this paper
ABL: Alignment-Based Learning
2000influential reference
Formal grammar and information theory: together again?
2000cited by this paper
Towards High Speed Grammar Induction on Large Text Corpora
2000cited by this paper
Information Structure and the Syntax-Phonology Interface
2000cited by this paper
The phonology and morphology of reduplication
2000influential reference
Learning Phonemes Without Minimal Pairs
2000cited by this paper
A Comparative Study of Unsupervised Grapheme-Phoneme Alignment Methods
2000cited by this paper
Vision
2000cited by this paper
The Syntax of American Sign Language: Functional Categories and Hierarchical Structure: by Carol Neidle et al.
2000cited by this paper
The Acquisition of Word Order by a Computational Learning System
2000cited by this paper
Computational Complexity of Problems on Probabilistic Grammars and Transducers
2000cited by this paper
Knowledge-Free Induction of Morphology Using Latent Semantic Analysis
2000cited by this paper
Measures of Distributional Similarity
1999influential reference
Connectionist sentence processing in perspective
1999cited by this paper
Unsupervised learning of derivational morphology from inflectional lexicons
1999cited by this paper
Hiding a Semantic Hierarchy in a Markov Model
1999cited by this paper
Compression and Approximate Matching
1999cited by this paper
Techniques in Speech Acoustics
1999cited by this paper
Book Reviews: Foundations of Statistical Natural Language Processing
1999cited by this paper
Computational Approaches to Language Acquisition
1999cited by this paper
Memory-Based Morphological Analysis
1999cited by this paper
Distributional Similarity Models: Clustering vs. Nearest Neighbors
1999cited by this paper
Approximating the Permanent via Importance Sampling with Application to the Dimer Covering Problem
1999influential reference
The applications of unsupervised learning to Japanese grapheme-phoneme alignment
1999cited by this paper
Hidden Neural Networks
1999cited by this paper
A Selectionist Theory of Language Acquisition
1999cited by this paper
On the Unsupervised Induction of Phrase-Structure Grammars
1999cited by this paper
Recent advances in memory-based part-of-speech tagging
1999cited by this paper
Analogical Prediction
1999cited by this paper
Polynomial Time Algorithms to Approximate Permanents and Mixed Discriminants Within a Simply Exponential Factor
1999cited by this paper
German noun inflection
1999cited by this paper
A Stochastic Language Model using Dependency and Its Improvement by Word Clustering
1998cited by this paper
The segmentation problem in morphology learning
1998cited by this paper
Meaningful Differences in the Everyday Experience of Young American Children
1998cited by this paper
Word Clustering and Disambiguation Based on Co-occurrence Data
1998influential reference
Machine Learning of Phonotactics
1998cited by this paper
Parsing Inside-Out
1998cited by this paper
The BNC Handbook: Exploring the British National Corpus with SARA
1998cited by this paper
Investigations in universal grammar: A guide to experiments on the acquisition of syntax and semantics
1998cited by this paper
Inductive Logic Programming: Issues, Results and the LLL Challenge (abstract)
1998cited by this paper
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
1998cited by this paper
On Psycholinguistic Grammars
1998cited by this paper
Quantitative Comparison of Languages
1998cited by this paper
Morphemes as Necessary Concept for Structures Discovery from Untagged Corpora
1998cited by this paper
Parser evaluation: a survey and a new proposal
1998cited by this paper
Statistical Models for Co-occurrence Data
1998cited by this paper
A Goodness Measure for Phrase Learning via Compression with the MDL Principle
1998cited by this paper
No free lunch theorems for optimization
1997cited by this paper
Machine Transliteration
1997cited by this paper
Grammar Acquisition Based on Clustering Analysis and Its Application to Statistical Parsing
1997cited by this paper
The inflectional phonology of German adjectives, determiners, and pronouns
1997cited by this paper
Data Mining as a Method for Linguistic Analysis: Dutch Diminutives
1997cited by this paper
THE MORPHOLEXICAL NATURE OF ENGLISH TO-CONTRACTION
1997cited by this paper
Statistical methods for speech recognition
1997influential reference
Automatic Acquisition of Two-Level Morphological Rules
1997cited by this paper
Searching through subsets: a test of the visual indexing hypothesis.
1997cited by this paper
Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology
1997influential reference
Syntactic categorization in early language acquisition: formalizing the role of distributional analysis.
1997cited by this paper
Algorithms for Grapheme-Phoneme Translation for English and French: Applications for Database Searches and Speech Synthesis
1997cited by this paper
A Connectionist Model of the Arabic Plural System
1997influential reference
Evolving stochastic context-free grammars from examples using a minimum description length principle
1997cited by this paper

CITED BY

Language acquisition: do children and language models follow similar learning stages?
2023cites this paper
Approaching explanatory adequacy in phonology using Minimum Description Length
2021cites this paper
Learning Syntax from Naturally-Occurring Bracketings
2021cites this paper
A conditional learnability argument for constraints on underlying representations
2020cites this paper
Task‐induced brain functional connectivity as a representation of schema for mediating unsupervised and supervised learning dynamics in language acquisition
2020cites this paper
Supervised Training on Synthetic Languages: A Novel Framework for Unsupervised Parsing
2019cites this paper
A unified approach to several learning challenges in phonology*
2019cites this paper
Learning phonological optionality and opacity from distributional evidence*
2018cites this paper
Cognitive and linguistic biases in morphology learning.
2018cites this paper
Learning rule-based morpho-phonology
2018cites this paper
Language and other complex behaviors: Unifying characteristics, computational models, neural mechanisms
2017cites this paper
Similarity Learning and Stochastic Language Models for Tree-Represented Music
2017cites this paper
Grammatical Inference of PCFGs Applied to Language Modelling and Unsupervised Parsing
2016cites this paper
On Evaluation Metrics in Optimality Theory
2016cites this paper
DCG-UPUP-Away : automatic symbol acquisition through grounding to unknowns
2016cites this paper
Linguistically Motivated Combinatory Categorial Grammar Induction
2016cites this paper
EFL learners' learning of English verb argument structure
2015influential citation
A cognitively plausible model for grammar induction
2015influential citation
Learning a Generative Probabilistic Grammar of Experience: A Process-Level Model of Language Acquisition
2015cites this paper
On evaluation metrics in Optimality Theory 0
2015cites this paper
Compression-based learning for OT is incompatible with Richness of the Base∗
2015cites this paper
Unsupervised grammar induction with Combinatory Categorial Grammars
2015cites this paper
Semi-supervised Parsing of Portuguese
2014cites this paper
Syllables as Linguistic Units?
2014cites this paper
Técnicas de clustering para inducción de categorías sintácticas en español.
2014cites this paper
Grammatical Inference of some Probabilistic Context-Free Grammars from Positive Data using Minimum Satisfiability
2014cites this paper
An Evaluation Metric for
2014cites this paper
PCFG Induction for Unsupervised Parsing and Language Modelling
2014cites this paper
Conceptually Related Lexicon Clustering Based On Word Context Association Mining
2013cites this paper
An HDP Model for Inducing Combinatory Categorial Grammars
2013cites this paper
A Diverse Dirichlet Process Ensemble for Unsupervised Induction of Syntactic Categories
2012cites this paper
Learning Constructions of Natural Language: Statistical Models and Evaluations
2012cites this paper
Computational models of syntactic acquisition.
2012cites this paper
Roi Reichart 1 / 5 RESEARCH STATEMENT – Roi Reichart ( roiri
2011cites this paper
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation
2011cites this paper
Computational Models of Language Acquisition
2011cites this paper
SEMANTIC REPRESENTATION
2011cites this paper
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation
2011cites this paper
Implementation of sentence parser for Hungarian language in natural language processing
2010cites this paper
Improved Fully Unsupervised Parsing with Zoomed Learning
2010cites this paper
An empirical generative framework for computational modeling of language acquisition.
2010cites this paper
Mapping Between Semantic Graphs and Sentences in Grammar Induction System
2010cites this paper
Inducción de constituyentes sintácticos en español con técnicas de clustering y filtrado por información mutua
2010influential citation
Semi-supervised natural language acquisition (למידה מפוקחת למחצה של שפה טבעית על ידי מחשב.; למידה מפקחת למחצה של שפה טבעית על ידי מחשב.)
2010cites this paper
WordICA—emergence of linguistic representations for words by independent component analysis
2010cites this paper
Rademacher Complexity and Grammar Induction Algorithms: What It May (Not) Tell Us
2010cites this paper
Unsupervised Learning and Grammar Induction
2010cites this paper
Three Myths from the Language Acquisition Literature
2010cites this paper
Superior and Efficient Fully Unsupervised Pattern-based Concept Acquisition Using an Unsupervised Parser
2009cites this paper
Unsupervised learning of natural languages
2009cites this paper
Using genetic algorithm for Persian grammar induction
2009cites this paper
The myth of language universals: language diversity and its importance for cognitive science.
2009cites this paper
Updated MINDS Report on Speech Recognition and Understanding, Part 2
2009cites this paper
Part-of-speech Bootstrapping using Lexically-Specific Frames
2009cites this paper
Universal grammar and mental continuity : Two modern myths
2009cites this paper
Sistema de extracción de cuerpos de texto de la web para tareas lingüísticas
2009cites this paper
Unsupervised Learning of Morphology and the Languages of the World
2009cites this paper
Returning language to culture by way of biology
2009cites this paper
Commentary/Evans & Levinson: The myth of language universals
2009cites this paper
Web text corpus extraction system for linguistic tasks
2009cites this paper
Updated MINDS report on speech recognition and understanding, Part 2 [DSP Education]
2009cites this paper
Updated MINDS report on speech recognition and understanding, Part 2 [DSP Education]
2009cites this paper
Updated MINDS Report on Speech Recognition and Understanding
2009cites this paper
A Computational Model of Early Argument Structure Acquisition
2008cites this paper
Speechlinks: Robust Cross-Lingual Tactical Communication Aids
2008cites this paper
Structures and distributions in morphology learning
2008cites this paper
Unsupervised Grammar Induction Using a Parent Based Constituent Context Model
2008cites this paper
A computational model of the emergence of early constructions
2008cites this paper
Unsupervised Induction of Labeled Parse Trees by Clustering with Syntactic Features
2008cites this paper
Contextual bootstrapping for grammar learning
2008cites this paper
A Probabilistic Model of Early Argument Structure Acquisition
2008cites this paper
of Early Argument Structure Acquisition
2008cites this paper
Grammar Induction & Language Evolution
2008cites this paper
Historical Development and Future Directions in Speech Recognition and Understanding
2007cites this paper
ERP data and cognitive linguistics
2007cites this paper
Novel estimation methods for unsupervised discovery of latent structure in natural language text
2007cites this paper
Historical Development and Future Directions in Speech Recognition and Understanding
2007cites this paper
Behavioral and computational aspects of language and its acquisition
2007cites this paper
Developing an annotated corpus for Gıkuyu using language-independent machine learning techniques
2006cites this paper
A deterministic dynamic associative memory (ddam) model for concept space representation
2006cites this paper
Unsupervised learning of natural languages
2006cites this paper
Hisory-Based Inside-Outside Algorithm
2006cites this paper
Neural blackboard architectures of combinatorial structures in cognition
2006cites this paper
A Structured Context Model for Grammar Learning
2006cites this paper
Neural blackboard architectures of combinatorial structures in cognition
2006cites this paper
Unsupervised grammar induction using history based approach
2006cites this paper
Language resources and tools in Southern Africa 11
2006cites this paper
Ewolucyjne wnioskowanie gramatyczne
2006cites this paper
Unsupervised pattern discovery in speech: applications to word acquisition and speaker segmentation
2006cites this paper
The unsupervised learning of natural language structure
2005cites this paper
The major transitions in the evolution of language
2005cites this paper
Hybrid Syntactic Category Induction
2005cites this paper
Guiding Unsupervised Grammar Induction Using Contrastive Estimation
2005cites this paper
A Survey on Computational Models for Argument Structure Acquisition
2005influential citation
Progressing the state-of-the-art in grammatical inference by competition
2005cites this paper
A Second Language Acquisition Model Using Example Generalization and Concept Categories
2005cites this paper
Automatic and unsupervised methods in natural language processing
2005cites this paper
Some Tests of an Unsupervised Model of Language Acquisition
2004cites this paper
Bridging language with the rest of cognition: computational, algorithmic and neurobiological issues and methods
2004cites this paper
Learning Syntactic Constructions from Raw Corpora
2004cites this paper