Categorial Grammar Induction with Stochastic Category Selection

Published 2024 in International Conference on Language Resources and Evaluation

ABSTRACT

Grammar induction, the task of learning a set of syntactic rules from minimally annotated training data, provides a means of exploring the longstanding question of whether humans rely on innate knowledge to acquire language. Of the various formalisms available for grammar induction, categorial grammars provide an appealing option due to their transparent interface between syntax and semantics. However, to obtain competitive results, previous categorial grammar inducers have relied on shortcuts such as part-of-speech annotations or an ad hoc bias term in the objective function to ensure desirable branching behavior. We present a categorial grammar inducer that eliminates both shortcuts: it learns from raw data, and does not rely on a biased objective function. This improvement is achieved through a novel stochastic process used to select the set of available syntactic categories. On a corpus of English child-directed speech, the model attains a recall-homogeneity of 0.48, a large improvement over previous categorial grammar inducers.

PUBLICATION RECORD

Publication year
2024
Venue
International Conference on Language Resources and Evaluation
Publication date
2024-05-01
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.63317/3y3r87q75ad4
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Learning a Grammar Inducer from Massive Uncurated Instructional Videos
2022cited by this paper
Depth-Bounded Statistical PCFG Induction as a Model of Human Grammar Acquisition
2021cited by this paper
Video-aided Unsupervised Grammar Induction
2021cited by this paper
On Aspects of the Theory of Syntax
2021cited by this paper
Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages
2021influential reference
Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories
2020cited by this paper
Syntactic Structure from Deep Learning
2020cited by this paper
The Return of Lexical Dependencies: Neural Lexicalized PCFGs
2020cited by this paper
Visually Grounded Compound PCFGs
2020cited by this paper
Compound Probabilistic Context-Free Grammars for Grammar Induction
2019cited by this paper
Attention is All you Need
2017cited by this paper
Labeled Grammar Induction with Minimal Supervision
2015cited by this paper
Syntactic Islands and Learning Biases: Combining Experimental Syntax and Computational Modeling to Investigate the Language Acquisition Problem
2013cited by this paper
An HDP Model for Inducing Combinatory Categorial Grammars
2013cited by this paper
Simple Robust Grammar Induction with Combinatory Categorial Grammars
2012cited by this paper
Accurate Unbounded Dependency Recovery using Generalized Categorial Grammars
2012cited by this paper
First Language
2009cited by this paper
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank
2007cited by this paper
Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars
2005cited by this paper
The estimation of stochastic context-free grammars using the Inside-Outside algorithm
2003cited by this paper
The Infinite Hidden Markov Model
2002cited by this paper
A Generative Constituent-Context Model for Improved Grammar Induction
2002cited by this paper
The CHILDES Project: Tools for Analyzing Talk (third edition): Volume I: Transcription format and programs, Volume II: The database
2000cited by this paper
Two Experiments on Learning Probabilistic Dependency Grammars from Corpora
1992cited by this paper
Syntactic Processing
1979cited by this paper
Language Identification in the Limit
1967cited by this paper
A Quasi-Arithmetical Notation for Syntactic Description
1953cited by this paper

CITED BY

No citing papers are available for this paper.