High-accuracy Annotation and Parsing of CHILDES Transcripts

Kenji Sagae,Eric Davis,A. Lavie,B. MacWhinney,S. Wintner

Published 2007 in Unknown venue

ABSTRACT

Corpora of child language are essential for psycholinguistic research. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe an ongoing project that aims to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures. To date, we have produced a corpus of over 65,000 words with manually curated gold-standard grammatical relation annotations. Using this corpus, we have developed a highly accurate data-driven parser for English CHILDES data. The parser and the manually annotated data are freely available for research purposes.

PUBLICATION RECORD

Publication year
2007
Venue
Unknown venue
Publication date
2007-06-29
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.3115/1629795.1629799
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles
2007cited by this paper
Labeled Pseudo-Projective Dependency Parsing with Support Vector Machines
2006cited by this paper
CoNLL-X Shared Task on Multilingual Dependency Parsing
2006cited by this paper
A Best-First Probabilistic Shift-Reduce Parser
2006cited by this paper
Automatic Measurement of Syntactic Development in Child Language
2005influential reference
Adding Syntactic Annotations to Transcripts of Parent-Child Dialogs
2004influential reference
A Maximum-Entropy-Inspired Parser
2000cited by this paper
Parser evaluation: a survey and a new proposal
1998cited by this paper
A Maximum Entropy Approach to Natural Language Processing
1996cited by this paper
The CHILDES project: tools for analyzing talk
1992cited by this paper
What Are You Cookin' on a Hot?: Movement Constraints in the Speech of A Three-Year-Old Blind Child
1988cited by this paper
The role of Imitation in the developing syntax of a blind child
1987cited by this paper
The Units of Language Acquisition
1983cited by this paper
A First Language: The Early Stages
1975cited by this paper
On the Translation of Languages from Left to Right
1965cited by this paper

CITED BY

Automated Defect Identification System in Printed Circuit Boards Using Region-Based Convolutional Neural Networks
2025cites this paper
Compositional Syntactico-SemBanking for English as a Second or Foreign Language
2025cites this paper
UD-English-CHILDES: A Collected Resource of Gold and Silver Universal Dependencies Trees for Child Language Interactions
2025cites this paper
Do Syntactic Categories Help in Developmentally Motivated Curriculum Learning for Language Models?
2025cites this paper
Using video calls to study children's conversational development: The case of backchannel signaling
2023cites this paper
Lexicalization in the developing parser
2022cites this paper
Data-driven Parsing Evaluation for Child-Parent Interactions
2022cites this paper
Dependency Parsing Evaluation for Low-resource Spontaneous Speech
2021cites this paper
From the world to word order: Deriving biases in noun phrase order from statistical properties of the world
2020cites this paper
Children's Sentential Complement Use Leads the Theory of Mind Development Period: Evidence from the CHILDES Corpus
2019influential citation
Gold Standard Annotations for Preposition and Verb Sense with Semantic Role Labels in Adult-Child Interactions
2018cites this paper
Extensions to the GrETEL Treebank Query Application
2018cites this paper
Fluency Bank: A new resource for fluency research and practice.
2018cites this paper
The AnnCor CHILDES Treebank
2018cites this paper
Manual Versus Automated Narrative Analysis of Agrammatic Production Patterns: The Northwestern Narrative Language Analysis and Computerized Language Analysis.
2018cites this paper
Word learning and the acquisition of syntactic-semantic overhypotheses
2018cites this paper
TalkBank and CLARIN
2017cites this paper
Thinking About Multiword Constructions: Usage-Based Approaches to Acquisition and Processing
2017cites this paper
AphasiaBank as BigData
2016cites this paper
The contextual modulation of semantic information
2016cites this paper
A Data-driven Investigation of Corrective Feedback on Subject Omission Errors in First Language Acquisition
2016cites this paper
A Data-driven Investigation of Corrective Feedback on Subject Omission Errors in First Language Acquisition
2016cites this paper
An Evaluation of POS Taggers for the CHILDES Corpus
2016cites this paper
Introduction: Cognitive Issues in Natural Language Processing
2016cites this paper
The Cambridge Handbook of Learner Corpus Research: Learner corpora and natural language processing
2015cites this paper
Foreebank: Syntactic Analysis of Customer Support Forums
2015cites this paper
Dependency annotation of coordination for learner language
2014cites this paper
Word categorization from distributional information: frames confer more than the sum of their (Bigram) parts.
2014cites this paper
1 Personal Details
2014cites this paper
Child Acquisition of Multiword Verbs: A Computational Investigation
2013cites this paper
On the Automatic Analysis of Learner Language: Introduction to the Special Issue
2013cites this paper
Computational Modeling as a Methodology for Studying Human Language Learning
2013cites this paper
Annotation for Learner English Guidelines, v. 0.1
2013cites this paper
Complexity in Language Acquisition
2013cites this paper
The Syntax Parser GRASP for CHILDES
2013cites this paper
The Hebrew CHILDES corpus: transcription and morphological analysis
2013cites this paper
The PASCAL Challenge on Grammar Induction
2012cites this paper
Hierarchical Bayesian Models of Verb Learning in Children
2012influential citation
Automatically Learning Measures of Child Language Development
2012cites this paper
Defining Syntax for Learner Language Annotation
2012cites this paper
Combining the Sparsity and Unambiguity Biases for Grammar Induction
2012influential citation
Transferring Frames: Utilization of Linked Lexical Resources
2012cites this paper
Statistical construction learning: Does a Zipfian problem space ensure robust language learning?
2012cites this paper
The learnability of abstract syntactic principles.
2011cites this paper
Avoiding the Comparative Fallacy inthe Annotation of Learner Corpora
2011cites this paper
Generalizing between form and meaning using learned verb classes
2011cites this paper
Measuring Language Development in Early Childhood Education: A Case Study of Grammar Checking in Child Language Transcripts
2011cites this paper
Factors Facilitating Implicit Learning: The Case of the Sesotho Passive
2010cites this paper
Learning verb alternations in a usage-based Bayesian model
2010influential citation
Mejora de la precisión para el análisis de dependencias usando Maltparser para el castellano
2010cites this paper
Automated analysis of the Cinderella story
2010cites this paper
From linear sequences to abstract structures : Distributional information in infant-direct speech
2010cites this paper
Analyzing language samples of Spanish–English bilingual children for the automated prediction of language dominance
2010influential citation
Syntactic Annotation of Learner Corpora
2010cites this paper
A Morphologically-Analyzed CHILDES Corpus of Hebrew
2010influential citation
Computational Models of Language Acquisition
2010cites this paper
Modelling the acquisition of verb polysemy in children
2009cites this paper
Dependency Annotation for Learner Corpora
2009cites this paper
Proceedings of the Seventh International Workshop on Treebanks and Linguistic Theories (TLT 7)
2009cites this paper
The emergence of linguistic complexity
2009cites this paper
The CHILDES Project Part 1: The CHAT Transcription Format
2009cites this paper
Wide-coverage parsing of speech transcripts
2009cites this paper
From Exemplar to Grammar: A Probabilistic Analogy-Based Model of Language Learning
2009cites this paper
Speaker Choice in Children’s Spontaneous Relative Clauses
2009cites this paper
Acquiring Multiword Verbs: The Role of Statistical Evidence
2009cites this paper
Enriching CHILDES for Morphosyntactic Analysis
2008cites this paper
of Early Argument Structure Acquisition
2008cites this paper
Grammar Induction & Language Evolution
2008cites this paper
From Exemplar to Grammar: Integrating Analogy and Probability in Language Learning
2008influential citation
A Probabilistic Model of Early Argument Structure Acquisition
2008cites this paper
The Talkbank Project
2007cites this paper
Explorations in Language Learning by Integrating Analogy and Probability
2007influential citation
Edinburgh Research Explorer Computational Grammar Acquisition from CHILDES Data Using a Probabilistic Parsing Model
year unknowncites this paper