Self-Training PCFG Grammars with Latent Annotations Across Languages

Published 2009 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

We investigate the effectiveness of self-training PCFG grammars with latent annotations (PCFG-LA) for parsing languages with different amounts of labeled training data. Compared to Charniak's lexicalized parser, the PCFG-LA parser was more effectively adapted to a language for which parsing has been less well developed (i.e., Chinese) and benefited more from self-training. We show for the first time that self-training is able to significantly improve the performance of the PCFG-LA parser, a single generative parser, on both small and large amounts of labeled training data. Our approach achieves state-of-the-art parsing accuracies for a single parser on both English (91.5%) and Chinese (85.2%).

PUBLICATION RECORD

Publication year
2009
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2009-08-06
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.3115/1699571.1699621
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Chinese Statistical Parsing
2009cited by this paper
Optimizing Chinese Word Segmentation for Machine Translation Performance
2008cited by this paper
Semi-Supervised Convex Training for Dependency Parsing
2008cited by this paper
Simple Semi-supervised Dependency Parsing
2008cited by this paper
Forest Reranking: Discriminative Parsing with Non-Local Features
2008cited by this paper
Sparse Multi-Scale Grammars for Discriminative Latent Variable Parsing
2008cited by this paper
When is Self-Training Effective for Parsing?
2008cited by this paper
Mandarin Part-of-Speech Tagging and Discriminative Reranking
2007cited by this paper
Improved Inference for Unlexicalized Parsing
2007influential reference
Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets
2007cited by this paper
Effective Self-Training for Parsing
2006influential reference
Learning Accurate, Compact, and Interpretable Tree Annotation
2006influential reference
Tregex and Tsurgeon: tools for querying and manipulating tree data structures
2006cited by this paper
Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking
2005cited by this paper
Probabilistic CFG with Latent Annotations
2005cited by this paper
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus
2005influential reference
Bootstrapping statistical parsers from small datasets
2003cited by this paper
Head-Driven Statistical Models for Natural Language Parsing
2003cited by this paper
Is it Harder to Parse Chinese, or the Chinese Treebank?
2003influential reference
A Maximum-Entropy-Inspired Parser
2000cited by this paper
Two Statistical Parsing Models Applied to the Chinese Treebank
2000cited by this paper
Statistical Parsing with a Context-Free Grammar and Word Statistics
1997cited by this paper

CITED BY

Transforming Human-Machine Interaction: Generative AI Virtual Asst
2024cites this paper
Curriculum-Style Fine-Grained Adaption for Unsupervised Cross-Lingual Dependency Transfer
2023cites this paper
Out-of-Domain Discourse Dependency Parsing via Bootstrapping: An Empirical Analysis on Its Effectiveness and Limitation
2022cites this paper
Improving Low-resource RRG Parsing with Cross-lingual Self-training
2022cites this paper
Self-training For Pre-training Language Models
2021cites this paper
FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning
2021cites this paper
Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference
2020cites this paper
Reciprocal Supervised Learning Improves Neural Machine Translation
2020cites this paper
Predictions For Pre-training Language Models
2020cites this paper
Towards Scalable Image Classifier Learning with Noisy Labels via Domain Adaptation
2020cites this paper
slimIPL: Language-Model-Free Iterative Pseudo-Labeling
2020cites this paper
Zero-shot Text Classification via Reinforced Self-training
2020cites this paper
Capturing document context inside sentence-level neural machine translation models with self-training
2020cites this paper
RBN: enhancement in language attribute prediction using global representation of natural language transfer learning technology like Google BERT
2019cites this paper
Revisiting Self-Training for Neural Sequence Generation
2019cites this paper
Dynamic Self-training Framework for Graph Convolutional Networks
2019cites this paper
Smooth Methods in Head-Driven Statistical Models for Parsing
2019cites this paper
Two Local Models for Neural Constituent Parsing
2018influential citation
Learning How to Self-Learn: Enhancing Self-Training Using Neural Reinforcement Learning
2018cites this paper
Adaptive Semantic Segmentation with a Strategic Curriculum of Proxy Labels
2018cites this paper
The Devil is in the Details: Parsing Unknown German Words
2017cites this paper
Attention is All you Need
2017cites this paper
Latent-Variable PCFGs: Background and Applications
2017cites this paper
Building a Treebank for Vietnamese Syntactic Parsing
2017cites this paper
Improving Shift‐Reduce Phrase‐Structure Parsing with Constituent Boundary Information
2017cites this paper
Shift-Reduce Constituent Parsing with Neural Lookahead Features
2016cites this paper
Recurrent Neural Network Grammars
2016cites this paper
Enhancing Shift-Reduce Constituent Parsing with Action N-Gram Model
2016cites this paper
The Role of Intended Audience in Determining Modality Type : A Study in Relation to the Iranian Constitution
2016cites this paper
Irish dependency treebanking and parsing
2016cites this paper
Building the Vietnamese Phrase Treebank by Improved Probabilistic Context-Free Grammars
2016cites this paper
Iterative parameter mixing for distributed large-margin training of structured predictors for natural language processing
2015cites this paper
POS tagging of Chinese Buddhist texts using Recurrent Neural Networks
2015cites this paper
Modeling Reportable Events as Turning Points in Narrative
2015cites this paper
Parser self-training for syntax-based machine translation
2015cites this paper
The role of syntax and semantics in machine translation and quality estimation of machine-translated user-generated content
2015cites this paper
Graph-Based Lexicon Regularization for PCFG With Latent Annotations
2015cites this paper
Effective Use of Cross-Domain Parsing in Automatic Speech Recognition and Error Detection
2015cites this paper
Mapping Unseen Words to Task-Trained Embedding Spaces
2015cites this paper
Training with Auto-parsed Whole Trees
2015cites this paper
Structured Training for Neural Network Transition-Based Parsing
2015cites this paper
Domain adaptation for parsing in automatic speech recognition
2014cites this paper
Chinese Unknown Word Recognition for PCFG-LA Parsing
2014cites this paper
Efficient Latent-variable Grammars : Learning and Inference
2014cites this paper
A word clustering approach to domain adaptation: Robust parsing of source and target domains
2014cites this paper
Parsing low-resource languages using Gibbs sampling for PCFGs with latent annotations
2014cites this paper
Grammar as a Foreign Language
2014influential citation
Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
2014influential citation
Learning latent variable grammars from complementary perspectives
2014cites this paper
Lexicon expansion for latent variable grammars
2014cites this paper
Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
2014influential citation
Exploiting limited data for parsing
2014cites this paper
Fast and Accurate Shift-Reduce Constituent Parsing
2013cites this paper
Iterative Transformation of Annotation Guidelines for Constituency Parsing
2013cites this paper
Word Segmentation, Unknown-word Resolution, and Morphological Agreement in a Hebrew Parsing System
2013influential citation
Semi-Supervised Learning and Domain Adaptation in Natural Language Processing
2013cites this paper
Augmented Parsing of Unknown Word by Graph-Based Semi-Supervised Learning
2013cites this paper
Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation
2013cites this paper
Working with a small dataset - semi-supervised dependency parsing for Irish
2013cites this paper
Improving shift-reduce constituency parsing with large-scale unlabeled data
2013cites this paper
Embedding epistemic modals in English: A corpus-based study
2012cites this paper
Practical and efficient incorporation of syntactic features into statistical language models
2012cites this paper
Self-Training Tree Substitution Grammars for Domain Adaptation
2012influential citation
Phrase Parses Reranking Based on Higher-Order Lexical Dependencies: Phrase Parses Reranking Based on Higher-Order Lexical Dependencies
2012influential citation
SELECTIVE LEARNING IN THE ACQUISITION OF KANNADA
2012cites this paper
Revisiting the Case for Explicit Syntactic Information in Language Models
2012cites this paper
NAACL-HLT 2012 WLM 2012: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT Workshop Notes
2012cites this paper
Exploiting Lexical Dependencies from Large-Scale Data for Better Shift-Reduce Constituency Parsing
2012influential citation
Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT - Workshop Notes
2012cites this paper
Intégration de ressources lexicales riches dans un analyseur syntaxique probabiliste. (Integration of lexical resources in a probabilistic parser)
2012cites this paper
Learning sub-word units and exploiting contextual information for open vocabulary speech recognition
2011cites this paper
Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser
2011influential citation
Efficient discriminative training of long-span language models
2011cites this paper
Parse Reranking Based on Higher-Order Lexical Dependencies
2011cites this paper
Generalized Interpolation in Decision Tree LM
2011cites this paper
Coarse-to-Fine Natural Language Processing
2011influential citation
Data point selection for self-training
2011cites this paper
The Second Workshop on Statistical Parsing of Morphologically Rich Languages ( SPMRL
2011cites this paper
From News to Comment: Resources and Benchmarks for Parsing the Language of Web 2.0
2011cites this paper
Feature-Rich Log-Linear Lexical Model for Latent Variable PCFG Grammars
2011influential citation
Modeling Dependencies in Natural Languages with Latent Variables
2011cites this paper
OOV Sensitive Named-Entity Recognition in Speech
2011cites this paper
Comparing the Use of Edited and Unedited Text in Parser Self-Training
2011cites this paper
Selective learning in the acquisition of Kannada ditransitives
2011cites this paper
Syntactic Decision Tree LMs: Random Selection or Intelligent Design?
2011cites this paper
Improving Part-of-speech Tagging for Context-free Parsing
2011cites this paper
Self-Training with Products of Latent Variable Grammars
2010influential citation
Contextual Information Improves OOV Detection in Speech
2010cites this paper
Handling Unknown Words in Statistical Latent-Variable Parsing Models for Arabic, English and French
2010cites this paper
Appropriately Handled Prosodic Breaks Help PCFG Parsing
2010cites this paper
Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics and Writing: Writing Process
2010cites this paper
Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions
2010cites this paper
Learning Simple Wikipedia: A Cogitation in Ascertaining Abecedarian Language
2010cites this paper
Uptraining for Accurate Deterministic Question Parsing
2010cites this paper
Better Arabic Parsing: Baselines, Evaluations, and Analysis
2010cites this paper
Treebank Conversion based Self-training Strategy for Parsing
2010cites this paper
Workshop on Computational Linguistics and Writing: Writing Processes and Authoring Aids (cl&w 2010) Grammaticality Judgement in a Word Completion Task Exploring Individual Differences in Student Writing with a Narrative Composition Support Environment Workshop Program Exploring Individual Difference
2010cites this paper
Products of Random Latent Variable Grammars
2010cites this paper
Phrase Structure Parsing with Dependency Structure
2010cites this paper
A Parse-and-Trim Approach with Information Significance for Chinese Sentence Compression
2009cites this paper