Effective Self-Training for Parsing

David McClosky,Eugene Charniak,Mark Johnson

Published 2006 in North American Chapter of the Association for Computational Linguistics

ABSTRACT

We present a simple, but surprisingly effective, method of self-training a two-phase parser-reranker system using readily available unlabeled data. We show that this type of bootstrapping is possible for parsing when the bootstrapped parses are processed by a discriminative reranker. Our improved model achieves an f-score of 92.1%, an absolute 1.1% improvement (12% error reduction) over the previous best result for Wall Street Journal parsing. Finally, we provide some analysis to better understand the phenomenon.

PUBLICATION RECORD

Publication year
2006
Venue
North American Chapter of the Association for Computational Linguistics
Publication date
2006-06-04
Fields of study
Computer Science
Identifiers
DOI 10.3115/1220835.1220855
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

MAP adaptation of stochastic grammars
2006cited by this paper
Better k-best Parsing
2005cited by this paper
Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking
2005influential reference
Discriminative Training of a Neural Network Statistical Parser
2004cited by this paper
Bootstrapping statistical parsers from small datasets
2003influential reference
Bootstrapping POS-taggers using unlabelled data
2003cited by this paper
A Generative Constituent-Context Model for Improved Grammar Induction
2002cited by this paper
Corpus Variation and Parser Performance
2001cited by this paper
PAC Generalization Bounds for Co-training
2001cited by this paper
Applying Co-Training Methods to Statistical Parsing
2001cited by this paper
A Maximum-Entropy-Inspired Parser
2000cited by this paper
Discriminative Reranking for Natural Language Parsing
2000cited by this paper
Computation of the N Best Parse Trees for Weighted and Stochastic Context-Free Grammars
2000cited by this paper
Estimators for Stochastic “Unification-Based” Grammars
1999cited by this paper
Statistical Parsing with a Context-Free Grammar and Word Statistics
1997cited by this paper
An Empirical Study of Smoothing Techniques for Language Modeling
1996cited by this paper
Building a Large Annotated Corpus of English: The Penn Treebank
1993cited by this paper
Structural Ambiguity and Lexical Relations
1991cited by this paper
of the Association for Computational Linguistics
year unknowncited by this paper

CITED BY

PLLM: Pseudo-Labeling Large Language Models for CAD Program Synthesis
2026cites this paper
Discovering and Causally Validating Emotion-Sensitive Neurons in Large Audio-Language Models
2026cites this paper
Semi-supervised learning for dose prediction in targeted radionuclide therapy: a synthetic data study
2026cites this paper
Dual-view cross attention enhanced semi-supervised learning method for discourse cognitive engagement classification in online course discussions
2025cites this paper
An open-set few-shot face recognition framework with balanced adaptive cohesive mixing
2025cites this paper
Manod: A multi-modal anomaly detection framework for distributed system
2025cites this paper
Semi-supervised federated learning for collaborative security threat detection in control system for distributed power generation
2025cites this paper
Revisiting semi-supervised learning in the era of foundation models
2025cites this paper
A pseudo-labeling approach based on knowledge distillation for graph few-shot learning
2025cites this paper
A Dual-Channel Iterative Method Integrating Semi-supervised Self-Training and Interpretable Deep Learning Models for Mineral Prospectivity Prediction
2025cites this paper
Contrastive Learning on LLM Back Generation Treebank for Cross-domain Constituency Parsing
2025cites this paper
CACE: Sim-to-Real Indoor 3D Semantic Segmentation via Context-Aware Augmentation and Consistency Enforcement
2025cites this paper
PropitterX: a Twitter-based propaganda corpus extended with multiple contextual features
2025cites this paper
Progressive low-confidence pseudolabeling for semisupervised node classification
2025cites this paper
CGMatch: A Different Perspective of Semi-supervised Learning
2025cites this paper
Semi-Supervised Learning for Dose Prediction in Targeted Radionuclide: A Synthetic Data Study
2025cites this paper
Cross-Cloud Consistency for Weakly Supervised Point Cloud Semantic Segmentation
2025cites this paper
Semi-supervised medical image classification via distance correlation minimization and graph attention regularization
2024cites this paper
Cross-Domain Learning for Video Anomaly Detection with Limited Supervision
2024cites this paper
PL-MCT: pseudo-labeling and multi-frame consistency training for semi-supervised visual tracking
2024cites this paper
Named entity recognition using transfer learning and small human‐ and meta‐pseudo‐labeled datasets
2024cites this paper
Leveraging the Structure of Pre-trained Embeddings to Minimize Annotation Effort
2024cites this paper
Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale
2024cites this paper
Training-Free Unsupervised Prompt for Vision-Language Models
2024cites this paper
Hierarchical Differential Amplifier Contrastive Learning for Semi-supervised Extractive Summarization
2024cites this paper
A Semi-Supervised Method for Grain Boundary Segmentation: Teacher–Student Knowledge Distillation and Pseudo-Label Repair
2024cites this paper
Depression symptoms modelling from social media text: an LLM driven semi-supervised learning approach
2024cites this paper
Graph augmentation for node-level few-shot learning
2024cites this paper
Deep Confident Steps to New Pockets: Strategies for Docking Generalization
2024cites this paper
Structured Tree Alignment for Evaluation of (Speech) Constituency Parsing
2024cites this paper
Unsupervised Morphological Tree Tokenizer
2024cites this paper
Exploring Inherent Consistency for Semi-Supervised Anatomical Structure Segmentation in Medical Imaging
2024cites this paper
Extraction of Event Structures from Text
2024cites this paper
Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier
2024cites this paper
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents
2024cites this paper
Cross-domain Constituency Parsing by Leveraging Heterogeneous Data
2024cites this paper
Weighting Pseudo-Labels via High-Activation Feature Index Similarity and Object Detection for Semi-Supervised Segmentation
2024cites this paper
Generalized Uncertainty of Deep Neural Networks: Taxonomy and Applications
2023cites this paper
ReFixMatch-LS: reusing pseudo-labels for semi-supervised skin lesion classification
2023cites this paper
Toolformer: Language Models Can Teach Themselves to Use Tools
2023cites this paper
Unsupervised domain adaptation for object detection through mixed-domain and co-training learning
2023cites this paper
Data-driven dependency parsing of Vedic Sanskrit
2023cites this paper
Exploiting Censored Information in Self-Training for Time-to-Event Prediction
2023cites this paper
Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity
2023cites this paper
Multistage Collaborative Knowledge Distillation from Large Language Models
2023cites this paper
Approximately Bayes-optimal pseudo-label selection
2023cites this paper
DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback
2023cites this paper
Multistage Collaborative Knowledge Distillation from a Large Language Model for Semi-Supervised Sequence Generation
2023cites this paper
Student as an Inherent Denoiser of Noisy Teacher
2023cites this paper
SequenceMatch Revisiting the design of weak-strong augmentations for Semi-supervised learning
2023cites this paper
KD-Fixmatch: Knowledge Distillation Siamese Neural Networks
2023cites this paper
A Semi-Supervised Approach for Power System Event Identification
2023cites this paper
In all likelihoods: robust selection of pseudo-labeled data
2023cites this paper
Speech Emotion Recognition based on Semi-Supervised Adversarial Variational Autoencoder
2023cites this paper
MarginMatch: Improving Semi-Supervised Learning with Pseudo-Margins
2023cites this paper
MARRS: Modern Backbones Assisted Co-training for Rapid and Robust Semi-Supervised Domain Adaptation
2023cites this paper
Class-aware progressive self-training for learning convolutional networks on graphs
2023cites this paper
PHM-IRNET: Self-training thermal segmentation approach for thermographic inspection of industrial components
2023cites this paper
Data-efficient Active Learning for Structured Prediction with Partial Annotation and Self-Training
2023cites this paper
Progressive cross-domain knowledge distillation for efficient unsupervised domain adaptive object detection
2023cites this paper
Extrinsic Factors Affecting the Accuracy of Biomedical NER
2023cites this paper
Friend-training: Learning from Models of Different but Related Tasks
2023cites this paper
FixMatch-LS: Semi-supervised skin lesion classification with label smoothing
2023cites this paper
In all LikelihoodS: How to Reliably Select Pseudo-Labeled Data for Self-Training in Semi-Supervised Learning
2023cites this paper
Reducing cohort bias in natural language understanding systems with targeted self-training scheme
2023cites this paper
Don't Stop Pretraining? Make Prompt-based Fine-tuning Powerful Learner
2023cites this paper
Self-Healing Through Error Detection, Attribution, and Retraining
2023cites this paper
SCENE: Self-Labeled Counterfactuals for Extrapolating to Negative Examples
2023cites this paper
Rethinking Semi-supervised Learning with Language Models
2023cites this paper
ASPER: Answer Set Programming Enhanced Neural Network Models for Joint Entity-Relation Extraction
2023cites this paper
Approximate Bayes Optimal Pseudo-Label Selection
2023cites this paper
Progressive Feature Upgrade in Semi-supervised Learning on Tabular Domain
2022cites this paper
A Semi-supervised Deep Learning Model with Consistency Regularization of Augmented Samples for Imbalanced Fault Detection
2022cites this paper
Source-Free Domain Adaptation for Question Answering with Masked Self-training
2022cites this paper
On the Domain Adaptation and Generalization of Pretrained Language Models: A Survey
2022cites this paper
Zero-Label Prompt Selection
2022cites this paper
Weakly Supervised Training of Universal Visual Concepts for Multi-domain Semantic Segmentation
2022cites this paper
Predicting Survival Outcomes in the Presence of Unlabeled Data
2022cites this paper
Deep semi-supervised multiple instance learning with self-correction for DME classification from OCT images
2022cites this paper
Automatic Rule Induction for Efﬁcient and Interpretable Semi-Supervised Learning
2022cites this paper
Dual-feature-embeddings-based semi-supervised learning for cognitive engagement classification in online course discussions
2022cites this paper
Investigating Semi-Supervised Learning Algorithms in Text Datasets
2022cites this paper
Improving Low-resource RRG Parsing with Cross-lingual Self-training
2022cites this paper
A Comparison of Strategies for Source-Free Domain Adaptation
2022cites this paper
Depression Symptoms Modelling from Social Media Text: A Semi-supervised Learning Approach
2022cites this paper
Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent
2022influential citation
DuIVA: An Intelligent Voice Assistant for Hands-free and Eyes-free Voice Interaction with the Baidu Maps App
2022cites this paper
Coordination Generation via Synchronized Text-Infilling
2022cites this paper
STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction
2022cites this paper
Cross-domain Aspect-based Sentiment Analysis with Multimodal Sources
2022cites this paper
Generating unlabelled data for a tri-training approach in a low resourced NER task
2022cites this paper
Unsupervised Domain Adaptation for Question Generation with DomainData Selection and Self-training
2022cites this paper
HybridCR: Weakly-Supervised 3D Point Cloud Semantic Segmentation via Hybrid Contrastive Regularization
2022cites this paper
Automatic Rule Induction for Efficient Semi-Supervised Learning
2022cites this paper
Semi-supervised Domain Adaptation for Dependency Parsing with Dynamic Matching Network
2022cites this paper
SynGEC: Syntax-Enhanced Grammatical Error Correction with a Tailored GEC-Oriented Parser
2022cites this paper
Improve Event Extraction via Self-Training with Gradient Guidance
2022cites this paper
Learning with Limited Text Data
2022cites this paper
Improving Code-Switching Dependency Parsing with Semi-Supervised Auxiliary Tasks
2022influential citation
Measuring Annotator Agreement Generally across Complex Structured, Multi-object, and Free-text Annotation Tasks
2022cites this paper