We present a new semi-supervised training procedure for conditional random fields (CRFs) that can be used to train sequence segmentors and labelers from a combination of labeled and unlabeled training data. Our approach is based on extending the minimum entropy regularization framework to the structured prediction case, yielding a training objective that combines unlabeled conditional entropy with labeled conditional likelihood. Although the training objective is no longer concave, it can still be used to improve an initial model (e.g. obtained from supervised training) by iterative ascent. We apply our new training algorithm to the problem of identifying gene and protein mentions in biological texts, and show that incorporating unlabeled data improves the performance of the supervised CRF in this case.
Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling
Feng Jiao,Shaojun Wang,Chi-Hoon Lee,R. Greiner,Dale Schuurmans
Published 2006 in Annual Meeting of the Association for Computational Linguistics
ABSTRACT
PUBLICATION RECORD
- Publication year
2006
- Venue
Annual Meeting of the Association for Computational Linguistics
- Publication date
2006-07-17
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-22 of 22 references · Page 1 of 1