Text Chunking using Regularized Winnow

Tong Zhang,Fred J. Damerau,David E. Johnson

Published 2001 in Annual Meeting of the Association for Computational Linguistics

ABSTRACT

Many machine learning methods have recently been applied to natural language processing tasks. Among them, the Winnow algorithm has been argued to be particularly suitable for NLP problems, due to its robustness to irrelevant features. However in theory, Winnow may not converge for non-separable data. To remedy this problem, a modification called regularized Winnow has been proposed. In this paper, we apply this new method to text chunking. We show that this method achieves state of the art performance with significantly less computation than previous approaches.

PUBLICATION RECORD

Publication year
2001
Venue
Annual Meeting of the Association for Computational Linguistics
Publication date
2001-07-06
Fields of study
Computer Science
Identifiers
DOI 10.3115/1073012.1073081
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

Regularized Winnow uses significantly less computation than previous approaches on the text chunking task.
Confidence 0.94

박진우 (dztg5apj7m) extraction뀨 (7c402c1b98) reviewq (76h6bfydm6) review
Regularized Winnow applied to text chunking achieves state-of-the-art performance.
Confidence 0.96

박진우 (dztg5apj7m) extraction뀨 (7c402c1b98) reviewq (76h6bfydm6) review

CONCEPTS

computation
resource, metric

The amount of processing required to train or apply a method on the text chunking task.

Aliases: computational cost

박진우 (dztg5apj7m) extraction뀨 (7c402c1b98) reviewq (76h6bfydm6) review
non-separable data
data property

Training data that cannot be perfectly separated by a linear classifier.

박진우 (dztg5apj7m) extraction뀨 (7c402c1b98) reviewq (76h6bfydm6) review
previous approaches
baseline

Earlier text chunking methods used as comparison baselines in the paper.

Aliases: prior approaches, earlier approaches

박진우 (dztg5apj7m) extraction뀨 (7c402c1b98) reviewq (76h6bfydm6) review
regularized winnow
method

A modified Winnow variant introduced to handle non-separable data more robustly.

박진우 (dztg5apj7m) extraction뀨 (7c402c1b98) reviewq (76h6bfydm6) review
state-of-the-art performance
evaluation metric

The highest reported level of effectiveness on the evaluated text chunking task.

Aliases: SOTA performance, SOTA

박진우 (dztg5apj7m) extraction뀨 (7c402c1b98) reviewq (76h6bfydm6) review
text chunking
task

The natural language processing task of segmenting text into syntactic chunks.

Aliases: chunking

박진우 (dztg5apj7m) extraction뀨 (7c402c1b98) reviewq (76h6bfydm6) review
winnow algorithm
method

A linear learning algorithm used for feature-based classification and discussed here as the method being regularized.

Aliases: Winnow

박진우 (dztg5apj7m) extraction뀨 (7c402c1b98) reviewq (76h6bfydm6) review

REFERENCES

The Use of Classifiers in Sequential Inference
2001cited by this paper
Use of Support Vector Learning for Chunk Identification
2000cited by this paper
Introduction to the CoNLL-2000 Shared Task Chunking
2000cited by this paper
Regularized Winnow Methods
2000influential reference
Chunking with WPDV Models
2000cited by this paper
Relational Learning for NLP using Linear Threshold Elements
1999cited by this paper
Linear Hinge Loss and Average Margin
1998cited by this paper
Mistake-Driven Learning in Text Categorization
1997cited by this paper
Some Advances in Transformation-Based Part of Speech Tagging
1994cited by this paper
Parsing By Chunks
1991cited by this paper
Principle-Based Parsing: Computation and Psycholinguistics
1991cited by this paper
Slot Grammar: A System for Simpler Construction of Practical Natural Language Grammars
1989cited by this paper
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm
1987cited by this paper

CITED BY

A survey on syntactic processing techniques
2022cites this paper
Inspecting Hybrid Data Mining Approaches in Decision Support Systems for Humanities Texts Criticism
2021cites this paper
Automatic authorship detection from Bengali text using stylometric approach
2017cites this paper
Consolidation of Subtasks for Target Task in Pipelined NLP Model
2014cites this paper
A Boosted Semi-Markov Perceptron
2013influential citation
Tibetan Base Noun Phrase Identification Framework Based on Chinese-Tibetan Sentence Aligned Corpus
2012cites this paper
About the exploration of data mining techniques using structured features for information extraction
2012cites this paper
Multiword Expression Multiword Expression Multiword Expression Multiword Expressions s
2011cites this paper
Inference of Fine-grained Attributes of Bengali Corpus for Stylometry Detection
2011cites this paper
Automatic Bubble Detection in Cardiac Video Imaging
2011cites this paper
Learning Sparser Perceptron Models
2011cites this paper
Using Suffix Arrays for Efficiently Recognition of Named Entities in Large Scale
2011cites this paper
Exploiting Rich Syntactic Features for Hedge Detection and Scope Finding ∗
2010cites this paper
An AdaBoost Using a Weak-Learner Generating Several Weak-Hypotheses for Large Training Data of Natural Language Processing
2010cites this paper
Contributions to the estimation of probabilistic discriminative models: semi-supervised learning and feature selection. (Contributions à l'estimation de modèles probabilistes discriminants: apprentissage semi-supervisé et sélection de caractéristiques)
2010cites this paper
Supertagging: Using Complex Lexical Descriptions in Natural Language Processing
2010cites this paper
Semi-supervised Bio-named Entity Recognition with Word-Codebook Learning
2010cites this paper
Dropping down the Maximum Item Set: Improving the Stylometric Authorship Attribution Algorithm in the Text Mining for Authorship Investigation
2010cites this paper
Hedge Detection and Scope Finding by Sequence Labeling with Procedural Feature Selection
2010cites this paper
Combining labeled and unlabeled data with word-class distribution learning
2009cites this paper
Computational Stylometic Approach Based on Frequent Word and Frequent Pair in the Text Mining Authorship Attribution
2009cites this paper
On the Role of Lexical Features in Sequence Labeling
2009cites this paper
Coping with Distribution Change in the Same Domain Using Similarity-Based Instance Weighting
2009cites this paper
Shallow Parsing Based on Maximum Matching Method and Scoring Model
2008cites this paper
Applying causal-state splitting reconstruction algorithm to natural language processing tasks
2008cites this paper
A Fast Boosting-based Learner for Feature-Rich Tagging and Chunking
2008cites this paper
A robust multilingual portable phrase chunking system
2007cites this paper
Efficient text chunking using linear kernel with masked method
2007influential citation
Improving on-line learning
2007cites this paper
Chinese Base NP Chunking by Error-driven Combination Classifiers
2007cites this paper
Korean Text Chunk Identification Using Support Vector Machines
2006cites this paper
Two-phase learning for biological event extraction and verification
2006cites this paper
A Systematic Cross-Comparison of Sequence Classifiers
2006cites this paper
A Classifier-Based Parser with Linear Run-Time Complexity
2005cites this paper
Algorithms for Minimum Risk Chunking
2005cites this paper
Inference With Classifiers: A Study of Structured Output Problems in Natural Language Processing
2005cites this paper
A Systematic Comparison of Feature-Rich Probabilistic Classifiers for NER Tasks
2005cites this paper
Inference with Classifiers : The Phrase Identification Problem Inference with Classifiers : The Phrase Identification Problem ∗
2004cites this paper
A SNoW Based Supertagger with Application to NP Chunking
2003influential citation
An island-driven parsing system
2003cites this paper
Text Chunking by Combining Hand-Crafted Rules and Memory-Based Learning
2003cites this paper
Corpus-based Japanese morphological analysis
2003cites this paper
Decision tree decomposition-based complex feature selection for text chunking
2002cites this paper
Text Chunking based on a Generalization of Winnow
2002influential citation
Memory-Based Shallow Parsing
2002cites this paper
Recognising Clauses Using Symbolic and Machine Learning Approaches
2002cites this paper
Shallow Parsing using Specialized HMMs
2002cites this paper
Chunking Using Maximum Entropy Models
2001cites this paper
Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm
year unknowninfluential citation