On the Role of Lexical Features in Sequence Labeling

Published 2009 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

We use the technique of SVM anchoring to demonstrate that lexical features extracted from a training corpus are not necessary to obtain state of the art results on tasks such as Named Entity Recognition and Chunking. While standard models require as many as 100K distinct features, we derive models with as little as 1K features that perform as well or better on different domains. These robust reduced models indicate that the way rare lexical features contribute to classification in NLP is not fully understood. Contrastive error analysis (with and without lexical features) indicates that lexical features do contribute to resolving some semantic and complex syntactic ambiguities -- but we find this contribution does not generalize outside the training corpus. As a general strategy, we believe lexical features should not be directly derived from a training corpus but instead, carefully inferred and selected from other sources.

PUBLICATION RECORD

Publication year
2009
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2009-08-06
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.3115/1699648.1699660
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Design Challenges and Misconceptions in Named Entity Recognition
2009cited by this paper
A unified architecture for natural language processing: deep neural networks with multitask learning
2008cited by this paper
SVM Model Tampering and Anchored Learning: A Case Study in Hebrew NP Chunking
2007influential reference
Minimally Lexicalized Dependency Parsing
2007cited by this paper
Noun Phrase Chunking in Hebrew: Influence of Lexical and Morphological Features
2006cited by this paper
Learning Accurate, Compact, and Interpretable Tree Annotation
2006cited by this paper
Discriminative learning and spanning tree algorithms for dependency parsing
2006cited by this paper
MaltParser: A Data-Driven Parser-Generator for Dependency Parsing
2006cited by this paper
Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling
2005cited by this paper
Probabilistic CFG with Latent Annotations
2005cited by this paper
Intricacies of Collins’ Parsing Model
2004cited by this paper
Shallow Parsing with Conditional Random Fields
2003cited by this paper
Accurate Unlexicalized Parsing
2003cited by this paper
Comparison of L1 and L2 support vector machines
2003cited by this paper
Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
2003cited by this paper
Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition
2002cited by this paper
Efficient Support Vector Classifiers for Named Entity Recognition
2002cited by this paper
Chunking with Support Vector Machines
2001cited by this paper
Corpus Variation and Parser Performance
2001cited by this paper
Text Chunking using Regularized Winnow
2001cited by this paper
Use of Support Vector Learning for Chunk Identification
2000cited by this paper
The Nature of Statistical Learning Theory
2000cited by this paper
Introduction to the CoNLL-2000 Shared Task Chunking
2000cited by this paper
A Maximum-Entropy-Inspired Parser
2000cited by this paper
Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification
1998cited by this paper
Three Generative, Lexicalised Models for Statistical Parsing
1997cited by this paper
Text Chunking using Transformation-Based Learning
1995cited by this paper

CITED BY

A multi-model framework for semantically enhancing detection of quality-related bug report descriptions
2023cites this paper
Review on knowledge extraction from text and scope in agriculture domain
2022cites this paper
Analyzing and Detecting Emerging Quality-Related Concerns across OSS Defect Report Summaries
2021cites this paper
Use Generalized Representations, But Do Not Forget Surface Features
2017cites this paper
Search Space Pruning: A Simple Solution for Better Coreference Resolvers
2016cites this paper
Semantic Feature Reduction in Text Document Clustering with Natural Language Processing
2014cites this paper
Ontology-based knowledge discovery from unstructured and semi-structured text
2014cites this paper
Improving Bisecting K-means by Applying Natural Language Processing
2014cites this paper
The Success and Limitations of Machine Learning in Chinese Word Segmentation
2013cites this paper
Distributional Evidence and Beyond: the Success and Limitations of Machine Learning in Chinese Word Segmentation
2013cites this paper
Experiential Knowledge Mining
2013cites this paper
Limitations of Machine Learning in Chinese Word Segmentation
2013cites this paper
Knowledge-Poor Approach to Shallow Parsing: Contribution of Unsupervised Part-of-Speech Induction
2011influential citation
Manifold Learning for the Semi-Supervised Induction of FrameNet Predicates: An Empirical Investigation
2010cites this paper
Towards Open-Domain Semantic Role Labeling
2010cites this paper