BackgroundGood automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools.MethodsWe present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts.ResultsThis system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation.ConclusionCentral contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches.
Exploring the boundaries: gene and protein identification in biomedical text
J. Finkel,Shipra Dingare,Christopher D. Manning,M. Nissim,Beatrice Alex,Claire Grover
Published 2005 in BMC Bioinformatics
ABSTRACT
PUBLICATION RECORD
- Publication year
2005
- Venue
BMC Bioinformatics
- Publication date
2005-05-24
- Fields of study
Biology, Medicine, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar, PubMed
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-25 of 25 references · Page 1 of 1