Deep learning with word embeddings improves biomedical named entity recognition

Maryam Habibi,Leon Weber,M. Neves,D. Wiegandt,U. Leser

Published 2017 in Bioinform.

ABSTRACT

Motivation: Text mining has become an important tool for biomedical research. The most fundamental text‐mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre‐defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and linguistic information. State‐of‐the‐art tools are entity‐specific, as dictionaries and empirically optimal feature sets differ between entity types, which makes their development costly. Furthermore, features are often optimized for a specific gold standard corpus, which makes extrapolation of quality measures difficult. Results: We show that a completely generic method based on deep learning and statistical word embeddings [called long short‐term memory network‐conditional random field (LSTM‐CRF)] outperforms state‐of‐the‐art entity‐specific NER tools, and often by a large margin. To this end, we compared the performance of LSTM‐CRF on 33 data sets covering five different entity classes with that of best‐of‐class NER tools and an entity‐agnostic CRF implementation. On average, F1‐score of LSTM‐CRF is 5% above that of the baselines, mostly due to a sharp increase in recall. Availability and implementation: The source code for LSTM‐CRF is available at https://github.com/glample/tagger and the links to the corpora are available at https://corposaurus.github.io/corpora/. Contact: habibima@informatik.hu‐berlin.de

PUBLICATION RECORD

Publication year
2017
Venue
Bioinform.
Publication date
2017-07-12
Fields of study
Medicine, Computer Science
Identifiers
DOI 10.1093/bioinformatics/btx228 PMID 28881963 PMCID 5870729
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task
2016cited by this paper
BioCreative V CDR task corpus: a resource for chemical disease relation extraction
2016cited by this paper
Bidirectional LSTM-CRF for Clinical Concept Extraction
2016cited by this paper
Exploring the Limits of Language Modeling
2016cited by this paper
TaggerOne: joint named entity recognition and normalization with semi-Markov Models
2016influential reference
vSDC: a method to improve early recognition in virtual screening when limited experimental resources are available
2016cited by this paper
Neural Architectures for Named Entity Recognition
2016cited by this paper
Recognizing chemicals in patents: a comparative analysis
2016cited by this paper
An Investigation of Recurrent Neural Architectures for Drug Name Recognition
2016cited by this paper
Drug Name Recognition: Approaches and Resources
2015cited by this paper
The CHEMDNER corpus of chemicals and drugs and its annotation principles
2015cited by this paper
Combining Conditional Random Fields and Word Embeddings for the CHEMDNER-patents task
2015cited by this paper
Linked annotations: a middle ground for manual curation of biomedical databases and text corpora
2015cited by this paper
Prediction of the potency of mammalian cyclooxygenase inhibitors with ensemble proteochemometric modeling
2015cited by this paper
CHEMDNER: The drugs and chemical names extraction challenge
2015cited by this paper
Semi-supervised Sequence Learning
2015cited by this paper
A Computational Framework for 3D Mechanical Modeling of Plant Morphogenesis with Cellular Resolution
2015cited by this paper
Overview of the CHEMDNER patents task
2015cited by this paper
miRTex: A Text Mining System for miRNA-Gene Relation Extraction
2015cited by this paper
Cell line name recognition in support of the identification of synthetic lethality in cancer from text
2015cited by this paper
tmChem: a high performance approach for chemical named entity recognition and normalization
2015cited by this paper
Effects of Semantic Features on Machine Learning-Based Drug Name Recognition Systems: Word Embeddings vs. Manually Constructed Dictionaries
2015cited by this paper
Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics
2015cited by this paper
Surgical treatment of CRPS
2015cited by this paper
Chemical named entities recognition: a review on approaches and applications
2014cited by this paper
Evaluation and Integration of Genetic Signature for Prediction Risk of Nasopharyngeal Carcinoma in Southern China
2014cited by this paper
Detecting miRNA Mentions and Relations in Biomedical Literature
2014cited by this paper
Human symptoms–disease network
2014cited by this paper
Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks
2014cited by this paper
Annotated Chemical Patent Corpus: A Gold Standard for Text Mining
2014cited by this paper
NCBI disease corpus: A resource for disease name recognition and concept normalization
2014cited by this paper
Natural History of Malignant Bone Disease in Hepatocellular Carcinoma: Final Results of a Multicenter Bone Metastasis Survey
2014cited by this paper
Reply to Science-based risk assessment requires careful evaluation of all studies
2013cited by this paper
Gimli: open source and high-performance biomedical name recognition
2013cited by this paper
Distributional Semantics Resources for Biomedical Text Processing
2013cited by this paper
How to Construct Deep Recurrent Neural Networks
2013cited by this paper
DNorm: disease name normalization with pairwise learning to rank
2013cited by this paper
Overview of the chemical compound and drug name recognition ( CHEMDNER ) task
2013cited by this paper
The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text
2013cited by this paper
Named Entity Recognition : A Survey of Machine-Learning Tools
2012influential reference
Annotating and Evaluating Text for Stem Cell Research
2012cited by this paper
ChemSpot: a hybrid system for chemical named entity recognition
2012cited by this paper
Size (and Domain) Matters: Evaluating Semantic Word Space Representations for Biomedical Text
2012cited by this paper
Theory and Applications for Advanced Text Mining
2012cited by this paper
A novel method for assigning functional linkages to proteins using enhanced phylogenetic trees
2011cited by this paper
Evidence for classification of c.1852_1853AA>GC in MLH1 as a neutral variant for Lynch syndrome
2011cited by this paper
2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text
2011cited by this paper
Disambiguating the species of biomedical named entities using natural language parsers
2010cited by this paper
A Proposal for a Configurable Silver Standard
2010cited by this paper
LINNAEUS: A species name identification system for biomedical literature
2010cited by this paper
An Empirical Evaluation of Resources for the Identification of Diseases and Adverse Effects in Biomedical Literature
2010cited by this paper
A dictionary to identify small molecules and drugs in free text
2009cited by this paper
OSIRISv1.2: A named entity recognition system for sequence variants of genes in biomedical literature
2008cited by this paper
Overview of BioCreative II gene mention recognition
2008cited by this paper
BioInfer: a corpus for information extraction in the biomedical domain
2007cited by this paper
BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition
2007influential reference
Gene prioritization through genomic data fusion
2006cited by this paper
ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text
2005cited by this paper
What makes a gene name? Named entity recognition in the biomedical literature
2005cited by this paper
2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures
2005influential reference
Introduction to the Bio-entity Recognition Task at JNLPBA
2004cited by this paper
Confidence Estimation for Information Extraction
2004cited by this paper
Integrated Annotation for Biomedical Information Extraction
2004cited by this paper
Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
2003cited by this paper
A Biological Named Entity Recognizer
2002cited by this paper
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
2001cited by this paper
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
2001cited by this paper
Mining MEDLINE: Abstracts, Sentences, or Phrases?
2001cited by this paper
Long Short-Term Memory
1997influential reference
A tutorial on hidden Markov models and selected applications in speech recognition
1989cited by this paper
On the suitability of minimum and product operators for the intersection of fuzzy sets
1979cited by this paper

CITED BY

Real-Time Named Entity Recognition from Textual Electronic Clinical Records in Cancer Therapy Using Low-Latency Neural Networks.
2026cites this paper
OpenBioNER-v2: A Suite of Lightweight Models for Zero-Shot Medical Named Entity Recognition via Type Descriptions
2026cites this paper
A Study on Building Efficient Zero-Shot Relation Extraction Models
2026cites this paper
From Patient Emotion Recognition to Provider Understanding: A Multimodal Data Mining Framework for Emotion-Aware Clinical Counseling Systems
2026cites this paper
Chinese medical named entity recognition integrating adversarial training and feature enhancement
2025cites this paper
iEnhancer-GDM: A Deep Learning Framework Based on Generative Adversarial Network and Multi-head Attention Mechanism to Identify Enhancers and Their Strength
2025cites this paper
A refined set of RxNorm drug names for enhancing unstructured data analysis in drug safety surveillance
2025cites this paper
Optimized Domain-Specific Text Processing with Keyword Knowledge Distillation (KKD)
2025cites this paper
DREaM: Drug-Drug Relation Extraction via Transfer Learning Method
2025cites this paper
Med-VLM: Enhancing Medical Image Segmentation Accuracy Through Vision-Language Model
2025cites this paper
BERT applications in natural language processing: a review
2025cites this paper
OpenBioNER: Lightweight Open-Domain Biomedical Named Entity Recognition Through Entity Type Description
2025cites this paper
Psychomedical named entity recognition method based on multi-level feature extraction and multi-granularity embedding fusion
2025cites this paper
MedicalBERT: enhancing biomedical natural language processing using pretrained BERT-based model
2025cites this paper
A study on the methods of terminology expansion based on inclusion and exclusion criteria
2025cites this paper
Effective Multi-Task Learning for Biomedical Named Entity Recognition
2025cites this paper
Context-Aware Multimodal Representation Learning for Spatio-Temporally Explicit Environmental Modelling
2025cites this paper
MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph
2025cites this paper
Automated SNOMED CT Concept Annotation in Clinical Text Using Bi-GRU Neural Networks
2025cites this paper
Deep learning for named entity recognition in extracting critical information from struck-by accidents in construction
2025cites this paper
Joint Extraction of Uygur Medicine Knowledge with Edge Computing
2025cites this paper
Automated Data Harmonization in Clinical Research: Natural Language Processing Approach
2025cites this paper
lasigeBioTM at BioASQ25 Task GutBrainIE - Lean Large Language Models with Syntactic Features
2025cites this paper
CORE-NER: LLM-Based Character-Oriented Reference Enhancement for Chemical Named Entity Recognition
2025cites this paper
Explore the Chinese Named Entity Recognition of goods categories applied to Freight Network Data
2025cites this paper
A Comprehensive Study on the Use of Word Embedding Models in Software Engineering Domain
2025cites this paper
Named entity recognition and relationship extraction for biomedical text: A comprehensive survey, recent advancements, and future research directions
2024cites this paper
One-shot Biomedical Named Entity Recognition via Knowledge-Inspired Large Language Model
2024cites this paper
GRU-SCANET: unleashing the power of GRU-based sinusoidal capture network for precision-driven named entity recognition
2024cites this paper
TransformDDI: The Transformer-Based Joint Multi-Task Model for End-to-End Drug-Drug Interaction Extraction
2024cites this paper
Joint Extraction of Uyghur Medicine Knowledge with Edge Computing
2024cites this paper
Data governance and Gensini score automatic calculation for coronary angiography with deep-learning-based natural language extraction.
2024cites this paper
Evaluation of Natural Language Processing Techniques for Information Retrieval
2024cites this paper
A clinical named entity recognition model using pretrained word embedding and deep neural networks
2024cites this paper
Automated Information Extraction from Thyroid Operation Narrative: A Comparative Study of GPT-4 and Fine-tuned KoELECTRA
2024cites this paper
BioBBC: a multi-feature model that enhances the detection of biomedical entities
2024cites this paper
High-Throughput Phenotyping of Clinical Text Using Large Language Models
2024cites this paper
Advancing language models through domain knowledge integration: a comprehensive approach to training, evaluation, and optimization of social scientific neural word embeddings
2024cites this paper
HunFlair2 in a cross-corpus evaluation of biomedical named entity recognition and normalization tools
2024cites this paper
VANER: Leveraging Large Language Model for Versatile and Adaptive Biomedical Named Entity Recognition
2024cites this paper
A Large Language Model Outperforms Other Computational Approaches to the High-Throughput Phenotyping of Physician Notes
2024cites this paper
An adaptive multi-neural network model for named entity recognition of Chinese mechanical equipment corpus
2024cites this paper
Enhanced Identification of Care Preference Documentation in Patients' Discharge Summaries Using Pre-Trained Large Language Models
2024cites this paper
Language model based on deep learning network for biomedical named entity recognition.
2024cites this paper
Large Language Models in Biomedical and Health Informatics: A Bibliometric Review
2024cites this paper
Synergizing Knowledge Graphs with Large Language Models: A Comprehensive Review and Future Prospects
2024cites this paper
Large Language Models in Biomedical and Health Informatics: A Review with Bibliometric Analysis
2024cites this paper
GPDminer: a tool for extracting named entities and analyzing relations in biological literature
2024cites this paper
Multi-head CRF classifier for biomedical multi-class named entity recognition on Spanish clinical notes
2024cites this paper
Research on Medical Text Named Entity Recognition Model Based on Prompt Contrastive Learning
2024cites this paper
A comprehensive survey and taxonomy on privacy-preserving deep learning
2024cites this paper
High Throughput Phenotyping of Physician Notes with Large Language and Hybrid NLP Models
2024cites this paper
Enhancing aviation safety and mitigating accidents: A study on aviation safety hazard identification
2024cites this paper
A Semantically Enhanced Label Prediction Method for Imbalanced POI Data Category Distribution
2024cites this paper
Information Extraction: An application to the domain of hyper-local financial data on developing countries
2024cites this paper
Towards discovery: an end-to-end system for uncovering novel biomedical relations
2024cites this paper
Enhancing quality control in bioprinting through machine learning
2024cites this paper
LADA-Trans-NER: Adaptive Efficient Transformer for Chinese Named Entity Recognition Using Lexicon-Attention and Data-Augmentation
2023cites this paper
Umami-BERT: An interpretable BERT-based model for umami peptides prediction.
2023cites this paper
Towards Extracting and Utilising Entities in Task Specific Low Resource Settings
2023cites this paper
Extraction and linking of motivation, specification and structure of inventions for early design use
2023cites this paper
iSyn: Semi-automated Smart Contract Synthesis from Legal Financial Agreements
2023influential citation
Can Race-sensitive Biomedical Embeddings Improve Healthcare Predictive Models?
2023cites this paper
DMNER: Biomedical Named Entity Recognition by Detection and Matching
2023influential citation
Consistency enhancement of model prediction on document-level named entity recognition
2023cites this paper
Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization
2023cites this paper
A Review on Clinical Named Entity Recognition
2023cites this paper
Measuring the interdisciplinary characteristics of Chinese research in library and information science based on knowledge elements
2023cites this paper
A Review on Electronic Health Record Text-Mining for Biomedical Name Entity Recognition in Healthcare Domain
2023cites this paper
From Zero to Hero: Harnessing Transformers for Biomedical Named Entity Recognition in Zero- and Few-shot Contexts
2023cites this paper
DeepAR: a novel deep learning-based hybrid framework for the interpretable prediction of androgen receptor antagonists
2023cites this paper
Vocabulary Matters: An Annotation Pipeline and Four Deep Learning Algorithms for Enzyme Named Entity Recognition
2023cites this paper
Adversarial Transfer Learning for Biomedical Named Entity Recognition
2023cites this paper
Embeddings for Automatic Short Answer Grading: A Scoping Review
2023cites this paper
Research on Named Entity Recognition for Spoken Language Understanding Using Adversarial Transfer Learning
2023cites this paper
AI-Based Knowledge Extraction from the Bioprinting Literature for Identifying Technology Trends
2023influential citation
Exploring Partial Knowledge Base Inference in Biomedical Entity Linking
2023cites this paper
Adversarial Adaptation for French Named Entity Recognition
2023cites this paper
Knowledge Adaptive Multi-Way Matching Network for Biomedical Named Entity Recognition via Machine Reading Comprehension
2023cites this paper
Transferring From Textual Entailment to Biomedical Named Entity Recognition
2023cites this paper
Extraction of knowledge graph of Covid-19 through mining of unstructured biomedical corpora
2023cites this paper
Improving Feature Extraction Using a Hybrid of CNN and LSTM for Entity Identification
2023cites this paper
A text mining-based approach for understanding Chinese railway incidents caused by electromagnetic interference
2023cites this paper
Named Entity Recognition From Biomedical Data
2023cites this paper
Extraction of knowledge graph of Covid-19 through mining of unstructured biomedical corpora
2023cites this paper
Extraction of knowledge graph of Covid-19 through mining of unstructured biomedical corpora
2023cites this paper
Advancing COVID-19 Inquiry Responses Through Transfer Learning-Based Question Entailment Methodology
2023cites this paper
Biomedical Named Entity Recognition Through Deep Reinforcement Learning
2023cites this paper
Improving biomedical named entity recognition through transfer learning and asymmetric tri-training
2023cites this paper
A prefix and attention map discrimination fusion guided attention for biomedical named entity recognition
2023influential citation
Advanced Privacy Preserving Model for Smart Healthcare Using Deep Learning
2023cites this paper
TaughtNet: Learning Multi-Task Biomedical Named Entity Recognition From Single-Task Teachers
2023cites this paper
A survey on Relation Extraction
2023cites this paper
Ontology-Powered Boosting for Improved Recognition of Ontology Concepts from Biological Literature
2023cites this paper
Automatic Knowledge Graph Construction over Efficient Information Extraction Networks
2023cites this paper
A Named Entity Recognition Approach for Electronic Medical Records Using BERT Semantic Enhancement and BiLSTM
2023cites this paper
Web Interface of NER and RE with BERT for Biomedical Text Mining
2023cites this paper
Class-Imbalanced-Aware Distantly Supervised Named Entity Recognition
2023cites this paper
A transformer-based method for zero and few-shot biomedical named entity recognition
2023cites this paper
A Hybrid Named Entity Recognition System for Aviation Text
2023cites this paper