PPPred: Classifying Protein-phenotype Co-mentions Extracted from Biomedical Literature

Morteza Pourreza Shahri,G. Reynolds,Mandi M. Roe,Indika Kahanda

Published 2019 in bioRxiv

ABSTRACT

The MEDLINE database provides an extensive source of scientific articles and heterogeneous biomedical information in the form of unstructured text. One of the most important knowledge present within articles are the relations between human proteins and their phenotypes, which can stay hidden due to the exponential growth of publications. This has presented a range of opportunities for the development of computational methods to extract these biomedical relations from the articles. However, currently, no such method exists for the automated extraction of relations involving human proteins and human phenotype ontology (HPO) terms. In our previous work, we developed a comprehensive database composed of all co-mentions of proteins and phenotypes. In this study, we present a supervised machine learning approach called PPPred (Protein-Phenotype Predictor) for classifying the validity of a given sentence-level co-mention. Using an in-house developed gold standard dataset, we demonstrate that PPPred significantly outperforms several baseline methods. This two-step approach of co-mention extraction and classification constitutes a complete biomedical relation extraction pipeline for extracting protein-phenotype relations. CCS CONCEPTS •Computing methodologies → Information extraction; Supervised learning by classification; •Applied computing →Bioinformatics;

PUBLICATION RECORD

Publication year
2019
Venue
bioRxiv
Publication date
2019-05-31
Fields of study
Biology, Medicine, Computer Science
Identifiers
DOI 10.1101/654475
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Ontology based text mining of gene-phenotype associations: application to candidate gene prediction
2019influential reference
Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering
2019cited by this paper
ProPheno 1.0: An Online Dataset for Accelerating the Complete Characterization of the Human Protein-Phenotype Landscape in Biomedical Literature
2019cited by this paper
Extracting Co-mention Features from Biomedical Literature for Automated Protein Phenotype Prediction using PHENOstruct
2018cited by this paper
A population genetic interpretation of GWAS findings for human quantitative traits
2018cited by this paper
A hybrid model based on neural networks for biomedical relation extraction
2018cited by this paper
Extracting chemical–protein relations with ensembles of SVM and deep learning models
2018cited by this paper
Biocuration: Distilling data into knowledge
2018cited by this paper
SNPPhenA: a corpus for extracting ranked associations of single-nucleotide polymorphisms and phenotypes from literature
2017cited by this paper
Extracting microRNA-gene relations from biomedical literature using distant supervision
2017cited by this paper
Identifying genotype-phenotype relationships in biomedical text
2017influential reference
What is precision medicine?
2017cited by this paper
Proteome-Scale Investigation of Protein Allosteric Regulation Perturbed by Somatic Mutations in 7,000 Cancer Genomes.
2017cited by this paper
Protein Misfolding Diseases.
2017cited by this paper
BELMiner: adapting a rule-based relation extraction system to extract biological expression language statements from bio-medical literature evidence sentences
2017cited by this paper
Protein Misfolding, Amyloid Formation, and Human Disease: A Summary of Progress Over the Last Decade.
2017cited by this paper
DiMeX: A Text Mining System for Mutation-Disease Association Extraction
2016cited by this paper
Molecular interaction between type 2 diabetes and Alzheimer’s disease through cross-seeding of protein misfolding
2016cited by this paper
Protein misfolding and aggregation: Mechanism, factors and detection
2016cited by this paper
Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature
2016cited by this paper
Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine
2016cited by this paper
Multidimensional proteomics for cell biology
2015cited by this paper
Methods of integrating data to uncover genotype–phenotype interactions
2015cited by this paper
The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data
2014cited by this paper
The evolution of gene expression and the transcriptome-phenotype relationship.
2012cited by this paper
Interrater reliability: the kappa statistic
2012cited by this paper
Deep phenotyping for precision medicine
2012cited by this paper
A hybrid approach to extract protein-protein interactions
2011cited by this paper
Using text to build semantic networks for pharmacogenomics
2010cited by this paper
Bayesian inference of protein-protein interactions from biological literature
2009cited by this paper
Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study.
2008cited by this paper
A fast SCOP fold classification system using content-based E-Predict algorithm
2006cited by this paper
Learning Relations from Biomedical Corpora Using Dependency Trees
2006cited by this paper
Data and text mining Clustering microarray-derived gene lists through implicit literature relationships
2006cited by this paper
Systematic Association of Genes to Phenotypes by Genome and Literature Mining
2005cited by this paper
Discovering patterns to extract protein-protein interactions from full texts
2004cited by this paper
Classifying Semantic Relations in Bioscience Texts
2004cited by this paper
Extraction of protein interaction information from unstructured text using a context-free grammar
2003cited by this paper
Semantic Relations Asserting the Etiology of Genetic Diseases
2003cited by this paper
Mining literature for protein-protein interactions
2001cited by this paper
Event Extraction from Biomedical Papers Using a Full Parser
2000cited by this paper
Learning to Extract Relations from MEDLINE
1999cited by this paper
Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts.
1999cited by this paper
Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts.
1998cited by this paper

CITED BY

A review of semi-supervised learning for text classification
2023influential citation
Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes
2021influential citation
DeepPPPred: An Ensemble of BERT, CNN, and RNN for Classifying Co-mentions of Proteins and Phenotypes
2020influential citation
ProPheno 1.0: An Online Dataset for Accelerating the Complete Characterization of the Human Protein-Phenotype Landscape in Biomedical Literature
2019cites this paper