Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning

Published 2014 in PLoS Comput. Biol.

ABSTRACT

Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs) have been found near or inside the protein-protein interaction (PPI) interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor). Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1) a 2-class problem (strengthening/weakening PPI mutations), (2) another 2-class problem (mutations that disrupt/preserve a PPI), and (3) a 3-class classification (detrimental/neutral/beneficial mutation effects). In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the rewiring of large-scale protein-protein interaction networks, and can be useful for functional annotation of disease-associated SNPs. SNIP-IN tool is freely accessible as a web-server at http://korkinlab.org/snpintool/.

PUBLICATION RECORD

Publication year
2014
Venue
PLoS Comput. Biol.
Publication date
2014-05-01
Fields of study
Biology, Medicine, Computer Science
Identifiers
DOI 10.1371/journal.pcbi.1003592 PMID 24784581 PMCID 4006705
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Predicting the functional consequences of somatic missense mutations found in tumors.
2014cited by this paper
Computational Design of Proteins Targeting the Conserved Stem Region of Influenza
2013cited by this paper
The effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on protein-protein interactions.
2013cited by this paper
Predicting Mendelian Disease-Causing Non-Synonymous Single Nucleotide Variants in Exome Sequencing Studies
2013cited by this paper
Community‐wide evaluation of methods for predicting the effect of mutations on protein–protein interactions
2013influential reference
INstruct: a database of high-quality 3D structurally resolved protein interactome networks
2013cited by this paper
Assessment of computational methods for predicting the effects of missense mutations in human cancers
2013cited by this paper
Semi-Supervised Video Segmentation Using Tree Structured Graphical Models
2013cited by this paper
Interpretation of Genomic Variants Using a Unified Biological Network Approach
2013cited by this paper
BeAtMuSiC: prediction of changes in protein–protein binding affinity on mutations
2013influential reference
Semi-Supervised Video Segmentation Using Tree Structured Graphical Models.
2013cited by this paper
Transcriptome and genome sequencing uncovers functional variation in humans
2013cited by this paper
Predicting the functional consequences of cancer-associated amino acid substitutions
2013cited by this paper
Predicting protective bacterial antigens using random forest classifiers
2012cited by this paper
Three-dimensional reconstruction of protein networks provides insight into human genetic disease
2012cited by this paper
Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations
2012cited by this paper
A Combined Functional Annotation Score for Non-Synonymous Variants
2012cited by this paper
Protein–protein interaction sites are hot spots for disease‐associated nonsynonymous SNPs
2012cited by this paper
HINT: High-quality protein interactomes and their applications in understanding human disease
2012cited by this paper
Identification of Novel Type 1 Diabetes Candidate Genes by Integrating Genome-Wide Association Data, Protein-Protein Interactions, and Human Pancreatic Islet Gene Expression
2012cited by this paper
SKEMPI: a Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models
2012cited by this paper
Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing
2012cited by this paper
DROP: an SVM domain linker predictor trained with optimal features selected by random forest
2011cited by this paper
DOMMINO: a database of macromolecular interactions
2011cited by this paper
Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel.
2011cited by this paper
Predicting the functional impact of protein mutations: application to cancer genomics
2011cited by this paper
Computational Design of Proteins Targeting the Conserved Stem Region of Influenza Hemagglutinin
2011cited by this paper
AN EMPIRICAL COMPARISON OF SUPERVISED LEARNING ALGORITHMS IN DISEASE DETECTION
2011cited by this paper
A structure‐based benchmark for protein–protein binding affinity
2011cited by this paper
GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction.
2011cited by this paper
Prediction of Enzyme Mutant Activity Using Computational Mutagenesis and Incremental Transduction
2011cited by this paper
Accounting for conformational entropy in predicting binding free energies of protein‐protein interactions
2011cited by this paper
RAD51C Germline Mutations in Breast and Ovarian Cancer Cases from High-Risk Families
2011cited by this paper
Three-dimensional modeling of protein interactions and complexes is going 'omics.
2011cited by this paper
Comparing experimental and computational alanine scanning techniques for probing a prototypical protein-protein interaction.
2011cited by this paper
Feature‐based classification of native and non‐native protein–protein interactions: Comparing supervised and semi‐supervised learning approaches
2011cited by this paper
Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models
2010cited by this paper
Computational mutagenesis studies of hammerhead ribozyme catalysis.
2010cited by this paper
Semi-supervised multi-task learning for predicting interactions between HIV-1 and human proteins
2010cited by this paper
Kinetic stability may determine the interaction dynamics of the bifunctional protein DCoH1, the dimerization cofactor of the transcription factor HNF-1α .
2010cited by this paper
A method and server for predicting damaging missense mutations
2010cited by this paper
A human functional protein interaction network and its application to cancer data analysis
2010cited by this paper
Knowledge-based computational mutagenesis for predicting the disease potential of human non-synonymous single nucleotide polymorphisms.
2010cited by this paper
Protein-protein docking tested in blind predictions: the CAPRI experiment.
2010cited by this paper
A comprehensive comparison of ML algorithms for gene expression data classification
2010cited by this paper
XRCC3 Thr241Met polymorphism and breast cancer risk: a meta-analysis
2010cited by this paper
Integrating common and rare genetic variation in diverse human populations
2010cited by this paper
Semi-Supervised Random Forests
2009cited by this paper
Modeling effects of human single nucleotide polymorphisms on protein-protein interactions.
2009cited by this paper
Semi-supervised protein subcellular localization
2009cited by this paper
The WEKA data mining software: an update
2009cited by this paper
Predicting free energy changes using structural ensembles
2009cited by this paper
Introduction to Semi-Supervised Learning
2009cited by this paper
HNF1A gene polymorphisms and cardiovascular risk factors in individuals with late-onset autosomal dominant diabetes: a cross-sectional study
2009cited by this paper
Functional characterization of the RAD51D E233G genetic variant
2009cited by this paper
Human genetic variation and its contribution to complex traits
2009cited by this paper
Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature
2008cited by this paper
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro‐fuzzy classifiers
2008cited by this paper
Computational Mutagenesis of E. coliLacRepressor: Insight into Structure-Function Relationships and Accurate Prediction of Mutant Activity
2008cited by this paper
A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
2008cited by this paper
OPUS-PSP: an orientation-dependent statistical all-atom potential derived from side-chain packing.
2008cited by this paper
SNAP predicts effect of mutations on protein function
2008cited by this paper
Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all‐atom statistical energy functions
2008cited by this paper
An empirical evaluation of supervised learning in high dimensions
2008cited by this paper
A Structural Bioinformatics Approach to the Analysis of nonsynonymous Single nucleotide polymorphisms (nsSNPS) and their Relation to Disease
2007cited by this paper
Genetic polymorphism of XRCC3 Thr241Met and breast cancer risk: case-control study in Korean women and meta-analysis of 12 studies
2007cited by this paper
Geometric packing potential function for model selection in protein structure and protein-protein binding predictions
2007cited by this paper
Computational mutagenesis studies of protein structure‐function correlations
2006cited by this paper
Semi-supervised analysis of gene expression profiles for lineage-specific development in the Caenorhabditis elegans embryo
2006cited by this paper
An empirical comparison of supervised learning algorithms
2006cited by this paper
Predicting the effects of amino acid substitutions on protein function.
2006cited by this paper
Prediction of water and metal binding sites and their affinities by using the Fold-X force field.
2005cited by this paper
The FoldX web server: an online force field
2005cited by this paper
Performance Comparisons of Semi-Supervised Learning Algorithms
2005cited by this paper
Working Set Selection Using Second Order Information for Training Support Vector Machines
2005cited by this paper
Semi-Supervised Classification by Low Density Separation
2005cited by this paper
Use of bioinformatics tools for the annotation of disease‐associated mutations in animal models
2005cited by this paper
A physical reference state unifies the structure‐derived potential of mean force for protein folding and binding
2004cited by this paper
No association between the DNA repair gene XRCC3 T241M polymorphism and risk of skin cancer and breast cancer.
2003cited by this paper
Modeller: generation and refinement of homology-based protein structure models.
2003cited by this paper
CAPRI: A Critical Assessment of PRedicted Interactions
2003cited by this paper
Interactions involving the Rad51 paralogs Rad51C and XRCC3 in human cells.
2002cited by this paper
RAD51C Interacts with RAD51B and Is Central to a Larger Protein Complex in Vivo Exclusive of RAD51*
2002cited by this paper
Involvement of Rad51C in two distinct protein complexes of Rad51 paralogs in human cells.
2002cited by this paper
Variation is the spice of life
2001cited by this paper
Homologous-pairing activity of the human DNA-repair proteins Xrcc3⋅Rad51C
2001cited by this paper
ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions
2001cited by this paper
Random Forests
2001cited by this paper
A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms
2001cited by this paper
dbSNP: the NCBI database of genetic variation
2001cited by this paper
Characterization of single-nucleotide polymorphisms in coding regions of human genes
1999cited by this paper
Transductive Inference for Text Classification using Support Vector Machines
1999cited by this paper
Characterization of single-nucleotide polymorphisms in coding regions of human genes
1999cited by this paper
Anatomy of hot spots in protein interfaces.
1998influential reference

CITED BY

Advances in MRGPRX2-mediated anaphylactoid reactions to traditional Chinese medicine injections
2025cites this paper
A quantitative comparison of the deleteriousness of missense and nonsense mutations using the structurally resolved human protein interactome
2025cites this paper
ProBASS—a language model with sequence and structural features for predicting the effect of mutations on binding affinity
2025cites this paper
Mutations of MRGPRX2, drug sensitivity, and genetic markers related to disease
2025cites this paper
Detection of autism spectrum disorder-related pathogenic trio variants by a novel structure-based approach
2024cites this paper
Toward Robust Self-Training Paradigm for Molecular Prediction Tasks
2024cites this paper
Gene expression networks regulated by human personality
2024cites this paper
Cognitive Brain of Homo sapiens: Stress, Emotions, Health, Hormones, Longevity
2024cites this paper
ProBASS – a language model with sequence and structural features for predicting the effect of mutations on binding affinity
2024cites this paper
Unraveling Extremely Damaging IRAK4 Variants and Their Potential Implications for IRAK4 Inhibitor Efficacy
2023cites this paper
Utilizing Semi-supervised Method in Predicting BRCA1 Pathogenicity Variants
2023cites this paper
Genetic prediction of quantitative traits: a machine learner's guide focused on height
2023cites this paper
Unraveling the impact of ORF3a Q57H mutation on SARS-CoV-2: insights from molecular dynamics
2023cites this paper
The Extent of Edgetic Perturbations in the Human Interactome Caused by Population-Specific Mutations
2023influential citation
PSnpBind-ML: predicting the effect of binding site mutations on protein-ligand binding affinity
2023cites this paper
New GO-based measures in multiple network alignment
2023cites this paper
Assessment of 13 in silico pathogenicity methods on cancer-related variants
2022cites this paper
HBD-2 variants and SARS-CoV-2: New insights into inter-individual susceptibility
2022cites this paper
Robust self-training strategy for various molecular biology prediction tasks
2022cites this paper
SANA: cross-species prediction of Gene Ontology GO annotations via topological network alignment
2022cites this paper
Determination of key residues in MRGPRX2 to enhance pseudo-allergic reactions induced by fluoroquinolones
2022cites this paper
Are transient protein-protein interactions more dispensable?
2022cites this paper
Predicting protein interaction network perturbation by alternative splicing with semi-supervised learning.
2021cites this paper
Structure analysis of deleterious nsSNPs in human PALB2 protein for functional inference
2021cites this paper
Using collections of structural models to predict changes of binding affinity caused by mutations in protein–protein interactions
2020cites this paper
Machine learning, an impetus approach for molecular functional annotation in plants
2020cites this paper
Prognostic outcome prediction by semi-supervised least squares classification
2020cites this paper
In Silico Tools and Approaches for the Prediction of Functional and Structural Effects of Single-Nucleotide Polymorphisms on Proteins: An Expert Review
2020cites this paper
MutaBind2: Predicting the Impacts of Single and Multiple Mutations on Protein-Protein Interactions
2020cites this paper
Pathogenicity and functional impact of non-frameshifting insertion/deletion variation in the human genome
2019cites this paper
Pharmacogenes (PGx-genes): Current understanding and future directions.
2019cites this paper
Enriching Human Interactome with Functional Mutations to Detect High-Impact Network Modules Underlying Complex Diseases
2019cites this paper
Immune responses induced by different genotypes of the disease-specific protein of Rice stripe virus in the vector insect.
2019cites this paper
Detection of protein complexes from multiple protein interaction networks using graph embedding
2019cites this paper
Methods to improve short fragment NGS analysis - with a focus on ancient DNA
2019cites this paper
Analysis of single amino acid variations in singlet hot spots of protein‐protein interfaces
2018cites this paper
Probability of phenotypically detectable protein damage by ENU-induced mutations in the Mutagenetix database
2018cites this paper
Epigenetic versus Genetic Deregulation of the KEAP1/NRF2 Axis in Solid Tumors: Focus on Methylation and Noncoding RNAs
2018cites this paper
Recent advances in automated protein design and its future challenges
2018cites this paper
Multilayer view of pathogenic SNVs in human interactome through in-silico edgetic profiling
2018influential citation
SKEMPI 2.0: an updated benchmark of changes in protein–protein binding energy, kinetics and thermodynamics upon mutation
2018cites this paper
Computational Approaches to Prioritize Cancer Driver Missense Mutations
2018cites this paper
Genome Wide Identification of Mutational Hotspots in the Apicomplexan Parasite Neospora caninum and the Implications for Virulence
2018cites this paper
Do environmentally induced DNA variations mediate adaptation in Aspergillus flavus exposed to chromium stress in tannery sludge?
2018cites this paper
Mas-Related G Protein-Coupled Receptor-X2 (MRGPRX2) in Drug Hypersensitivity Reactions
2018cites this paper
Individualized screening for chaperone activity in Gaucher disease using multiple patient derived primary cell lines.
2018cites this paper
Determining rewiring effects of alternatively spliced isoforms on protein-protein interactions using a computational approach
2018cites this paper
Genetics of migration timing in bar-tailed godwits : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Zoology at Massey University, Manawatū, New Zealand
2018cites this paper
Data integration and predictive modeling methods for multi-omics datasets.
2018cites this paper
Mass spectrometric characterization of protein structures and protein complexes in condensed and gas phase
2017cites this paper
Identification of Sequence Variants within Experimentally Validated Protein Interaction Sites Provides New Insights into Molecular Mechanisms of Disease Development
2017cites this paper
Apparent activation energies of protein–protein complex dissociation in the gas–phase determined by electrospray mass spectrometry
2017cites this paper
SNPViz - Visualization of SNPs in proteins
2017cites this paper
Mutations at protein-protein interfaces: Small changes over big surfaces have large impacts on human health.
2017cites this paper
GenProBiS: web server for mapping of sequence variants to protein binding sites
2017cites this paper
MSDC-0160 and MSDC-0602 Binding with Human Mitochondrial Pyruvate Carrier (MPC) 1 and 2 Heterodimer: PPARγ Activating and Sparing TZDs as Therapeutics
2017cites this paper
Mutation-Structure-Function Relationship Based Integrated Strategy Reveals the Potential Impact of Deleterious Missense Mutations in Autophagy Related Proteins on Hepatocellular Carcinoma (HCC): A Comprehensive Informatics Approach
2017cites this paper
Genetic and genomic technologies and diagnosis of aggressive prostate cancer
2017cites this paper
Predicting nsSNPs that Disrupt Protein-Protein Interactions Using Docking
2017cites this paper
Annotating Mutational Effects on Proteins and Protein Interactions: Designing Novel and Revisiting Existing Protocols.
2017cites this paper
Identification of protein complexes by integrating multiple alignment of protein interaction networks
2017cites this paper
DOMMINO 2.0: integrating structurally resolved protein-, RNA-, and DNA-mediated macromolecular interactions
2016cites this paper
Protein Conformational Dynamics In Genomic Analysis
2016cites this paper
Systems Pharmacology: An Overview
2016cites this paper
2 Systems Pharmacology : A Network-Based View of Drug Action
2016cites this paper
Effect-specific analysis of pathogenic SNVs in human interactome: Leveraging edge-based network robustness
2016cites this paper
Keap1/Nrf2 impairing revised: are we missing the single nucleotide polymorphisms?
2016cites this paper
Structural impact analysis of missense SNPs present in the uroguanylin gene by long-term molecular dynamics simulations.
2016cites this paper
Incorporation of protein binding effects into likelihood ratio test for exome sequencing data
2016cites this paper
Analysis of Genetic Variation and Potential Applications in Genome-Scale Metabolic Modeling
2015cites this paper
Reasoning over genetic variance information in cause-and-effect models of neurodegenerative diseases
2015cites this paper
The variation game: Cracking complex genetic disorders with NGS and omics data.
2015cites this paper
Integration of structural dynamics and molecular evolution via protein interaction networks: a new era in genomic medicine.
2015cites this paper
GESPA: classifying nsSNPs to predict disease association
2015cites this paper
In Silico Prediction of the Effects of Mutations in the Human Mevalonate Kinase Gene: Towards a Predictive Framework for Mevalonate Kinase Deficiency
2015cites this paper
The potential relationship discovery model based on result fusion for biomedical medicine research
2015cites this paper
eQuant - A Server for Fast Protein Model Quality Assessment by Integrating High-Dimensional Data and Machine Learning
2015cites this paper
Childhood cancer: an emerging public health issue in China.
2015cites this paper
Docking features for predicting binding loss due to protein mutation
2014cites this paper
The effect of sequence variation on essential protein-protein interactions of pathogens -- A computational analysis
2014cites this paper
Machine learning for Big Data analytics in plants.
2014cites this paper