Sequence-based predictive modeling to identify cancerlectins

Hong-Yan Lai,Xin-Xin Chen,Hua Tang,Hao Lin

Published 2017 in OncoTarget

ABSTRACT

Lectins are a diverse type of glycoproteins or carbohydrate-binding proteins that have a wide distribution to various species. They can specially identify and exclusively bind to a certain kind of saccharide groups. Cancerlectins are a group of lectins that are closely related to cancer and play a major role in the initiation, survival, growth, metastasis and spread of tumor. Several computational methods have emerged to discriminate cancerlectins from non-cancerlectins, which promote the study on pathogenic mechanisms and clinical treatment of cancer. However, the predictive accuracies of most of these techniques are very limited. In this work, by constructing a benchmark dataset based on the CancerLectinDB database, a new amino acid sequence-based strategy for feature description was developed, and then the binomial distribution was applied to screen the optimal feature set. Ultimately, an SVM-based predictor was performed to distinguish cancerlectins from non-cancerlectins, and achieved an accuracy of 77.48% with AUC of 85.52% in jackknife cross-validation. The results revealed that our prediction model could perform better comparing with published predictive tools.

PUBLICATION RECORD

Publication year
2017
Venue
OncoTarget
Publication date
2017-03-07
Fields of study
Biology, Medicine, Computer Science
Identifiers
DOI 10.18632/oncotarget.15963 PMID 28423655 PMCID 5438640
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition
2017cited by this paper
iRSpot-EL: identify recombination spots with an ensemble learning approach
2017cited by this paper
[MicroRNA Target Prediction Based on Support Vector Machine Ensemble Classification Algorithm of Under-sampling Technique].
2016cited by this paper
iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition
2016cited by this paper
QAcon: single model quality assessment using protein structural and contact information with machine learning techniques
2016cited by this paper
DeepQA: improving the estimation of single protein model quality with deep belief networks
2016cited by this paper
A novel features ranking metric with application to scalable visual and bioinformatics data classification
2016cited by this paper
Multiclass Classification for the Differential Diagnosis on the ADHD Subtypes Using Recursive Feature Elimination and Hierarchical Extreme Learning Machine: Structural MRI Study
2016cited by this paper
Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology
2016cited by this paper
iPTM-mLys: identifying multiple lysine PTM sites and their different types
2016cited by this paper
Role of lectin microarrays in cancer diagnosis
2016cited by this paper
Prediction of phosphothreonine sites in human proteins by fusing different features
2016cited by this paper
Impacts of bioinformatics to medicinal chemistry.
2015cited by this paper
Lectins with Potential for Anti-Cancer Therapy
2015cited by this paper
PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions
2015cited by this paper
Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition.
2015cited by this paper
Predicting cancerlectins by the optimal g-gap dipeptides
2015influential reference
iDPF-PseRAAAC: A Web-Server for Identifying the Defensin Peptide Family and Subfamily Using Pseudo Reduced Amino Acid Alphabet Composition
2015cited by this paper
Accurate prediction of nuclear receptors with conjoint triad feature
2015influential reference
SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines
2014cited by this paper
A survey on feature selection methods
2014cited by this paper
Engagement of myelomonocytic Siglecs by tumor-associated ligands modulates the innate immune response to cancer
2014cited by this paper
Predicting peroxidase subcellular location by hybridizing different descriptors of Chou' pseudo amino acid patterns.
2014cited by this paper
Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis.
2014cited by this paper
Sequence-specific flexibility organization of splicing flanking sequence and prediction of splice sites in the human genome
2014cited by this paper
iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition.
2013cited by this paper
A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins
2013cited by this paper
SVM Based Descriptor Selection and Classification of Neurodegenerative Disease Drugs for Pharmacological Modeling
2013cited by this paper
Synthetic lectin arrays for the detection and discrimination of cancer associated glycans and cell lines.
2012cited by this paper
CD-HIT: accelerated for clustering the next-generation sequencing data
2012cited by this paper
Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions.
2012cited by this paper
Lectin microarray profiling of metastatic breast cancers.
2011cited by this paper
SSPred: A prediction server based on SVM for the identification and classification of proteins involved in bacterial secretion systems
2011cited by this paper
Analysis and prediction of cancerlectins using evolutionary and domain information
2011cited by this paper
A glycobiology review: carbohydrates, lectins and implications in cancer therapeutics.
2011cited by this paper
Prediction of thermophilic proteins using feature selection technique.
2011influential reference
Using K-minimum increment of diversity to predict secretory proteins of malaria parasite based on groupings of amino acids
2010cited by this paper
Some remarks on protein attribute prediction and pseudo amino acid composition
2010cited by this paper
Using the nonlinear dimensionality reduction method for the prediction of subcellular localization of Gram-negative bacterial proteins
2009cited by this paper
Use of tetrapeptide signals for protein secondary-structure prediction
2008cited by this paper
CancerLectinDB: a database of lectins relevant to cancer
2008influential reference
BMC Bioinformatics BioMed Central Methodology article
2006cited by this paper
Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data
2006cited by this paper
Lectins as Bioactive Plant Proteins: A Potential in Cancer Treatment
2005cited by this paper
Galectins as modulators of tumour progression
2005cited by this paper
On the Role of Cell Surface Carbohydrates and their Binding Proteins (lectins) in Tumor Metastasis
2004cited by this paper
Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy
2003cited by this paper
Characterization of lectins and their specificity in carcinomas—An appraisal
2003cited by this paper
How proteins bind carbohydrates: lessons from legume lectins.
2002cited by this paper
Lectins: Carbohydrate-Specific Proteins That Mediate Cellular Recognition.
1998cited by this paper
Lectins: from basic science to clinical application in cancer prevention.
1998cited by this paper
Cell surface carbohydrates and lectins in early development.
1997cited by this paper
REVIEW ARTICLE. CELL SURFACE CARBOHYDRATES AS PROGNOSTIC MARKERS IN HUMAN CARCINOMAS
1996cited by this paper
Prediction of protein structural classes.
1995cited by this paper
Support-Vector Networks
1995cited by this paper
Use of lectins as diagnostic and therapeutic tools for cancer.
1995cited by this paper
Cell surface carbohydrates in cell adhesion.
1991cited by this paper
Lectins in Cancer Cells
1988cited by this paper

CITED BY

Predlectins-MLP: an improved predictor of cancer-lectins using mixed features
2024cites this paper
Identification of cancerlectin proteins using hyperparameter optimization in deep learning and DDE profiles
2023cites this paper
IDDLncLoc: Subcellular Localization of LncRNAs Based on a Framework for Imbalanced Data Distributions
2022cites this paper
A GHKNN model based on the physicochemical property extraction method to identify SNARE proteins
2022cites this paper
Deep-PCL: A deep learning model for prediction of cancerlectins and non cancerlectins using optimized integrated features
2021cites this paper
The accurate prediction and characterization of cancerlectin by a combined machine learning and GO analysis
2021cites this paper
mLoc-mRNA: predicting multiple sub-cellular localization of mRNAs using random forest algorithm coupled with feature selection via elastic net
2021cites this paper
miRNALoc: predicting miRNA subcellular localizations based on principal component scores of physico-chemical properties and pseudo compositions of di-nucleotides
2020cites this paper
LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion
2020cites this paper
Comparison and Analysis of Computational Methods for Identifying N6 - methyladenosine Sites in Saccharomyces Cerevisiae.
2020cites this paper
PredAmyl-MLP: Prediction of Amyloid Proteins Using Multilayer Perceptron
2020cites this paper
Identification of Cancerlectins By Using Cascade Linear Discriminant Analysis and Optimal g-gap Tripeptide Composition
2020cites this paper
Multi-feature fusion and dimensional reduction based on the two-step deep ontology and the conjoint triad for the identification of cancerlectins
2020cites this paper
Identifying FL11 subtype by characterizing tumor immune microenvironment in prostate adenocarcinoma via Chou's 5-steps rule.
2020cites this paper
Recent advances of computational methods for identifying bacteriophage virion proteins.
2020cites this paper
Applications of machine learning methods in predicting nuclear receptors and their families.
2020cites this paper
CanLect-Pred: A Cancer Therapeutics Tool for Prediction of Target Cancerlectins Using Experiential Annotated Proteomic Sequences
2020cites this paper
Sequence-based Detection of DNA-binding Proteins using Multiple-View Features Allied with Feature Selection.
2020cites this paper
iterb-PPse: Identification of transcriptional terminators in bacterial by incorporating nucleotide properties into PseKNC
2020cites this paper
Identification of Cancerlectins Using Support Vector Machines With Fusion of G-Gap Dipeptide
2020influential citation
Characterization of the relationship between FLI1 and immune infiltrate level in tumour immune microenvironment for breast cancer
2020cites this paper
Remarks on computational method for identifying acid and alkaline enzymes.
2020cites this paper
Recent Advances on Prediction of Human Papillomaviruses Risk Types.
2019cites this paper
A Review of Recent Advances and Research on Drug Target Identification Methods.
2019cites this paper
Identification of D Modification Sites by Integrating Heterogeneous Features in Saccharomyces cerevisiae
2019cites this paper
Modulation of CD44, EGFR and RAC pathway genes (WAVE complex) in Epithelial cancers.
2019cites this paper
AngularQA: Protein Model Quality Assessment with LSTM Networks
2019cites this paper
A Review of DNA-binding Proteins Prediction Methods
2019cites this paper
Identification of Mitochondrial Proteins of Malaria Parasite Adding the New Parameter
2019cites this paper
iAFP-gap-SMOTE: An Efficient Feature Extraction Scheme Gapped Dipeptide Composition is Coupled with an Oversampling Technique for Identification of Antifreeze Proteins
2019cites this paper
Prediction of Nitrosocysteine Sites Using Position and Composition Variant Features
2019cites this paper
Protein Structural Class Prediction Based on Distance-related Statistical Features from Graphical Representation of Predicted Secondary Structure
2019cites this paper
Combining Support Vector Machine with Dual g-gap Dipeptides to Discriminate between Acidic and Alkaline Enzymes
2019cites this paper
Identification of Phage Virion Proteins by Using the g-gap Tripeptide Composition
2019cites this paper
PEPred-Suite: improved and robust prediction of therapeutic peptides using adaptive feature representation learning
2019cites this paper
Identification of hormone binding proteins based on machine learning methods.
2019cites this paper
A Survey for Predicting Enzyme Family Classes Using Machine Learning Methods.
2019cites this paper
The Development of Machine Learning Methods in Cell-Penetrating Peptides Identification: A Brief Review.
2019cites this paper
Predicting Ion Channels Genes and Their Types With Machine Learning Techniques
2019cites this paper
Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation
2019cites this paper
The Application of Machine Learning Techniques in Protein Drugs and Drug Targets Recognition.
2019cites this paper
A Brief Review of the Computational Identification of Antifreeze Protein
2019cites this paper
SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome
2019cites this paper
A Mendelian Randomization Study of Infant Length and Type 2 Diabetes Mellitus Risk.
2019cites this paper
A Computational Method for the Identification of Endolysins and Autolysins.
2019cites this paper
Recent Advance in Predicting Subcellular Localization of Mycobacterial Protein with Machine Learning Methods.
2019cites this paper
A comparison and assessment of computational method for identifying recombination hotspots in Saccharomyces cerevisiae
2019cites this paper
Recent Development of Computational Predicting Bioluminescent Proteins.
2019cites this paper
A Linear Regression Predictor for Identifying N6-Methyladenosine Sites Using Frequent Gapped K-mer Pattern
2019cites this paper
iPredCNC: Computational prediction model for cancerlectins and non-cancerlectins using novel cascade features subset selection
2019cites this paper
Survey of Machine Learning Techniques in Drug Discovery.
2019cites this paper
Targeting Virus-host Protein Interactions: Feature Extraction and Machine Learning Approaches.
2019cites this paper
Recent Advances in Computational Methods for Identifying Anticancer Peptides.
2019cites this paper
Identification of transcription factor-miRNA-lncRNA feed-forward loops in breast cancer subtypes
2019cites this paper
Understanding Membrane Protein Drug Targets in Computational Perspective.
2019cites this paper
Predicting protein structural classes for low-similarity sequences by evaluating different features
2019cites this paper
Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods
2019cites this paper
Prediction of bacteriophage proteins located in the host cell using hybrid features
2018cites this paper
Multistage inhibitors of the malaria parasite: Emerging hope for chemoprotection and malaria eradication
2018cites this paper
iRNA-3typeA: Identifying Three Types of Modification at RNA’s Adenosine Sites
2018cites this paper
Machine-Learning-Based Prediction of Cell-Penetrating Peptides and Their Uptake Efficiency with Improved Accuracy.
2018cites this paper
PrESOgenesis: A two-layer multi-label predictor for identifying fertility-related proteins using support vector machine and pseudo amino acid composition approach
2018cites this paper
SeqSVM: A Sequence-Based Support Vector Machine Method for Identifying Antioxidant Proteins
2018cites this paper
iLoc‐lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC
2018cites this paper
An in-silico method for identifying aggregation rate enhancer and mitigator mutations in proteins.
2018cites this paper
iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC
2018cites this paper
Identification of Inhibitors of MMPS Enzymes via a Novel Computational Approach
2018cites this paper
HBPred: a tool to identify growth hormone-binding proteins
2018cites this paper
Research on folding diversity in statistical learning methods for RNA secondary structure prediction
2018cites this paper
RFAmyloid: A Web Server for Predicting Amyloid Proteins
2018cites this paper
Large-scale frequent stem pattern mining in RNA families.
2018cites this paper
PIP-EL: A New Ensemble Learning Method for Improved Proinflammatory Peptide Predictions
2018cites this paper
M6APred-EL: A Sequence-Based Predictor for Identifying N6-methyladenosine Sites Using Ensemble Learning
2018cites this paper
Identifying Phage Virion Proteins by Using Two-Step Feature Selection Methods
2018cites this paper
iRNA-2OM: A Sequence-Based Predictor for Identifying 2′-O-Methylation Sites in Homo sapiens
2018cites this paper
Characterize the difference between TMPRSS2-ERG and non-TMPRSS2-ERG fusion patients by clinical and biological characteristics in prostate cancer.
2018cites this paper
Identification of Antioxidant Proteins With Deep Learning From Sequence Information
2018cites this paper
CPPred-FL: a sequence-based predictor for large-scale identification of cell-penetrating peptides by feature representation learning
2018cites this paper
iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators
2018cites this paper
Piecewise linear solution path for pinball twin support vector machine
2018cites this paper
A computational method for prediction of xylanase enzymes activity in strains of Bacillus subtilis based on pseudo amino acid composition features
2018cites this paper
M6AMRFS: Robust Prediction of N6-Methyladenosine Sites With Sequence-Based Features in Multiple Species
2018cites this paper
iGHBP: Computational identification of growth hormone binding proteins from sequences using extremely randomised tree
2018cites this paper
Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique
2018cites this paper
Recent Advances on the Machine Learning Methods in Identifying DNA Replication Origins in Eukaryotic Genomics
2018cites this paper
An Efficient Classifier for Alzheimer’s Disease Genes Identification
2018cites this paper
mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation
2018cites this paper
AIPpred: Sequence-Based Prediction of Anti-inflammatory Peptides Using Random Forest
2018cites this paper
DBPPred-PDSD: Machine learning approach for prediction of DNA-binding proteins using Discrete Wavelet Transform and optimized integrated features space
2018cites this paper
Intelligent computational method for discrimination of anticancer peptides by incorporating sequential and evolutionary profiles information
2018cites this paper
A novel feature ranking method for prediction of cancer stages using proteomics data
2017cites this paper
ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network
2017cites this paper
IonchanPred 2.0: A Tool to Predict Ion Channels and Their Types
2017cites this paper
Recent Advances in Conotoxin Classification by Using Machine Learning Methods
2017cites this paper