The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84–0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better (R2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better (R2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules.
ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches
Ashok K. Sharma,Gopal N. Srivastava,Ankita Roy,Vineet K. Sharma
Published 2017 in Frontiers in Pharmacology
ABSTRACT
PUBLICATION RECORD
- Publication year
2017
- Venue
Frontiers in Pharmacology
- Publication date
2017-11-30
- Fields of study
Medicine, Chemistry, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar, PubMed
CITATION MAP
EXTRACTION MAP
CLAIMS
CONCEPTS
- 10-fold cross-validation
A validation scheme that partitions the data into ten folds for repeated train-test evaluation.
Aliases: 10-fold CV
- blind dataset
An external evaluation dataset used to assess model performance after training.
- descriptor-based classification model
A toxicity classification model that uses molecular descriptors as its input features.
Aliases: descriptor-based model
- fingerprint-based classification model
A toxicity classification model that uses molecular fingerprints as its input features.
Aliases: fingerprint-based model
- hybrid-based classification model
A toxicity classification model that combines descriptor and fingerprint features.
Aliases: hybrid model
- partial least squares regression
A multivariate regression technique that projects predictors into latent components before fitting the outcome.
Aliases: PLSR
- random forest regression
A regression approach that predicts outcomes by averaging predictions from an ensemble of decision trees.
Aliases: RF regression
- toxim web server
A web-based tool for predicting toxicity, solubility, and permeability from small-molecule input.
Aliases: ToxiM
REFERENCES
Showing 1-49 of 49 references · Page 1 of 1
CITED BY
Showing 1-63 of 63 citing papers · Page 1 of 1