Literature DB >> 19343583

Prediction of chemical carcinogenicity by machine learning approaches.

N X Tan1, H B Rao, Z R Li, X Y Li.   

Abstract

In this paper we report a successful application of machine learning approaches to the prediction of chemical carcinogenicity. Two different approaches, namely a support vector machine (SVM) and artificial neural network (ANN), were evaluated for predicting chemical carcinogenicity from molecular structure descriptors. A diverse set of 844 compounds, including 600 carcinogenic (CG+) and 244 noncarcinogenic (CG-) molecules, was used to estimate the accuracies of these approaches. The database was divided into two sets: the model construction set and the independent test set. Relevant molecular descriptors were selected by a hybrid feature selection method combining Fischer's score and Monte Carlo simulated annealing from a wide set of molecular descriptors, including physiochemical properties, constitutional, topological, and geometrical descriptors. The first model validation method was based a five-fold cross-validation method, in which the model construction set is split into five subsets. The five-fold cross-validation was used to select descriptors and optimise the model parameters by maximising the averaged overall accuracy. The final SVM model gave an averaged prediction accuracy of 90.7% for CG+ compounds, 81.6% for CG- compounds and 88.1% for the overall accuracy, while the corresponding ANN model provided an averaged prediction accuracy of 86.1% for CG+ compounds, 79.3% for CG- compounds and 84.2% for the overall accuracy. These results indicate that the hybrid feature selection method is very efficient and the selected descriptors are truly relevant to the carcinogenicity of compounds. Another model validation method, i.e. a hold-out method, was used to build the classification model using the selected descriptors and the optimised model parameters, in which the whole model construction set was used to build the classification model and the independent test set was used to test the predictive ability of the model. The SVM model gave a prediction accuracy of 87.6% for CG+ compounds, 79.1% for CG- compounds and 85.0% for the overall accuracy. The ANN model gave a prediction accuracy of 85.6% for CG+ compounds, 79.1% for CG- compounds and 83.6% for the overall accuracy. The results indicate that the built models are potentially useful for facilitating the prediction of chemical carcinogenicity of untested compounds.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19343583     DOI: 10.1080/10629360902724085

Source DB:  PubMed          Journal:  SAR QSAR Environ Res        ISSN: 1026-776X            Impact factor:   3.000


  3 in total

1.  Prediction of carcinogenicity for diverse chemicals based on substructure grouping and SVM modeling.

Authors:  Kazutoshi Tanabe; Bono Lučić; Dragan Amić; Takio Kurita; Mikio Kaihara; Natsuo Onodera; Takahiro Suzuki
Journal:  Mol Divers       Date:  2010-02-26       Impact factor: 2.943

2.  Machine learning-based cognitive impairment classification with optimal combination of neuropsychological tests.

Authors:  Abhay Gupta; Bratati Kahali
Journal:  Alzheimers Dement (N Y)       Date:  2020-07-19

3.  Quantitative structure-activity relationship study of P2X7 receptor inhibitors using combination of principal component analysis and artificial intelligence methods.

Authors:  Mehdi Ahmadi; Mohsen Shahlaei
Journal:  Res Pharm Sci       Date:  2015 Jul-Aug
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.