Literature DB >> 17238260

Random forest models to predict aqueous solubility.

David S Palmer1, Noel M O'Boyle, Robert C Glen, John B O Mitchell.   

Abstract

Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offered methods for automatic descriptor selection, an assessment of descriptor importance, and an in-parallel measure of predictive ability, all of which serve to recommend its use. The prediction of log molar solubility for an external test set of 330 molecules that are solid at 25 degrees C gave an r2 = 0.89 and RMSE = 0.69 log S units. For a standard data set selected from the literature, the model performed well with respect to other documented methods. Finally, the diversity of the training and test sets are compared to the chemical space occupied by molecules in the MDL drug data report, on the basis of molecular descriptors selected by the regression analysis.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17238260     DOI: 10.1021/ci060164k

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  34 in total

1.  Toward better QSAR/QSPR modeling: simultaneous outlier detection and variable selection using distribution of model features.

Authors:  Dongsheng Cao; Yizeng Liang; Qingsong Xu; Yifeng Yun; Hongdong Li
Journal:  J Comput Aided Mol Des       Date:  2010-11-13       Impact factor: 3.686

2.  An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data.

Authors:  Ming Hao; Yanli Wang; Stephen H Bryant
Journal:  Anal Chim Acta       Date:  2013-11-06       Impact factor: 6.558

3.  Impact of hydroxyurea therapy on serum fatty acids of β-thalassemia patients.

Authors:  Ayesha Iqbal; Amna Jabbar Siddiqui; Jian-Hua Huang; Saqib Hussain Ansari; Syed Ghulam Musharraf
Journal:  Metabolomics       Date:  2018-01-31       Impact factor: 4.290

4.  In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning.

Authors:  Qingda Zang; Kamel Mansouri; Antony J Williams; Richard S Judson; David G Allen; Warren M Casey; Nicole C Kleinstreuer
Journal:  J Chem Inf Model       Date:  2017-01-09       Impact factor: 4.956

5.  Multi-channel GCN ensembled machine learning model for molecular aqueous solubility prediction on a clean dataset.

Authors:  Chenglong Deng; Li Liang; Guomeng Xing; Yi Hua; Tao Lu; Yanmin Zhang; Yadong Chen; Haichun Liu
Journal:  Mol Divers       Date:  2022-06-23       Impact factor: 2.943

6.  Prediction of PKCθ inhibitory activity using the Random Forest Algorithm.

Authors:  Ming Hao; Yan Li; Yonghua Wang; Shuwei Zhang
Journal:  Int J Mol Sci       Date:  2010-09-20       Impact factor: 5.923

7.  ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches.

Authors:  Ashok K Sharma; Gopal N Srivastava; Ankita Roy; Vineet K Sharma
Journal:  Front Pharmacol       Date:  2017-11-30       Impact factor: 5.810

8.  An introspective comparison of random forest-based classifiers for the analysis of cluster-correlated data by way of RF++.

Authors:  Yuliya V Karpievitch; Elizabeth G Hill; Anthony P Leclerc; Alan R Dabney; Jonas S Almeida
Journal:  PLoS One       Date:  2009-09-18       Impact factor: 3.240

9.  Variable importance-weighted Random Forests.

Authors:  Yiyi Liu; Hongyu Zhao
Journal:  Quant Biol       Date:  2017-11-06

10.  Quantum-mechanical transition-state model combined with machine learning provides catalyst design features for selective Cr olefin oligomerization.

Authors:  Steven M Maley; Doo-Hyun Kwon; Nick Rollins; Johnathan C Stanley; Orson L Sydora; Steven M Bischof; Daniel H Ess
Journal:  Chem Sci       Date:  2020-08-21       Impact factor: 9.825

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.