| Literature DB >> 22837677 |
Ming Hao1, Shuwei Zhang1, Jieshan Qiu1.
Abstract
Currently, Chemoinformatic methods are used to perform the prediction for FBPase inhibitory activity. A genetic algorithm-random forest coupled method (GA-RF) was proposed to predict fructose 1,6-bisphosphatase (FBPase) inhibitors to treat type 2 diabetes mellitus using the Mold(2) molecular descriptors. A data set of 126 oxazole and thiazole analogs was used to derive the GA-RF model, yielding the significant non-cross-validated correlation coefficient r(2) (ncv) and cross-validated r(2) (cv) values of 0.96 and 0.67 for the training set, respectively. The statistically significant model was validated by a test set of 64 compounds, producing the prediction correlation coefficient r(2) (pred) of 0.90. More importantly, the building GA-RF model also passed through various criteria suggested by Tropsha and Roy with r(2) (o) and r(2) (m) values of 0.90 and 0.83, respectively. In order to compare with the GA-RF model, a pure RF model developed based on the full descriptors was performed as well for the same data set. The resulting GA-RF model with significantly internal and external prediction capacities is beneficial to the prediction of potential oxazole and thiazole series of FBPase inhibitors prior to chemical synthesis in drug discovery programs.Entities:
Keywords: FBPase inhibitor; chemoinformatics methods; genetic algorithm; random forest
Mesh:
Substances:
Year: 2012 PMID: 22837677 PMCID: PMC3397509 DOI: 10.3390/ijms13067015
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Figure 1Self-organizing map (SOM) analysis for fructose 1,6-bisphosphatase (FBPase) inhibitors, where the black dot denotes the training set and the red asterisk stands for the test set.
Molecular descriptors selected from genetic algorithm-random forest coupled method (GA-RF) for the FBPase inhibitors.
| Name | Definition | Name | Definition |
|---|---|---|---|
| D004 | Number of 05-membered rings | D543 | Lowest eigenvalue from Burdex matrix weighted by van der Waals order-4 |
| D016 | Number of double bonds | D545 | Lowest eigenvalue from Burdex matrix weighted by van der Waals order-6 |
| D152 | Mean atomic polarizability scaled on carbon-SP3 | D547 | Lowest eigenvalue from Burdex matrix weighted by van der Waals order-8 |
| D164 | Index of terminal vertex matrix | D557 | Lowest eigenvalue from Burden matrix weighted by polarizabilities order-2 |
| D237 | Kier 3-path index | D561 | Lowest eigenvalue from Burden matrix weighted by polarizabilities order-6 |
| D279 | Total information content order-4 index | D562 | Lowest eigenvalue from Burden matrix weighted by polarizabilities order-7 |
| D309 | Sum eigenvalue weighted by mass distance matrix | D563 | Lowest eigenvalue from Burden matrix weighted by polarizabilities order-8 |
| D455 | Geary topological structure autocorrelation length-1 weighted by atomic van der Waals volumes | D571 | Highest eigenvalue from Burden matrix weighted by masses order-8 |
| D458 | Geary topological structure autocorrelation length-4 weighted by atomic van der Waals volumes | D582 | Highest eigenvalue from Burden matrix weighted by electronegativities Sanderson-scale order-3 |
| D462 | Geary topological structure autocorrelation length-8 weighted by atomic van der Waals volumes | D589 | Highest eigenvalue from Burden matrix weighted by polarizabilities order-2 |
| D465 | Geary topological structure autocorrelation length-3 weighted by atomic Sanderson electronegativities | D598 | Number of total tertiary carbon-SP3 |
| D470 | Geary topological structure autocorrelation length-8 weighted by atomic Sanderson electronegativities | D647 | Number of group primary amines (aliphatic) |
| D473 | Geary topological structure autocorrelation length-3 weighted by atomic polarizabilities | D715 | Number of group CH2R2 |
| D476 | Geary topological structure autocorrelation length-6 weighted by atomic polarizabilities | D719 | Number of group CH2RX |
| D491 | Moran topological structure autocorrelation length-5 weighted by atomic van der Waals volumes | D729 | Number of group =CHR |
| D492 | Moran topological structure autocorrelation length-6 weighted by atomic van der Waals volumes | D731 | Number of group =CHX |
| D499 | Moran topological structure autocorrelation length-5 weighted by atomic Sanderson electronegativities | D746 | Number of group H attached to C0(sp3) no X attached to next C |
| D506 | Moran topological structure autocorrelation length-4 weighted by atomic polarizabilities | D754 | Number of group O= |
| D523 | Mean molecular topological order-3 charge index | D756 | Number of group Al-O-Ar or Ar-O-Ar or R-O-C=X |
| D541 | Lowest eigenvalue from Burden matrix weighted by van der Waals order-2 | D775 | Hydrophilic factor index |
Statistical performances of GA-RF and RF models a.
| Model | Training Set | Test Set | ||||
|---|---|---|---|---|---|---|
|
|
| |||||
| RMSE | RMSE | |||||
| GA-RF | 0.96 | 0.67 | 0.25 | 0.91 | 0.90 | 0.34 |
| RF | 0.96 | 0.59 | 0.28 | 0.87 | 0.85 | 0.42 |
r2cv from OOB estimation; mtry is equal to 13 and 36 for GA-RF and RF, respectively.
Figure 2The scatter plots of actual and predicted activity by GA-RF and RF models.
Compounds with their chemical names, observed and predicted activities by GA-RF and RF for the FBPase inhibitors.
|
| ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||
| No. | R2 | Obs. pIC50 | GA-RF | RF | Ref. | |||||
| 1 | Me | 7.00 | 6.62 | 6.70 | [ | |||||
| 2 | Et | 6.40 | 6.20 | 6.25 | [ | |||||
| 3 | vinyl | 5.92 | 5.99 | 6.05 | [ | |||||
| 4 | CH2OH | 6.66 | 6.63 | 6.61 | [ | |||||
| 5 | H | 6.30 | 6.05 | 6.27 | [ | |||||
| 6 | Cl | 6.74 | 6.61 | 6.64 | [ | |||||
| 7 | Br | 7.10 | 6.85 | 6.89 | [ | |||||
| 8 | SMe | 6.05 | 6.23 | 6.16 | [ | |||||
| 9 | CN | 5.70 | 5.65 | 5.73 | [ | |||||
| 10 | NH2 | 7.60 | 7.38 | 7.20 | [ | |||||
| 11 | NHMe | 6.00 | 5.95 | 6.08 | [ | |||||
| 12 | NHAc | 5.00 | 5.69 | 5.66 | [ | |||||
| 13 | CONH2 | 5.56 | 5.75 | 6.03 | [ | |||||
| 14 | CSNH2 | 6.30 | 6.38 | 6.39 | [ | |||||
| 15 | Ph | 4.87 | 5.33 | 5.40 | [ | |||||
| 16 | 2-thienyl | 5.10 | 5.93 | 5.78 | [ | |||||
| 17 | 3-pyridyl | 5.30 | 5.40 | 5.55 | [ | |||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
| 18 | H | 6.35 | 6.38 | 6.42 | [ | |||||
| 19 | Me | 6.92 | 6.84 | 6.80 | [ | |||||
| 20 | HOCH2 | 6.30 | 6.73 | 6.55 | [ | |||||
| 21 | 7.52 | 7.29 | 7.13 | [ | ||||||
| 22 | 7.55 | 7.04 | 7.01 | [ | ||||||
| 23 | CF3CH2 | 7.24 | 6.99 | 7.14 | [ | |||||
| 24 | neopentyl | 7.92 | 7.58 | 7.51 | [ | |||||
| 25 | cyclobutyl | 7.72 | 7.61 | 7.54 | [ | |||||
| 26 | cyclopentyl | 7.68 | 7.67 | 7.58 | [ | |||||
| 27 | cyclohexyl | 8.00 | 7.80 | 7.83 | [ | |||||
| 28 | cyclopropyl-CH2 | 7.70 | 7.62 | 7.53 | [ | |||||
| 29 | cyclopentyl-CH2 | 7.74 | 7.36 | 7.44 | [ | |||||
| 30 | cyclohexyl-CH2 | 7.23 | 7.18 | 7.08 | [ | |||||
| 31 | PhCH2 | 6.82 | 6.85 | 6.82 | [ | |||||
| 32 | morpholinyl-CH2 | 6.25 | 6.16 | 6.45 | [ | |||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
| 33 | Cl | 7.15 | 7.03 | 6.97 | [ | |||||
| 34 | Br | 7.30 | 6.99 | 6.88 | [ | |||||
| 35 | I | 7.00 | 6.87 | 6.36 | [ | |||||
| 36 | 1-morpholinyl | 7.80 | 7.09 | 7.29 | [ | |||||
| 37 | EtS | 7.48 | 7.32 | 7.24 | [ | |||||
| 38 | 7.80 | 7.21 | 7.03 | [ | ||||||
| 39 | 7.62 | 7.50 | 7.46 | [ | ||||||
| 40 | 7.62 | 7.52 | 7.53 | [ | ||||||
| 41 | PhS | 6.52 | 6.70 | 6.58 | [ | |||||
| 42 | CONMe2 | 5.77 | 5.94 | 6.22 | [ | |||||
| 43 | CO2Et | 7.85 | 7.55 | 7.48 | [ | |||||
| 44 | CO2Bn | 7.82 | 7.25 | 7.43 | [ | |||||
| 45 | 6.07 | 6.56 | 6.45 | [ | ||||||
| 46 | Ph | 7.85 | 7.68 | 7.64 | [ | |||||
| 47 | 2-MeO-Ph | 7.37 | 7.51 | 7.52 | [ | |||||
| 48 | 3-MeO-Ph | 7.68 | 7.60 | 7.62 | [ | |||||
| 49 | 4-MeO-Ph | 7.66 | 7.61 | 7.64 | [ | |||||
| 50 | 4-MeS-Ph | 7.68 | 7.41 | 7.40 | [ | |||||
| 51 | 4- | 7.06 | 7.21 | 7.10 | [ | |||||
| 52 | 4-MeO2C-Ph | 7.85 | 7.48 | 7.36 | [ | |||||
| 53 | 4-F-Ph | 7.80 | 7.71 | 7.68 | [ | |||||
| 54 | 4-Cl-Ph | 7.89 | 7.76 | 7.75 | [ | |||||
| 55 | 4-Ac-Ph | 7.49 | 7.45 | 7.48 | [ | |||||
| 56 | 4-MeSO2-Ph | 7.39 | 7.30 | 7.00 | [ | |||||
| 57 | 4-Ph-Ph | 7.47 | 7.31 | 7.23 | [ | |||||
| 58 | 2-nathphyl | 7.92 | 7.66 | 7.61 | [ | |||||
| 59 | 2-furanyl | 7.40 | 7.12 | 7.22 | [ | |||||
| 60 | 2-thienyl | 7.36 | 7.17 | 7.20 | [ | |||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
| 61 | 2,5-furanyl | H | 5.00 | 5.41 | 5.78 | [ | ||||
| 62 | -CH2OCO- | 7.30 | 6.90 | 6.92 | [ | |||||
| 63 | -CH2NHCO- | 2-thienyl | 6.02 | 6.42 | 6.69 | [ | ||||
| 64 | 2,6-pyridyl | H | 5.70 | 5.74 | 5.94 | [ | ||||
| 65 | 1,3-phenyl | H | 5.89 | 6.06 | 6.01 | [ | ||||
| 66 | 1,3-phenyl-(6-Me) | 6.87 | 6.71 | 6.39 | [ | |||||
| 67 | 1,3-phenyl-(6-OMe) | 6.68 | 7.05 | 6.89 | [ | |||||
| 68 | 1,3-phenyl-(6-F) | Ph | 7.10 | 7.42 | 7.27 | [ | ||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
| 69 | 6.92 | 6.38 | 6.13 | [ | ||||||
| 70 | H | 5.00 | 5.60 | 5.43 | [ | |||||
| 71 | Allyl | 6.85 | 6.70 | 6.51 | [ | |||||
| 72 | 6.77 | 6.64 | 6.54 | [ | ||||||
| 73 | 6.68 | 6.54 | 6.35 | [ | ||||||
| 74 | -CH2-cyclohexyl | 6.49 | 6.24 | 6.29 | [ | |||||
| 75 | Ph | 6.80 | 6.78 | 6.80 | [ | |||||
| 76 | Bn | 6.05 | 6.23 | 6.12 | [ | |||||
| 77 | -CH2-(2-thienyl) | 6.59 | 6.47 | 6.59 | [ | |||||
| 78 | 7.15 | 6.97 | 6.91 | [ | ||||||
| 79 | 6.96 | 7.01 | 7.02 | [ | ||||||
| 80 | 6.92 | 6.64 | 7.05 | [ | ||||||
| 81 | PhS | 5.40 | 5.79 | 6.08 | [ | |||||
| 82 | -CO2Me | 7.17 | 7.06 | 6.70 | [ | |||||
| 83 | -CO2Et | 7.42 | 7.10 | 6.85 | [ | |||||
| 84 | -CO2Pr- | 7.40 | 7.13 | 7.14 | [ | |||||
| 85 | -CO2Bn | 7.07 | 6.95 | 6.91 | [ | |||||
| 86 | -COSEt | 7.52 | 7.23 | 7.20 | [ | |||||
| 87 | -COBu- | 6.07 | 6.10 | 6.22 | [ | |||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
| 88 | Me | 6.22 | 6.24 | 6.14 | [ | |||||
| 89 | HO | 5.00 | 5.48 | 5.43 | [ | |||||
| 90 | H | 5.72 | 5.87 | 5.80 | [ | |||||
| 91 | Me2N- | 5.68 | 5.61 | 5.54 | [ | |||||
| 92 | 5.66 | 5.79 | 5.78 | [ | ||||||
| 93 | MeHN- | 5.37 | 5.55 | 5.62 | [ | |||||
| 94 | Et | 6.02 | 6.09 | 5.94 | [ | |||||
| 95 | EtHN- | 5.00 | 5.43 | 5.68 | [ | |||||
| 96 | vinyl | 5.17 | 5.49 | 5.54 | [ | |||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
| 97 | H2N- | H | 5.15 | 5.50 | 5.43 | [ | ||||
| 98 | H2N- | Me | 6.38 | 6.19 | 5.72 | [ | ||||
| 99 | H2N- | Et | 6.42 | 6.39 | 6.17 | [ | ||||
| 100 | H2N- | 6.55 | 6.46 | 6.08 | [ | |||||
| 101 | H2N- | 6.24 | 6.36 | 6.19 | [ | |||||
| 102 | H2N- | 6.60 | 6.32 | 5.98 | [ | |||||
| 103 | H2N- | 6.46 | 6.30 | 6.10 | [ | |||||
| 104 | Me | CF3 | 5.00 | 5.45 | 5.54 | [ | ||||
| 105 | H | Ph | 5.00 | 5.35 | 5.45 | [ | ||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
| 106 | NH | O | PO3H2 | NH2 | 5.30 | 5.55 | 5.47 | [ | ||
| 107 | S | O | PO3H2 | H | H | 5.26 | 5.58 | 5.56 | [ | |
| 108 | CH=CH | O | PO3H2 | NH2 | Ph | 7.38 | 6.95 | 6.87 | [ | |
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
| 109 | -NH(CH2)2PO3H2 | OH | 4.00 | 4.36 | 4.62 | [ | ||||
| 110 | -NH(CH2)2OPO3H2 | OH | 3.85 | 4.27 | 4.56 | [ | ||||
| 111 | -NH(CH2)2PO3H2 | H | 4.00 | 4.27 | 4.44 | [ | ||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
| 112 | -NH(CH2)2- | Bn | 4.04 | 4.20 | 4.26 | [ | ||||
| 113 | -NH(CH2)2- | Ph(CH2)2- | 4.00 | 4.14 | 4.20 | [ | ||||
| 114 | -NH(CH2)2- | 2-naphthyl-CH2- | 4.46 | 4.35 | 4.42 | [ | ||||
| 115 | -CONHCH2- | Ph(CH2)2- | 4.00 | 4.19 | 4.50 | [ | ||||
| 116 | -(CH2)3- | Ph(CH2)2- | 4.00 | 4.04 | 4.16 | [ | ||||
| 117 | -CH=CHCH2- | Ph(CH2)2- | 4.00 | 4.19 | 4.28 | [ | ||||
| 118 | -S(CH2)2- | Ph(CH2)2- | 3.84 | 4.03 | 4.29 | [ | ||||
| 119 | -CH2OCH2- | Ph(CH2)2- | 4.64 | 4.09 | 4.38 | [ | ||||
| 120 | -2,5-furanyl- | Ph(CH2)2- | 5.30 | 4.95 | 4.95 | [ | ||||
| 121 | -2,5-thienyl- | Ph(CH2)2- | 4.32 | 4.55 | 4.71 | [ | ||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
| 122 | -(CH2)2-OPO(OH)2 | 4.40 | 4.13 | 4.21 | [ | |||||
| 123 | -2,5-furanyl-SO3H | 3.82 | 4.43 | 4.50 | [ | |||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
| 124 | H | -N(Me)2 | -(CH2)2Ph | 3.60 | 4.11 | 4.17 | [ | |||
| 125 | H | -NHMe | -(CH2)2Ph | 4.30 | 4.46 | 4.41 | [ | |||
| 126 | H | Cl | -(CH2)2Ph | 4.30 | 4.61 | 4.59 | [ | |||
| 127 | H | -NH2 | -CH2CH(Ph)2 | 4.15 | 4.31 | 4.47 | [ | |||
| 128 | H | -NH2 | -(CH2)2(cyclohexyl) | 5.85 | 5.54 | 5.54 | [ | |||
| 129 | H | -NH2 | -(CH2)(2-naphthyl) | 5.48 | 5.22 | 5.19 | [ | |||
| 130 | H | -NH2 | cyclopropyl | 5.82 | 5.70 | 5.78 | [ | |||
| 131 | H | -NH2 | cyclopentyl | 5.70 | 5.69 | 5.76 | [ | |||
| 132 | H | -NH2 | Et | 5.74 | 5.65 | 5.76 | [ | |||
| 133 | H | -NH2 | isobutyl | 5.82 | 5.81 | 5.82 | [ | |||
| 134 | H | -NH2 | neopentyl | 6.10 | 5.87 | 5.87 | [ | |||
| 135 | -SMe | -NH2 | isobutyl | 6.15 | 5.52 | 5.42 | [ | |||
| 136 | -SO2Me | -NH2 | isobutyl | 4.55 | 4.95 | 4.99 | [ | |||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
| 137 | H | -CH2C(Me)2CH2OH | 2,5-furanyl | 5.35 | 5.52 | 5.40 | [ | |||
| 138 | H | -CH2C(Me)2CH2Cl | 2,5-furanyl | 6.05 | 5.83 | 5.91 | [ | |||
| 139 | H | -CH2C(Me)2CMe3 | 2,5-furanyl | 5.80 | 5.66 | 5.67 | [ | |||
| 140 | H | -CH(Me)CMe3 | 2,5-furanyl | 5.30 | 5.72 | 5.74 | [ | |||
| 141 | -NH2 | -CH2CMe3 | 2,5-furanyl | 5.26 | 5.27 | 5.27 | [ | |||
| 142 | -SMe | -CH2CMe3 | 2,5-furanyl | 5.96 | 5.54 | 5.42 | [ | |||
| 143 | H | -CH2CMe3 | 2,5-(3,4-di-Cl)furanyl | 4.89 | 5.56 | 5.42 | [ | |||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
| 144 | Me | 5.22 | 5.49 | 5.70 | [ | |||||
| 145 | Et | 5.65 | 5.80 | 5.88 | [ | |||||
| 146 | 5.96 | 6.00 | 6.03 | [ | ||||||
| 147 | 5.82 | 5.91 | 6.00 | [ | ||||||
| 148 | cycllopropyl-CH2- | 6.10 | 6.03 | 6.02 | [ | |||||
| 149 | cyclobutyl-CH2- | 6.10 | 6.04 | 6.01 | [ | |||||
| 150 | cyclopentyl-CH2- | 5.82 | 5.87 | 5.81 | [ | |||||
| 151 | cyclohexyl-CH2- | 5.60 | 5.64 | 5.63 | [ | |||||
| 152 | cycloheptyl-CH2- | 5.49 | 5.57 | 5.64 | [ | |||||
| 153 | norbornyl | 6.00 | 5.94 | 5.85 | [ | |||||
| 154 | benzyl | 5.30 | 5.72 | 5.70 | [ | |||||
| 155 | 4- | 5.02 | 5.26 | 5.34 | [ | |||||
| 156 | 4-CF3-benzyl | 5.15 | 5.50 | 5.51 | [ | |||||
| 157 | 4-Ph-benzyl | 5.60 | 5.63 | 5.59 | [ | |||||
| 158 | 3-furanyl-CH2- | 5.38 | 5.74 | 5.68 | [ | |||||
| 159 | 3-HO-benzyl | 5.73 | 5.87 | 5.75 | [ | |||||
| 160 | 3-thienyl-CH2- | 5.40 | 6.02 | 6.08 | [ | |||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
| 161 | Et | H | 5.60 | 5.86 | 5.87 | [ | ||||
| 162 | H | 5.52 | 5.69 | 5.66 | [ | |||||
| 163 | MeO | H | 6.15 | 6.41 | 6.25 | [ | ||||
| 164 | OH | H | 6.30 | 6.23 | 6.24 | [ | ||||
| 165 | Cl | H | 6.70 | 6.52 | 6.56 | [ | ||||
| 166 | H | Cl | 6.05 | 6.19 | 6.11 | [ | ||||
| 167 | Br | H | 6.40 | 6.32 | 6.29 | [ | ||||
| 168 | H | Br | 6.40 | 6.21 | 6.14 | [ | ||||
| 169 | F | H | 7.00 | 6.56 | 6.47 | [ | ||||
| 170 | (Et)2CHCH2- | F | H | 6.82 | 6.83 | 6.57 | [ | |||
| 171 | (Et)2CH- | F | H | 6.07 | 6.42 | 6.38 | [ | |||
| 172 | F | H | 7.26 | 6.54 | 6.53 | [ | ||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
| 173 | Br | H | Br | 6.00 | 6.09 | 6.03 | [ | |||
| 174 | Cl | H | Cl | 6.35 | 6.34 | 6.30 | [ | |||
| 175 | F | H | Cl | 7.00 | 6.77 | 6.67 | [ | |||
| 176 | F | H | Br | 6.89 | 6.75 | 6.67 | [ | |||
| 177 | F | Cl | H | 6.65 | 6.66 | 6.59 | [ | |||
| 178 | Br | Cl | Cl | 5.00 | 5.53 | 5.60 | [ | |||
| 179 | F | H | vinyl | 6.55 | 6.90 | 6.94 | [ | |||
| 180 | F | H | 7.22 | 7.08 | 7.10 | [ | ||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
| 181 | Ph | 7.05 | 6.79 | 6.83 | [ | |||||
| 182 | 4-F-Ph | 6.74 | 6.74 | 6.75 | [ | |||||
| 183 | 4-Cl-Ph | 7.05 | 6.94 | 6.81 | [ | |||||
| 184 | Et | 7.26 | 7.07 | 7.06 | [ | |||||
| 185 | 7.00 | 6.99 | 7.02 | [ | ||||||
| 186 | 6.68 | 6.70 | 6.67 | [ | ||||||
| 187 | (Me)2CH(CH2)3- | 7.00 | 6.92 | 6.99 | [ | |||||
| 188 | HO(CH2)3- | 7.10 | 6.97 | 7.05 | [ | |||||
| 189 | (Me)2N(CH2)3- | 7.26 | 6.72 | 6.73 | [ | |||||
| 190 | Cl(CH2)4- | 7.15 | 6.74 | 6.78 | [ | |||||
test set;
from the corresponding references.
External predictability of GA-RF model.
| Model | ( | k | ||||
|---|---|---|---|---|---|---|
| GA-RF | 0.91 | 0.90 | 0.90 | 0.01 | 1.01 | 0.83 |
| RF | 0.87 | 0.85 | 0.85 | 0.02 | 1.01 | 0.76 |
Figure 3Boxplot of 50 replications of OOB estimation (r2oob) at various values of mtry. Horizontal lines inside the boxes are the median correlation.
Figure 4Comparison of mean squared errors from out-of-bag (OOB) set, test set and training set as the number of trees increases for FBPase inhibitors.
Comparison with and without Y-randomization check of the optimal GA-RF model.
| Model | Training Set | Test Set | |||||
|---|---|---|---|---|---|---|---|
|
|
| ||||||
| RMSE | RMSE | ||||||
| GA-RF | 0.96 | 0.67 | 0.25 | 0.91 | 0.90 | 0.83 | 0.34 |
| GA-RF | 0.01 | −0.14 | 1.27 | 0.06 | −0.10 | 0.04 | 1.13 |
without Y-randomization check;
with Y-randomization check.
Figure 5Variable importance plot from GA-RF. The first two important descriptors are surrounded by red frames.