| Literature DB >> 29201087 |
Eslam Pourbasheer1, Saadat Vahdani2, Davood Malekzadeh1, Reza Aalizadeh3, Amin Ebadi4.
Abstract
The 17β-HSD3 enzyme plays a key role in treatment of prostate cancer and small inhibitors can be used to efficiently target it. In the present study, the multiple linear regression (MLR), and support vector machine (SVM) methods were used to interpret the chemical structural functionality against the inhibition activity of some 17β-HSD3inhibitors. Chemical structural information were described through various types of molecular descriptors and genetic algorithm (GA) was applied to decrease the complexity of inhibition pathway to a few relevant molecular descriptors. Non-linear method (GA-SVM) showed to be better than the linear (GA-MLR) method in terms of the internal and the external prediction accuracy. The SVM model, with high statistical significance (R2train = 0.938; R2test = 0.870), was found to be useful for estimating the inhibition activity of 17β-HSD3 inhibitors. The models were validated rigorously through leave-one-out cross-validation and several compounds as external test set. Furthermore, the external predictive power of the proposed model was examined by considering modified R2 and concordance correlation coefficient values, Golbraikh and Tropsha acceptable model criteria's, and an extra evaluation set from an external data set. Applicability domain of the linear model was carefully defined using Williams plot. Moreover, Euclidean based applicability domain was applied to define the chemical structural diversity of the evaluation set and training set.Entities:
Keywords: 17β-HSD3; Genetic algorithms; Multiple linear regressions; QSAR; Support vector machine
Year: 2017 PMID: 29201087 PMCID: PMC5610752
Source DB: PubMed Journal: Iran J Pharm Res ISSN: 1726-6882 Impact factor: 1.696
Experimental and predicted pIC50 values for 17β-HSD3 inhibitors using GA-MLR and GA-SVM models
|
|
Test set.
Figure 1A dendrogram illustrating the results of the hierarchical clustering of the training and test sets
Figure 2Predicted versus experimental pIC50 by GA-MLR model
The Q2LOO and R2training values after several Y-randomization tests
|
|
|
|
|---|---|---|
| 1 | 0.114 | 0.036 |
| 2 | 0.058 | 0.103 |
| 3 | 0.058 | 0.137 |
| 4 | 0.139 | 0.048 |
| 5 | 0.091 | 0.104 |
| 6 | 0.046 | 0.287 |
| 7 | 0.006 | 0.180 |
| 8 | 0.001 | 0.244 |
| 9 | 0.452 | 0.078 |
| 10 | 0.003 | 0.169 |
Figure 3The Williams plot of the training and test sets
The correlation coefficient of selected descriptors and corresponding VIF values by GA-MLR.
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|
| GATS6m | 1 | 0 | 0 | 0 | 0 | 1.047 |
| GATS1e | 0.095 | 1 | 0 | 0 | 0 | 1.172 |
| P2e | -0.080 | 0.297 | 1 | 0 | 0 | 1.495 |
| R7u+ | 0.078 | 0.255 | 0.503 | 1 | 0 | 1.441 |
| C-026 | 0.209 | -0.105 | -0.217 | -0.220 | 1 | 1.052 |
Variation inflation factor.
Figure 4.Predicted versus experimental pIC50 by GA-SVM model
The statistical parameters of different constructed QSAR models
|
| |||||||
|---|---|---|---|---|---|---|---|
| R2 | RMSE | F | CCC | R2adj | |||
| GA-MLR | 0.779 | 0.444 | 15.508 | 0.8758 | 0.729 | ||
| GA-SVM | 0.938 | 0.260 | 42.831 | 0.9563 | 0.924 | ||
|
| |||||||
| R2 | RMSE | F | CCC | rm2 | |||
| GA-MLR | 0.823 | 0.531 | 0.675 | 0.8554 | 0.775 | ||
| GA-SVM | 0.870 | 0.513 | 0.390 | 0.8257 | 0.667 | ||
CCC: concordance correlation coefficient.
Experimental validation of models based on evaluation external set.
|
|
See reference (42).
See reference (3).
See reference (8).
Based on Euclidean applicability domain, the molecules are within applicability domain of models.
Figure 5Euclidean based applicability domain of the proposed models
Figure 6The Williams plot of the training and evaluation sets
Golbraikh and Tropsha acceptable model criteria's for GA-MLR.
|
|
| |
|---|---|---|
| Condition I | 0.674 | Passed |
| Condition II | 0.823 | Passed |
| Condition III | K = 1.049 | Passed |
| Condition IV |
| Passed |
Statistical parameters comparison based on different selected descriptors by GA-MLR
|
| |||||||
|---|---|---|---|---|---|---|---|
| Model 1: | pIC50 = 7.0086 (± 0.44447) - 0.56701 (± 0.28122) nN + 0.21651 (± 0.0772) RDF100u + 0.22206 (± 0.05459) RDF070e - 0.29424 (± 0.11456) RDF065p - 1.16837 (± 0.40968) Mor15m | ||||||
| Model 2: | pIC50= 6.09379 (± 0.76707) - 0.4853 (± 0.28032) nN + 0.46193 (± 0.33757) GATS6e + 0.19134 (± 0.04665) RDF070e - 1.19599 (± 0.38172) Mor10m - 1.03101 (± 0.39102) Mor15m | ||||||
| Model 3: | pIC50 = 6.72661 (± 0.47912) + 0.23447 (± 0.34966) nBnz - 0.05179 (± 0.01078) Eig1p + 0.24754 (± 0.049) RDF070e - 0.43443 (± 0.14922) Mor03m + 1.79601(± 0.46346) C - 029 | ||||||
| Model 4 : | pIC50 = 6.48189 (± 0.61284) + 0.20329 (± 0.3866) nBnz - 0.0438 (±0.01101) Eig1p + 0.255 (± 0.05382) RDF070e + 0.39064 (± 0.2075) H0m + 1.71489 (± 0.50505) C-029 | ||||||
| Model 5: | pIC50= 6.61855 (± 0.58588) + 0.1342 (± 0.38201) nBnz - 0.04662 (± 0.01173) Eig1p + 0.2437 (± 0.05394) RDF070e + 0.12274 (± 0.06922) RTm + 1.71805 (±0.50909) C-029 | ||||||
| Model 6: | pIC50= 6.9527 (± 0.42856) 0.69797 (± 0.28258) nN + 0.18259 (± 0.04581)RD F070e - 1.11912 (± 0.35743) Mor10m - 1.06653 (± 0.37879) Mor15m + 0.53212 (± 0.31188) C-029 | ||||||
| Model 7: | pIC50 = 7.4772 (± 0.43517) - 0.59149 (± 0.25509) nN + 0.21837 (± 0.06629) RDF100u - 0.1982 (± 0.05626) RDF065m + 0.20205 (± 0.04439) RDF070e - 1.67878 (± 0.36124) Mor15m | ||||||
| Model 8: | pIC50 = 6.8958 (± 0.43333) + 0.01576 (± 0.22858) nN - 0.04743 (± 0.00872) Eig1p + 0.23367 (± 0.04769) RDF070e - 0.40774 (± 0.15963) Mor03m + 1.60567 (± 0.37707) C-029 | ||||||
| Model 9: | pIC50 = 6.65879 (± 1.20624) + 0.08351 (± 0.4122) IDDE - 0.04773 (± 0.00881) Eig1p + 0.23025 (± 0.05042) RDF070e - 0.40408 (± 0.15195) Mor03m + 1.60321 (± 0.37471) C-029 | ||||||
| Model 10: | pIC50 = 6.77014 (± 0.44483) - 0.56233 (± 0.28016) nN + 0.11016 (± 0.04713) RDF070e - 1.2491 (± 0.38605) Mor15m - 0.95591(± 0.35215) Mor10e + 0.60422 (± 0.33097) C-029 | ||||||
| Statistical Results | |||||||
|
|
|
|
|
| |||
| Model 1 | 0.608 | 6.820 | 0.333 | 0.736 | 0.702 | ||
| Model 2 | 0.618 | 7.081 | 0.389 | 0.714 | 0.498 | ||
| Model 3 | 0.692 | 9.856 | 0.452 | 0.820 | 0.771 | ||
| Model 4 | 0.632 | 7.549 | 0.420 | 0.878 | 0.704 | ||
| Model 5 | 0.626 | 7.362 | 0.409 | 0.836 | 0.718 | ||
| Model 6 | 0.633 | 7.581 | 0.360 | 0.766 | 0.616 | ||
| Model 7 | 0.674 | 9.101 | 0.399 | 0.790 | 0.613 | ||
| Model 8 | 0.685 | 9.574 | 0.446 | 0.811 | 0.722 | ||
| Model 9 | 0.686 | 9.597 | 0.464 | 0.803 | 0.720 | ||
| Model 10 | 0.602 | 6.663 | 0.370 | 0.805 | 0.584 | ||
|
|
|
|
|
|
| ||
| nN: | Number of Nitrogen atoms | ||||||
| RDF100u: | Radial Distribution Function - 100 /unweighted | ||||||
| RDF070e: | Radial Distribution Function - 070 / weighted by Sanderson electronegativity | ||||||
| RDF065p: | Radial Distribution Function - 065 /weighted by polarizability | ||||||
| Mor15m: | Signal 15 / weighted by mass | ||||||
| GATS6e: | Geary autocorrelation of lag 6 weighted by Sanderson electronegativity | ||||||
| Mor10m: | Signal 10 / weighted by mass | ||||||
| nBnz: | Number of benzene-like rings | ||||||
| Eig1p: | Leading eigenvalue from polarizability weighted distance matrix | ||||||
| Mor03m: | Signal 03 / weighted by mass | ||||||
| C-029: | R-CX-X | ||||||
| H0m: | H autocorrelation of lag 0 /weighted by mass | ||||||
| RTm: | R total index / weighted by mass | ||||||
| RDF065m: | Radial Distribution Function - 065 /weightedby mass | ||||||
| IDDE: | Mean information content on the distance degree equality | ||||||
| Mor10e: | Signal 10 / weighted by Sanderson electronegativity | ||||||
Design of some novel inhibitors with the predicted inhibition activities.
|
|