| Literature DB >> 29340790 |
Samina Kausar1,2, Andre O Falcao3,4.
Abstract
BACKGROUND: In-silico quantitative structure-activity relationship (QSAR) models based tools are widely used to screen huge databases of compounds in order to determine the biological properties of chemical molecules based on their chemical structure. With the passage of time, the exponentially growing amount of synthesized and known chemicals data demands computationally efficient automated QSAR modeling tools, available to researchers that may lack extensive knowledge of machine learning modeling. Thus, a fully automated and advanced modeling platform can be an important addition to the QSAR community.Entities:
Keywords: Data set modelability; Feature selection; KNIME; Machine learning; Quantitative structure–activity relationship (QSAR); Random forests; Support vector machines; Variable importance
Year: 2018 PMID: 29340790 PMCID: PMC5770354 DOI: 10.1186/s13321-017-0256-5
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1Overview of automated QSAR modeling workflow
Fig. 2Automated QSAR modeling methodology
Fig. 3Input data set options. Overview of possible ways to submit input data to the automated QSAR modeling workflow
Fig. 4Input parameters. Input configurations required before to run the workflow
Description of selected problems
| Uniprot ID | Target protein name | Associated bioactivities (Y) | Total number of observations (N-retrieved) | Total number of observations (N-processed) |
|---|---|---|---|---|
| Q05586 | Glutamate [NMDA] receptor | IC50 | 512 | 320 |
| Q99720 | Sigma non-opioid intracellular receptor 1 (Sigma1R) | IC50 | 1895 | 762 |
| Q99720 | Sigma non-opioid intracellular receptor 1 (Sigma1R) | Ki | 2584 | 1465 |
| CHEMBL613288 (Uniprot ID NA.) | Sigma non-opioid intracellular receptor 2 (Sigma2R) | Ki | 553 | 497 |
| P08588 | Beta-1 adrenergic receptor (ADRB1) | IC50 | 1471 | 599 |
| P07550 | Beta-2 adrenergic receptor (ADRB2) | IC50 | 1424 | 554 |
| P13945 | Beta-3 adrenergic receptor (ADRB3) | EC50 | 1478 | 1227 |
| P35348 | Alpha-1A adrenergic receptor | Ki | 1650 | 1260 |
| P35368 | Alpha-1b adrenergic receptor | Ki | 1567 | 1260 |
| P25100 | Alpha-1D adrenergic receptor | Ki | 2076 | 1060 |
| P35367 | Histamine H | Ki | 2239 | 1222 |
| P25021 | Histamine H | Ki | 1218 | 385 |
| Q9Y5N1 | Histamine H | Ki | 3799 | 3101 |
| Q9H3N8 | Histamine H | Ki | 1486 | 1095 |
| Q12809 | Potassium voltage-gated channel subfamily H member 2 (HERG) | Ki | 2539 | 1481 |
| P21728 | D(1A) dopamine receptor (DRD1) | Ki | 2244 | 1087 |
| P14416 | D(2) dopamine receptor (DRD2) | IC50 | 1667 | 725 |
| P35462 | D(3) dopamine receptor (DRD3) | IC50 | 1174 | 326 |
| P21917 | D(4) dopamine receptor (DRD4) | Ki | 3409 | 1900 |
| P21918 | D(1B) dopamine receptor (DRD5) | Ki | 529 | 341 |
| P47898 | 5-Hydroxytryptamine receptor 5A | Ki | 382 | 302 |
| P50406 | 5-Hydroxytryptamine receptor 6 | Ki | 4084 | 2632 |
| P46098 | 5-Hydroxytryptamine receptor 3A | Ki | 517 | 432 |
| P28222 | 5-Hydroxytryptamine receptor 1B | Ki | 1129 | 938 |
| P41595 | 5-Hydroxytryptamine receptor 2B | Ki | 2034 | 1149 |
| P28335 | 5-Hydroxytryptamine receptor 2C | Ki | 3433 | 2157 |
| P28221 | 5-Hydroxytryptamine receptor 1D | Ki | 1153 | 973 |
| P08908 | 5-Hydroxytryptamine receptor 1A | Ki | 4008 | 3244 |
| Q13639 | 5-Hydroxytryptamine receptor 4 | Ki | 540 | 422 |
| P34969 | 5-Hydroxytryptamine receptor 7 | Ki | 1753 | 1438 |
QSAR models based on all descriptors (RDKit descriptors and Morgan fingerprints) datasets
| Target protein name | Total number of observations (N-processed) | Total number of features (F) | Feature selection by scaled variables importance (VI)1 | Feature selection by unscaled variables importance (VI)2 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Training set | IVS | Selected features (SF) | SF-model (test set) | Final model (SF-model (IVS)) | Selected features (SF) | SF-model (test set) | Final model (SF-model (IVS)) | ||||||
| PVE | RMSE | PVE | RMSE | PVE | RMSE | PVE | RMSE | ||||||
| Glutamate [NMDA] receptor | 240 | 80 | 949 | 120 | 0.78 | 0.12 | 0.69 | 0.17 | 120 | 0.79 | 0.12 | 0.73 | 0.16 |
| Sigma non-opioid intracellular receptor 1 (Sigma1R) | 572 | 190 | 1079 | 220 | 0.68 | 0.15 | 0.47 | 0.19 | 29 | 0.62 | 0.16 | 0.40 | 0.20 |
| Sigma non-opioid intracellular receptor 1 (Sigma1R) | 1099 | 366 | 1117 | 111 | 0.64 | 0.17 | 0.60 | 0.18 | 116 | 0.59 | 0.17 | 0.61 | 0.17 |
| Sigma non-opioid intracellular receptor 2 (Sigma2R) | 373 | 124 | 875 | 201 | 0.71 | 0.11 | 0.57 | 0.14 | 234 | 0.66 | 0.13 | 0.61 | 0.14 |
| Beta-1 adrenergic receptor (ADRB1) | 450 | 149 | 1040 | 150 | 0.70 | 0.14 | 0.72 | 0.13 | 180 | 0.80 | 0.12 | 0.71 | 0.13 |
| Beta-2 adrenergic receptor (ADRB2) | 416 | 138 | 1032 | 133 | 0.76 | 0.13 | 0.70 | 0.16 | 76 | 0.75 | 0.13 | 0.69 | 0.16 |
| Beta-3 adrenergic receptor (ADRB3) | 921 | 306 | 1093 | 310 | 0.64 | 0.15 | 0.56 | 0.17 | 170 | 0.57 | 0.19 | 0.55 | 0.18 |
| Alpha-1A adrenergic receptor | 945 | 315 | 1108 | 206 | 0.69 | 0.16 | 0.67 | 0.18 | 170 | 0.73 | 0.16 | 0.66 | 0.18 |
| Alpha-1b adrenergic receptor | 945 | 315 | 1106 | 275 | 0.71 | 0.15 | 0.65 | 0.15 | 115 | 0.69 | 0.15 | 0.62 | 0.16 |
| Alpha-1D adrenergic receptor | 795 | 265 | 1109 | 270 | 0.69 | 0.16 | 0.65 | 0.17 | 370 | 0.68 | 0.16 | 0.66 | 0.17 |
| Histamine H | 917 | 305 | 1116 | 76 | 0.79 | 0.15 | 0.72 | 0.17 | 237 | 0.79 | 0.14 | 0.76 | 0.16 |
| Histamine H | 289 | 96 | 1037 | 9 | 0.30 | 0.11 | 0.32 | 0.13 | 180 | 0.62 | 0.07 | 0.33 | 0.13 |
| Histamine H | 2326 | 775 | 1134 | 397 | 0.62 | 0.16 | 0.63 | 0.16 | 282 | 0.66 | 0.16 | 0.63 | 0.16 |
| Histamine H | 822 | 273 | 1075 | 123 | 0.63 | 0.18 | 0.56 | 0.18 | 330 | 0.63 | 0.17 | 0.55 | 0.18 |
| Potassium voltage-gated channel subfamily H member 2 (HERG) | 1111 | 370 | 1132 | 120 | 0.69 | 0.12 | 0.54 | 0.15 | 160 | 0.64 | 0.12 | 0.55 | 0.15 |
| D(1A) dopamine receptor (DRD1) | 816 | 271 | 1118 | 118 | 0.73 | 0.15 | 0.68 | 0.17 | 219 | 0.75 | 0.15 | 0.70 | 0.16 |
| D(2) dopamine receptor (DRD2) | 544 | 181 | 1092 | 91 | 0.66 | 0.16 | 0.63 | 0.18 | 150 | 0.71 | 0.16 | 0.62 | 0.19 |
| D(3) dopamine receptor (DRD3) | 245 | 81 | 1054 | 36 | 0.58 | 0.21 | 0.58 | 0.19 | 195 | 0.66 | 0.18 | 0.61 | 0.18 |
| D(4) dopamine receptor (DRD4) | 1425 | 475 | 1124 | 368 | 0.60 | 0.18 | 0.63 | 0.17 | 395 | 0.60 | 0.18 | 0.62 | 0.17 |
| D(1B) dopamine receptor (DRD5) | 256 | 85 | 957 | 135 | 0.68 | 0.18 | 0.76 | 0.15 | 142 | 0.75 | 0.17 | 0.77 | 0.15 |
| 5-Hydroxytryptamine receptor 5A | 227 | 75 | 980 | 140 | 0.83 | 0.13 | 0.87 | 0.12 | 38 | 0.81 | 0.14 | 0.84 | 0.13 |
| 5-Hydroxytryptamine receptor 6 | 1974 | 658 | 1132 | 320 | 0.72 | 0.15 | 0.68 | 0.16 | 432 | 0.69 | 0.17 | 0.67 | 0.16 |
| 5-Hydroxytryptamine receptor 3A | 324 | 108 | 1045 | 150 | 0.69 | 0.19 | 0.71 | 0.19 | 230 | 0.62 | 0.21 | 0.71 | 0.19 |
| 5-Hydroxytryptamine receptor 1B | 704 | 234 | 1103 | 255 | 0.79 | 0.15 | 0.75 | 0.16 | 145 | 0.79 | 0.15 | 0.76 | 0.15 |
| 5-Hydroxytryptamine receptor 2B | 862 | 287 | 1130 | 101 | 0.51 | 0.18 | 0.37 | 0.19 | 110 | 0.57 | 0.15 | 0.39 | 0.19 |
| 5-Hydroxytryptamine receptor 2C | 1618 | 539 | 1135 | 263 | 0.67 | 0.16 | 0.62 | 0.18 | 244 | 0.64 | 0.18 | 0.62 | 0.17 |
| 5-Hydroxytryptamine receptor 1D | 730 | 243 | 1112 | 120 | 0.82 | 0.15 | 0.76 | 0.18 | 250 | 0.76 | 0.19 | 0.77 | 0.18 |
| 5-Hydroxytryptamine receptor 1A | 2433 | 811 | 1134 | 470 | 0.61 | 0.19 | 0.65 | 0.17 | 360 | 0.59 | 0.19 | 0.66 | 0.17 |
| 5-Hydroxytryptamine receptor 4 | 317 | 105 | 948 | 203 | 0.80 | 0.16 | 0.66 | 0.22 | 280 | 0.83 | 0.15 | 0.71 | 0.20 |
| 5-Hydroxytryptamine receptor 7 | 1079 | 359 | 1122 | 210 | 0.65 | 0.16 | 0.59 | 0.18 | 290 | 0.66 | 0.16 | 0.61 | 0.17 |
Comparison of performance of QSAR models (with and without feature selection)
| Target protein name | Total number of observations (N-processed) | Total number of features (F) | PVE (IVS) | RMSE (IVS) | |||||
|---|---|---|---|---|---|---|---|---|---|
| Training set | IVS | Full model without feature delection | Final model with feature selection | Full model without feature selection | Final model with feature selection | ||||
| Full-model | SF-model (VI)1 | SF-model (VI)2 | Full-model | SF-model (VI)1 | SF-model (VI)2 | ||||
| Glutamate [NMDA] receptor | 240 | 80 | 949 | 0.30 | 0.69 | 0.73 | 0.25 | 0.17 | 0.16 |
| Sigma non-opioid intracellular receptor 1 (Sigma1R) | 572 | 190 | 1079 | 0.31 | 0.47 | 0.40 | 0.21 | 0.19 | 0.20 |
| Sigma non-opioid intracellular receptor 1 (Sigma1R) | 1099 | 366 | 1117 | 0.45 | 0.60 | 0.61 | 0.21 | 0.18 | 0.17 |
| Sigma non-opioid intracellular receptor 2 (Sigma2R) | 373 | 124 | 875 | 0.46 | 0.57 | 0.61 | 0.16 | 0.14 | 0.14 |
| Beta-1 adrenergic receptor (ADRB1) | 450 | 149 | 1040 | 0.41 | 0.72 | 0.71 | 0.19 | 0.13 | 0.13 |
| Beta-2 adrenergic receptor (ADRB2) | 416 | 138 | 1032 | 0.46 | 0.70 | 0.69 | 0.21 | 0.16 | 0.16 |
| Beta-3 adrenergic receptor (ADRB3) | 921 | 306 | 1093 | 0.37 | 0.56 | 0.55 | 0.21 | 0.17 | 0.18 |
| Alpha-1A adrenergic receptor | 945 | 315 | 1108 | 0.53 | 0.67 | 0.66 | 0.21 | 0.18 | 0.18 |
| Alpha-1b adrenergic receptor | 945 | 315 | 1106 | 0.48 | 0.65 | 0.62 | 0.18 | 0.15 | 0.16 |
| Alpha-1D adrenergic receptor | 795 | 265 | 1109 | 0.47 | 0.65 | 0.66 | 0.21 | 0.17 | 0.17 |
| Histamine H | 917 | 305 | 1116 | 0.59 | 0.72 | 0.76 | 0.21 | 0.17 | 0.16 |
| Histamine H | 289 | 96 | 1037 | 0.13 | 0.32 | 0.33 | 0.14 | 0.13 | 0.13 |
| Histamine H | 2326 | 775 | 1134 | 0.46 | 0.63 | 0.63 | 0.19 | 0.16 | 0.16 |
| Histamine H | 822 | 273 | 1075 | 0.34 | 0.56 | 0.55 | 0.22 | 0.18 | 0.18 |
| Potassium voltage-gated channel subfamily H member 2 (HERG) | 1111 | 370 | 1132 | 0.42 | 0.54 | 0.55 | 0.17 | 0.15 | 0.15 |
| D(1A) dopamine receptor (DRD1) | 816 | 271 | 1118 | 0.50 | 0.68 | 0.70 | 0.21 | 0.17 | 0.16 |
| D(2) dopamine receptor (DRD2) | 544 | 181 | 1092 | 0.51 | 0.63 | 0.62 | 0.21 | 0.18 | 0.19 |
| D(3) dopamine receptor (DRD3) | 245 | 81 | 1054 | 0.32 | 0.58 | 0.61 | 0.24 | 0.19 | 0.18 |
| D(4) dopamine receptor (DRD4) | 1425 | 475 | 1124 | 0.47 | 0.63 | 0.62 | 0.20 | 0.17 | 0.17 |
| D(1B) dopamine receptor (DRD5) | 256 | 85 | 957 | 0.56 | 0.76 | 0.77 | 0.20 | 0.15 | 0.15 |
| 5-Hydroxytryptamine receptor 5A | 227 | 75 | 980 | 0.58 | 0.87 | 0.84 | 0.22 | 0.12 | 0.13 |
| 5-Hydroxytryptamine receptor 6 | 1974 | 658 | 1132 | 0.48 | 0.68 | 0.67 | 0.20 | 0.16 | 0.16 |
| 5-Hydroxytryptamine receptor 3A | 324 | 108 | 1045 | 0.41 | 0.71 | 0.71 | 0.27 | 0.19 | 0.19 |
| 5-Hydroxytryptamine receptor 1B | 704 | 234 | 1103 | 0.45 | 0.75 | 0.76 | 0.23 | 0.16 | 0.15 |
| 5-Hydroxytryptamine receptor 2B | 862 | 287 | 1130 | 0.31 | 0.37 | 0.39 | 0.20 | 0.19 | 0.19 |
| 5-Hydroxytryptamine receptor 2C | 1618 | 539 | 1135 | 0.48 | 0.62 | 0.62 | 0.21 | 0.18 | 0.17 |
| 5-Hydroxytryptamine receptor 1D | 730 | 243 | 1112 | 0.49 | 0.76 | 0.77 | 0.24 | 0.18 | 0.18 |
| 5-Hydroxytryptamine receptor 1A | 2433 | 811 | 1134 | 0.43 | 0.65 | 0.66 | 0.21 | 0.17 | 0.17 |
| 5-Hydroxytryptamine receptor 4 | 317 | 105 | 948 | 0.35 | 0.66 | 0.71 | 0.26 | 0.22 | 0.20 |
| 5-Hydroxytryptamine receptor 7 | 1079 | 359 | 1122 | 0.43 | 0.59 | 0.61 | 0.21 | 0.18 | 0.17 |
Fig. 5Comparison of models with and without feature selection. Pink color represents the full-model without feature selection [with all variables (F)], green color is for SF-model ((VI)1) contains predefined set of features (SF) identified by scaled permutation importance, and blue color represents SF-model ((VI)2) having selected features (SF) by unscaled variable importance measure
Fig. 6Size of the problems and predictive power of fitted models. Blue dots represent externally validated models with feature selection by scaled importance, and golden yellow color denotes externally validated models with feature selection by unscaled importance measure
Fig. 7Models over-fitting analysis. Models with a predefined set of features identified by scaled variable importance (a) and unscaled variable importance (b)
Fig. 8versus QSAR_PVE for 30 datasets. K is the number of nearest neighbors. a K = 3 and b K = 5. QSAR_PVE(IVS) is PVE score of externally validated models without feature selection (Full-model) and with selected features (SF-model). High correlation with SF-models QSAR_PVE suggests is good modelability criteria. Weaker correlation between Full-model QSAR_PVE and emphasize the importance of feature selection to obtain actual and reliable predictive performance of QSAR model