| Literature DB >> 35008467 |
Jacob Spiegel1, Hanoch Senderowitz1.
Abstract
Virtual screening (VS) is a well-established method in the initial stages of many drug and material design projects. VS is typically performed using structure-based approaches such as molecular docking, or various ligand-based approaches. Most docking tools were designed to be as global as possible, and consequently only require knowledge on the 3D structure of the biotarget. In contrast, many ligand-based approaches (e.g., 3D-QSAR and pharmacophore) require prior development of project-specific predictive models. Depending on the type of model (e.g., classification or regression), predictive ability is typically evaluated using metrics of performance on either the training set (e.g.,QCV2) or the test set (e.g., specificity, selectivity or QF1/F2/F32). However, none of these metrics were developed with VS in mind, and consequently, their ability to reliably assess the performances of a model in the context of VS is at best limited. With this in mind we have recently reported the development of the enrichment optimization algorithm (EOA). EOA derives QSAR models in the form of multiple linear regression (MLR) equations for VS by optimizing an enrichment-based metric in the space of the descriptors. Here we present an improved version of the algorithm which better handles active compounds and which also takes into account information on inactive (either known inactive or decoy) compounds. We compared the improved EOA in small-scale VS experiments with three common docking tools, namely, Glide-SP, GOLD and AutoDock Vina, employing five molecular targets (acetylcholinesterase, human immunodeficiency virus type 1 protease, MAP kinase p38 alpha, urokinase-type plasminogen activator, and trypsin I). We found that EOA consistently outperformed all docking tools in terms of the area under the ROC curve (AUC) and EF1% metrics that measured the overall and initial success of the VS process, respectively. This was the case when the docking metrics were calculated based on a consensus approach and when they were calculated based on two different sets of single crystal structures. Finally, we propose that EOA could be combined with molecular docking to derive target-specific scoring functions.Entities:
Keywords: AutoDock Vina; GOLD; Glide; QSAR; docking; enrichment optimization algorithm; virtual screening
Mesh:
Substances:
Year: 2021 PMID: 35008467 PMCID: PMC8744642 DOI: 10.3390/ijms23010043
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
EOA results obtained for all four subsets from all five datasets using 7, 10 and 13-descriptor models. Results are provided in terms on the number and percentage (based on the total number of actives) of active compounds appearing within the first L places of the list ranked according to the EOA equation.
| Dataset | # Descriptors | # Actives = | # Actives among | ||||
|---|---|---|---|---|---|---|---|
| Train | Validation | Test | Train (%) | Validation (%) | Test (%) | ||
| ACES-1 | 7 | 430 | 106 | 107 | 324 (75%) | 75 (70%) | 15 (14%) |
| 10 | 315 (73%) | 72 (67%) | 28 (26%) | ||||
| 13 | 343 (80%) | 84 (79%) | 49 (46%) | ||||
| ACES-2 | 7 | 312 (73%) | 68 (64%) | 42 (40%) | |||
| 10 | 326 (76%) | 72 (67%) | 38 (36%) | ||||
| 13 | 357 (83%) | 79 (74%) | 0 (0%) | ||||
| ACES-3 | 7 | 322 (75%) | 78 (73%) | 38 (36%) | |||
| 10 | 319 (74%) | 74 (69%) | 13 (12%) | ||||
| 13 | 323 (75%) | 69 (64%) | 20 (19%) | ||||
| ACES-4 | 7 | 331 (77%) | 73 (68%) | 47 (44%) | |||
| 10 | 326 (76%) | 73 (68%) | 56 (53%) | ||||
| 13 | 329 (77%) | 75 (70%) | 5 (5%) | ||||
| HIVPR-1 | 7 | 912 | 227 | 227 | 766 (84%) | 187 (82%) | 66 (29%) |
| 10 | 806 (88%) | 204 (90%) | 49 (22%) | ||||
| 13 | 831 (91%) | 202 (89%) | 127 (56%) | ||||
| HIVPR-2 | 7 | 742 (81%) | 182 (80%) | 75 (33%) | |||
| 10 | 830 (91%) | 201 (89%) | 91 (40%) | ||||
| 13 | 801 (88%) | 196 (86%) | 38 (17%) | ||||
| HIVPR-3 | 7 | 750 (82%) | 192 (85%) | 66 (29%) | |||
| 10 | 800 (88%) | 200 (88%) | 73 (32%) | ||||
| 13 | 837 (92%) | 206 (91%) | 124 (55%) | ||||
| HIVPR-4 | 7 | 759 (83%) | 182 (80%) | 61 (27%) | |||
| 10 | 804 (88%) | 197 (87%) | 65 (29%) | ||||
| 13 | 807 (88%) | 198 (87%) | 94 (41%) | ||||
| MK14-1 | 7 | 608 | 151 | 152 | 455 (75%) | 111 (74%) | 41 (27%) |
| 10 | 477 (78%) | 116 (77%) | 40 (26%) | ||||
| 13 | 464 (76%) | 110 (73%) | 37 (24%) | ||||
| MK14-2 | 7 | 469 (77%) | 106 (70%) | 29 (19%) | |||
| 10 | 471 (77%) | 111 (74%) | 42 (28%) | ||||
| 13 | 491 (81%) | 111 (74%) | 49 (32%) | ||||
| MK14-3 | 7 | 421 (69%) | 114 (75%) | 43 (28%) | |||
| 10 | 472 (78%) | 125 (83%) | 41 (27%) | ||||
| 13 | 482 (79%) | 122 (81%) | 48 (32%) | ||||
| MK14-4 | 7 | 440 (72%) | 103 (68%) | 27 (18%) | |||
| 10 | 466 (77%) | 107 (71%) | 34 (22%) | ||||
| 13 | 470 (77%) | 109 (72%) | 39 (26%) | ||||
| UROK-1 | 7 | 200 | 49 | 49 | 192 (96%) | 46 (94%) | 29 (59%) |
| 10 | 193 (97%) | 48 (98%) | 30 (61%) | ||||
| 13 | 191 (96%) | 47 (96%) | 34 (69%) | ||||
| UROK-2 | 7 | 194 (97%) | 47 (96%) | 22 (45%) | |||
| 10 | 193 (97%) | 47 (96%) | 30 (61%) | ||||
| 13 | 195 (98%) | 46 (94%) | 29 (59%) | ||||
| UROK-3 | 7 | 192 (96%) | 46 (94%) | 27 (55%) | |||
| 10 | 180 (90%) | 42 (86%) | 25 (51%) | ||||
| 13 | 193 (97%) | 47 (96%) | 34 (69%) | ||||
| UROK-4 | 7 | 194 (97%) | 46 (94%) | 21 (43%) | |||
| 10 | 195 (98%) | 46 (94%) | 25 (51%) | ||||
| 13 | 194 (97%) | 46 (94%) | 33 (67%) | ||||
| TRY1-1 | 7 | 504 | 125 | 126 | 445 (88%) | 100 (80%) | 75 (60%) |
| 10 | 460 (91%) | 109 (87%) | 81 (64%) | ||||
| 13 | 438 (87%) | 105 (84%) | 55 (44%) | ||||
| TRY1-2 | 7 | 449 (89%) | 116 (93%) | 74 (59%) | |||
| 10 | 465 (92%) | 111 (89%) | 84 (67%) | ||||
| 13 | 456 (90%) | 114 (91%) | 71 (56%) | ||||
| TRY1-3 | 7 | 463 (92%) | 113 (90%) | 83 (66%) | |||
| 10 | 465 (92%) | 113 (90%) | 86 (68%) | ||||
| 13 | 461 (91%) | 110 (88%) | 77 (61%) | ||||
| TRY1-4 | 7 | 455 (90%) | 111 (89%) | 87 (69%) | |||
| 10 | 453 (90%) | 105 (84%) | 79 (63%) | ||||
| 13 | 464 (92%) | 110 (88%) | 79 (63%) | ||||
EOA and docking results for all test sets expressed in terms of AUC and EF1% values. The docking results are presented as consensuses of two crystal structures per target. For each dataset, the best result is highlighted. We note that EF1% values are indifferent to the order of the active compounds within the first 1% of the library, and it was therefore not unlikely to obtain identical EF1% values from different methods.
| Set | Method | AUC | EF1% | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| ACES | HIVPR | MK14 | UROK | TRY1 | ACES | HIVPR | MK14 | UROK | TRY1 | ||
| 1 | EOA-7 | 0.862 | 0.775 | 0.905 | 0.997 | 0.979 | 36.449 | 3.965 | 40.132 | 77.551 | 73.810 |
| EOA-10 | 0.886 | 0.946 | 0.947 | 0.997 | 0.996 | 26.168 | 20.705 | 39.474 | 81.633 | 80.159 | |
| EOA-13 | 0.899 | 0.977 | 0.927 | 0.996 | 0.986 | 58.879 | 59.471 | 25.658 | 81.633 | 58.730 | |
| AD Vina | 0.764 | 0.747 | 0.737 | 0.749 | 0.806 | 10.280 | 7.048 | 10.526 | 4.082 | 4.762 | |
| GOLD | 0.739 | 0.726 | 0.676 | 0.830 | 0.861 | 30.841 | 13.656 | 12.500 | 32.653 | 15.873 | |
| Glide | 0.735 | 0.678 | 0.743 | 0.801 | 0.847 | 14.953 | 11.454 | 15.789 | 40.816 | 35.714 | |
| 2 | EOA-7 | 0.808 | 0.813 | 0.902 | 0.986 | 0.958 | 8.411 | 14.097 | 20.395 | 69.388 | 75.397 |
| EOA-10 | 0.885 | 0.955 | 0.914 | 0.978 | 0.982 | 43.925 | 21.586 | 41.447 | 81.633 | 79.365 | |
| EOA-13 | 0.921 | 0.927 | 0.925 | 0.987 | 0.966 | 9.346 | 19.383 | 37.500 | 73.469 | 71.429 | |
| AD Vina | 0.760 | 0.754 | 0.753 | 0.766 | 0.795 | 12.150 | 3.965 | 8.553 | 2.041 | 4.762 | |
| GOLD | 0.719 | 0.687 | 0.686 | 0.785 | 0.849 | 28.972 | 12.335 | 9.868 | 28.571 | 23.810 | |
| Glide | 0.693 | 0.612 | 0.729 | 0.816 | 0.832 | 14.019 | 8.370 | 18.421 | 36.735 | 42.857 | |
| 3 | EOA-7 | 0.896 | 0.918 | 0.894 | 0.955 | 0.982 | 42.991 | 9.692 | 25.000 | 79.592 | 76.984 |
| EOA-10 | 0.860 | 0.946 | 0.891 | 0.956 | 0.980 | 7.477 | 34.802 | 23.684 | 75.510 | 79.365 | |
| EOA-13 | 0.895 | 0.882 | 0.918 | 0.959 | 0.981 | 21.495 | 31.718 | 27.632 | 89.796 | 73.016 | |
| AD Vina | 0.762 | 0.769 | 0.768 | 0.777 | 0.786 | 10.280 | 7.048 | 9.868 | 6.122 | 3.968 | |
| GOLD | 0.710 | 0.715 | 0.686 | 0.866 | 0.849 | 27.103 | 17.621 | 9.868 | 34.694 | 26.190 | |
| Glide | 0.687 | 0.667 | 0.753 | 0.840 | 0.857 | 14.953 | 10.132 | 23.026 | 42.857 | 44.444 | |
| 4 | EOA-7 | 0.859 | 0.915 | 0.892 | 0.980 | 0.983 | 14.019 | 16.300 | 21.711 | 61.224 | 80.159 |
| EOA-10 | 0.919 | 0.922 | 0.934 | 0.986 | 0.982 | 58.879 | 13.656 | 23.026 | 67.347 | 76.190 | |
| EOA-13 | 0.945 | 0.978 | 0.934 | 0.983 | 0.983 | 13.084 | 51.542 | 36.184 | 79.592 | 79.365 | |
| AD Vina | 0.796 | 0.740 | 0.766 | 0.765 | 0.809 | 8.411 | 5.727 | 5.921 | 6.122 | 7.143 | |
| GOLD | 0.736 | 0.722 | 0.663 | 0.818 | 0.854 | 28.972 | 13.656 | 7.895 | 26.531 | 25.397 | |
| Glide | 0.691 | 0.646 | 0.728 | 0.818 | 0.860 | 10.280 | 12.335 | 16.447 | 40.816 | 46.825 | |
Figure 1ROC curves for the best-performing docking method based on the consensus approach compared with EOA and the other docking methods (see text for more details). In all cases, the EOA performed better than all docking methods.
Figure 2ROC curves obtained from the EOA models derived from the scrambled sets. In all cases AUC values are below 0.5, indicating performances lower than random.
Test set AUC and EF1% values obtained for 3-descriptors EOA models.
| Target | AUC | EF1% |
|---|---|---|
| ACES | 0.825 | 24.299 |
| HIVPR | 0.860 | 18.943 |
| MK14 | 0.807 | 13.158 |
| UROK | 0.897 | 20.408 |
| TRY1 | 0.932 | 24.603 |
Averaged ± SD Euclidean distances between active and decoy compounds for subsets used to derive the best 10-descriptor models for each target. Distances are based on all principle components obtained from PCA.
| Target | Set | # Descriptors | Euclidean Distances (Average ± Standard Deviation) | AUC |
|---|---|---|---|---|
| ACES | 4 | 10 | 4.34 ± 1.41 | 0.885 |
| HIVPR | 2 | 10 | 4.10 ± 1.23 | 0.955 |
| MK14 | 1 | 10 | 4.90 ± 1.59 | 0.947 |
| UROK | 1 | 10 | 4.59 ± 1.30 | 0.997 |
| TRY1 | 1 | 10 | 4.72 ± 1.32 | 0.996 |
Descriptions of the five datasets used in this work, including the numbers of active and decoy compounds in training, validation and test sets. The UROK dataset had fewer active/decoy compounds listed in DUD-E in comparison with all other datasets.
| Dataset | PDB Codes | # Active | # Decoy | Training | Validation | Test | |||
|---|---|---|---|---|---|---|---|---|---|
| # Active | # Decoy | # Active | # Decoy | # Active | # Decoy | ||||
|
| 1e66, 1acj | 643 | 24,161 | 430 | 3333 | 106 | 832 | 107 | 19,996 |
|
| 1xl2, 2pwc | 1366 | 35,071 | 912 | 3333 | 227 | 832 | 227 | 30,906 |
|
| 2qd9, 3o8t | 911 | 34,896 | 608 | 3333 | 151 | 832 | 152 | 30,731 |
|
| 1sqt, 4fue | 298 | 9262 | 200 | 1666 | 49 | 416 | 49 | 7180 |
|
| 2ayw, 3rxl | 755 | 24,760 | 504 | 3333 | 125 | 832 | 126 | 20,595 |
Figure 3A flowchart of the modified enrichment optimizer algorithm (EOA). See the Supporting information for more details.