| Literature DB >> 28369284 |
Keisuke Yanagisawa1,2, Shunta Komine2,3, Shogo D Suzuki2,3, Masahito Ohue1,3,4, Takashi Ishida1,2,3,4, Yutaka Akiyama1,2,3,4.
Abstract
MOTIVATION: Recently, the number of available protein tertiary structures and compounds has increased. However, structure-based virtual screening is computationally expensive owing to docking simulations. Thus, methods that filter out obviously unnecessary compounds prior to computationally expensive docking simulations have been proposed. However, the calculation speed of these methods is not fast enough to evaluate ≥ 10 million compounds.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28369284 PMCID: PMC5860314 DOI: 10.1093/bioinformatics/btx178
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1Spresso flowchart
Fig. 2An example of compound decomposition. The carbon moiety in the structure on the right has four adjacent groups; therefore, it is not merged into any adjacent groups
The results of docking times for docking of all 28 629 602 ZINC compounds into three DUD-E protein targets
| Target | Calculation time [CPU hours] | ||
|---|---|---|---|
| Spresso-SP | Spresso-HTVS | Glide HTVS | |
| ACES | 42.6 (× 76.8) | 22.8 (× 143.1) | 3268.8 |
| EGFR | 38.9 (× 126.4) | 21.5 (× 229.3) | 4925.1 |
| PGH1 | 41.8 (× 88.0) | 20.9 (× 175.4) | 3674.5 |
Values in parentheses indicate the fold increase in speed exhibited by Spresso relative to Glide HTVS.
The results of averaged prediction accuracy for 102 DUD-E targets
| Methods | Enrichment factors | |||||
|---|---|---|---|---|---|---|
| 2%–1% | 5%–1% | 10%–1% | 5%–2% | 10%–2% | ||
| Spresso-SP | SUM | 4.58 | 6.78 | 8.92 | 4.00 | 5.53 |
| MAX | 9.28 | 11.01 | 11.94 | 7.51 | 8.31 | |
| GS3 | ||||||
| Spresso-HTVS | SUM | 4.60 | 6.78 | 8.93 | 4.20 | 5.46 |
| MAX | 9.29 | 9.93 | 12.41 | 6.38 | 8.29 | |
| GS3 | 9.00 | 12.18 | 14.49 | 7.39 | 9.24 | |
| Glide HTVS | 17.85 | 18.97 | 19.60 | 12.50 | 12.92 | |
Note: All enrichment factors represent the average of 102 EFs from DUD-E protein targets. (a%–b%) indicates the EFb% when compounds were prescreened using a% of all compounds. Best EF values among Spressos are written in bold.
Fig. 3A scatter plot of the Glide SP score and the Spresso-SP score for DUD-E CP3A4 target. Each dot represents a compound in DUD-E CP3A4 dataset. The correlation coefficient is R = 0.55
Fig. 4A Venn diagram of selected compounds identified by pre-screening for CP3A4, a DUD-E target. The top 1000 compounds identified by Glide SP, Glide HTVS and Spresso-SP are shown. The number of compounds for each method is shown and the numbers of true positives are in parentheses
Fig. 5Scatter plot of physicochemical features based on pre-screening for ACES, a DUD-E protein target. Each dot represents a compound: cyan dots represent 0.1% of the compounds from the ZINC database; orange dots represent the top 0.1% of Spresso-SP compounds calculated using the method (III) GS3 formula; and magenta dots represent active compounds for ACES from the DUD-E dataset
Fig. 6Boxplot representation and average (square dots) of the maximum Tanimoto coefficient between active compounds of target ACES. The data indicate structural diversity. ZINC, SVM, Glide HTVS and Spresso represent 0.1% of randomly selected compounds from the ZINC database, the top 0.1% of compounds resulting from SVM prediction, the top 0.1% of compounds resulting from Glide HTVS scoring, and the top 0.1% of compounds returned from Spresso-SP results using method (III) GS3 scoring, respectively
Fig. 7(A) Structure of ZINC12181222, the highest scoring compound for the protein target ACES. (B) Result of ZINC12181222 decomposition. (C) Results of fragment docking. The color of the structure mimics those of the structures shown in (A) and (B)