| Literature DB >> 28931921 |
Shuntaro Chiba1,2, Takashi Ishida1,2,3, Kazuyoshi Ikeda4, Masahiro Mochizuki5, Reiji Teramoto6, Y-H Taguchi7, Mitsuo Iwadate8, Hideaki Umeyama8, Chandrasekaran Ramakrishnan9, A Mary Thangakani10, D Velmurugan10, M Michael Gromiha9, Tatsuya Okuno11, Koya Kato12, Shintaro Minami13, George Chikenji12, Shogo D Suzuki3, Keisuke Yanagisawa3, Woong-Hee Shin14, Daisuke Kihara14,15, Kazuki Z Yamamoto16, Yoshitaka Moriwaki17, Nobuaki Yasuo3, Ryunosuke Yoshino17,18, Sergey Zozulya19,20, Petro Borysko19,20, Roman Stavniichuk19, Teruki Honma1,3,21, Takatsugu Hirokawa22,23,24, Yutaka Akiyama1,2,3,22,24, Masakazu Sekijima25,26,27,28,29.
Abstract
We propose a new iterative screening contest method to identify target protein inhibitors. After conducting a compound screening contest in 2014, we report results acquired from a contest held in 2015 in this study. Our aims were to identify target enzyme inhibitors and to benchmark a variety of computer-aided drug discovery methods under identical experimental conditions. In both contests, we employed the tyrosine-protein kinase Yes as an example target protein. Participating groups virtually screened possible inhibitors from a library containing 2.4 million compounds. Compounds were ranked based on functional scores obtained using their respective methods, and the top 181 compounds from each group were selected. Our results from the 2015 contest show an improved hit rate when compared to results from the 2014 contest. In addition, we have successfully identified a statistically-warranted method for identifying target inhibitors. Quantitative analysis of the most successful method gave additional insights into important characteristics of the method used.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28931921 PMCID: PMC5607274 DOI: 10.1038/s41598-017-10275-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(a) The flowchart of the contest. The participated groups (G1–G11) proposed 400 compounds (cmpds) with a prioritized rank from compound library using their own methods. The proposed compounds that were not stocked-out were selected until the number of compounds reached 181 for each group. If there is a duplication in the proposed compounds from different groups, such group attained additional compounds to be assayed. This is the reason why there are differences among the number of selected compounds of each group. Finally, the selected compounds were assayed. (b) The screening flow of the compounds in the experimental assay. The filtering criteria are shown in a trapezium.
Figure 2(a) The IC50 of compounds from each group, where results of those groups whose compounds did not proceed to the IC50 analysis are omitted. The compounds of log (IC50/1 M) less than −5 are hit compounds. The error bars represent a 68% confidence interval estimated from the IC50 assay. (b) The number of hit compounds included within a prioritized rank of compounds that were proposed from each group.
Figure 3Similarity of each hit compound to known Src-family kinase inhibitors (see Section Preparation of compound library) is plotted against experimental inhibition activity. The error bars represent 68% confidence intervals estimated from IC50 assays. The similarity in these figures was calculated with the Tanimoto coefficient of the MACCS descriptor[41]. A chemical structure of the most similar compound of each hit is shown in Table S5 of the Supporting Information with its ChEMBL ID and literature.
Figure 4(a) Hit rate of compounds in each cluster with respect to the three kinase families. The hit rate was calculated by dividing the number of hit compounds by the number of compounds with inhibition rates that were measured to the family. (b) Hit rate of compounds in each cluster with respect to the three groups of Src-family kinases. The 11 kinases defined by the kinome were classified into three groups: Group 1: Src, Fyn, Yes, Fgr; Group 2: Blk, Hck, Lck Lyn; and Group 3: Frk Srm, Brk based on the kinome[42]. The clustering was calculated with Canvas[46, 47] based on the k-means algorithm[48] of the MACCS descriptor[41]. The clusters in Fig. 4a do not correspond to those in Fig. 4b.
Figure 5Diversified screening by collecting various computational methods. Principal component analysis of the library compounds in this study was applied, in which the MACCS fingerprint was used. The cumulative variance of the principal component (PC) 1 and 2 are 26% and 49%. (a) Compounds proposed from groups participating in the previous contest are projected to the PC1 and PC2. Two hit compounds confirmed based on IC50 determination are plotted. To avoid the complication of symbols, the top 60 compounds in the proposed list are shown. As for the compound library, a randomly chosen 2.5% of all the compounds are shown. (b) The same analysis as (a) is conducted using data from this study. Ten hit compounds (magenta for G3, cyan for G10, green for G11) are plotted. (c) Number density in the PC1 and PC2 of all the compounds are shown.
Summary of methods used by participant groups.
| Group | Modeling of Yes structure | Ligand preparation | Processing method of compound library | |||
|---|---|---|---|---|---|---|
|
|
|
|
|
| ||
| 1 | — | — | — | LB: 1D and 2D PaDEL descriptor[ | The top 7 compounds in PubChem (AID 686947)[ | The rest of compounds |
| 2 | — | — | — | LB: Morgan descriptor[ | 80% of PubChem (AID 686947)[ | |
| 3 | — | — | — | LB: Morgan2[ | Eliminated.sdf.zip, | |
| 4 | Homology modeling ( | 1Y57[ |
| SB: | ||
| 5 | Homology modeling ( | 1OPK[ |
| Hybrid (LB & SB): | IPAB2014 | IPAB2014 |
| 6 | Homology modeling ( | 2SRC[ |
| Hybrid (LB & SB): | Eliminated.sdf.zip | DUD-E9 |
| 7 | — | — | — | LB: Physicochemical properties and topological descriptors complied in | Eliminated.sdf.zip | IPAB2014 |
| 8 | Homology modeling ( | 2H8H[ |
| SB: | — | — |
| 9 | Homologous protein structure themselves were used. | 1YI6[ |
| Hybrid (LB → SB) LB: Drug-like filtering ( | ||
| 10 | Homology modeling ( | 2H8H[ |
| Hybrid (LB → SB) LB: | List 1–3 for Ligand-based filtering (See Section | — |
| 11 | Homology modeling ( | 1Y57[ |
| SB: | Actives (IC50 <1 μM) in ChEMBL, IPAB2014 | 300 compounds from IPAB2014 |
Software names are given in italics.
Known Src-kinase inhibitors distributed by IPAB (see Preparation of compound library section).
Inhibitory assay results of the previous contest[10], in which experimental conditions were the same as this study.
PDB = protein data bank; LB = ligand-based; SB = structure-based; IPAB = Initiative for Parallel Bioinformatics; MD = molecular dynamics;
IC50 values of compounds that passed the validation assay (the 2nd screening).
| Compound ID | Chemical Structure | IC50 μM | 95% CI μM | Group | |
|---|---|---|---|---|---|
| lower | upper | ||||
| Z64663950 |
| 0.26 | 0.22 | 0.31 | 3 |
| Z49895016 |
| 0.30 | 0.23 | 0.38 | 3 |
| Z64663944 |
| 0.35 | 0.13 | 0.99 | 3 |
| Z1229984790 |
| 0.71 | 0.24 | 2.10 | 10 |
| Z57745314 |
| 1.16 | 0.51 | 2.62 | 3 |
| Z57745304 |
| 1.9 | 1.5 | 2.4 | 3 |
| Z199512484 |
| 3.0 | 1.9 | 4.7 | 3 |
| Z410927360 |
| 3.4 | 3.2 | 3.6 | 10 |
| Z295464022 |
| 5.0 | 3.5 | 7.3 | 3 |
| Z449737600 |
| 7.0 | 5.2 | 9.3 | 11 |
| Z1252403274 |
| 20.0 | 15.6 | 25.6 | 11 |
| Z275023406 |
| 37.4 | 16.9 | 82.5 | 5 |
| Z57745307 |
| — | — | — | 3 |
| Z50080378 |
| — | — | — | 3 |
| Z1283491630 |
| — | — | — | 5 |
| Z50080181 |
| — | — | — | 3 |
Inhibition rates from the first and second screenings are shown in Tables S1 and S2 of the Supporting Information along with the canonical SMILES. The final reagent concentrations were 5.5-nmol L−1 Yes, 0.013-mmol L−1 ATP, and 0.2-mg mL−1 substrate (poly Glu-Tyr peptides, Glu:Tyr=4:1).
(a) 95% confidence interval. Some compounds are not a hit because of insufficient potency (b) or a bad dose-dependence relationship (c). (d) These compounds are hydrazones or a potential Michel acceptor (see sections “Experimental procedure and screening of potential inhibitors” and “Comparison of ligand-based and structure-based methods”).
IC50 = inhibitory concentrations; CI = confidence interval.
Range of experimental conditions used for training a machine learning technique.
| Feature | Reagent concentration (μmol L−1) | pH | ||
|---|---|---|---|---|
| Compound | ATP | Mg2+b | ||
| Average | 4.6 | 91 | 5500 | 7.3 |
| Minimum | 4.6 × 10−4 | 10 | 0 | 7.5 |
| Maximum | 670 | 200 | 2 × 104 | 7.0 |
| Standard deviation | 20 | 27 | 3900 | 0.2 |
aIn addition to these features, dummy parameters that distinguish sources of experimental studies were combined with the training set.
bThis range was calculated based on the actual training set used, which included trivial mistakes in retrieving experimental parameters. As Mg2+ usually coexists adequately in assay samples, it would not affect inhibition rates. G3 confirmed that removing the concentration of Mg2+ from the training set did not affect the result after participation in the contest.