| Literature DB >> 30023563 |
Carmen Cerchia1,2, Dilyana Dimova1, Antonio Lavecchia2, Jürgen Bajorath1.
Abstract
Analog series were systematically extracted from more than 650 000 bioactive compounds originating from medicinal chemistry and screening sources and more than 3.6 million commercial compounds that were not biologically annotated. Then, analog series-based (ASB) scaffolds were generated. For each scaffold from a bioactive series, a target profile was derived and ASB scaffolds shared by bioactive and commercial compounds were determined. On the basis of our analysis, large segments of commercial chemical space were not yet explored biologically. Shared ASB scaffolds established structural relationships between bioactive and commercial chemical space, and the target profiles of these scaffolds were transferred to commercially available analogs of active compounds. This made it possible to derive target hypotheses for more than 37 000 compounds without biological annotations covering more than 1000 different targets. For many molecules, alternative target assignments were available. Target hypotheses for these compounds should be of interest, for example, for hit expansion, acquisition of compounds to design or further extend focused libraries for drug discovery, or testing of expanded analog series on different targets. They can also be used to search for analogs and complement compound series during target-directed optimization. Therefore, all of the commercial molecules with new target hypotheses as well as key scaffolds identified in our analysis and their target profiles are made freely available.Entities:
Year: 2017 PMID: 30023563 PMCID: PMC6044811 DOI: 10.1021/acsomega.7b01338
Source DB: PubMed Journal: ACS Omega ISSN: 2470-1343
Compound, Analog Series, and ASB Scaffold Statisticsa
| ZINC | ChEMBL | PubChem | |
|---|---|---|---|
| # unique CPDs | 3 658 425 | 224 532 | 426 294 |
| # analog series | 264 496 | 22 015 | 42 513 |
| # CPDs in analog series | 2 147 784 (58.7%) | 133 441 (59.4%) | 259 019 (60.7%) |
| # ASB scaffolds | 208 004 (78.6%) | 15 625 (71%) | 31 602 (74.3%) |
| # CPDs in series with ASB scaffolds | 604 382 (28.1%) | 51 308 (38.4%) | 92 794 (35.8%) |
| # CPDs per series | 2–321 | 2–60 | 2–65 |
The distribution of compounds (CPDs) over analog series obtained from ZINC, ChEMBL, and PubChem subsets is reported. In addition, the ASB scaffold statistics are provided.
Figure 1Distribution of compounds over series. Shown is the distribution of compounds over analog series with ASB scaffolds. Percentages give the proportion of series with 2 (blue), 3–5 (purple), 6–10 (green), and 11 or more (orange) compounds. In each case, the largest proportion of series were compound pairs.
Figure 2Overlap between ASB scaffolds from different sources. The Venn diagram reports the overlap of ASB scaffolds derived from ChEMBL 22, PubChem, and ZINC. The number of shared ASB scaffolds and corresponding compounds is reported. ZINC compounds were associated with the union of unique target annotations from bioactive compounds represented by the shared ASB scaffold. For ASB scaffolds shared by ChEMBL and PubChem, unique targets were determined.
Figure 3Distribution of targets over different families. The distribution of targets over different families assigned to ZINC compounds on the basis (a) ChEMBL and (b) PubChem annotations is shown. UniProt-based target family assignments were available for 549 of 610 PubChem targets.
Promiscuity of Shared ASB Scaffoldsa
| # targets | # ASB
scaffolds | # ChEMBL CPDs | # ZINC CPDs | # ASB scaffolds | # PubChem CPDs | # ZINC CPDs |
|---|---|---|---|---|---|---|
| 1 | 863 | 1154 | 1851 | 3091 | 3200 | 7832 |
| 2 | 193 | 285 | 417 | 2065 | 2473 | 5450 |
| 3 | 66 | 90 | 138 | 1420 | 1903 | 4010 |
| 4 | 52 | 94 | 139 | 1009 | 1518 | 2891 |
| 5 | 14 | 31 | 18 | 753 | 1224 | 2144 |
| 6–10 | 26 | 49 | 43 | 1823 | 3563 | 6467 |
| 11–15 | 1 | 1 | 1 | 605 | 1440 | 2248 |
| 16–20 | − | − | − | 240 | 738 | 889 |
| 21–25 | 1 | 1 | 1 | 117 | 360 | 426 |
| >25 | − | − | − | 147 | 582 | 605 |
The table reports the number of shared ASB scaffolds associated with single- and increasing multitarget activities. For each ASB scaffold, the total number of bioactive compounds (from ChEMBL and PubChem) and ZINC compounds forming the analog series is reported.
ASB scaffolds shared by ChEMBL and ZINC.
PubChem and ZINC compounds.
Figure 4Analog series and shared ASB scaffolds. Shown is exemplary analog series with shared ASB scaffolds for (a) ChEMBL (blue box), (b) PubChem (green), (c) ChEMBL and PubChem, and ZINC compounds (orange) in an R-group table format. Substituents (R1) in analogs are in red. For each series, the union of targets (assigned targets) from bioactive compounds is provided that can be potentially assigned to ZINC compounds containing the same scaffold. In (a) eight analogs from ChEMBL that share an ASB scaffold with six analogs from ZINC are shown that are active against a total of seven targets including protein-tyrosine phosphatase LC-PTP (T1), induced myeloid leukemia cell differentiation protein Mcl-1 (T2), carboxy-terminal domain RNA polymerase II polypeptide A small phosphatase 1 (T3), nuclear receptor subfamily 4 group A member 1 (T4), protein tyrosine kinase 2 β (T5), estrogen receptor β (T6), and apoptotic protease-activating factor 1 (T7). In (b), six analogs from PubChem that share an ASB scaffold with five analogs from ZINC are shown that are active against a total of three targets including TDP1 protein (T1), thioredoxin glutathione reductase (T2), and dopamine receptor D3 (T3). In (c), 5 PubChem, 4 ChEMBL, and 6 ZINC analogs containing 1 of the 581 conserved ASB scaffolds are shown. PubChem and ChEMBL analogs were active against a total of five unique targets including dopamine D1 receptor (T1), TDP1 protein (T2), adenosine 5′-triphosphate-dependent Clp protease proteolytic subunit (T3), serotonin 6 (5-HT6) receptor (T4), and urea transporter 1 (T5).