| Literature DB >> 25790752 |
Lin Tao1, Feng Zhu2, Chu Qin3, Cheng Zhang4, Shangying Chen4, Peng Zhang4, Cunlong Zhang5, Chunyan Tan5, Chunmei Gao5, Zhe Chen6, Yuyang Jiang5, Yu Zong Chen1.
Abstract
Some natural product leads of drugs (NPLDs) have been found to congregate in the chemical space. The extent, detailed patterns, and mechanisms of this congregation phenomenon have not been fully investigated and their usefulness for NPLD discovery needs to be more extensively tested. In this work, we generated and evaluated the distribution patterns of 442 NPLDs of 749 pre-2013 approved and 263 clinical trial small molecule drugs in the chemical space represented by the molecular scaffold and fingerprint trees of 137,836 non-redundant natural products. In the molecular scaffold trees, 62.7% approved and 37.4% clinical trial NPLDs congregate in 62 drug-productive scaffolds/scaffold-branches. In the molecular fingerprint tree, 82.5% approved and 63.0% clinical trial NPLDs are clustered in 60 drug-productive clusters (DCs) partly due to their preferential binding to 45 privileged target-site classes. The distribution patterns of the NPLDs are distinguished from those of the bioactive natural products. 11.7% of the NPLDs in these DCs have remote-similarity relationship with the nearest NPLD in their own DC. The majority of the new NPLDs emerge from preexisting DCs. The usefulness of the derived knowledge for NPLD discovery was demonstrated by the recognition of the new NPLDs of 2013-2014 approved drugs.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25790752 PMCID: PMC5380136 DOI: 10.1038/srep09325
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Distribution of the natural product leads of approved and clinical trial drugs in branch 5 of the Scaffold-Hunter derived molecular scaffold trees of the 134,097 natural products and 411 natural product leads.
The drug-productive scaffolds or scaffold parent-child sub-branches (DSs) are indicated by red dots or red dots connected by red lines, which marked by the respective label DS15-DS19. The green triangles indicate the natural product leads outside the DSs. Some of the representative scaffolds in these DSs are shown in the Figure. The more complete sets of the representative scaffolds are shown in the Supplementary Figure S2.
Figure 2Distribution of the natural product leads of approved and clinical trial drugs in branch 3 of the substructure-fingerprint clustering tree of the 137,836 natural products and 442 natural product leads.
The drug-lead productive clusters are red-orange colored and marked by the respective cluster label DC4-DC10. The red, purple and blue lines on top of the clustering tree indicate the locations of the approved, approved + clinical trial, and clinical trial drug-leads with the height correlating with the number of approved + clinical trial drugs.
Top-ranked drug lead productive clusters with higher number of approved leads in the natural product chemical space represented by 137,836 natural products and 442 natural product leads. The corresponding target site class(es) and superclass(es) are also listed
| Drug Lead Cluster (Branch) | Drug Lead Molecular Scaffold Groups | No of NPs | No of Leads, Drugs (Approved/Trial) | Target Site Super Class | Target Site Class |
|---|---|---|---|---|---|
| DC19 (14) | Steroids & derivatives | 546 | 32/5, 85/8 | steroid sites | Nuclear receptor ligand sites |
| DC5 (3) | Pyrimidine nucleosides, Aminoglycosides | 225 | 19/0, 32/0 | nucleoside phosphate sites, aminoacyl-tRNA sites | DNA metab enzymes nucleoside phosphate, ribosome 30 s aminoacyl-tRNA sites |
| DC13 (4) | Purine nucleosides, Imidazole analogs & their oligopeptide hybrids | 264 | 17/2, 39/22 | Nucleoside sites, nucleoside phosphate sites | nucleoside receptor ligand & metabolism enzyme substrate sites, Nucleoside phosphate receptor ligand, DNA metab enzymes nucleoside phosphate sites |
| DC7 (3) | Amino acids with acyclic hydroxyl side chain & derivatives | 217 | 15/0, 23/0 | amino acid sites | amino acid receptor ligand & metab enzyme substrate sites |
| DC17 (14) | Macrolides, Polyenes, Spinosyns, Acarviosins | 117 | 15/4, 29/14 | amino acid phosphate sites, oligopeptide sites, lipopolysaccharide sites | phosphatase substrate, ribosome 23S peptidyl transferase, outer membrane lipopolysaccharide sites |
| DC21 (14) | Fatty acids & derivatives, Prostanoids | 423 | 8/1, 26/1 | fatty acid, cannabinoid, eicosanoid, retinoid, coenzyme A sites | retinoid receptor ligand, CoA metab enzyme substrate sites |
| DC28 (17) | Cardiac glycosides | 176 | 8/1, 12/1 | nucleoside phosphate sites | nucleoside phosphate metab enzyme substrate sites |
| DC38 (21) | Intermediate-sized linear and cyclic peptides | 33 | 7/1, 32/2 | Oligopeptide sites, lipopolysaccharide sites | exopeptidase substrate, Neuropeptide receptor ligand, outer membrane lipopolysaccharide sites |
| DC8 (3) | Beta-lactams | 92 | 6/2, 90/6 | peptidoglycan sites | ß-lactam binding protein peptidoglycan sites |
| DC29 (20) | Saponins, Triterpenoid glycosides, Macrocyclic lactones | 1696 | 6/1, 7/2 | nucleoside phosphate sites | steroid metab enzyme nucleoside phosphate, calcium channel DHP, chloride channel CBS sites |
| DC45 (30) | Tetracyclines, Capsaicinoids, Disulfide bromotyrosine derivatives | 447 | 6/3, 13/7 | aminoacyl-tRNA sites | ribosome 30 s aminoacyl-tRNA sites |
| DC49 (32) | Opiate alkaloids, Phenanthrene alkaloids | 457 | 6/1, 18/1 | amine sites, opiate sites | amine receptor ligand, opiate receptor ligand sites |
| DC10 (3) | Glycosaminoglycans, glucosamines, Lincosamides & derivatives | 237 | 5/0, 17/0 | oligopeptide sites | serine endopeptidase substrate sites |
| DC12 (4) | Purine base analogs, modified purine base analogs | 180 | 5/0, 9/0 | nucleoside sites, nucleoside phosphate sites | nucleoside receptor ligand, DNA metab enzymes nucleoside phosphate sites |
| DC14 (7) | Larger indole alkaloids | 4104 | 5/1, 15/16 | amine sites, oligopeptide sites | amine receptor ligand & transporter substrate, exopeptidase substrate sites |
| DC53 (10) | Cannabinoids, Diarylheptanoids, Dihydrostilbenoids, Small phenolic molecules with a long tail | 2531 | 5/4, 9/8 | fatty acid, cannabinoid, eicosanoid, retinoid sites | fatty acid metab enzyme substrate, retinoid receptor ligand, cannabinoid receptor ligand sites |
| DC24 (16) | Oligo-, Poly-, Cyclic- saccharides | 259 | 5/0, 5/0 | cyclic oligosaccharide drug delivery systems | cyclodextrin drug delivery systems |
| DC36 (20) | Large cyclic peptides | 265 | 5/6, 6/7 | nucleoside phosphate sites, sites within peptidoglycans, saccharide sites, lipopolysaccharide sites | calcium channel DHP, cell wall peptidoglycan, polysaccharide metab enzyme substrates, outer membrane lipopolysaccharide sites |
| DC40 (24) | Porphyrins, Prodiginines, Ergoline-, Ellipticine-, Epibatidine- alkaloids | 519 | 5/3, 8/4 | amine sites, nucleobase sites | amine receptors ligand, DNA intercalation sites |
| DC42 (26) | Indole-containing amino acid tryptophan analogs, Monoterpenoid indole alkaloids, Yohimbine alkaloids | 512 | 5/2, 6/4 | amine sites | amine receptor ligand sites |
| DC43 (28) | Tropane alkaloids | 29 | 5/0, 10/0 | amine sites | amine receptor ligand & transporter substrate sites |
| DC44 (28) | Catecholamines, Small alkaloids with an amine group | 358 | 5/0, 21/0 | amine, opiate sites | amine receptor, opiate receptor ligand sites |
The statistical significance of the clustering of the NPLDs in every DC. MTd is the mean Tanimoto distance of the NPLDs in each DC, MTd.rnd is the mean Tanimoto distance in randomization, NRI is a standardized effect size measure of the community structure, and P-value is the number of randomly selected NPs that are more clustered than the NPLDs in each DC divided by the number of runs (60,000 in this study). P values in bold are the ones which remain significant after Bonferroni correction with conservative α′ = 0.05/60 = 0.000833
| DC | Branch | No of NPLD | MTD | MTD.rnd | NRI | P-value |
|---|---|---|---|---|---|---|
| DC1 | 1 | 2 | 0.2642 | 1.8368 | 6.2835 | 0.00238 |
| DC2 | 1 | 2 | 0.1224 | 1.8373 | 6.9228 | 0.00117 |
| DC3 | 1 | 2 | 0.6296 | 1.8331 | 4.7303 | 0.00847 |
| DC4 | 3 | 4 | 1.0322 | 1.6705 | 6.1005 | |
| DC5 | 3 | 19 | 1.2111 | 1.6704 | 16.3408 | |
| DC6 | 3 | 5 | 1.2328 | 1.6704 | 5.2077 | 0.00142 |
| DC7 | 3 | 15 | 1.4909 | 1.6706 | 5.2466 | |
| DC8 | 3 | 6 | 1.0778 | 1.6706 | 8.2775 | |
| DC9 | 3 | 7 | 1.1073 | 1.6711 | 9.1502 | |
| DC10 | 3 | 5 | 0.8436 | 1.6704 | 9.8633 | |
| DC11 | 4 | 4 | 0.8826 | 1.5121 | 4.7221 | 0.00277 |
| DC12 | 4 | 5 | 0.6238 | 1.5122 | 8.4309 | |
| DC13 | 4 | 18 | 1.0751 | 1.5107 | 12.7659 | |
| DC14 | 7 | 5 | 0.8233 | 1.2803 | 4.4320 | 0.00177 |
| DC15 | 8 | 3 | 0.8090 | 1.5069 | 4.7523 | 0.00175 |
| DC16 | 9 | 3 | 0.4985 | 1.4882 | 5.2888 | 0.00145 |
| DC17 | 9 | 16 | 0.8576 | 1.4892 | 10.2529 | |
| DC18 | 9 | 2 | 1.0548 | 1.4913 | 1.5675 | 0.06920 |
| DC19 | 9 | 37 | 1.0487 | 1.4896 | 11.3267 | |
| DC20 | 10 | 2 | 0.1740 | 1.4029 | 5.8662 | 0.00135 |
| DC21 | 10 | 9 | 0.9145 | 1.4017 | 6.8067 | |
| DC22 | 10 | 3 | 0.7339 | 1.4007 | 4.5322 | |
| DC23 | 10 | 2 | 1.2542 | 1.3999 | 0.6864 | 0.15480 |
| DC24 | 11 | 5 | 0.8121 | 1.3910 | 5.5223 | 0.00100 |
| DC25 | 12 | 2 | 0.2934 | 1.2207 | 3.7910 | 0.00947 |
| DC26 | 12 | 2 | 0 | 1.2202 | 4.9798 | |
| DC27 | 12 | 8 | 0.9926 | 1.2208 | 2.9058 | 0.01428 |
| DC28 | 12 | 8 | 0.2007 | 1.2200 | 12.9182 | |
| DC29 | 12 | 7 | 0.7426 | 1.2203 | 5.5655 | |
| DC30 | 13 | 2 | 0.8474 | 1.3667 | 1.9145 | 0.06510 |
| DC31 | 14 | 5 | 1.0297 | 1.5449 | 4.2627 | 0.00100 |
| DC32 | 14 | 5 | 0.9406 | 1.5439 | 4.9565 | |
| DC33 | 14 | 4 | 1.1776 | 1.5454 | 2.5820 | 0.02413 |
| DC34 | 14 | 3 | 0.8209 | 1.5460 | 4.0057 | 0.00120 |
| DC35 | 14 | 6 | 0.6607 | 1.5452 | 8.3548 | |
| DC36 | 15 | 10 | 0.6860 | 1.3351 | 7.9798 | |
| DC37 | 16 | 3 | 0.6935 | 1.1435 | 3.4132 | 0.01103 |
| DC38 | 16 | 7 | 0.5400 | 1.1431 | 10.9642 | |
| DC39 | 17 | 3 | 0.4787 | 1.1665 | 5.0540 | 0.00093 |
| DC40 | 17 | 7 | 0.8464 | 1.1671 | 4.7313 | 0.00235 |
| DC41 | 17 | 2 | 0.6630 | 1.1649 | 2.3529 | 0.03623 |
| DC42 | 17 | 7 | 0.7352 | 1.1663 | 6.3612 | |
| DC43 | 20 | 5 | 0.4329 | 1.3241 | 17.2567 | |
| DC44 | 20 | 5 | 0.6912 | 1.3241 | 12.1159 | |
| DC45 | 20 | 7 | 0.9507 | 1.3238 | 10.0919 | |
| DC46 | 21 | 3 | 0.1372 | 0.7713 | 5.0125 | |
| DC47 | 21 | 4 | 0.5792 | 0.7708 | 1.8932 | 0.02985 |
| DC48 | 24 | 4 | 0.7416 | 1.0739 | 5.7931 | 0.00110 |
| DC49 | 24 | 6 | 0.5427 | 1.0740 | 14.5890 | |
| DC50 | 26 | 5 | 0.4218 | 0.7916 | 6.3884 | |
| DC51 | 26 | 2 | 0.6052 | 0.7909 | 1.1859 | 0.07355 |
| DC52 | 28 | 3 | 0.9049 | 1.1883 | 2.4123 | 0.03025 |
| DC53 | 28 | 9 | 0.9577 | 1.1880 | 5.9241 | |
| DC54 | 30 | 2 | 0.7262 | 0.8764 | 0.7311 | 0.18568 |
| DC55 | 32 | 3 | 0.3393 | 0.8538 | 4.4925 | 0.00152 |
| DC56 | 32 | 2 | 0.6884 | 0.8548 | 0.9532 | 0.12325 |
| DC57 | 32 | 2 | 0.2012 | 0.8550 | 3.7418 | 0.00427 |
| DC58 | 32 | 2 | 0.4916 | 0.8533 | 2.0560 | 0.05248 |
| DC59 | 32 | 3 | 0.2995 | 0.8535 | 4.8258 | |
| DC60 | 33 | 2 | 0 | 0.9059 | 5.0937 |
Figure 3Distribution of the approved NP-related drugs, grouped into 45 target-site classes (TCs) of 20 target-site super-classes (TSs), in the drug-productive clusters DC1 to DC60.
TSs are colored as: TC1, TC2 of TS1 amine sites (LightCoral), TC3, TC4 of TS2 nucleobase sites (OliverGreen), TC5, TC6 of TS3 nucleoside sites (PalePurple), TC7-TC16 of TS4 nucleoside phosphate sites (Red), TC17 of TS5 cyclic nucleotide sites (Cyan), TC18 of TS6 aminoacyl-tRNA sites (Chocolate), TC19 of TS7 amino acid phosphate sites (Magenta), TC20, TC21 of TS8 amino acid sites (Yellow), TC22-TC28 of TS9 oligopeptide sites (Green), TC29 of TS10 peptidoglycan sites (PaleYellow), TC30 of TS11 peptidoglycan sites (Blue), TC31-TC33 of TS12 saccharide sites (OrangeRed), TC34 of TS13 cyclic oligosaccharide drug delivery systems (PaleBrown), TC35 of TS14 lipopolysaccharide sites (DarkCyan), TC36-TC39 of TS15 fatty acid, cannabinoid, eicosanoid, retinoid sites (PaleBlue), TC40 of TS16 coenzyme A & analog sites (PaleGreen), TC41, TC42 of TS17 microtubule sites (DeepPink), TC43 of TS18 opiate sites (Purple), TC44 of TS19 steroid sites (Brown), and TC45 of TS20 naphthoquinone sites (Orange).
chronological data of the natural product leads with the first approved drugs and the drug lead productive clusters during every five-year period from 1963 to 2012. The six drug lead clusters with only one approved drug plus one or more clinical trial drugs were not included here
| Number of natural product leads with the first approved drug in period | Number of drug lead clusters in period | |||
|---|---|---|---|---|
| Period | Inside preexisting DCs | Outside preexisting DCs | Number of preexisting DCs | Number of new DCs |
| Pre–1963 | 56 | NA | 8 | NA |
| 1963–1967 | 7 | 20 | 8 | 7 |
| 1968–1972 | 4 | 9 | 15 | 3 |
| 1973–1977 | 9 | 8 | 18 | 2 |
| 1978–1982 | 17 | 56 | 20 | 13 |
| 1983–1987 | 20 | 20 | 33 | 5 |
| 1988–1992 | 20 | 12 | 38 | 6 |
| 1993–1997 | 20 | 9 | 44 | 2 |
| 1998–2002 | 19 | 10 | 46 | 3 |
| 2003–2007 | 9 | 6 | 49 | 1 |
| 2008–2012 | 11 | 6 | 50 | 4 |
The list of natural product derived drugs approved by FDA from 2013 to 2014 June. The natural product leads, targets and affiliations to drug productive clusters and target-site classes are provided
| Drug | Drug Type | NP Lead | NP Lead Type | Affiliation to drug lead productive cluster | target | Affiliation to target-site class |
|---|---|---|---|---|---|---|
| Canagliflozin | Small Molecule | Phlorizin | N | Near DC57 (tanimoto similarity coefficient 0.91 to the nearest NPLD) | SGLT2 | Monosacharide transporter substrate sites as a new TC in TS12 (saccharide binding sites) |
| Luliconazole | Small Molecule | Imidazole-based NP such as mizoribine | N | DC13 | Lanosterol demethylase | Steroid metabolism enzyme substrate sites in TS19 (steroid binding sites) |
| Sofosbuvir | Small Molecule | Uridine monophosphate | N | DC5 | HCV NS5B polymerase | TC7 |
| Vorapaxar | Small molecule | Himbacine | N | - | Protease-activated receptor-1 | TC22 |
| Simeprevir | Oligopeptide | HCV NS3/4A protease product oligopeptide | B (oligopeptide) | DC9 | HCV NS3/4A protease | TC28 |
| Mipomerse | Antisense | Section of mRNA of apolipoprotein B-100 | B (oligonucleotide) | - | Apolipoprotein B-100 | Lipid-binding sites in TS15 |
| Dalvance | Semisynthetic lipoglycopeptide | Lipoglycopeptide | B (lipoglycopeptide) | - | Cell wall | TC30 |
| Tanzeum | Peptide | Glucagon-like peptide-1 | B (peptide) | - | Glucagon-like peptide-1 receptor | TC22 |