| Literature DB >> 33431011 |
Maria Sorokina1, Christoph Steinbeck2.
Abstract
Natural products (NPs) have been the centre of attention of the scientific community in the last decencies and the interest around them continues to grow incessantly. As a consequence, in the last 20 years, there was a rapid multiplication of various databases and collections as generalistic or thematic resources for NP information. In this review, we establish a complete overview of these resources, and the numbers are overwhelming: over 120 different NP databases and collections were published and re-used since 2000. 98 of them are still somehow accessible and only 50 are open access. The latter include not only databases but also big collections of NPs published as supplementary material in scientific publications and collections that were backed up in the ZINC database for commercially-available compounds. Some databases, even published relatively recently are already not accessible anymore, which leads to a dramatic loss of data on NPs. The data sources are presented in this manuscript, together with the comparison of the content of open ones. With this review, we also compiled the open-access natural compounds in one single dataset a COlleCtion of Open NatUral producTs (COCONUT), which is available on Zenodo and contains structures and sparse annotations for over 400,000 non-redundant NPs, which makes it the biggest open collection of NPs available to this date.Entities:
Keywords: Databases; Drug discovery; Natural products; Traditional medicines
Year: 2020 PMID: 33431011 PMCID: PMC7118820 DOI: 10.1186/s13321-020-00424-9
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
List of Natural Products databases cited in scientific literature since 2000. The list is ordered by alphabetical order of the database names, and contains, when available, extended metadata
| Database name | NP type | Estimated size (number of NP molecules with correct structures) | Number of unique molecules in COCONUT | Percentage of molecules with stereochemistry | Is open (data can be freely browsed) | Recquires a registration | Is maintained (2019) | Is updated |
|---|---|---|---|---|---|---|---|---|
| 3DMET | Generalistic | 18248 | x | x | Yes | No | Yes | Yes |
| AfroCancer | tm, plants, africa | 390 | 365 | 69.76% | Yes | NA | NA | No |
| AfroDB | tm, plants, africa | 954 | 874 | 70.73% | Yes | No | No | No |
| AfroMalariaDB | tm, plants, africa | 265 | 252 | 70.93% | yes | NA | NA | no |
| Afrotryp | tm, plants, drug-like, africa | 321 | x | x | Unknown | NA | NA | No |
| Alkamid database | plants, structure | 300 | x | x | yes | no | yes | no |
| Ambinter-Greenpharma natural compound library (GPNCL) | Generalistic, industrial | > 150,000 | x | x | No | Yes | Yes | Yes |
| AnalytiCon Discovery MEGx | bacteria, plants, industrial | 5147 | 4908 | 44.15% | Yes | Yes | Yes | Yes |
| AntiBase | drug-like | > 40,000 | x | x | No | No | Yes | No |
| AntiMarin | Marine, drug-like | > 60,000 | x | x | No | Unknown | No | No |
| ATBD (Animal Toxin Database) | toxins | 1000 | x | x | Unknown | Unknown | No | Unknown |
| Ayurveda | tm, asia | 950 | x | x | No | Yes | Yes | Unknown |
| Berdy’s Bioactive Natural Products Database | Generalistic | x | x | x | No | Unknown | No | No |
| BiGG | Metabolites | 7339 | x | x | Yes | No | Yes | Yes |
| Binding DB | Drug-like | x | x | x | Yes | No | Yes | No |
| BIOFAQUIM | Plant, fungi, america | 420 | 400 | 59.05% | Yes | No | Yes | Yes |
| BioPhytMol | Drug-like, plants, asia | 633 | x | x | Yes | No | Yes | Yes |
| BitterDB | Food | 654 | 631 | 14.17% | Yes | No | Yes | Yes |
| BRENDA | Metabolites | x | x | x | Yes | No | Yes | Yes |
| CamMedNP | tm, plants, africa | > 2500 | x | x | Yes, but proprietary format | No | NA | No |
| Carotenoids Database | Structure | 1174 | 991 | 57.63% | Yes | No | Yes | Yes |
| CAS registry/SciFinder | Chemicals | > 300,000 | x | x | No | Yes | Yes | Yes |
| CEMTDD - Chinese Ethnic Minority Traditional Drug Database | tm, plants, asia | 4060 | x | x | Yes | No | Yes | No |
| CHDD (Chinese Traditional Medicinal Herbs database) | tm, plants, asia | > 30,000 | x | x | Unknown | Unknown | No | No |
| ChEBI | Chemicals | 15,736 | 14,621 | 71.33% | Yes | No | Yes | Yes |
| Chem-TCM | Plants, tm, asia | > 12,000 | x | x | No | Yes | Yes | No |
| ChemBank | Chemicals | x | x | x | Yes | No | No | No |
| ChEMBL | Chemicals | 1899 | 1581 | 91.59% | Yes | No | Yes | Yes |
| ChemBridge diversity datasets | Generalistic, industrial | x | x | x | No | Yes | No | No |
| ChemDB | Plants, asia | > 1000 | x | x | Unknown | Unknown | No | No |
| ChemIDplus | Drug-like, toxins | 9042 | x | x | Yes | No | Yes | Yes |
| ChemSpider | Chemicals | 9732 | 9029 | 29.50% | Yes | No | Yes | Yes |
| CHMIS-C | Plants, tm, asia | > 8000 | x | x | Yes | No | No | No |
| CMAUP | Plants | 47,645 | 20,873 | 72.37% | Yes | No | Yes | No |
| CNPD (Chinese Natural Products Database) | Generalistic | > 57,000 | x | x | Unknown | Unknown | No | No |
| ConMedNP | Plants, tm, africa | 3118 | 2504 | 69.59% | Yes | NA | NA | NA |
| CSLS/NCI (Chemical Structure Lookup Service) | Metabolites | x | x | x | Yes | No | Yes | No |
| Database of Indonesian Medicinal Plants | Plants, tm, asia | 6776 | x | x | Yes | No | Yes | No |
| DESMSCI (Dragon Exploration System on Marin Sponge Compounds Interactions) | Marine | x | x | x | Yes | No | No | No |
| DFC (Dictionary of Food COmpounds) | Food | > 41,000 | x | x | No | Yes | Yes | Yes |
| DMNP (Dictionary of Marine Natural Products) | Marine | > 30,000 | x | x | No | Yes | Yes | Yes |
| DNP (Dictionary of Natural Products) by Chapman and Hall (also known as CHEMnetBase) | Generalistic | > 230,000 | x | x | No | Yes | Yes | Yes |
| Drugbank NPs | Drug-like | 2617 | 2617 | 51.32% | Yes | No | Yes | Yes |
| eBasis | Food | x | x | No | Yes | Yes | Yes | |
| ETCM (Encyclopedia of Traditional Chinese Medicine) | tm, asia | 7274 | x | x | Yes | No | Yes | Yes |
| ETM-DB | tm, plants, africa | 1795 | 1653 | 40.46% | Yes | No | Yes | Yes |
| FooDB | Food | 24,215 | 22,223 | 36.01% | Yes | No | Yes | Yes |
| GNPS | Dereplication | 7619 | 6708 | 31.08% | Yes | No | Yes | Yes |
| HIM (Herbal Ingridients in-vivo Metabolism database) | Drug-like, tm, plants | 1261 | 962 | 41.62% | Yes | No | No | No |
| HIT (Herbal Ingridients Targets) | Drug-like, tm, plants | 524 | 472 | 44.03% | Yes | No | No | No |
| HMDB | Dereplication | x | x | x | Yes | No | Yes | Yes |
| IMPPAT | tm, plants, asia | 9596 | x | x | Yes | No | Yes | Yes |
| InflamNat | Drug-like | 552 | 536 | 63.75% | Yes | NA | NA | NA |
| Indofine Chemical Company Inc. natural products | Generalistic, industrial | 56 | 46 | 51.06% | Yes | No | Yes | |
| InPACdb | Drug-like, plants, asia | 124 | 121 | 62.10% | Yes | No | No | No |
| InterBioScreen Ltd (IBS) | Generalistic, industrial | 68,350 | 67,292 | 42.17% | Yes | Yes | Yes | Yes |
| iSMART | tm, plants, asia | x | x | x | Yes | No | Yes | Yes |
| KEGG | Metabolites | x | x | x | Yes | No | Yes | Yes |
| KNApSaCK | Plants | 10,265 | 8887 | 74.76% | Yes | No | Yes | Yes |
| Lichen Database | Fungi | 249 | 156 | 26.67% | Yes | No | Yes | No |
| LOPAC1280 by Merck | Drug-like | 1280 | x | x | No | Yes | Yes | Yes |
| MAPS database | Plants, asia | x | x | x | Unknown | Unknown | No | No |
| Marine Compound Database (MCDB) | Marine | 182 | x | x | Yes | No | No | No |
| Marine Natural Product Database (MNPD) | Marine | 6000 | x | x | Yes | No | No | No |
| MarineLit | Marine | > 29,000 | x | x | No | Yes | Yes | Yes |
| Massbank | Dereplication | x | x | x | Yes | No | Yes | Yes |
| MedPServer | Plants, tm, asia, drug-like | 1124 | x | x | Yes | No | Yes | Yes |
| MetaCyc | Metabolites | x | x | x | Yes | No | Yes | Yes |
| METLIN | Dereplication | x | x | x | Yes | Yes | Yes | Yes |
| Mitishamba database | Plants, africa | 1102 | 1010 | 23.84% | Yes | No | Yes | No |
| NADI | tm, plants | 3000 | x | x | No | Yes | Yes | Unknown |
| NANPDB | Plants, africa | 6832 | 3913 | 75.02% | Yes | No | Yes | Yes |
| NaprAlert | Generalistic | > 15,5000 | x | x | No | Yes | Yes | Yes |
| NAPROC-13 | Dereplication | > 18,000 | x | x | Yes | No | Yes | Yes |
| NCI DTP data | Drug-like | 418 | 404 | 36.76% | Yes | No | Yes | No |
| NeMedPlant | tm, plants, asia | 100 | x | x | Yes | No | Yes | No |
| NIST | Chemicals | x | x | x | No | No | Yes | Yes |
| NMRDATA | Dereplication | x | x | x | Unknown | Yes | Yes | Yes |
| NMRShiftDB | Dereplication | 1875 | x | x | Yes | No | Yes | Yes |
| Novel Antibiotics database | Drug-like | 5430 | x | x | Yes | No | Yes | No |
| NPACT | Plants, drug-like | 1573 | 1453 | 77.53% | Yes | No | Yes | Yes |
| NPASS | Plants, bacteria, metazoa, fungi | 30,858 | 27,479 | 71.58% | Yes | No | Yes | Yes |
| NPAtlas | Bacteria, fungi | 20,035 | 18,959 | 67.03% | Yes | No | Yes | Yes |
| NPCARE | Plants, marine, bacteria, drug-like | 1370 | 1364 | 0% | Yes | No | Yes (server fails sometimes) | Yes |
| NPEdia | Generalistic | 18,016 | 16,190 | 51.83% | Yes | No | Yes | No |
| NPL (library) | Plants, drug-like | 814 | x | x | No | NA | NA | NA |
| NuBBEDB | Plants, insects, america | 2215 | 2022 | 58.34% | Yes | No | Yes | |
| Open Source Malaria | Drug-like | 842 | x | x | Yes | No | Yes | Yes |
| p-ANAPL (Pan-African Natural Product Library ) | Plants, africa | 538 | 467 | 0.86% | Yes | No | NA | |
| PAMDB | Metabolites, bacteria | x | x | x | Yes | No | Yes | Yes |
| Phenol-explorer | Food | 862 | 681 | 51.79% | Yes | No | Yes | NA |
| Phytochemica | Plants, tm, asia | 571 | x | x | Yes | No | No | No |
| PhytoHub | Food, plants | 1200 | x | x | Yes | No | Yes | Yes |
| Pi Chemicals System Natural Products | Generalistic, industrial | 405 | x | x | Yes | No | Yes | Yes |
| Prestwick | Plants, industrial | 320 | x | x | No | Yes | Yes | Yes |
| ProCarDB | Structure, bacteria | 304 | x | x | Yes | No | Yes | No |
| PubChem | Chemicals | 3529 | 2835 | 2.33% | Yes | No | Yes | Yes |
| REAXYS | Chemicals | > 220,000 | x | x | No | Yes | Yes | |
| ReSpect | Dereplication | 4767 | 711 | 0% | Yes | No | Yes | No |
| SANCDB | Plants, africa | 623 | 592 | 82.28% | Yes | No | Yes | Yes |
| Seaweed Metabolite Database (SWMD) | Marine | 1110 | 423 | 78.53% | Yes | No | Yes | No |
| Specs Natural Products | Generalistic, industrial | 745 | 745 | 53.02% | Yes | Yes | Yes | No |
| Spektraris NMR | Dereplication | 248 | 242 | 91.53% | Yes | No | Yes | No |
| StreptomeDB | Bacteria | 6415 | 3610 | 56.41% | Yes | No | Yes | No |
| Super Natural II | Generalistic | 320,670 | 235,436 | 83.55% | Yes | No | Yes | No |
| Super Scent | Other | 2100 | x | x | Yes | No | Yes | No |
| Super Sweet | Food, metabolites | 15,000 | x | x | Yes | No | Yes | No |
| TargetMol Natural Compound Library | Generalistic, industrial | 1680 | x | x | No | Yes | Yes | Yes |
| TC-MC | tm, asia, plants | > 20,000 | x | x | Yes | No | Yes | Yes |
| TCMDB@Taiwan | tm, asia, plants | 58,351 | 50,891 | 90.38% | Yes | No | Yes (server fails sometimes) | Yes |
| TCMID | tm, asia, plants | 12,549 | 10,572 | 0% | Yes | No | Yes | No |
| TCMSP | tm, asia, plants | 29,384 | x | x | Yes | No | No | No |
| TIM | tm, asia, plants | 1829 | x | x | No | Unknown | No | No |
| TIPdb | Asia, plants, drug-like | 8656 | 7752 | 77.10% | Yes | No | Yes | No |
| TMDB | Plants, metabolites | 1393 | x | x | Yes | No | No | No |
| TPPT | Plants, toxins, europe | 1583 | 1486 | 76.84% | Yes | No | Yes | No |
| TriForC | Plants | 266 | x | x | Yes | No | Yes | No |
| UEFS | Plants, america | 503 | 481 | 68.93% | Yes | No | No | No |
| UNPD (Universal Natural Products Database) | Generalistic | 213,100 | 156,984 | 12.62% | Yes | No | No | No |
| VIETHERB | Plants, asia | 10,887 | x | x | Yes | Unknown | No | No |
| Yeast Metabolome Database | Metabolites, dereplication | 16,042 | x | x | Yes | No | Yes | Yes |
| YaTCM | tm, asia, plants | 47,696 | x | x | Yes | No | Yes | Yes |
| ZINC natural products catalogue | Generalistic | 85,198 | 673,36 | 90.49% | Yes | No | Yes | Yes |
Fig. 1Network of content similarity between the 50 open natural products databases. The network is directed, and there is an arrow from database A to database B if more than 50% of molecules in database A are also present in database B. The interactive version of this network is available at https://npreview.naturalproducts.net
Fig. 2Most frequent molecules in open databases. a Common biggest substructure in the top 5 most frequent molecules, found in 34 out of 50 open databases. b Coumaric acid; c gallic acid; d scopoletin; e ellagic acid