| Literature DB >> 31612915 |
Satria A Kautsar1, Kai Blin2, Simon Shaw2, Jorge C Navarro-Muñoz3, Barbara R Terlouw1, Justin J J van der Hooft1, Jeffrey A van Santen4, Vittorio Tracanna1, Hernando G Suarez Duran1, Victòria Pascal Andreu1, Nelly Selem-Mojica5, Mohammad Alanjary1, Serina L Robinson6, George Lund7, Samuel C Epstein8, Ashley C Sisto8, Louise K Charkoudian8, Jérôme Collemare3, Roger G Linington4, Tilmann Weber2, Marnix H Medema1.
Abstract
Fueled by the explosion of (meta)genomic data, genome mining of specialized metabolites has become a major technology for drug discovery and studying microbiome ecology. In these efforts, computational tools like antiSMASH have played a central role through the analysis of Biosynthetic Gene Clusters (BGCs). Thousands of candidate BGCs from microbial genomes have been identified and stored in public databases. Interpreting the function and novelty of these predicted BGCs requires comparison with a well-documented set of BGCs of known function. The MIBiG (Minimum Information about a Biosynthetic Gene Cluster) Data Standard and Repository was established in 2015 to enable curation and storage of known BGCs. Here, we present MIBiG 2.0, which encompasses major updates to the schema, the data, and the online repository itself. Over the past five years, 851 new BGCs have been added. Additionally, we performed extensive manual data curation of all entries to improve the annotation quality of our repository. We also redesigned the data schema to ensure the compliance of future annotations. Finally, we improved the user experience by adding new features such as query searches and a statistics page, and enabled direct link-outs to chemical structure databases. The repository is accessible online at https://mibig.secondarymetabolites.org/.Entities:
Year: 2020 PMID: 31612915 PMCID: PMC7145714 DOI: 10.1093/nar/gkz882
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Distribution of taxonomic kingdoms and biosynthetic classes for all BGCs present in and added to MIBiG 2.0. Statistics are taken after the restructuring effort, and include retired entries. New entries are depicted in light green. Only (hybrid) classes comprising more than one BGC entry are listed in the figure. The intersection diagram is generated using the UpSetR tool (14).
Annotation completeness of BGCs in MIBiG 2.0 before and after the restructuring effort
| Before | After | |
|---|---|---|
|
|
|
|
|
|
|
|
| • No reference publication | 148 | 11 |
| • Values unknown to the schema | 235 | 0 |
| • Others | 158 | 7 |
|
|
| |
| • Duplicate BGC | 11 | |
| • Poor sequence quality | 70 | |
| • Poor annotation quality | 24 |
Figure 2.The new per-BGC overview page. The locus overview (top-left) section allows panning, zooming, or highlighting specific genes, for which the information would be displayed in the gene details (top-right) section. In the lower section, the ‘Compounds’ tab is currently selected, showing all compound-related information of the BGC, such as chemical structure, molecular formula, or linked databases. Other data is linked to other specific tabs.