Literature DB >> 33978755

antiSMASH 6.0: improving cluster detection and comparison capabilities.

Kai Blin1, Simon Shaw1, Alexander M Kloosterman2, Zach Charlop-Powers3, Gilles P van Wezel2,4, Marnix H Medema2,5, Tilmann Weber1.   

Abstract

Many microorganisms produce natural products that form the basis of antimicrobials, antivirals, and other drugs. Genome mining is routinely used to complement screening-based workflows to discover novel natural products. Since 2011, the "antibiotics and secondary metabolite analysis shell-antiSMASH" (https://antismash.secondarymetabolites.org/) has supported researchers in their microbial genome mining tasks, both as a free-to-use web server and as a standalone tool under an OSI-approved open-source license. It is currently the most widely used tool for detecting and characterising biosynthetic gene clusters (BGCs) in bacteria and fungi. Here, we present the updated version 6 of antiSMASH. antiSMASH 6 increases the number of supported cluster types from 58 to 71, displays the modular structure of multi-modular BGCs, adds a new BGC comparison algorithm, allows for the integration of results from other prediction tools, and more effectively detects tailoring enzymes in RiPP clusters.
© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Year:  2021        PMID: 33978755      PMCID: PMC8262755          DOI: 10.1093/nar/gkab335

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Natural compounds produced by microorganisms form the basis of many drugs (1). Traditionally, new compounds were discovered by extracting, chemically isolating, purifying, and then testing from natural sources. This approach can now be complemented by sequencing and subsequent mining of genome and metagenome data to identify natural product biosynthetic pathways (2). Software tools to assist researchers in their natural product genome mining tasks have existed for over a decade. Recently published or updated examples include BAGEL (3), PRISM (4), RiPPER (5) and TOUCAN (6). For more in-depth reviews on the topic, see (7–10). Since its initial release in 2011, antiSMASH (11–15) has become the most widely used tool for mining microbial genomes for secondary/specialised metabolite (SM) biosynthetic gene clusters (BGCs) and is regarded as the gold standard. An ecosystem of independent tools incorporating or utilising antiSMASH results has developed over the years, such as the antibiotics resistance target seeker ARTS (16), the mass-spectronomy-guided peptide mining tool Pep2Path (17), the sgRNA design tool CRISPY-web (18), the BGC classification and clustering platform BiG-SCAPE (19), or its big data BGC clustering cousin BiG-SLiCE (20). antiSMASH is also used to annotate BGCs in many genomic as well as BGC-oriented databases, such as the Joint Genome Institute's Integrated Microbial Genomes database with its Atlas of Biosynthetic gene Clusters IMG-ABC (21), the MicroScope platform for microbial genome annotation and analysis (22), the MIBiG database of manually curated BGCs (23), the BGC family database BiG-FAM (24), and, of course, the antiSMASH database (25). antiSMASH uses a rule-based approach to identify many different types of biosynthetic pathways involved in SM production. More in-depth analyses are performed for BGCs encoding non-ribosomal peptide synthetases (NRPSs), type I and type II polyketide synthases (PKSs), lanthipeptides, lasso peptides, sactipeptides, and thiopeptides, for which cluster-specific analyses can provide more information about the biosynthetic steps performed and thus also provide more detailed predictions on the compound(s) produced (Figure 1).
Figure 1.

Schematic workflow of the antiSMASH secondary/specialized metabolite genome mining platform.

Schematic workflow of the antiSMASH secondary/specialized metabolite genome mining platform. Here we present version 6 of antiSMASH. It extends and improves upon previous versions by adding and improving BGC detection rules, making the modular structure of multimodular enzymes more accessible, introducing an additional, more robust, cluster comparison tool, and improving the interoperability with other gene cluster annotation tools.

NEW FEATURES AND UPDATES

New cluster types

antiSMASH uses manually curated and validated “rules" that define which core biosynthetic functions need to exist in a genomic region in order to constitute a BGC. To identify these biosynthetic functions, antiSMASH uses profile hidden Markov models (pHMMs) from PFAM (26), TIGRFAMs (27), SMART (28), BAGEL (3), Yadav et al. (29), or custom models. antiSMASH 5 contained rules for 58 different BGC types (15), version 6 increases this number to 71. Especially in focus for this release were the rules for ribosomally synthesized and post-translationally modified peptides (RiPPs). The lanthipeptide rule was split up into individual rules for classes I through IV. Several RiPP families that were previously jointly designated as “bacteriocins" are now identified by specific rules. The term “bacteriocins" was therefore deprecated and replaced by “RiPP-like", which is defined by profiles that are frequently associated with RiPPs but are insufficient to detect RiPP clusters by themselves. The old “head_to_tail" rule was folded into the “sactipeptide" rule because they covered the same class of RiPPs. New rules were added for class V lanthipeptides, lanthidines, thioamitides, ranthipeptides, PQQ- and mycofactocin-like redox cofactors, epipeptides, cyclic lactone autoinducers, and spliceotides. Outside of RiPPs, antiSMASH added support for BGCs producing thioamide-containing non-ribosomal peptides, tropodithietic acid, Serratia-type prodigiosins, non-alpha poly-amino acids and pyrrolidines.

Module detection

Nonribosomal peptide synthetases, non-iterative type I polyketide synthases and trans-AT polyketide synthases are large, multimodular enzymes (for a review, see (30)). While antiSMASH has always detected the individual enzymatic domains in these megaenzymes, it now also detects and displays the modules explicitly (Figure 2A). Module detection allows for better prediction of modifications made to the respective monomers during biosynthesis. Hence, displaying the modules makes it easier for researchers to interpret the likely biosynthetic mechanisms of an enzymatic assembly line encoded by a BGC. In order to examine the protein domains making up each module, hovering the mouse cursor over a module will send the module lid to the background and will reveal the domains. For a more complete view of all protein domains, module monomer display can be turned off using the “show module domains" toggle button above the graph to hide all module lids (Figure 2B).
Figure 2.

The NRPS/PKS domain view of the kirromycin biosynthetic gene cluster (NCBI ID: AM746336.1), consisting of trans-AT PKS, modular type I PKS and NRPS modules. A View with module lids, displaying the monomer predicted to be integrated into the final product. The jagged module edges on KirAI/AII/AIII/AIV show modules that are split across different protein-coding sequences, with the small lettering next to the edges indicating how the modules link up. B View with the module lids hidden, revealing the underlying protein domains.

The NRPS/PKS domain view of the kirromycin biosynthetic gene cluster (NCBI ID: AM746336.1), consisting of trans-AT PKS, modular type I PKS and NRPS modules. A View with module lids, displaying the monomer predicted to be integrated into the final product. The jagged module edges on KirAI/AII/AIII/AIV show modules that are split across different protein-coding sequences, with the small lettering next to the edges indicating how the modules link up. B View with the module lids hidden, revealing the underlying protein domains.

ClusterCompare

Since the release of antiSMASH 1, antiSMASH has provided a comparison of the identified region to similar clusters in other genomes via ClusterBlast (see (11) for a description of the algorithm). As the ClusterBlast algorithm is based on protein sequence comparisons by local alignments (initially using BLAST (31), now using DIAMOND (32)), it does not perform optimally on multimodular enzymes like NRPSes and PKSes. Of particular note, BGCs with very different module counts can score similarly if a single module has a good match. To address this issue, we have added a new ClusterCompare algorithm (Shaw et al., manuscript in preparation) to antiSMASH 6. Like ClusterBlast, ClusterCompare builds on a local protein-alignment-based sequence comparison, but only uses that as one part of the comparison score. Additional parts of the score are the gene synteny and the presence/absence of biosynthetic components of the query and reference gene clusters, based on antiSMASH annotations of each. Biosynthetic components are the collection of gene products matching one of antiSMASH’s BGC detection profiles, gene products with a functional annotation due to either their presence in an antiSMASH detection rule or based on the classification of their secondary metabolite clusters of orthologous groups (smCOG) class, and, if applicable, NRPS and PKS domains. All score parts are scaled to a 0 to 1 range, and the final score for a single comparison is calculated as the geometric mean of the parts. The top hits in the reference database are displayed in a table, along with the scoring information. Selecting a table row will also display a pairwise comparison of the query and reference clusters (Figure 3).
Figure 3.

ClusterCompare output for the Streptomyces rochei large linear plasmid pSLA2-L (NCBI ID: NC_004808.2), which is densely packed with secondary metabolite biosynthetic genes (see (39)) with the MIBiG dataset in protocluster-to-region mode. Lines connect pairs of protein-coding genes with the highest similarity to make conserved functions more visible even at different scaling of query and reference. The comparisons show the similarity of (A) the left part of the region to the trans-AT PKS, NRPS hybrid cluster responsible for lankacidin C production (MIBiG ID: BGC0001100.1), as well as (B) the middle part of the region to the modular PKS type I cluster responsible for lankamycin biosynthesis (MIBiG ID: BGC0000085.1). The example illustrates how ClusterCompare can be used to distinguish between hybrid gene clusters and adjacent gene clusters that are part of the same region, based on comparison with individual reference BGCs.

ClusterCompare output for the Streptomyces rochei large linear plasmid pSLA2-L (NCBI ID: NC_004808.2), which is densely packed with secondary metabolite biosynthetic genes (see (39)) with the MIBiG dataset in protocluster-to-region mode. Lines connect pairs of protein-coding genes with the highest similarity to make conserved functions more visible even at different scaling of query and reference. The comparisons show the similarity of (A) the left part of the region to the trans-AT PKS, NRPS hybrid cluster responsible for lankacidin C production (MIBiG ID: BGC0001100.1), as well as (B) the middle part of the region to the modular PKS type I cluster responsible for lankamycin biosynthesis (MIBiG ID: BGC0000085.1). The example illustrates how ClusterCompare can be used to distinguish between hybrid gene clusters and adjacent gene clusters that are part of the same region, based on comparison with individual reference BGCs.

Sideloading

antiSMASH exists in an ecosystem of BGC prediction and analysis tools. While tools like ARTS (16) and BiG-SCAPE (19) sit downstream of antiSMASH and consume antiSMASH results, other tools like ClusterFinder (33) and DeepBGC (34) offer alternative, machine-learning based methods to predict gene clusters and thus function more in parallel to antiSMASH’s rule based cluster detection. Depending on the research question and the desired sensitivity vs. specificity trade-off, there can be a value in preferring one cluster detection over the other (see (8,35) for a discussion). It is not feasible from a development resource perspective to include and maintain all different cluster detection approaches within the antiSMASH pipeline. Hence, to use analysis modules such as ClusterBlast, ClusterCompare, their respective MIBiG-based variants, or PFAM/TIGRFAMs analysis on the identified BGCs regardless of their mode of detection, we have created a JSON-based file format that can be used by external tools to annotate additional regions and clusters in the input and allow antiSMASH to analyse those areas exactly as per the natively predicted regions and clusters. An explanation of the external annotation file format and the currently implemented sideloading functionality is available at https://docs.antismash.secondarymetabolites.org/sideloading/.

Improved annotation for RiPP clusters

For lanthipeptide, lasso peptide, sactipeptide, and thiopeptide BGCs, antiSMASH 5 already provided more detailed product predictions by detecting the cluster's prepeptide and commonly occurring tailoring enzymes. Because some of the tailoring enzymes can match relatively generic functional profiles, it was not always trivial to determine whether a given enzyme was indeed interacting with the RiPP precursor peptide as a tailoring enzyme or whether it was just an unrelated enzyme that happened to be encoded in the vicinity. Often, RiPP tailoring enzymes will harbour RiPP recognition element (RRE) domains that can identify and bind the RiPP precursor peptide. RRE-Finder (36), which will annotate these RRE domains, has been integrated into antiSMASH 6, thus helping to more confidently identify tailoring enzymes in antiSMASH-detected RiPP clusters. Additionally, an “RRE-containing" detection rule using the RRE-Finder pHMMs was added to the “relaxed" strictness ruleset, allowing antiSMASH to identify potentially novel RiPP clusters for which antiSMASH does not have a specific detection rule set up, as long as that gene cluster contains an RRE.

Other optimizations

The sequence data used in antiSMASH’s ClusterBlast are based on the records contained in the antiSMASH database. As the antiSMASH database was recently updated to version 3 (25), the ClusterBlast dataset was also refreshed to include 147 517 high quality BGC regions from 388 archaeal, 25 236 bacterial and 177 fungal genomes. The antiSMASH database version 3 is the first version to also contain both archaeal and fungal sequences along with bacterial sequences, so ClusterBlast will now also give more relevant hits for users running antiSMASH on inputs originating from those taxa. In addition to the region PFAM analysis added in version 5, antiSMASH 6 can now also scan regions using profiles from the TIGRFAMS database (27).

CONCLUSIONS AND FUTURE PERSPECTIVES

Genome mining with tools like antiSMASH has become an established part of many natural product discovery workflows. With the updates and additions to the feature set, antiSMASH is positioning itself to remain the go-to tool for microbial genome mining for natural products. By improving the interoperability with other tools, the open-source software antiSMASH integrates even better with the thriving ecosystem of computational tools in the natural products field. Future updates will further improve the predictions of the chemical structures of the compounds produced by the detected BGCs. This will help to connect gene clusters to molecules identified via metabolomics or other analytical chemistry approaches (37), and to link up with databases such as GNPS (38).

DATA AVAILABILITY

The bacteria and fungal versions of antiSMASH 6 can be freely accessed at https://antismash.secondarymetabolites.org and https://fungismash.secondarymetabolites.org, respectively. The antiSMASH documentation is available at https://docs.antismash.secondarymetabolites.org/. The antiSMASH source code, licensed under the GNU Affero General Public License (AGPL) v3.0, is available at https://github.com/antismash/antismash. antiSMASH is also available via Docker.
  39 in total

1.  The large linear plasmid pSLA2-L of Streptomyces rochei has an unusually condensed gene organization for secondary metabolism.

Authors:  Susumu Mochizuki; Keiichiro Hiratsu; Masanori Suwa; Tatsuya Ishii; Fuminori Sugino; Kohei Yamada; Haruyasu Kinashi
Journal:  Mol Microbiol       Date:  2003-06       Impact factor: 3.501

2.  Natural Products as Sources of New Drugs over the Nearly Four Decades from 01/1981 to 09/2019.

Authors:  David J Newman; Gordon M Cragg
Journal:  J Nat Prod       Date:  2020-03-12       Impact factor: 4.050

3.  Computational approaches to natural product discovery.

Authors:  Marnix H Medema; Michael A Fischbach
Journal:  Nat Chem Biol       Date:  2015-09       Impact factor: 15.040

4.  IMG-ABC v.5.0: an update to the IMG/Atlas of Biosynthetic Gene Clusters Knowledgebase.

Authors:  Krishnaveni Palaniappan; I-Min A Chen; Ken Chu; Anna Ratner; Rekha Seshadri; Nikos C Kyrpides; Natalia N Ivanova; Nigel J Mouncey
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

5.  antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences.

Authors:  Marnix H Medema; Kai Blin; Peter Cimermancic; Victor de Jager; Piotr Zakrzewski; Michael A Fischbach; Tilmann Weber; Eriko Takano; Rainer Breitling
Journal:  Nucleic Acids Res       Date:  2011-06-14       Impact factor: 16.971

6.  MIBiG 2.0: a repository for biosynthetic gene clusters of known function.

Authors:  Satria A Kautsar; Kai Blin; Simon Shaw; Jorge C Navarro-Muñoz; Barbara R Terlouw; Justin J J van der Hooft; Jeffrey A van Santen; Vittorio Tracanna; Hernando G Suarez Duran; Victòria Pascal Andreu; Nelly Selem-Mojica; Mohammad Alanjary; Serina L Robinson; George Lund; Samuel C Epstein; Ashley C Sisto; Louise K Charkoudian; Jérôme Collemare; Roger G Linington; Tilmann Weber; Marnix H Medema
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

7.  ARTS 2.0: feature updates and expansion of the Antibiotic Resistant Target Seeker for comparative genome mining.

Authors:  Mehmet Direnç Mungan; Mohammad Alanjary; Kai Blin; Tilmann Weber; Marnix H Medema; Nadine Ziemert
Journal:  Nucleic Acids Res       Date:  2020-07-02       Impact factor: 16.971

8.  A computational framework to explore large-scale biosynthetic diversity.

Authors:  Jorge C Navarro-Muñoz; Nelly Selem-Mojica; Michael W Mullowney; Satria A Kautsar; James H Tryon; Elizabeth I Parkinson; Emmanuel L C De Los Santos; Marley Yeong; Pablo Cruz-Morales; Sahar Abubucker; Arne Roeters; Wouter Lokhorst; Antonio Fernandez-Guerra; Luciana Teresa Dias Cappelini; Anthony W Goering; Regan J Thomson; William W Metcalf; Neil L Kelleher; Francisco Barona-Gomez; Marnix H Medema
Journal:  Nat Chem Biol       Date:  2019-11-25       Impact factor: 15.040

9.  Towards prediction of metabolic products of polyketide synthases: an in silico analysis.

Authors:  Gitanjali Yadav; Rajesh S Gokhale; Debasisa Mohanty
Journal:  PLoS Comput Biol       Date:  2009-04-10       Impact factor: 4.475

View more
  219 in total

1.  A genomic overview including polyphasic taxonomy of Thalassoroseus pseudoceratinae gen. nov., sp. nov. isolated from a marine sponge, Pseudoceratina sp.

Authors:  Gaurav Kumar; Uppada Jagadeeshwari; Pannikurungottu Sreya; Ahmed Shabbir; Chintalapati Sasikala; Chintalapati Venkata Ramana
Journal:  Antonie Van Leeuwenhoek       Date:  2022-05-19       Impact factor: 2.271

2.  Streptomyces spinosus sp. nov. and Streptomyces shenzhenensis subsp. oryzicola subsp. nov. endophytic actinobacteria isolated from Jasmine rice and their genome mining for potential as antibiotic producers and plant growth promoters.

Authors:  Onuma Kaewkla; Sudarat Sukpanoa; Chanwit Suriyachadkun; Nitcha Chamroensaksi; Theeraphan Chumroenphat; Christopher Milton Mathew Franco
Journal:  Antonie Van Leeuwenhoek       Date:  2022-05-21       Impact factor: 2.271

Review 3.  Bacillus spp. as Bio-factories for Antifungal Secondary Metabolites: Innovation Beyond Whole Organism Formulations.

Authors:  Bruno Salazar; Aurelio Ortiz; Chetan Keswani; Tatiana Minkina; Saglara Mandzhieva; Satyendra Pratap Singh; Bhagwan Rekadwad; Rainer Borriss; Akansha Jain; Harikesh B Singh; Estibaliz Sansinenea
Journal:  Microb Ecol       Date:  2022-05-23       Impact factor: 4.552

4.  Rhizobium setariae sp. nov., an Indole-3-Acetic Acid-Producing Bacterium Isolated from Green Foxtail, Setaria viridis.

Authors:  Minchung Kang; Taegun Seo
Journal:  Curr Microbiol       Date:  2022-04-18       Impact factor: 2.188

5.  Advenella mandrilli sp. nov., a bacterium isolated from the faeces of Mandrillus sphinx.

Authors:  Qiong Wang; Xiu-Lin Han; Zhi-Qin Fang; Chen-Lu Zhang; Chun Li; Tao Lu
Journal:  Antonie Van Leeuwenhoek       Date:  2022-01-15       Impact factor: 2.271

6.  Analysis of the complete genome sequence of a rhizosphere-derived Pseudomonas sp. HN3-2 leads to the characterization of a cyclic lipopeptide-type antibiotic bananamide C.

Authors:  Zongwang Ma
Journal:  3 Biotech       Date:  2022-01-04       Impact factor: 2.406

7.  Schumannella soli sp. nov., a novel actinomycete isolated from mangrove soil by in situ cultivation.

Authors:  Feina Li; Qinpei Lu; Shuilin Liao; Li Tuo; Shaowei Liu; Qin Yang; Adong Shen; Chenghang Sun
Journal:  Antonie Van Leeuwenhoek       Date:  2021-08-02       Impact factor: 2.271

8.  Metabacillus elymi sp. nov., isolated from the Rhizosphere of Elymus tsukushiensis, a plant native to the Dokdo Islands, Republic of Korea.

Authors:  Soo-Yeong Lee; Jin-Soo Son; Ye-Ji Hwang; Jae-Ho Shin; Sa-Youl Ghim
Journal:  Antonie Van Leeuwenhoek       Date:  2021-08-06       Impact factor: 2.271

9.  Untargeted Identification of Alkyne-Containing Natural Products Using Ruthenium-Catalyzed Azide Alkyne Cycloaddition Reactions Coupled to LC-MS/MS.

Authors:  Daniel Back; Brenda T Shaffer; Joyce E Loper; Benjamin Philmus
Journal:  J Nat Prod       Date:  2022-01-19       Impact factor: 4.050

10.  Pseudohalioglobus sediminis sp. nov., isolated from coastal sediment.

Authors:  Ya-Ning Zhang; Tian-Shu Zhang; Xu-Yang Chen; Ya Gong; Zong-Jun Du
Journal:  Arch Microbiol       Date:  2022-03-10       Impact factor: 2.552

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.