Literature DB >> 29788290

BAGEL4: a user-friendly web server to thoroughly mine RiPPs and bacteriocins.

Auke J van Heel1, Anne de Jong1, Chunxu Song1, Jakob H Viel1, Jan Kok1, Oscar P Kuipers1.   

Abstract

Interest in secondary metabolites such as RiPPs (ribosomally synthesized and posttranslationally modified peptides) is increasing worldwide. To facilitate the research in this field we have updated our mining web server. BAGEL4 is faster than its predecessor and is now fully independent from ORF-calling. Gene clusters of interest are discovered using the core-peptide database and/or through HMM motifs that are present in associated context genes. The databases used for mining have been updated and extended with literature references and links to UniProt and NCBI. Additionally, we have included automated promoter and terminator prediction and the option to upload RNA expression data, which can be displayed along with the identified clusters. Further improvements include the annotation of the context genes, which is now based on a fast blast against the prokaryote part of the UniRef90 database, and the improved web-BLAST feature that dynamically loads structural data such as internal cross-linking from UniProt. Overall BAGEL4 provides the user with more information through a user-friendly web-interface which simplifies data evaluation. BAGEL4 is freely accessible at http://bagel4.molgenrug.nl.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29788290      PMCID: PMC6030817          DOI: 10.1093/nar/gky383

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

BAGEL4 is a web server that enables users to identify and visualize gene clusters in prokaryotic DNA involved in the biosynthesis of Ribosomally synthesized and Post translationally modified Peptides (RiPPs) and (unmodified) bacteriocins. Interest in these classes of molecules is increasing due to the need for novel antibiotics and to their important role in food preservation, microbial ecology and plant biocontrol. The post translational modifications in RiPPs expand the natural structural diversity beyond the 20 genetically encoded amino acids (1). These modifications often stabilize the peptides, making them more resistant to heat and proteases. Mining web servers help researchers to analyse the genetic potential of strains. They can aid in pinpointing the genetic origin of an observed antimicrobial activity and hence identifying the associated chemical structure (2,3). Alternatively the data can provide a starting point for (heterologous) production of novel ribosomally synthesized antimicrobial compounds (4). Also these servers have added value in annotation pipelines (5–7). Since the development of the first version of BAGEL (8) more web servers have been developed such as antiSMASH (9), PRISM (10) and recently RIPPMiner (11). They all depend on information from literature and databases such as bactibase (12) CAMPR3 (13) and the MiBig data repository (14). Here, we present the latest version of the BAGEL suite, BAGEL4. New features such as the integration of RNA-Seq data, an improved web blast and integration of promoter and terminator predictions have been added. The databases have been thoroughly updated and we commit to supporting and maintaining the web server for years to come.

MATERIALS AND METHODS

Description of the software

BAGEL4 operates according to the flowchart depicted in Figure 1. The required input is fasta formatted DNA. Multiple files and multiple records per file are allowed; a maximum of 50 Mb can be uploaded. Alternatively a genome can be selected from a list containing full and WGS genomes (using the name or the RefSeq Accession number). DNA will only be analysed if its read length is above the set minimum (default 3000 bp). Optionally, RNA expression data can be uploaded in BedGraph track format (.BED and .BEDGRAPH extensions are allowed). The BedGraph file should contain both strands in one file. Larger datasets can be handled in consultation on our server or, alternatively, a stand-alone version is available.
Figure 1.

General flow of the BAGEL4 web server. The left part is executed for all input DNA larger than a set threshold (default 3000 bp), the right part is executed for every detected area of interest (AOI).

General flow of the BAGEL4 web server. The left part is executed for all input DNA larger than a set threshold (default 3000 bp), the right part is executed for every detected area of interest (AOI). The DNA is translated into six large proteins (one for each reading frame). To limit the amount of data that has to be screened, translation is only started after a legal start codon (ATG, GTG and TTG). Subsequently, the proteins are screened for the (co) occurrence of certain protein motifs (Supplementary Table S1) and blasted against the core peptide database. Based on these results so called Area(s) Of Interest (AOI) are selected. Next, overlapping AOIs are combined. Once an AOI has been determined it is analyzed in detail. The ORFs in the AOI are first called using Glimmer3 (15). The pipeline is setup in such a way that Glimmer3 makes a model for every defined AOI. Subsequently, small ORFs (sORFs) are called in the intergenic regions. The default setting for these sORFs is a minimum length of 72 bp (24 amino acid residues); an overlap of 10 bp with Glimmer3 ORFs is allowed. All (small) ORFs are then blasted (16) against the annotation database and the core peptide database. If homology is found in the core peptide database, an alignment is produced. Then promoters and terminators are predicted (see new features for more detail). Finally, an overview of the results is generated with links to detailed reports per AOI. The detailed report consists of a graphical visualisation (Figure 2) of the (annotated) genes, promoters and terminators. Gene expression data will also be visualised if an optional RNA-Seq file had been uploaded (Figure 2). Additionally, an alignment is shown if homology with a record in the core peptide database was found (Figure 3). UniProt structural data (cross linking, modified residues) of this database record will also be displayed, if available (Figure 3).
Figure 2.

Example graphics produced by BAGEL4. On the left top items can be turned on or off by clicking on the item. Genes are indicated as arrows and additional information is displayed (including a link to BLAST the protein) by mouse-over. The additional information disappears by clicking on the gene. Bottom (in blue), RNA expression data displayed in RPMK.

Figure 3.

Example output of an alignment based on a BLAST hit with the core peptide database. The query is linked to a record (Nukacin A) in the core peptide database. Based on the UniProt identifier of the hit in the database, information available on modifications and bridging patterns is displayed. The leader peptide of the database record is highlighted in dark gray and modified residues are indicated with asterisks. Users should be aware that this information is only indicative for the query sequence.

Example graphics produced by BAGEL4. On the left top items can be turned on or off by clicking on the item. Genes are indicated as arrows and additional information is displayed (including a link to BLAST the protein) by mouse-over. The additional information disappears by clicking on the gene. Bottom (in blue), RNA expression data displayed in RPMK. Example output of an alignment based on a BLAST hit with the core peptide database. The query is linked to a record (Nukacin A) in the core peptide database. Based on the UniProt identifier of the hit in the database, information available on modifications and bridging patterns is displayed. The leader peptide of the database record is highlighted in dark gray and modified residues are indicated with asterisks. Users should be aware that this information is only indicative for the query sequence.

BAGEL4 databases

Core peptide database

The bacteriocin database has been updated and now contains almost 500 RiPPs (class 1, see Supplementary Figure S1), 230 unmodified bacteriocins (class 2) and 90 large (>10 kD) bacteriocins (class 3). Most records contain a link to NCBI or UniProt. Next to literature in general specific resources have been used to update our records such as RippMiner (11) and the MIBiG data repository (14). The database is available on http://bagel4.molgenrug.nl.

Annotation database

The database includes the prokaryotic part of the uniref90 database extended with the context protein database used by BAGEL3 (19). It was scanned for protein domains that are common in the context of RiPPs and this information was added to the database records.

Validation of BAGEL4

The software was validated as described previously (19). In short, the successful detection of known gene clusters was verified, while 50 recently published genomes were analysed to check for new compounds and for the appearance of false positives.

NEW IN BAGEL4

Improved core peptide (web) BLAST

The webblast feature executes a BLAST (16) search against the selected database and, whenever hits are found, it visualizes an alignment that includes post-translational-modifications. This information is imported weekly from the public UniProt database (17). In this way, basic structural information based on homology is provided, which gives insight into structural relatedness (cross-linking patterns). Background information (literature) is provided on the basis of the database hit. Next to being incorporated in the full pipeline, this feature is also offered as a separate web-blast option (http://bagel4.molgenrug.nl/blast.php).

Six frame translation of input DNA for improved speed

BAGEL4 translates all DNA into six large proteins, one for each reading frame and using only legal start codons. This converted input is used to look for motifs and core peptides. This strategy eliminates the need for a Glimmer run at this stage, saving computational time. Being efficient in selecting the AOIs is important when screening large datasets.

Integration of RNA-Seq data

The BAGEL4 input can optionally be extended with RNA sequencing data. This data must be supplied in the so called BedGraph format with both strands in one file. The data is visualized in reads per kilobase million (RPKM) below the identified gene cluster (Figure 2). The RNA sequencing data is not used for gene cluster identification but it offers two main advantages. First, it can help identifying the core peptide, if that has not been predicted by the software. Secondly, it allows examining whether the gene cluster is expressed under the condition tested.

Integrated promoter and terminator prediction

BAGEL4 now predicts promoters and terminators within the AOI(s). Promoter prediction is based on a DNA binding motif recognized by RNA-polymerase σA (TTGACAN16–18TATAAT). The position frequency matrix (PFM) of this motif was built using known promoter binding sites of Escherichia coli and Bacillus subtilis. This tool is also available separately in the Genome2D web server (http://genome2d.molgenrug.nl/index.php/prokaryote-promoters). TranTermHP terminator prediction (18) was additionally implemented in the BAGEL4 pipeline. The predictions are visualised in the gene cluster (Figure 2) and can help understanding gene regulation and manually identifying core peptides.

Improved and adaptable graphical report

For each AOI a gene cluster graphic is generated that can be modified by the user. Promoters, terminators, gene names and small ORFs can be turned on or off. Moving the mouse over a certain gene provides background information and a link to BLAST its encoded protein.

DISCUSSION

The goal of BAGEL4 is to provide its users with as much information as possible on identified AOIs and to improve the annotation of novel bacterial genome sequences. Finding associated core peptides can be especially challenging when there is no homology to described compounds. Coupling different information sources can be very helpful in this respect. The new web blast feature provides users with a quick insight in the potential impact sequence differences can have on internal bridging patterns. It also often gives insight in the position of the leader cleavage site. The coupling to RNA-Seq data allows going beyond defining the genetic potential of a strain. With the increasing availability of RNA-Seq data this provides a useful additional feature. Discovery of AOIs by BAGEL4 is now fully independent of ORF calling, which has two main advantages. It firstly improves the speed of the evaluation. Secondly, not depending on ORF calling lowers the risk of missing an AOI. Overall BAGEL4 has been updated and extended with new features; it is user-friendlier and offers reliable, fast and convenient mining of bacteriocins and RiPPs.

DATA AVAILABILITY

BAGEL4 is freely available at http://bagel4.molgenrug.nl for files up to 50 mb. Click here for additional data file.
  19 in total

1.  Identifying bacterial genes and endosymbiont DNA with Glimmer.

Authors:  Arthur L Delcher; Kirsten A Bratke; Edwin C Powers; Steven L Salzberg
Journal:  Bioinformatics       Date:  2007-01-19       Impact factor: 6.937

2.  Discovery, Production and Modification of Five Novel Lantibiotics Using the Promiscuous Nisin Modification Machinery.

Authors:  Auke J van Heel; Tomas G Kloosterman; Manuel Montalban-Lopez; Jingjing Deng; Annechien Plat; Baptiste Baudu; Djoke Hendriks; Gert N Moll; Oscar P Kuipers
Journal:  ACS Synth Biol       Date:  2016-07-07       Impact factor: 5.110

3.  Biosynthesis of the Novel Macrolide Antibiotic Anthracimycin.

Authors:  Silke Alt; Barrie Wilkinson
Journal:  ACS Chem Biol       Date:  2015-09-08       Impact factor: 5.100

Review 4.  Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature.

Authors:  Paul G Arnison; Mervyn J Bibb; Gabriele Bierbaum; Albert A Bowers; Tim S Bugni; Grzegorz Bulaj; Julio A Camarero; Dominic J Campopiano; Gregory L Challis; Jon Clardy; Paul D Cotter; David J Craik; Michael Dawson; Elke Dittmann; Stefano Donadio; Pieter C Dorrestein; Karl-Dieter Entian; Michael A Fischbach; John S Garavelli; Ulf Göransson; Christian W Gruber; Daniel H Haft; Thomas K Hemscheidt; Christian Hertweck; Colin Hill; Alexander R Horswill; Marcel Jaspars; Wendy L Kelly; Judith P Klinman; Oscar P Kuipers; A James Link; Wen Liu; Mohamed A Marahiel; Douglas A Mitchell; Gert N Moll; Bradley S Moore; Rolf Müller; Satish K Nair; Ingolf F Nes; Gillian E Norris; Baldomero M Olivera; Hiroyasu Onaka; Mark L Patchett; Joern Piel; Martin J T Reaney; Sylvie Rebuffat; R Paul Ross; Hans-Georg Sahl; Eric W Schmidt; Michael E Selsted; Konstantin Severinov; Ben Shen; Kaarina Sivonen; Leif Smith; Torsten Stein; Roderich D Süssmuth; John R Tagg; Gong-Li Tang; Andrew W Truman; John C Vederas; Christopher T Walsh; Jonathan D Walton; Silke C Wenzel; Joanne M Willey; Wilfred A van der Donk
Journal:  Nat Prod Rep       Date:  2013-01       Impact factor: 13.423

5.  Draft Genome Sequence of Streptomyces sp. Strain JV178, a Producer of Clifednamide-Type Polycyclic Tetramate Macrolactams.

Authors:  Yunci Qi; John M D'Alessandro; Joshua A V Blodgett
Journal:  Genome Announc       Date:  2018-01-04

6.  PRISM 3: expanded prediction of natural product chemical structures from microbial genomes.

Authors:  Michael A Skinnider; Nishanth J Merwin; Chad W Johnston; Nathan A Magarvey
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

7.  Genome-guided identification of novel head-to-tail cyclized antimicrobial peptides, exemplified by the discovery of pumilarin.

Authors:  Auke J van Heel; Manuel Montalban-Lopez; Quentin Oliveau; Oscar P Kuipers
Journal:  Microb Genom       Date:  2017-09-25

8.  BACTIBASE second release: a database and tool platform for bacteriocin characterization.

Authors:  Riadh Hammami; Abdelmajid Zouhir; Christophe Le Lay; Jeannette Ben Hamida; Ismail Fliss
Journal:  BMC Microbiol       Date:  2010-01-27       Impact factor: 3.605

9.  BAGEL3: Automated identification of genes encoding bacteriocins and (non-)bactericidal posttranslationally modified peptides.

Authors:  Auke J van Heel; Anne de Jong; Manuel Montalbán-López; Jan Kok; Oscar P Kuipers
Journal:  Nucleic Acids Res       Date:  2013-05-15       Impact factor: 16.971

10.  Draft Genome Sequence of Bacillus velezensis GF610, a Producer of Potent Anti-Listeria Agents.

Authors:  Michelle M Gerst; Edward G Dudley; Lingzi Xiaoli; Ahmed E Yousef
Journal:  Genome Announc       Date:  2017-10-12
View more
  153 in total

1.  An antimicrobial Staphylococcus sciuri with broad temperature and salt spectrum isolated from the surface of the African social spider, Stegodyphus dumicola.

Authors:  Seven Nazipi; Sofie G Vangkilde-Pedersen; Mette Marie Busck; Dorthe Kirstine Lund; Ian P G Marshall; Trine Bilde; Marie Braad Lund; Andreas Schramm
Journal:  Antonie Van Leeuwenhoek       Date:  2021-02-04       Impact factor: 2.271

2.  Genomic, Phenotypic, and Virulence Analysis of Streptococcus sanguinis Oral and Infective-Endocarditis Isolates.

Authors:  Shannon P Baker; Tara J Nulton; Todd Kitten
Journal:  Infect Immun       Date:  2018-12-19       Impact factor: 3.441

Review 3.  Natural bacterial isolates as an inexhaustible source of new bacteriocins.

Authors:  Jelena Lozo; Ljubisa Topisirovic; Milan Kojic
Journal:  Appl Microbiol Biotechnol       Date:  2021-01-04       Impact factor: 4.813

4.  dbAMP: an integrated resource for exploring antimicrobial peptides with functional activities and physicochemical properties on transcriptome and proteome data.

Authors:  Jhih-Hua Jhong; Yu-Hsiang Chi; Wen-Chi Li; Tsai-Hsuan Lin; Kai-Yao Huang; Tzong-Yi Lee
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

Review 5.  Heterologous expression-facilitated natural products' discovery in actinomycetes.

Authors:  Min Xu; Gerard D Wright
Journal:  J Ind Microbiol Biotechnol       Date:  2018-11-16       Impact factor: 3.346

6.  Diazaquinomycin Biosynthetic Gene Clusters from Marine and Freshwater Actinomycetes.

Authors:  Jana Braesel; Jung-Ho Lee; Benoit Arnould; Brian T Murphy; Alessandra S Eustáquio
Journal:  J Nat Prod       Date:  2019-03-21       Impact factor: 4.050

7.  Development of Bacterial Therapeutics against the Bovine Respiratory Pathogen Mannheimia haemolytica.

Authors:  Samat Amat; Edouard Timsit; Danica Baines; Jay Yanke; Trevor W Alexander
Journal:  Appl Environ Microbiol       Date:  2019-10-16       Impact factor: 4.792

8.  Comparative genomics analysis of Pediococcus acidilactici species.

Authors:  Zhenzhen Li; Qi Song; Mingming Wang; Junli Ren; Songling Liu; Shancen Zhao
Journal:  J Microbiol       Date:  2021-05-15       Impact factor: 3.422

Review 9.  Genome mining for lasso peptides: past, present, and future.

Authors:  Wai Ling Cheung-Lee; A James Link
Journal:  J Ind Microbiol Biotechnol       Date:  2019-06-05       Impact factor: 3.346

10.  Genetic Analysis of Mutacin B-Ny266, a Lantibiotic Active against Caries Pathogens.

Authors:  Delphine Dufour; Abdelahhad Barbour; Yuki Chan; Marcus Cheng; Taimoor Rahman; Matthew Thorburn; Cameron Stewart; Yoav Finer; Siew-Ging Gong; Céline M Lévesque
Journal:  J Bacteriol       Date:  2020-05-27       Impact factor: 3.490

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.