Literature DB >> 23677608

BAGEL3: Automated identification of genes encoding bacteriocins and (non-)bactericidal posttranslationally modified peptides.

Auke J van Heel1, Anne de Jong, Manuel Montalbán-López, Jan Kok, Oscar P Kuipers.   

Abstract

Identifying genes encoding bacteriocins and ribosomally synthesized and posttranslationally modified peptides (RiPPs) can be a challenging task. Especially those peptides that do not have strong homology to previously identified peptides can easily be overlooked. Extensive use of BAGEL2 and user feedback has led us to develop BAGEL3. BAGEL3 features genome mining of prokaryotes, which is largely independent of open reading frame (ORF) predictions and has been extended to cover more (novel) classes of posttranslationally modified peptides. BAGEL3 uses an identification approach that combines direct mining for the gene and indirect mining via context genes. Especially for heavily modified peptides like lanthipeptides, sactipeptides, glycocins and others, this genetic context harbors valuable information that is used for mining purposes. The bacteriocin and context protein databases have been updated and it is now easy for users to submit novel bacteriocins or RiPPs. The output has been simplified to allow user-friendly analysis of the results, in particular for large (meta-genomic) datasets. The genetic context of identified candidate genes is fully annotated. As input, BAGEL3 uses FASTA DNA sequences or folders containing multiple FASTA formatted files. BAGEL3 is freely accessible at http://bagel.molgenrug.nl.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 23677608      PMCID: PMC3692055          DOI: 10.1093/nar/gkt391

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Scientific interest in bacterial antimicrobial peptides and other posttranslationally modified peptides is increasing (1,2). Finding new antibiotic compounds from novel sources to fight multi-drug resistant pathogens has become the focus of many researchers. Furthermore, knowledge about the diverse enzymes involved in posttranslational modifications is rapidly advancing (3–5) and can be used to make new-to-nature antimicrobial peptides (6,7) or to stabilize medically relevant peptides (8). The discovered world of ribosomally synthesized and posttranslationally modified peptides (RiPPs) is constantly expanding. More and more modifications and the enzymes involved are being described (1). With the discovery of each new class new genome mining efforts are triggered. These efforts have led to valuable information and several high-impact publications (4,9–12). The main challenge in these kinds of genome mining efforts is the small size of the genes encoding the peptides of interest. Small open reading frames (ORFs) are often omitted during automated annotation efforts especially when their product sequences do not show strong homology with those of already described peptides, hampering a direct mining approach. Therefore, the large modification enzymes have been used regularly in indirect genome mining efforts. With the design and development of the BActeriocin GEnome mining tooL (BAGEL) since 2005, we aim to facilitate these efforts (13,14). Other useful tools have also been developed, such as the data repository Bactibase (15) and the prediction tool antiSMASH (16), which also supports non-ribosomal peptides but lacks some of the classes supported by the faster BAGEL3. In the current version of BAGEL, BAGEL3, our main goals were to combine direct and indirect mining, generate a simpler, clearer and better quality output, make the analysis more independent of ORF predictions and to facilitate the addition of novel classes of peptides that can be mined for.

IMPLEMENTATION

New in BAGEL3

The major improvement in BAGEL3 is the new dual process (Figure 1), i.e. combining two mining strategies in one procedure. Another major advantage of BAGEL3 is its use of DNA sequences as input instead of annotated genomes, making it less dependent on ORF predictions. Furthermore, novel classes of RiPPs have been implemented, extending the genome mining capabilities of BAGEL3 beyond bacteriocins only. For this purpose new hidden Markov (HMM) models have been added describing specific genes involved in the biosynthesis of cyanobactins (called CyaG after PatG) (17), sactipeptides (called SacCD after TrnCD) (18) and linaridins (called LinL after CypL) (10).
Figure 1.

Schematic overview of the BAGEL3 genome mining procedure. BAGEL3 uses two different approaches in parallel to find bacteriocins and modified peptides. Both approaches use nucleotide sequences in FASTA format as input. The first approach (left, red) describes how the context-based approach proceeds. The second approach (right, blue) describes the simpler precursor peptide-based mining. Finally, both methods generate a single summary table with links to detailed graphical reports.

Schematic overview of the BAGEL3 genome mining procedure. BAGEL3 uses two different approaches in parallel to find bacteriocins and modified peptides. Both approaches use nucleotide sequences in FASTA format as input. The first approach (left, red) describes how the context-based approach proceeds. The second approach (right, blue) describes the simpler precursor peptide-based mining. Finally, both methods generate a single summary table with links to detailed graphical reports.

BAGEL3 databases

BAGEL3 uses three different databases containing modified or unmodified bacteriocins and other posttranslationally modified peptides (non-bactericidal). The databases have been thoroughly updated. Each database contains all the records belonging to one of the three classes of proteins internal to BAGEL3: Class I contains posttranslationally modified peptides <10 kDa, the modification enzymes of which are encoded in the genomic context of the modified peptide and have been described for more than one case; Class II contains posttranslationally modified peptides <10 kDa not fitting the criteria of the first database; Class III contains anti-microbial proteins >10 kDa. This division is based on the procedure used by BAGEL3 to identify these proteins. The databases can be viewed online (http://bagel.molgenrug.nl/index.php/bacteriocin-database/) and have web links to literature, UniprotKB and NCBI. Users are actively encouraged to add new records to these databases via a web form (http://bagel.molgenrug.nl/index.php/submit-a-bacteriocin).

Description of the software

BAGEL3 uses DNA nucleotide sequences in FASTA format as input; multiple sequence entries per file are allowed. These DNA sequences are analyzed in parallel using two different approaches, one based on finding genes commonly found in the context of bacteriocin or RiPP genes, the other based on finding the gene itself. The indirect approach (left red box in Figure 1) starts with performing a simple ORF call on the DNA. This call looks for ORFs of a certain minimal length that have a start and a stop codon not taking into account the presence of a possible ribosome-binding site. The products of these ORFs are subsequently screened for the presence of protein domains. Simple and defined rules based on these protein domains are then used to decide which part of the nucleotide sequence should be analyzed in more detail. These DNA sequences are called area(s) of interest (AOI). The size of the area is set to 20 K base pairs centered on the identified context gene. The ORFs in the AOI are then called using Glimmer (19). The next essential step is an additional specialized simple ORF call for every AOI to find the small ORFs that encode the targets of the identified modification enzymes. This ORF call takes into account the rule that was used to identify this AOI, so that when BAGEL3 is, for example, looking for a lanthipeptide, it will only call small ORFs encoding cysteine-containing peptides. Next, the context is annotated using the PFAM database (blast against Uniprot database is also possible in the stand-alone version). The last step is to identify the RiPP gene(s) that are present in the AOI. This is done using the results of both a Blast search against the BAGEL3 Class I and Class II databases and a screening for known motifs. If no direct hit is obtained then BAGEL3 predicts a precursor peptide sequence based on sequence properties and genomic organization. The direct approach (right blue in box Figure 1) starts with a Glimmer ORF call. Next, the ORFs are blasted against all the three databases. The context (20 K base pair) of Blast hits is annotated using the PFAM database (a blast against Uniprot database is also possible in the stand-alone version). Because the same peptide could be identified with both approaches, the results of both are compared and filtered to exclude duplicates. Peptides identified via context genes are classified using this information (Table 1). Unmodified peptides identified via homology are classified according to their best Blast hit. Finally, an html output with graphics is generated from the large basic results table (see Figure 2). The whole process is logged into a log file. The nucleotide sequences of the identified AOIs can be downloaded.
Table 1.

Currently supported classes of RiPPs and the rules used to identify potential clusters

NameRule
Bottromycin(PF04055) AND (PF02624)
Cyanobactin(CyaG)
Glycocin(TIGR04195) AND (PF03412)
Lanthipeptide class II(PF05147) AND (PF13575)
Lanthipeptide class I(PF04737|PF04738|PF14028) AND (PF05147)
Lanthipeptide class III(lanKC)
Lanthipeptide class IV(PF05147) AND (LanL)
LAPs(PF02624) AND (PF00881)
Lasso peptide(PF13471) AND (PF00733)
Linaridin(LinL)
Microcin(PF02794)
Sactipeptides(SacCD) AND (PF04055)
Thiopeptide(PF02624) AND (PF00881) AND (PF14028)

| = or AND = additional requirement. The rules in this table describe the criteria that have to be matched by a certain stretch of DNA to become an AOI. Some rules might overlap, and therefore they are checked in an ordered fashion. In this way, the more stringent rule is checked after the less stringent.

Figure 2.

Example detailed report of a lantibiotic cluster encoding a nisin variant and its modification enzymes found in Streptococcus suis J14 (NC_017618.1) using BAGEL3. The target gene (smallORF_6) was in this case identified by the specialized small ORF calling procedure.

Example detailed report of a lantibiotic cluster encoding a nisin variant and its modification enzymes found in Streptococcus suis J14 (NC_017618.1) using BAGEL3. The target gene (smallORF_6) was in this case identified by the specialized small ORF calling procedure. Currently supported classes of RiPPs and the rules used to identify potential clusters | = or AND = additional requirement. The rules in this table describe the criteria that have to be matched by a certain stretch of DNA to become an AOI. Some rules might overlap, and therefore they are checked in an ordered fashion. In this way, the more stringent rule is checked after the less stringent.

Availability

The BAGEL3 web server can be accessed and used freely for files up to 20 MB (http://bagel.molgenrug.nl). A stand-alone Linux version is available on request for local installation. The stand-alone version can be easily adapted to personal preferences using a comprehensive configuration file.

System requirements

BAGEL3 runs on an Ubuntu Linux platform (http://www.ubuntu.com) with Apache web server (version2.2), MySQL server (version 14.14), PHP 5.4 (http://www.php.net/), Perl 5.10 (http://www.perl.org/), BioPerl 1.6.9 (http://www.bioperl.org/) and Joomla. Furthermore, the following software packages are used: BLAST 2.2.27 (20); HMMsearch 3.0 (http://hmmer.janelia.org/); Glimmer 3.02 (http://www.cbcb.umd.edu/software/glimmer/) (19), Pfamscan of the Sanger institute (ftp://ftp.sanger.ac.uk/pub/databases/Pfam/Tools/) and the UniRef50 database (http://www.ebi.ac.uk/uniref/).

Validation of BAGEL 3

The BAGEL3 software was validated using a set of 50 genomes known to encode bacteriocins and other modified peptides. It was checked that no known compounds were missed. Next, to validate if novel clusters could be identified, 200 draft genome sequences from the NCBI server were screened. Both these sets of genomes were also used to check for false positives. A false positive was defined as a cluster that did not have at least a likely core peptide or a gene context that can be associated with RiPP biosynthesis.

RESULTS AND DISCUSSION

Analysis of example genomes

Based on the newly added HMM CyaG, which identifies the serine protease, generally termed G protein, in the cyanobactin biosynthesis pathway (21), we found a new cyanobactin encoded in the genome of the cyanobacterium Lyngbya sp PCC8106 (see Table 2). In Enterococcus faecalis Fly1, BAGEL3 identified an interesting novel lantibiotic gene cluster. The cluster could code for two lanthipeptides, which is common for two-component lantibiotics, but in this case they are not modified by a LanM type enzyme but by a single set of LanBC enzymes. Another example of the added value of BAGEL3 is demonstrated when querying the plasmid pTEF2 of E. faecalis V538. Based on the context, BAGEL3 identifies a glycocin-like peptide (Table 2). Glycocins are glycosylated antimicrobial peptides of which Glycocin S and Sublancin 168 are the only two characterized members that also contain disulfide bridges (1). The identified peptide has an exact match with the previously described bacteriocin Enterocin 96 (22). In the article describing Enterocin 96, the authors note that the measured mass is higher than the theoretical peptide mass. This mass difference is perfectly in line with the BAGEL3 predicted glycosylation, which has also been suggested by others (23). In the genome of the honey bee pathogen Paenibacillus larvae subsp larvae BRL 230010, BAGEL3 identified the gene for a so-called sactipeptide that shows low homology to sporulation killing factor SfkA of Bacillus subtilis 168. In the genome of the gram negative pathogen Burkholderia pseudomallei, 354a, a lasso peptide, was identified with strong homology to capistruin (24).
Table 2.

A selection of novel RIPPs identified by BAGEL3

DNA screenedHomology (P-value)IdentificationSequence
P. larvae subsp larvae BRL 230010 Ctg01135Sporulation-killingfactor_skfA [1e-10]Context: SacCDSactipeptide:
MSNHNVRNEPAPAWESSAQNNLSKPAGIPLIKSVGCAACWGAK NISLTRACLPPTPINLAL
pTEF2 E. fecalis V538Enterocin_96 [2e-41] (exact match)context: TIGR04195Glycocin:
MLNKKLLENGVVNAVTIDELDAQFGGMSKRDCNL MKACCAGQAVTYAIHSLLNRLGGDSSDPAGCNDIVRKYCK
PF03412
E. faecalis Fly1 cont1.76leader_abc mature_ab PF02052.7 leaderLanBCcontext:Lantibiotic:
PF04737.5 PF04738.5MPKYDDFDLNLKQTSASNQKDTRVTSVMACTPGTCNNKCPN TNWLCSNVCVTKTCWTCA
PF05147.5
E. faecalis Fly1 cont1.76leader_abc PF02052.7 leaderLanBCcontext:Lantibiotic:
PF04737.5 PF04738.5MPKYDDFDLNLKQNVSSSNKEPRITSIKWCTPG TCNNTCKGDSTLKSNCCGGSLMCSLGGC
PF05147.5
Lyngbya sp PCC 8106Trunkamide[1e-06]Context:Cyanobactin:
CyaGMPCYPSYDGVDASVCMPCYPSYDGVDASVCMPCYP SYDDAE
B. pseudomallei 354a Contig0218Capistruin[5e-23]Context:Lasso peptide:
PF13471.1MVRFLAKLLRSTIHGSHGVSLDAVSSTHGTPGFQTPDARV ISRFGFN
PF00733.16
A selection of novel RIPPs identified by BAGEL3

Simple small ORF calling procedure facilitates genome mining

A big problem when screening large amounts of genomic data using BAGEL2 was its dependence on the quality of the annotation. Multiple different ORF prediction procedures were therefore implemented. Consequently, several results had to be compared while still some of the small ORFs encoding antimicrobial peptides were lacking, creating the need for manual evaluation of some of the identified gene clusters. To remove this dependency, we now use a simple small ORF calling procedure for the AOIs identified by their context, obviating the need for reannotation and simplifying the procedure and the analyses.

BAGEL3 is extensible

The novel mining approach used in BAGEL3 has the big advantage that it can easily be extended to include new classes of modified peptides, the only requirement being that the gene of the small peptide of interest lies in a genomic context that can be recognized. The genomic context has to be translated into a simple rule that describes a certain AOI (for examples see Table 1). The described precursor peptides should then be added to the database. If requirements for the precursor peptides are known (for example, must contain a cysteine), these can be added. The context rule should be tested to check if it is specific enough. Adding a new class of peptides to the system can be done in within a few hours if an extensive literature review is available. Users are encouraged to submit novel rules/classes via the online form available on the BAGEL3 web site.

CONCLUSIONS

BAGEL3 is a versatile fast genome-mining tool valid not only for modified and non-modified bacteriocins but also for non-bactericidal RiPPs. It can handle large data sets like those from metagenome projects. This updated version looks for bacteriocins/RiPPs via two different approaches, which increases the success rate and lowers the need for manual evaluation of results. The new design also allows for easy inclusion of novel classes of peptides, and users are therefore encouraged to propose the addition of novel classes.
  23 in total

1.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

2.  Identifying bacterial genes and endosymbiont DNA with Glimmer.

Authors:  Arthur L Delcher; Kirsten A Bratke; Edwin C Powers; Steven L Salzberg
Journal:  Bioinformatics       Date:  2007-01-19       Impact factor: 6.937

3.  Enterocin 96, a novel class II bacteriocin produced by Enterococcus faecalis WHE 96, isolated from Munster cheese.

Authors:  Esther Izquierdo; Camille Wagner; Eric Marchioni; Dalal Aoude-Werner; Saïd Ennahar
Journal:  Appl Environ Microbiol       Date:  2009-05-01       Impact factor: 4.792

4.  Discovery of a widely distributed toxin biosynthetic gene cluster.

Authors:  Shaun W Lee; Douglas A Mitchell; Andrew L Markley; Mary E Hensler; David Gonzalez; Aaron Wohlrab; Pieter C Dorrestein; Victor Nizet; Jack E Dixon
Journal:  Proc Natl Acad Sci U S A       Date:  2008-03-28       Impact factor: 11.205

5.  Designing and producing modified, new-to-nature peptides with antimicrobial activity by use of a combination of various lantibiotic modification enzymes.

Authors:  Auke J van Heel; Dongdong Mu; Manuel Montalbán-López; Djoke Hendriks; Oscar P Kuipers
Journal:  ACS Synth Biol       Date:  2013-02-12       Impact factor: 5.110

6.  Thuricin CD, a posttranslationally modified bacteriocin with a narrow spectrum of activity against Clostridium difficile.

Authors:  Mary C Rea; Clarissa S Sit; Evelyn Clayton; Paula M O'Connor; Randy M Whittal; Jing Zheng; John C Vederas; R Paul Ross; Colin Hill
Journal:  Proc Natl Acad Sci U S A       Date:  2010-04-30       Impact factor: 11.205

7.  Isolation and structural characterization of capistruin, a lasso peptide predicted from the genome sequence of Burkholderia thailandensis E264.

Authors:  Thomas A Knappe; Uwe Linne; Séverine Zirah; Sylvie Rebuffat; Xiulan Xie; Mohamed A Marahiel
Journal:  J Am Chem Soc       Date:  2008-08-01       Impact factor: 15.419

8.  Angiotensin-(1-7) with thioether bridge: an angiotensin-converting enzyme-resistant, potent angiotensin-(1-7) analog.

Authors:  Leon D Kluskens; S Adriaan Nelemans; Rick Rink; Louwe de Vries; Anita Meter-Arkema; Yong Wang; Thomas Walther; Anneke Kuipers; Gert N Moll; Marijke Haas
Journal:  J Pharmacol Exp Ther       Date:  2008-11-26       Impact factor: 4.030

9.  BAGEL: a web-based bacteriocin genome mining tool.

Authors:  Anne de Jong; Sacha A F T van Hijum; Jetta J E Bijlsma; Jan Kok; Oscar P Kuipers
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

10.  BACTIBASE second release: a database and tool platform for bacteriocin characterization.

Authors:  Riadh Hammami; Abdelmajid Zouhir; Christophe Le Lay; Jeannette Ben Hamida; Ismail Fliss
Journal:  BMC Microbiol       Date:  2010-01-27       Impact factor: 3.605

View more
  208 in total

Review 1.  Ribosomally synthesized and post-translationally modified peptide natural product discovery in the genomic era.

Authors:  Kenton J Hetrick; Wilfred A van der Donk
Journal:  Curr Opin Chem Biol       Date:  2017-03-02       Impact factor: 8.822

2.  Three Novel Lantibiotics, Ticins A1, A3, and A4, Have Extremely Stable Properties and Are Promising Food Biopreservatives.

Authors:  Bingyue Xin; Jinshui Zheng; Ziya Xu; Congzhi Li; Lifang Ruan; Donghai Peng; Ming Sun
Journal:  Appl Environ Microbiol       Date:  2015-07-31       Impact factor: 4.792

3.  Genome annotation and comparative genomic analysis of Bacillus subtilis MJ01, a new bio-degradation strain isolated from oil-contaminated soil.

Authors:  Touraj Rahimi; Ali Niazi; Tahereh Deihimi; Seyed Mohsen Taghavi; Shahab Ayatollahi; Esmaeil Ebrahimie
Journal:  Funct Integr Genomics       Date:  2018-05-05       Impact factor: 3.410

4.  Microviridin 1777: A Toxic Chymotrypsin Inhibitor Discovered by a Metabologenomic Approach.

Authors:  Simon Sieber; Simone M Grendelmeier; Lonnie A Harris; Douglas A Mitchell; Karl Gademann
Journal:  J Nat Prod       Date:  2020-01-28       Impact factor: 4.050

5.  Biofilm formation displays intrinsic offensive and defensive features of Bacillus cereus.

Authors:  Joaquín Caro-Astorga; Elrike Frenzel; James R Perkins; Ana Álvarez-Mena; Antonio de Vicente; Juan A G Ranea; Oscar P Kuipers; Diego Romero
Journal:  NPJ Biofilms Microbiomes       Date:  2020-01-15       Impact factor: 7.290

Review 6.  An Analysis of Biosynthesis Gene Clusters and Bioactivity of Marine Bacterial Symbionts.

Authors:  Nadarajan Viju; Stanislaus Mary Josephine Punitha; Sathianeson Satheesh
Journal:  Curr Microbiol       Date:  2021-05-26       Impact factor: 2.188

7.  Whole genome sequence of two Rathayibacter toxicus strains reveals a tunicamycin biosynthetic cluster similar to Streptomyces chartreusis.

Authors:  Aaron J Sechler; Matthew A Tancos; David J Schneider; Jonas G King; Christine M Fennessey; Brenda K Schroeder; Timothy D Murray; Douglas G Luster; William L Schneider; Elizabeth E Rogers
Journal:  PLoS One       Date:  2017-08-10       Impact factor: 3.240

Review 8.  Mechanistic Understanding of Lanthipeptide Biosynthetic Enzymes.

Authors:  Lindsay M Repka; Jonathan R Chekan; Satish K Nair; Wilfred A van der Donk
Journal:  Chem Rev       Date:  2017-01-30       Impact factor: 60.622

Review 9.  Using bacterial genomes and essential genes for the development of new antibiotics.

Authors:  Francisco R Fields; Shaun W Lee; Michael J McConnell
Journal:  Biochem Pharmacol       Date:  2016-12-08       Impact factor: 5.858

10.  Computational approaches to natural product discovery.

Authors:  Marnix H Medema; Michael A Fischbach
Journal:  Nat Chem Biol       Date:  2015-09       Impact factor: 15.040

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.