| Literature DB >> 16845009 |
Anne de Jong1, Sacha A F T van Hijum, Jetta J E Bijlsma, Jan Kok, Oscar P Kuipers.
Abstract
A common problem in the annotation of open reading frames (ORFs) is the identification of genes that are functionally similar but have limited or no sequence homology. This is particularly the case for bacteriocins, a very diverse group of antimicrobial peptides produced by bacteria and usually encoded by small, poorly conserved ORFs. ORFs surrounding bacteriocin genes are often biosynthetic genes. This information can be used to locate putative structural bacteriocin genes. Here, we describe BAGEL, a web server that identifies putative bacteriocin ORFs in a DNA sequence using novel, knowledge-based bacteriocin databases and motif databases. Many bacteriocins are encoded by small genes that are often omitted in the annotation process of bacterial genomes. Thus, we have implemented ORF detection using a number of published ORF prediction tools. In addition, BAGEL takes into account the genomic context, i.e. for each potential bacteriocin-encoding ORF, the sequence of the surrounding region on the genome is analyzed for genes that might encode proteins involved in biosynthesis, transport, regulation and/or immunity. These innovations make BAGEL unique in its ability to detect putative bacteriocin gene clusters in (new) bacterial genomes. BAGEL is freely accessible at: http://bioinformatics.biol.rug.nl/websoftware/bagel.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16845009 PMCID: PMC1538908 DOI: 10.1093/nar/gkl237
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Overview of the known classes of bacteriocins and the features that discriminate the different classes
| Class | Type | Special features | Mature size in kDa | pI | PFAM domain | Charge | Processing site | Motifs | Adjacent genes |
|---|---|---|---|---|---|---|---|---|---|
| IA | Lantibiotics | Posttranslationally modified Elongated shaped molecules | <5 | >8 | PF05500 PF04369 | G(SA) P(RQ) | F(ND)L(DEN)(LVI) SLCTPGC SXXXCPTTXCXXXC | ABC-transporter His-kinase C39-protease | |
| IB | Lantibiotics | Posttranslationally modified Globular molecules | <5 | >8 | PF05500 PF04369 | G(SA) P(RQ) | F(ND)L(DEN)(LVI) FTCCS GXXXTOBX-C | ABC-transporter His-kinase | |
| IIa | Non-lantibiotics | Strongly cationic Heat stable Strong anti-lysterial activity Pediocin-like at least one disulfide bridge | <10 | >9 | PF04604 PF02052 PF01721 | Strongly Cationic | GG | YGNGVXC LSXXELXXIXGG DoubleGG | ABC-transporter His-kinase C39-protease |
| IIb | Two-peptide non-lantibiotics | Possess at most one positive charge | 25–65 | >9 | PF02052 PF01721 | Weak | GG P(RQ) | GXXXTOBX-C LSXXELXXIXGG DoubleGG | ABC- transporter His-kinase C39-protease |
| IIc | No leader | 30–65 | ? | Cationic | ABC-transporter His-kinase | ||||
| II microcin | >7 | G(GSA) | MREOB | ABC-transporter His-kinase C39-protease | |||||
| III | Large heat labile | Large | >30 | ? | |||||
| IV | Complex bacteriocins carrying lipid or carbohydrate moieties | >10 | ? | ||||||
| V | Circular | Two transmembrane segments | 49–108 | 9.5–11 | CirC |
Figure 1Schematic overview of BAGEL. (A) A web page where a genome sequence is provided in GenBank file format. Optionally, an annotated reference genome can be selected in case a de novo ORF search was performed on the input genome. (B) A web page with parameters concerning the search and scoring of putative bacteriocins. (C) The actual search for putative bacteriocins takes place through a script which recruits (i) the various search modules; (ii) the peptides with a hit to the various databases (indicated by a plus sign) are further annotated; and (iii) the results concerning putative bacteriocins (indicated by a plus sign) are exported to a results database. (D) This database is used to provide the BAGEL user with the search results in a web page. Annotation (optional): if a genome with new ORF annotations was provided, the (optional) annotated reference genome is used (if possible) to connect the putative bacteriocin ORF to an existing genome annotation.
The motifs described in literature that were included in the BAGEL motif- and leader- sequence-databases
| Motif | Motif present in | Reference |
|---|---|---|
| YGNGXXCXXXXC | Class IIa | (25) |
| GXXXTXBEC | Class IIb | (26) |
| YYGNGVXC | Class IIa | (27) |
| GWAXGXXXG | Class II | (19) |
| FNDLV | Lantibiotics | (19) |
| MREOB | Class II microcins | (28) |
| GXOXTXBX-C | Lantibiotics | (26) |
| CPTTXCXXXC | Lantibiotics | (26) |
| AXXXAAXGA | Lantibiotics | This work |
| AX(FA)(LFV)AA(PL)GA | Lantibiotics | This work |
| Leader motifs | ||
| PR/PQ/GA/GS | Lantibiotics and Class IIb | (20) |
| GG-motif | Class IIa and IIb | (7) |
Figure 2Distance profiling. Adjacent genes are annotated on the basis of motifs or a hit to one of the databases described in Figure 1C. The presence of ORFs, of which the products are homologous to proteins known to be involved in bacteriocin biosynthesis, contributes to the probability that an ORF encodes a bacteriocin. In this example, the biosynthetic genes (red) are at position −2 (two ORFs upstream) and +4 (four ORFs downstream) of the putative bacteriocin ORF, as indicated by the ruler.
Properties used for the scoring of a putative bacteriocin gene
| Property | Weight factor |
|---|---|
| Motif present | 60 |
| FASTA hit on bacteriocin database | 100 |
| Adjacent genes | 100 |
| PFAM/HMM motif | 60 |
| Leader sequence present | 100 |
| Blast hit with colicin database | 120 |
| Charge, pI, Cysteins. | 5 |
The sum of the weight factors for the scored properties yields the final score. The peptide is considered as a significant hit when the total score is above 175.
Summary of results obtained by the application of BAGEL on various genomes
| Organism | gene/locus | Homology | ABC | HK | C39 | Score | Reference |
|---|---|---|---|---|---|---|---|
| Sublancin | −1 | −1 | 375 | (20) | |||
| Subtilosin A | 255 | (12) | |||||
| Microcin H47 | 255 | ||||||
| Microcin H47 | 255 | ||||||
| −5/+2 | 225 | ||||||
| −4 | −7 | 225 | |||||
| SP0041 | BlpU | +1 | +1 | 385 | (22) | ||
| SP0533 | BlpK | 360 | (22) | ||||
| SP0540 | BlpN | 250 | (22) | ||||
| SP0531 | BlpI | 255 | (22) | ||||
| SP0532 | BlpJ | 255 | (22) | ||||
| SP0539 | BlpM | 255 | (22) | ||||
| SP0541 | BlpO | 255 | (22) | ||||
| SP0109 | putative | +2 | 325 | ||||
| SP1949 | +8 | +4 | 175 | ||||
| SP0792 | −6 | +7 | 225 | ||||
| SP1832 | −7/+8 | 165 | |||||
| SP1948 | +5 | +5 | 165 | ||||
| BlpU | 3 | 3 | 490 | (23,24) | |||
| spr0141 | −4/8 | 175 | |||||
| spr1185 | −2/3 | 235 | |||||
| spr1199 | −8/4 | 235 | |||||
| spr1766 | 7 | 4 | 175 | ||||
| 3 | 4 | 325 | |||||
| spr0700 | −6 | +8 | 225 | ||||
| spr0098 | SP0109 | 8 | 325 | ||||
| spr1765 | +8 | +5 | 165 |
In the columns ABC (ABC transporters), HK (Histidine kinase), C39 (protease), the distance in number of ORFs of these genes to the putative bacteriocin gene is indicated (Figure 2). All peptides derived from these putative bacteriocin genes contain a processing site.