| Literature DB >> 20462861 |
Anne de Jong1, Auke J van Heel, Jan Kok, Oscar P Kuipers.
Abstract
Mining bacterial genomes for bacteriocins is a challenging task due to the substantial structure and sequence diversity, and generally small sizes, of these antimicrobial peptides. Major progress in the research of antimicrobial peptides and the ever-increasing quantities of genomic data, varying from (un)finished genomes to meta-genomic data, led us to develop the significantly improved genome mining software BAGEL2, as a follow-up of our previous BAGEL software. BAGEL2 identifies putative bacteriocins on the basis of conserved domains, physical properties and the presence of biosynthesis, transport and immunity genes in their genomic context. The software supports parameter-free, class-specific mining and has high-throughput capabilities. Besides building an expert validated bacteriocin database, we describe the development of novel Hidden Markov Models (HMMs) and the interpretation of combinations of HMMs via simple decision rules for prediction of bacteriocin (sub-)classes. Furthermore, the genetic context is automatically annotated based on (combinations of) PFAM domains and databases of known context genes. The scoring system was fine-tuned using expert knowledge on data derived from screening all bacterial genomes currently available at the NCBI. BAGEL2 is freely accessible at http://bagel2.molgenrug.nl.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20462861 PMCID: PMC2896169 DOI: 10.1093/nar/gkq365
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Classification scheme used by BAGEL2
| Class I lanthionine | Class II non-lanthionine | Class III |
|---|---|---|
| A LanBC modified | A Pediocin-like | Large proteins |
| B LanM modified | B Two-component | |
| C LanL modified | C Cyclic peptides | |
| D Miscellaneous |
Figure 1.Process overview. (A) The input genome data can be a single or multi-entry GenBank file, in case of non-annotated data or, if re-annotation is desired, an ORF calling can be performed via the BAGEL2 web site. (B) Annotation of putative bacteriocins and their genomic context genes. (C) Calculation of score for each candidate and generation of detailed reports (including graphical representation).
Weight factors used by the BAGEL2 scoring system
| Property | Weight factor |
|---|---|
| Blast hit with bacteriocin Class I or II, stringent cut-off | 10 000 |
| Blast hit with bacteriocin Class I or II, non-stringent cut-off | 5000 |
| Blast hit with bacteriocin Class III | 2000 |
| [Cys]:[Thr,Ser] ratio 0.35: 0.80 and leader present | 600 |
| Bacteriocin regular expression match | 500 |
| Cysteine count between 2 and 8 | 400 |
| PF05147 (1000 − 100 * distance) | 200–900 |
| PF03412 PF00005→LanT | 300 |
| PF04737 PF04738→LanB | 300 |
| PF00069 PF05147→LanL | 300 |
| Blast hit with context biosynthesis genes, Class I or II | 200 |
| Blast hit with context immunity genes, Class I or II | 200 |
| [Cys]:[Thr,Ser] ratio 0.25:0.55 and no leader | 200 |
| Presence of a leader processing site | 100 |
| HMM hit for context genes | 100 |
| PF00072 PF00486→Response | 100 |
| PF00512 PF02518→Sensor | 100 |
| Proper pI and charge | 50 |
| HMM hit for bacteriocin | see |
Figure 2.BAGEL2 graphical output for putative bacteriocin (light green). ClocelDRAFT_0418 from Clostridium cellulovorans 743B, which was identified through the new MA-2PEPA motif. Amino acids in the leader sequence of the putative bacteriocin are indicated in green. Amino acids potentially involved in lanthionine ring formation are marked in red (cysteine) and blue (serine and threonine).
Putative bacteriocins identified by BAGEL2 using new HMMs
| Motif | Putative bacteriocin identified by new motif | Organism |
|---|---|---|
| LE-MER1 | bpmyx0001_45460 | |
| MA-2PEPA | ClocelDRAFT_0418 | |
| LE-LAC481 | G11MC16DRAFT_3402 | |
| LE-LanBC | SnasDRAFT_14510 | |
| MA-2PEPb | CORMATOL_02550 | |
| MA-DUF | bcere0025_31210 |
Putative Class I bacteriocins identified with BAGEL2
| Gene | Product | Protein_ID | Class | Score | Organism |
|---|---|---|---|---|---|
| SPN23F_12701 | Putative lantibiotic precursor | YP_002511205.1 | IB | 8025 | |
| SPN23F_19710 | Hypothetical protein | YP_002511831.1 | IB | 2950 | |
| SPN23F_19700 | Hypothetical protein | YP_002511830.1 | IB | 1950 | |
| orf3711 | Not annotated | orf3711 | IA | 10 400 | |
| CORMATOL_2550 | Hypothetical protein | EEG25958.1 | IB | 4775 | |
| CORMATOL_2549 | Hypothetical protein | EEG25957.1 | IB | 3575 | |
| CORMATOL_2551 | Hypothetical protein | EEG25959.1 | IB | 3375 |
Displayed are the combined data from mining the genomes of S. pneumoniae 23F, B. clausii KSM-K16 (re-annotated using Prodigal) and C. matruchotii ATCC 33806.