| Literature DB >> 17062630 |
Abstract
Phage_Finder, a heuristic computer program, was created to identify prophage regions in completed bacterial genomes. Using a test dataset of 42 bacterial genomes whose prophages have been manually identified, Phage_Finder found 91% of the regions, resulting in 7% false positive and 9% false negative prophages. A search of 302 complete bacterial genomes predicted 403 putative prophage regions, accounting for 2.7% of the total bacterial DNA. Analysis of the 285 putative attachment sites revealed tRNAs are targets for integration slightly more frequently (33%) than intergenic (31%) or intragenic (28%) regions, while tmRNAs were targeted in 8% of the regions. The most popular tRNA targets were Arg, Leu, Ser and Thr. Mapping of the insertion point on a consensus tRNA molecule revealed novel insertion points on the 5' side of the D loop, the 3' side of the anticodon loop and the anticodon. A novel method of constructing phylogenetic trees of phages and prophages was developed based on the mean of the BLAST score ratio (BSR) of the phage/prophage proteomes. This method verified many known bacteriophage groups, making this a useful tool for predicting the relationships of prophages from bacterial genomes.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17062630 PMCID: PMC1635311 DOI: 10.1093/nar/gkl732
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Flow chart of Phage_Finder pipeline (A) and Phage_Finder.pl script (B) logic. Standard symbols for constructing flow charts were used.
List of HMMs used to categorize putative prophage regions
| Name | Description |
|---|---|
| Large terminase | |
| PF03354 | Terminase_1: phage terminase, large subunit, putative |
| PF04466 | Terminase_3: Phage terminase large subunit |
| PF05876 | Terminase_GpA: Phage terminase large subunit (GpA) |
| PF06056 | Terminase_5: Putative ATPase subunit of terminase (gpP-like) |
| PF07570 | Protein of unknown function (DUF1545) |
| TIGR01547 | phage_term_2: phage terminase, large subunit, PBSX family |
| TIGR01630 | psiM2_ORF9: phage uncharacterized protein, C-terminal domain |
| Small terminase | |
| PF03592 | Terminase_2: Terminase small subunit |
| PF05119 | Terminase_4: Phage terminase, small subunit |
| PF05944 | Phage_term_smal: Phage small terminase subunit |
| PF07141 | Phage_term_sma: Putative bacteriophage terminase small subunit |
| PF07471 | Phage_Nu1: Phage DNA packaging protein Nu1 |
| TIGR01558 | sm_term_P27: phage terminase, small subunit, putative, P27 family |
| Portal | |
| PF04860 | Phage_portal: Phage portal protein |
| PF05133 | Phage_prot_Gp6: Phage portal protein, SPP1 Gp6-like |
| PF05136 | Phage_portal_2: Phage portal protein, lambda family |
| PF06074 | DUF935: Protein of unknown function (DUF935) |
| TIGR01537 | portal_HK97: phage portal protein, HK97 family |
| TIGR01538 | portal_SPP1: phage portal protein, SPP1 family |
| TIGR01539 | portal_lambda: phage portal protein, lambda family |
| TIGR01540 | portal_PBSX: phage portal protein, PBSX family |
| TIGR01542 | A118_put_portal: phage portal protein, putative, A118 family |
| Capsid/head/coat | |
| PF01819 | Levi_coat: Levivirus coat protein |
| PF02305 | Phage_F: Capsid protein (F protein) |
| PF03864 | Phage_cap_E: Phage major capsid protein E |
| PF05065 | Phage_capsid: Phage capsid family |
| PF05125 | Phage_cap_P2: Phage major capsid protein, P2 family |
| PF05126 | Phage_min_cap: Phage minor capsid protein |
| PF05356 | Phage_Coat_B: Phage Coat protein B |
| PF05357 | Phage_Coat_A: Phage Coat Protein A |
| PF05371 | Phage_Coat_Gp8: Phage major coat protein, Gp8 |
| PF06673 | Phage_min_cap2: Phage minor capsid protein 2 |
| PF07068 | L_lactis_ph-MCP: |
| TIGR01551 | major_capsid_P2: phage major capsid protein, P2 family |
| TIGR01554 | major_cap_HK97: phage major capsid protein, HK97 family |
| Capsid prot. | |
| PF03420 | Peptidase_U9: Prohead core protein protease, T4 family |
| PF04586 | Caudo_protease: Caudovirus prohead protease |
| TIGR01543 | proheadase_HK97: phage prohead protease, HK97 family |
| Head-tail joining | |
| PF02831 | gpW: gpW [head-tail-joining] |
| PF05352 | Phage_connector: Phage Connector (GP10) |
| PF05354 | Phage_attach: Phage Head-Tail Attachment |
| PF05521 | Phage_H_T_join: Phage head-tail joining protein |
| PF06264 | DUF1026: Protein of unknown function (DUF1026) |
| TIGR01563 | gp16_SPP1: phage head-tail adaptor, putative |
| Tape measure | |
| PF06120 | Phage_HK97_TLTM: Tail length tape measure protein |
| PF06791 | TMP_2: Prophage tail length tape measure protein |
| TIGR01541 | tape_meas_lam_C: phage tail tape measure protein, lambda family |
| TIGR01760 | tape_meas_TP901: phage tail tape measure protein, TP901 family, core region |
| Virion morphogenesis | |
| PF02924 | HDPD: Bacteriophage lambda head decoration protein D |
| PF02925 | gpD: Bacteriophage scaffolding protein D |
| PF03863 | Phage_mat-A: Phage maturation protein |
| PF04233 | Phage_Mu_F: Phage Mu protein F like protein |
| PF05396 | Phage_T7_Capsid: Phage T7 capsid assembly protein |
| PF05926 | Phage_GPL: Phage head completion protein (GPL) |
| PF05929 | Phage_GPO: Phage capsid scaffolding protein (GPO) |
| PF07230 | Phage_T4_Gp20: Bacteriophage T4-like capsid assembly protein (Gp20) |
| TIGR01641 | phageSPP1_gp7: phage putative head morphogenesis protein, SPP1 gp7 family |
| Other functions | |
| PF02914 | Mu_transposase: Bacteriophage Mu transposase |
| PF03374 | ANT: Phage antirepressor protein |
| PF04687 | Microvir_H: Microvirus H protein (pilot protein) |
| PF05135 | Phage_QLRG: Phage QLRG family, putative DNA packaging |
| PF05435 | Phi-29_GP3: Phi-29 DNA terminal protein GP3 |
| PF05894 | Podovirus_Gp16: Podovirus DNA encapsidation protein (Gp16) [terminal protein] |
| PF07026 | DUF1317: phage conserved hypothetical protein |
| PF07030 | DUF1320: phage conserved hypothetical protein |
| PF07880 | T4_gp9_10: Bacteriophage T4 gp9/10-like protein [baseplate] |
| TIGR01560 | put_DNA_pack: uncharacterized phage protein (possible DNA packaging) |
| TIGR02215 | phage_chp_gp8: phage conserved hypothetical protein, phiE125 gp8 family |
| Lysis | |
| PF00959 | Phage_lysozyme: Phage lysozyme |
| PF01464 | SLT: Transglycosylase SLT domain |
| PF01473 | CW_binding_1: Putative cell wall binding repeat |
| PF03245 | Phage_lysis: Bacteriophage lysis protein |
| PF04517 | Microvir_lysis: Microvirus lysis protein (E), C terminus |
| PF04531 | Phage_holin_1: Bacteriophage holin |
| PF04550 | Phage_holin_2: Phage holin family 2 |
| PF04688 | Phage_holin: Phage lysis protein, holin |
| PF04936 | DUF658: Protein of unknown function (DUF 658) |
| PF05102 | Holin_BlyA: holin, BlyA family |
| PF05105 | Phage_holin_4: Holin family |
| PF05106 | Phage_holin_3: Phage holin family (Lysis protein S) |
| PF05289 | BLYB: Borrelia hemolysin accessory protein [holin] |
| PF05382 | Amidase_5: Bacteriophage peptidoglycan hydrolase |
| PF05449 | DUF754: Protein of unknown function (DUF754) |
| PF06714 | Gp5_OB: Gp5 N-terminal OB domain |
| PF06715 | Gp5_C: Gp5 C-terminal repeat (3 copies) |
| PF06737 | Transglycosylas: Transglycosylase-like domain |
| PF06946 | Phage_holin_5: Phage holin |
| PF07066 | Phage_Lacto_M3: Lactococcus phage M3 protein |
| TIGR01592 | holin_SPP1: holin, SPP1 family |
| TIGR01593 | holin_tox_secr: toxin secretion/phage lysis holin |
| TIGR01594 | holin_lambda: phage holin, lambda family |
| TIGR01598 | holin_phiLC3: holin, phage phi LC3 family |
| TIGR01606 | holin_BlyA: holin, BlyA family |
| TIGR01673 | holin_LLH: phage holin, LL-H family |
| Tails/tail fibers | |
| PF02306 | Phage_G: Major spike protein (G protein) |
| PF02413 | Caudo_TAP: Domain of unknown function DUF144 |
| PF03335 | Phage_fiber: Phage tail fiber repeat |
| PF03406 | Phage_fiber_2: Phage tail fiber repeat |
| PF03903 | Phage_T4_gp36: Phage T4 tail fibre |
| PF03906 | Phage_T7_tail: Phage T7 tail fiber protein |
| PF04630 | Phage_tail: Phage major tail protein |
| PF04717 | Phage_base_V: Phage-related baseplate assembly protein |
| PF04865 | Baseplate_J: Baseplate J-like protein |
| PF04883 | DUF646: Bacteriophage protein of unknown function (DUF646) |
| PF04984 | Phage_sheath_1: Phage tail sheath protein |
| PF04985 | Phage_tube: Phage tail tube protein FII |
| PF05017 | TMP: TMP repeat |
| PF05069 | Phage_tail_S: Phage virion morphogenesis family |
| PF05100 | Phage_tail_L: Phage minor tail protein L |
| PF05268 | GP38: Phage tail fibre adhesin Gp38 |
| PF05489 | Phage_tail_X: Phage Tail Protein X |
| PF05939 | Phage_min_tail: Phage minor tail protein |
| PF06141 | Phage_tail_U: Phage minor tail protein U |
| PF06158 | Phage_E: Phage tail protein E |
| PF06199 | Phage_tail_2: Phage major tail protein 2 |
| PF06222 | Phage_TAC: Phage tail assembly chaperone |
| PF06223 | Phage_tail_T: Minor tail protein T |
| PF06274 | Mu-like_GpL: Bacteriophage Mu tail sheath protein (GpL) |
| PF06341 | DUF1056: Protein of unknown function (DUF1056) |
| F06488 | L_lac_phage_MSP: |
| PF06528 | Phage_P2_GpE: phage tail protein, P2 GpE family |
| PF06763 | Minor_tail_Z: Prophage minor tail protein Z (GPZ) |
| PF06805 | Lambda_tail_I: Bacteriophage lambda tail assembly protein I |
| PF06810 | Phage_GP20: Phage minor structural protein GP20 |
| PF06820 | Phage_fiber_C: Putative prophage tail fibre C-terminus |
| PF06841 | Phage_T4_gp19: T4-like virus tail tube protein gp19 |
| PF06890 | Phage_Mu_Gp45: Bacteriophage Mu Gp45 protein |
| PF06891 | P2_Phage_GpR: P2 phage tail completion protein R (GpR) |
| PF06893 | Phage_Mu_P: Bacteriophage Mu P protein |
| PF06894 | Phage_lambd_GpG: Bacteriophage lambda minor tail protein (GpG) |
| PF06995 | Phage_P2_GpU: Phage P2 GpU |
| PF07409 | GP46: Phage protein GP46 |
| PF07484 | Collar: Phage Tail Collar Domain |
| TIGR01600 | phage_tail_L: phage minor tail protein L |
| TIGR01603 | maj_tail_phi13: phage major tail protein, phi13 family |
| TIGR01611 | tail_tube: phage major tail tube protein |
| TIGR01633 | phi3626_gp14_N: phage putative tail component, N-terminal domain |
| TIGR01634 | tail_P2_I: phage tail protein I |
| TIGR01635 | tail_comp_S: phage virion morphogenesis protein |
| TIGR01644 | phage_P2_V: phage baseplate assembly protein V |
| TIGR01665 | put_anti_recept: phage minor structural protein, N-terminal region |
| TIGR01674 | phage_lambda_G: phage minor tail protein G |
| TIGR01715 | phage_lam_T: phage tail assembly protein T |
| TIGR01725 | phge_HK97_gp10: phage protein, HK97 gp10 family |
| TIGR02126 | phgtail_TP901_1: phage major tail protein, TP901-1 family |
| TIGR02242 | tail_TIGR02242: phage tail protein domain |
Testing Phage_Finder against a known dataset
| Organism | Known | Predicted | # False | |||
|---|---|---|---|---|---|---|
| # Prophagea | # ORFs | # Prophagea | # ORFs | + | − | |
| 3 | 147 | 3 | 170 | 0 | 0 | |
| 1 | 44 | 1 | 44 | 0 | 0 | |
| 2 | 247 | 2 | 162 | 1 | 0 | |
| 19 | 1 | 19 | 0 | 0 | ||
| 1 | 57 | 1 | 57 | 0 | 0 | |
| 1 | 44 | 1 | 42 | 0 | 0 | |
| 2 | 68 | 2 | 69 | 0 | 0 | |
| 4 | 201 | 4 | 207 | 0 | 0 | |
| 5 | 288 | 5 | 268 | 0 | 0 | |
| 5 | 298 | 3 | 200 | 0 | 2 | |
| 4 | 98 | 4 | 97 | 0 | 0 | |
| 10 | 429 | 8 | 472 | 0 | 2 | |
| 11 | 598 | 10 | 677 | 0 | 1 | |
| 6 | 254 | 4 | 240 | 0 | 2 | |
| 5 | 302 | 5 | 321 | 0 | 0 | |
| 1 | 62 | 1 | 64 | 0 | 0 | |
| 2 | 95 | 1 | 36 | 0 | 1 | |
| 1 | 58 | 1 | 55 | 0 | 0 | |
| 1 | 14 | 1 | 14 | 0 | 0 | |
| 2 | 29 | 2 | 36 | 0 | 0 | |
| 2 | 125 | 2 | 125 | 0 | 0 | |
| 1 | 45 | 1 | 45 | 0 | 0 | |
| 3 | 120 | 3 | 146 | 3 | 0 | |
| 3 | 122 | 2 | 146 | 0 | 1 | |
| 5 | 207 | 5 | 212 | 0 | 0 | |
| 1 | 75 | 1 | 75 | 0 | 0 | |
| 2 | 39 | 2 | 64 | 2 | 0 | |
| 1 | 72 | 1 | 72 | 1 | 0 | |
| 2 | 132 | 2 | 133 | 1 | 0 | |
| 2 | 123 | 2 | 151 | 0 | 0 | |
| 1 | 65 | 1 | 90 | 0 | 0 | |
| 1 | 154 | 1 | 154 | 0 | 0 | |
| 2 | 112 | 2 | 112 | 0 | 0 | |
| 4 | 171 | 3 | 157 | 0 | 1 | |
| 6 | 338 | 6 | 359 | 0 | 0 | |
| 5 | 294 | 5 | 332 | 0 | 0 | |
| 1 | 41 | 1 | 41 | 0 | 0 | |
| 1 | 13 | 0 | 0 | 0 | 1 | |
| 2 | 51 | 2 | 97 | 0 | 0 | |
| 1 | 51 | 1 | 51 | 0 | 0 | |
| 3 | 168 | 3 | 169 | 0 | 0 | |
| 1 | 22 | 1 | 22 | 3 | 0 | |
| Total | 118 | 5892 | 107 | 6003 | 11 | 11 |
aOnly those regions with predicted att sites and contain core phage genes are listed.
bSPβ was split into two regions.
Figure 2Predicted prophage target-site distributions. The distribution of targets where Phage_Finder found putative attachment sites (A). The genetic code table indicates the distribution of tRNA targets (B). For each codon, the number of phages from Williams, 2002 and the number of predicted prophages from this study are indicated, separated by a colon. The gray-highlighted numbers demarcate those codons that are targeted six or more times. The point of insertion on a consensus tRNA molecule was mapped (C) for Phage_Finder predicted prophages (upper), phages and prophages from the literature [(27), middle] and the two datasets combined (lower). The arrows point to the nucleotide insertion point while the numbers indicate number of insertions at each insertion point. Red arrows and numbers in the combined dataset show those locations that are unique to either dataset, while gold colored arrows and numbers highlight common insertion points between the two datasets. The frequency and position of insertion into a consensus tRNA gene is noted in (D). Red bars indicate Phage_Finder-predicted insertion events while green bars represent insertion events reported from the literature (27).
Figure 3Test phylogenetic tree generated by converting the BSR of BLASTP bidirectional matches into distance (A). Whole genome BLASTP data from Fouts et al., 2005 (37) was used to compute this tree. The previously published 16S rRNA tree (B) is shown for comparison.
Figure 4Phylogenetic analysis of Phage_Finder predicted prophages, known prophages and sequenced phage genomes. The radial tree was constructed with branch length extensions. The branches were colored as follows: sequenced phage genomes (black), known prophages (blue), Phage_Finder predicted prophage regions (gold). Only key phages or prophages are noted for clarity. Known phage groups are indicated in red.