| Literature DB >> 16716236 |
James M Nolan1, Vasiliy Petrov, Claire Bertrand, Henry M Krisch, Jim D Karam.
Abstract
BACKGROUND: Bacteriophages are an important repository of genetic diversity. As one of the major constituents of terrestrial biomass, they exert profound effects on the earth's ecology and microbial evolution by mediating horizontal gene transfer between bacteria and controlling their growth. Only limited genomic sequence data are currently available for phages but even this reveals an overwhelming diversity in their gene sequences and genomes. The contribution of the T4-like phages to this overall phage diversity is difficult to assess, since only a few examples of complete genome sequence exist for these phages. Our analysis of five T4-like genomes represents half of the known T4-like genomes in GenBank.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16716236 PMCID: PMC1524935 DOI: 10.1186/1743-422X-3-30
Source DB: PubMed Journal: Virol J ISSN: 1743-422X Impact factor: 4.099
Summary of T4-like genome sequences determined in comparison with T4
| T4 | 168,904 (35.0%) | 273 (95.9%) | 8 | 209 (76.6%) | 64 |
| RB69 | 167,560 (37.6%) | 273 (94.0%) | 2 | 208 (77.7%) | 65 |
| RB49 | 164,018 (40.5%) | 272 (94.5%) | 0 | 121 (44.5%) | 151 |
| Aeh1 | 233,234 (42.8%) | 332 (91.6%) | 24 | 104 (31.3%) | 228 |
| RB43 | 180,500 (43.2%) | 292 (94.2%) | 1 | 114 (39.0%) | 178 |
| 44RR 2.8t | 173591 (44.0%) | 253 (92.8%) | 16 | 116 (45.8%) | 137 |
The number of ORFs for T4 is from the GenBank accession but does not include 7 alternative translation products included within some ORFs. The number of ORFs predicted for T4 by GeneMarkS was 266 (93.1% of the genome length). tRNAs were predicted by tRNAscan-SE. The number of T4-like ORFs is the number of ORFs conserved in T4 and at least one of the other genomes studied. The remainder of ORFs in each genome are novel ORFs.
Figure 1Blast alignment of T4-like genomes. Conserved T4-like genes are displayed as blue arrows, novel ORFs are shown as red arrows, tRNAs as black arrowheads. Pairwise tblastx similarities between genomes are indicated by green boxes. Similarities separated by less than 90 bp were combined for visual clarity. Yellow regions indicate similarities found in inverted orientation between genomes.
Domain matches for T4 conserved ORFs
| Gene | Pfam domain name | E value range | genomes hit |
| vs.6 | Gly_radical formyl transferase | 1.40E-45 to 8.8E-15 | 6/6 |
| vs.1 | SLT Transglycosylase | 0.012 to 0.74 | 6/6 |
| nrdC.10 | AAA ATPase family | 0.082 to 0.16 | 3/3 |
| nrdC.10 | BSD domain | 0.076 | 1/3 |
| nrdC.2 | TFIIS_C | 0.021 | 1/6 |
| *nrdC.11 | COG3541: nucleotidyl transferase | 4.0E-07 to 0.013 | 2/6 full alignment 4/6 partial alignment |
| *tk.4 | smart00506:A1pp phosphatase | 2.0E-20 to 0.04 | 4/6 full alignment 1/6 partial alignment |
Matches are HMMer matches to the Pfam database. * indicates BLAST matches to CCD database. Genomes hit shows (number of orthologs matching Pfam domain)/(total number of orthologs identified for the five genomes studied plus T4). For CDD matches, alignment to the full domain or partial length alignment is noted. Additional conserved ORFs for which no function was identified are: uvsW.1, pseT.2, pseT.3, a-gt.4, and 61.1.
Pfam hits for novel ORFs
| 44RRORF008c | Serine hydroxymethyltransferase | 9.80E-180 |
| 44RRORF084c | TM2 domain | 3.80E-14 |
| 44RRORF093c | Glutathionylspermidine synthase | 8.30E-109 |
| 44RRORF097c | Prokaryotic N-terminal methylation motif | 3.70E-09 |
| 44RRORF098c | SPFH domain/Band 7 family | 1.10E-06 |
| 44RRORF109c | Glutaredoxin-like domain (DUF836) | 0.016 |
| 44RRORF111c | Ribonucleotide reductase, small chain | 4.00E-06 |
| 44RRORF130c | Prokaryotic dksA/traR C4-type zinc finger | 4.30E-05 |
| 44RRORF168c | HD domain | 0.34 |
| 44RRORF232c | Domain of unknown function (DUF1732) | 0.35 |
| 44RRORF234c | Sodium:solute symporter family | 2.60E-34 |
| 44RRORF238c | Putative metallopeptidase (SprT family) | 0.33 |
| Aeh1ORF004c | CYTH domain | 0.14 |
| Aeh1ORF010c | dUTPase | 5.10E-25 |
| Aeh1ORF025c | Carbohydrate binding domain | 0.4 |
| Aeh1ORF026c | Carbohydrate binding domain | 0.12 |
| Aeh1ORF040c | Prokaryotic N-terminal methylation motif | 6.60E-09 |
| Aeh1ORF062c | Putative metallopeptidase (SprT family) | 0.00035 |
| Aeh1ORF064c | SPFH domain/Band 7 family | 2.40E-05 |
| Aeh1ORF068c | Bacterial transferase hexapeptide (3 repeats) | 0.32 |
| Aeh1ORF110c | HD domain | 0.0078 |
| Aeh1ORF111c | UV-endonuclease UvdE | 3.60E-20 |
| Aeh1ORF131c | Poly(ADP-ribose) polymerase catalytic domain | 0.026 |
| Aeh1ORF132c | ADP-ribosylglycohydrolase | 1.10E-05 |
| Aeh1ORF154c | von Willebrand factor type A domain | 0.22 |
| Aeh1ORF157c | CreA protein | 4.40E-09 |
| Aeh1ORF227c | RyR domain | 0.0054 |
| Aeh1ORF230c | Bacterial regulatory proteins, lacI family | 0.14 |
| Aeh1ORF245c | GatB/Yqey domain | 0.17 |
| Aeh1ORF289c | Poly A polymerase family | 9.00E-31 |
| Aeh1ORF318w | Phage T4 tail fibre | 8.10E-06 |
| RB43ORF020c | LysM domain | 1.70E-07 |
| RB43ORF057w | DnaJ domain | 2.70E-05 |
| RB43ORF119c | von Willebrand factor type A domain | 0.02 |
| RB43ORF127c | C-5 cytosine-specific DNA methylase | 1.20E-117 |
| RB43ORF139c | SPFH domain/Band 7 family | 3.80E-05 |
| RB43ORF157c | PhoH-like protein PIN domain | 4.20E-15 0.0032 |
| RB43ORF179c | DnaJ central domain (4 repeats) | 0.28 |
| RB43ORF191c | DnaJ central domain (4 repeats) | 0.22 |
| RB43ORF205w | Protein of unknown function (DUF1054) | 0.43 |
| RB43ORF241c | Zeta toxin | 0.36 |
| RB43ORF282w | Phage tail fibre adhesin Gp38 | 0.0035 |
| RB49ORF044c | DEAD/DEAH box helicase | 0.069 |
| RB49ORF046c | Prokaryotic N-terminal methylation motif | 0.43 |
| RB49ORF102c | D-alanyl-D-alanine carboxypeptidase | 0.0014 |
| RB49ORF143w | Methyltransferase small domain Ribosomal RNA adenine dimethylase | 0.0011 0.33 |
| RB49ORF188c | TFIIB zinc-binding | 0.22 |
| RB49ORF239c | Protein of unknown function (DUF723) | 0.098 |
| RB49ORF244c | CYTH domain | 0.0026 |
| RB49ORF260c | Protein of unknown function (DUF1311) | 0.2 |
| RB69ORF048c | Thymidylate synthase | 0.022 |
| RB69ORF050c | Peptidase family U32 | 0.00055 |
| RB69ORF053c | Nucleotidyl transferase | 0.0022 |
| RB69ORF055c | SIS domain | 0.0043 |
| RB69ORF104c | Oleosin | 0.42 |
| RB43ORF027c | AP2 domain | 0.00071 |
| RB43ORF066w | LAGLIDADG endonuclease | 0.15 |
| RB49ORF040c | AP2 domain HNH endonuclease | 2.20E-07 0.0042 |
| RB49ORF212c | HNH endonuclease | 9.20E-07 |
| 44RRORF072c | Nicotinate phosphoribosyltransferase | 9.80E-63 |
| 44RRORF083c | NUDIX domain | 7.10E-15 |
| Aeh1ORF119c | Nicotinate phosphoribosyltransferase | 1.30E-46 |
| Aeh1ORF282c | NUDIX domain Cytidylyltransferase | 8.30E-12 5.80E-05 |
| Aeh1ORF330c | NUDIX domain | 3.00E-08 |
| RB43ORF138c | NUDIX domain Cytidylyltransferase | 1.90E-13 5.30E-05 |
| RB43ORF255w | Nicotinate phosphoribosyltransferase | 4.50E-44 |
Predicted ORF protein sequences were used to search Pfam using HMMer. Matches with E < 0.5 are shown. Multiple matches are shown for ORFs having non-overlapping matches to more than one domain.
Figure 2Sequence logo representation of putative early promoter consensus for each genome. Sequences were identified using fuzznuc [24] and HMMer [53]. Consensus sequences were plotted with WebLogo [54]. Height of letter indicates degree of conservation. Nucleotide 0 is the putative transcription start site. Putative up elements and the -10 region are boxed.
Figure 3Location of early promoter sequences on the RB69 genome. The top panel shows an overview. Conserved Genes are shown as yellow arrows, novel ORFs as red line arrows, predicted early promoters are shown as large black arrows, and TransTerm [38] predicted terminators as red blocks. The bottom panel shows detail of one region. Predicted transcripts are shown at the bottom, blue arrows indicate transcripts expected from conserved gene promoters and red arrows designate those expected from novel ORF promoters. Orthologs of genes known to be expressed early in T4 infections are boxed. Red boxes indicate genes present only on predicted ORF promoter transcripts; blue-boxed genes are present on conserved and ORF promoter transcripts. Black boxes are early genes whose transcripts could not be predicted.
Figure 4(A) Sequence logo representation of putative middle promoter consensus for RB69 and 44RR. Consensus was identified and plotted as in Figure 2. (B) Putative late promoter consensus for each genome. Consensus was identified as for early promoters, using fuzznuc and HMMer, except Aeh1, for which ELPH [37] and HMMer were used initially.
Predicted tRNAs
| Ala UGC | + | + | |||
| Arg UCU | + | + | |||
| Asn GUU | + | + | |||
| Asp GUC | + | + | |||
| Cys GCA | + | ||||
| Gln UUG | + | ||||
| Glu UUC | + | ||||
| Gly UCC | + | + | + | ||
| His GUG | + | + | |||
| Ile CAU* | + | + | + | + | + |
| Ile GAU | + | + | |||
| Leu CAA | + | ||||
| Leu UAA | + | + | + | ||
| Leu UAG | + | ||||
| Lys UUU | + | + | |||
| Met CAU | + | + | |||
| Met CAU | + | ||||
| Phe GAA | + | + | |||
| Pro UGG | + | + | + | ||
| Ser GCU | + | ||||
| Ser UGA | + | + | + | ||
| Thr UGU | + | + | + | ||
| Trp CCA | + | + | |||
| Tyr GUA | + | ||||
| Val CAC | + | ||||
| Pseudo | 3 | 1 |
The presence of a tRNAscan-SE predicted species is indicated for each genome. The number of predicted tRNA pseudogenes is also indicated. * indicates putative lysine-modified tRNAIle [41-44].
Figure 5tRNA alignment. Putative lysidine-modified phage tRNA-Ile sequences were aligned by secondary structure using clustalW. E. coli modified tRNA-Ile and phage Met-CAU and Ile-GAU sequences are shown for comparison.