| Literature DB >> 30322017 |
Mike Dyall-Smith1,2, Felicitas Pfeifer3, Angela Witte4, Dieter Oesterhelt5, Friedhelm Pfeiffer6.
Abstract
The halophilic myohalovirus Halobacterium virus phiH (ΦH) was first described in 1982 and was isolated from a spontaneously lysed culture of Halobacterium salinarum strain R1. Until 1994, it was used extensively as a model to study the molecular genetics of haloarchaea, but only parts of the viral genome were sequenced during this period. Using Sanger sequencing combined with high-coverage Illumina sequencing, the full genome sequence of the major variant (phiH1) of this halovirus has been determined. The dsDNA genome is 58,072 bp in length and carries 97 protein-coding genes. We have integrated this information with the previously described transcription mapping data. PhiH could be classified into Myoviridae Type1, Cluster 4 based on capsid assembly and structural proteins (VIRFAM). The closest relative was Natrialba virus phiCh1 (φCh1), which shared 63% nucleotide identity and displayed a high level of gene synteny. This close relationship was supported by phylogenetic tree reconstructions. The complete sequence of this historically important virus will allow its inclusion in studies of comparative genomics and virus diversity.Entities:
Keywords: Archaea; Halobacterium salinarum; genome inversion; haloarchaea; halobacteria; halophage; halovirus; virus
Year: 2018 PMID: 30322017 PMCID: PMC6210493 DOI: 10.3390/genes9100493
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Diagram of the phiH1 genome with lines below showing regions previously sequenced (red) along with their database accessions. The blue lines (NEW) indicate regions sequenced in the present study by Sanger sequencing. Tick marks (dark green) below the blue lines show the positions of oligonucleotide primers used for PCR and primer-walking. Dots at the right and left contig ends indicate sequence continuity between them. Scale bar at top shows position in bp.
Annotated coding sequences (CDS) of halovirus phiH1.
| Start (nt) | Stop (nt) | Locus_tag | Length (bp) | Direction | Gene | Product | |
|---|---|---|---|---|---|---|---|
| 115 | 717 | PhiH1_005 | 603 | + | - | uncharacterized protein | PhiCh1p02, ORF1 |
| 710 | 2371 | PhiH1_010 | 1662 | + |
| terminase large subunit TerL | PhiCh1p03, ORF2 |
| 2377 | 2505 | PhiH1_015 | 129 | + | - | uncharacterized protein | Nmag_4253 |
| 2498 | 2689 | PhiH1_020 | 192 | + | - | uncharacterized protein | PhiCh1p05, ORF4 |
| 2686 | 4242 | PhiH1_025 | 1557 | + |
| portal protein Por | PhiCh1p07, ORF6 |
| 4246 | 5187 | PhiH1_030 | 942 | + | - | head morphogenesis protein | PhiCh1p08, ORF7 |
| 5261 | 5587 | PhiH1_035 | 327 | + |
| capsid protein HP20 | [AJF28118.1] |
| 5667 | 7466 | PhiH1_040 | 1800 | + | - | prohead protease | 4 PhiCh1p09, ORF8 |
| 7506 | 8468 | PhiH1_045 | 963 | + |
| major capsid protein HP32 | PhiCh1p12, ORF11 |
| 8481 | 8933 | PhiH1_050 | 453 | + | - | uncharacterized protein | PhiCh1p13, ORF12 |
| 8940 | 9542 | PhiH1_055 | 603 | + |
| head-tail adaptor protein Ada | PhiCh1p14, ORF13 |
| 9539 | 9919 | PhiH1_060 | 381 | + |
| head closure protein type 1 Hco | PhiCh1p15, ORF14 |
| 9921 | 10,202 | PhiH1_065 | 282 | + | - | uncharacterized protein | PhiCh1p16, ORF15 |
| 10,202 | 10,636 | PhiH1_070 | 435 | + |
| probable neck protein type 1 Nep | PhiCh1p17, ORF16 |
| 10,643 | 11,239 | PhiH1_075 | 597 | + |
| tail completion protein type 1 Tco | PhiCh1p18, ORF17 |
| 11,259 | 12,557 | PhiH1_080 | 1299 | + |
| tail sheath protein HP67 | PhiCh1p19, ORF18 |
| 12,607 | 13,002 | PhiH1_085 | 396 | + | - | probable structural protein | PhiCh1p20, ORF19 |
| 13,006 | 13,407 | PhiH1_090 | 402 | + | - | uncharacterized protein | PhiCh1p21, ORF20 |
| 13,572 | 13,745 | PhiH1_095 | 174 | − | - | DUF4177 domain protein | [SEH60446.1] |
| 13,792 | 16,581 | PhiH1_100 | 2790 | + |
| tape-measure tail protein Tpm | 4 PhiCh1p23, ORF22 |
| 16,583 | 17,104 | PhiH1_105 | 522 | + | - | uncharacterized protein | PhiCh1p25, ORF24 |
| 17,108 | 17,446 | PhiH1_110 | 339 | + | - | uncharacterized protein | PhiCh1p26, ORF25 |
| 17,450 | 18,298 | PhiH1_115 | 849 | + | - | uncharacterized protein | PhiCh1p27, ORF26 |
| 18,306 | 18,446 | PhiH1_120 | 141 | + | - | CxxC motif protein | [SEH61109.1] |
| 18,443 | 18,988 | PhiH1_125 | 546 | + | - | uncharacterized protein | PhiCh1p29, ORF28 |
| 18,988 | 19,146 | PhiH1_130 | 159 | + | - | uncharacterized protein | - |
| 19,143 | 19,508 | PhiH1_135 | 366 | + | - | virus-related protein | [AGM10900.1] |
| 19,505 | 19,867 | PhiH1_140 | 363 | + | - | uncharacterized protein | PhiCh1p30, ORF29 |
| 19,874 | 21,148 | PhiH1_145 | 1275 | + |
| baseplate J family protein Bpj | PhiCh1p31, ORF30 |
| 21,135 | 22,277 | PhiH1_150 | 1143 | + | - | uncharacterized protein | PhiCh1p32, ORF31 |
| 22,295 | 22,678 | PhiH1_155 | 384 | + | - | virus-related protein | [AFH21897.1] |
| 22,683 | 23,249 | PhiH1_160 | 567 | + | - | virus-related protein | [AFH21653.1] |
| 23,252 | 25,504 | PhiH1_165 | 2253 | + | - | repeat-containing tail fibre protein | PhiCh1p37, ORF36 |
| 25,506 | 25,787 | PhiH1_170 | 282 | + | - | uncharacterized protein | Nmag_4285 |
| 25,825 | 26,499 | PhiH1_175 | 675 | + |
| tyrosine integrase/recombinase Int1 | PhiCh1p36, ORF35 |
| 26,490 | 26,792 | PhiH1_180 | 303 | − | - | uncharacterized protein | Nmag_4283 |
| 26,798 | 27,766 | PhiH1_185 | 969 | − | - | repeat-containing tail fibre protein 2 | PhiCh1p37, ORF36 |
| 27,803 | 28,150 | PhiH1_190 | 348 | + | - | YncB-like endonuclease | [AGM11801.1] |
| 28,153 | 28,386 | PhiH1_195 | 234 | + | - | virus-related protein | [AGC34510.1] |
| 28,379 | 28,675 | PhiH1_200 | 297 | + | - | uncharacterized protein | [EMA49173.1] |
| 28,682 | 28,783 | PhiH1_205 | 102 | + | - | uncharacterized protein | - |
| 28,788 | 29,357 | PhiH1_210 | 570 | + | - | transmembrane domain protein | - |
| 29,394 | 29,642 | PhiH1_215 | 249 | − | - | uncharacterized protein | - |
| 29,651 | 29,941 | PhiH1_220 | 291 | − | - | uncharacterized protein | PhiCh1p40, ORF39 |
| 30,104 | 30,244 | PhiH1_225 | 144 | + | - | uncharacterized protein | - |
| 30,250 | 30,414 | PhiH1_230 | 165 | + | - | uncharacterized protein | PhiCh1p44, ORF43 |
| 30,411 | 30,806 | PhiH1_235 | 396 | + | - | VapC family toxin | PhiCh1p45, ORF44 |
| 30,803 | 31,465 | PhiH1_240 | 663 | − |
| tyrosine integrase/recombinase Int2 | PhiCh1p46, ORF45 |
| 31,680 | 31,934 | PhiH1_245 | 255 | + | - | uncharacterized protein | - |
| 31,939 | 32,271 | PhiH1_250 | 333 | + | - | uncharacterized protein | Nmag_4297 |
| 32,420 | 32,857 | PhiH1_255 | 438 | − | - | HNH-type endonuclease | PhiCh1p48, ORF47 |
| 32,854 | 33,255 | PhiH1_260 | 402 | − | - | uncharacterized protein | [ELY96531.1] |
| 33,248 | 34,024 | PhiH1_265 | 777 | − | - | parA domain protein | PhiCh1p47, ORF46 |
| 34,161 | 34,430 | PhiH1_270 | 270 | − |
| repressor protein RepR | 5 PhiCh1p49, ORF48 |
| 34,730 | 35,071 | PhiH1_275 | 342 | + | - | uncharacterized protein | - |
| 35,068 | 35,424 | PhiH1_280 | 357 | + | - | uncharacterized protein | PhiCh1p50, ORF49 |
| 35,381 | 38,167 | PhiH1_285 | 2787 | + |
| plasmid replication protein RepH | 4 PhiCh1p54, ORF53 |
| 38,262 | 38,489 | PhiH1_290 | 228 | − |
| probable immunity protein Imm | PhiCh1p56, ORF55 |
| 38,733 | 39,263 | PhiH1_295 | 531 | + | - | transcriptional regulator, PadR-like family | PhiCh1p57, ORF56 |
| 39,260 | 39,385 | PhiH1_300 | 126 | + | - | CxxC motif protein | - |
| 39,382 | 39,978 | PhiH1_305 | 597 | + | - | uncharacterized protein | PhiCh1p59, ORF58 |
| 39,975 | 40,133 | PhiH1_310 | 159 | + | - | uncharacterized protein | - |
| 40,153 | 40,902 | PhiH1_315 | 750 | + |
| DNA polymerase sliding clamp PcnA | PhiCh1p60, ORF59 |
| 40,908 | 41,339 | PhiH1_320 | 432 | + | - | uncharacterized protein | PhiCh1p61, ORF60 |
| 41,339 | 41,554 | PhiH1_325 | 216 | + | - | uncharacterized protein | PhiCh1p62, ORF61 |
| 41,547 | 42,041 | PhiH1_330 | 495 | + | - | uncharacterized protein | - |
| 42,098 | 42,490 | PhiH1_335 | 393 | + |
| IS200-type transposase TnpA | [CAP12925.1] |
| 42,492 | 43,748 | PhiH1_340 | 1257 | + |
| IS1341-type transposase TnpB | [CAP12926.1] |
| 43,808 | 44,014 | PhiH1_345 | 207 | + | - | uncharacterized protein | - |
| 44,007 | 44,234 | PhiH1_350 | 228 | + | - | uncharacterized protein | PhiCh1p66, ORF65 |
| 44,231 | 44,656 | PhiH1_355 | 426 | + | - | CxxC motif protein | PhiCh1p68, ORF67 |
| 44,646 | 45,026 | PhiH1_360 | 381 | + | - | uncharacterized protein | PhiCh1p69, ORF68 |
| 45,023 | 45,646 | PhiH1_365 | 624 | + | - | HNH-type endonuclease | [KYG11427.1] |
| 45,639 | 45,926 | PhiH1_370 | 288 | + | - | uncharacterized protein | PhiCh1p71, ORF70 |
| 45,919 | 46,350 | PhiH1_375 | 432 | + | - | DUF4326 domain protein | PhiCh1p72, ORF71 |
| 46,343 | 46,441 | PhiH1_380 | 99 | + | - | uncharacterized protein | - |
| 46,438 | 46,884 | PhiH1_385 | 447 | + | - | CxxC motif protein | PhiCh1p74, ORF73 |
| 46,865 | 47,038 | PhiH1_390 | 174 | + | - | uncharacterized protein | 5 PhiCh1p73, ORF72 |
| 47,031 | 47,447 | PhiH1_395 | 417 | + | - | uncharacterized protein | - |
| 47,440 | 47,739 | PhiH1_400 | 300 | + | - | NTPase protein | [PLX87675.1] |
| 47,732 | 49,618 | PhiH1_405 | 1887 | + |
| C-5 cytosine-specific DNA methylase Dcm5 | 5 PhiCh1p81, ORF80 |
| 49,611 | 49,931 | PhiH1_410 | 321 | + | - | uncharacterized protein | PhiCh1p82, ORF81 |
| 49,918 | 50,037 | PhiH1_415 | 120 | + | - | CxxC motif protein | - |
| 50,091 | 51,452 | PhiH1_420 | 1362 | + |
| DNA methylase N-4/N-6 domain protein YhdJ | PhiCh1p83, ORF82 |
| 51,449 | 52,024 | PhiH1_425 | 576 | + | - | uncharacterized protein | PhiCh1p84, ORF83 |
| 52,021 | 52,791 | PhiH1_430 | 771 | + | - | uncharacterized protein | PhiCh1p85, ORF84 |
| 52,784 | 53,152 | PhiH1_435 | 369 | + | - | uncharacterized protein | PhiCh1p88, ORF87 |
| 53,145 | 53,504 | PhiH1_440 | 360 | + | - | uncharacterized protein | PhiCh1p89, ORF88 |
| 53,788 | 54,369 | PhiH1_445 | 582 | + | - | CxxC motif protein | PhiCh1p90, ORF89 |
| 54,403 | 54,771 | PhiH1_450 | 369 | + | - | uncharacterized protein | PhiCh1p91, ORF90 |
| 54,794 | 55,147 | PhiH1_455 | 354 | + | - | uncharacterized protein | - |
| 55,144 | 55,401 | PhiH1_460 | 258 | + | - | transmembrane domain protein | PhiCh1p93, ORF92 |
| 55,394 | 55,729 | PhiH1_465 | 336 | + | - | transmembrane domain protein 3 | PhiCh1p94, ORF93 |
| 55,794 | 57,053 | PhiH1_470 | 1260 | + |
| DNA methylase N-4/N-6 domain protein YcdA | PhiCh1p95, ORF94 |
| 57,046 | 57,564 | PhiH1_475 | 519 | + | - | uncharacterized protein | PhiCh1p96, ORF95 |
| 57,621 | 57,830 | PhiH1_480 | 210 | + | - | CxxC motif protein | PhiCh1p98, ORF97 |
| 57,827> | <63 | PhiH1_485 | 309 | + |
| terminase small subunit TerS | PhiCh1p01, ORF98 |
1 PhiCh1/pNMAG03 homologs of phiH1 proteins show BLASTp E-values < 10−20. For phiCh1 proteins, both the PhiCh1p and originally assigned ORF codes (ORF for open reading frame) are shown (e.g., PhiCh1p02, ORF1). Codes starting with ORF represent the original annotation of the phiCh1 genome [17] (GB accession AF440695.1); and codes starting with PhiCh1p represent the RefSeq version of the annotation of the same genome sequence (GB accession NC_004084). The number shift is due to the terS gene, the N-terminal part being encoded at the end of the genome, and the C-terminal part at its beginning. This ORF is complete in the provirus state due to circularization and in the linear virus state due to terminal redundancy. This gene is ORF98 in the original annotation and PhiCh1p01 in the RefSeq annotation. Codes starting with Nmag_ represent the annotation of the Natrialba magadii plasmid pNMAG03 [20] (accession CP001935.1). The point of ring opening in pNMAG03 was set between Nmag_4303 and Nmag_4211. Codes in square brackets represent NCBI accessions referring to homologous proteins (BLASTp E-values ≤ 10−11), which are from other sources. 2 Gene PhiH1_185 is encoded on an invertible segment. In the current sequence version, it is inactivated because it is uncoupled from a start codon. By genome inversion, it becomes activated while its partner gene PhiH1_165 becomes inactivated. Overall, this results in tail fibre protein switching. 3 This protein (PhiH1_465) has three predicted transmembrane domains and has been suspected to function as a holin [18]. 4 In these cases, the phiCh1 gene is split into two CDS but is continuous in phiH1. 5 These proteins are more distantly related (show less than 39% sequence identity or fall above BLASTp E-values of 10−20). In these cases, a similar genetic context supports their stated relationship.
Figure 2PhiH1 GC-profile, genetic map, and corresponding transcription program (adapted from [16,42]). (a) GC-profile of the phiH1 genome. (b) Genetic map of the phiH1 genome, showing coding sequences as red, blue or grey arrows. Dotted lines above indicate gene clusters involved in particular functions. Some CDS are labelled above the map, e.g., TerL, terminase large subunit; Portal, portal protein; Tape, tape-measure protein; RepH, replicase (label within CDS arrow); Mt, DNA methylases; TerS, terminase small subunit. Some genes are shown below the map, such as hp32, encoding the major capsid protein HP32. Panels c, d and e summarise transcription data from previously published studies, and above them is a colour key that indicates the time of appearance of early (0–1 h, blue), middle (1–2 h, green) and late (>2 h, pink) transcripts. (c) Precise mapping of viral transcripts, including start and termination sites [16,39]. (d) Summary transcription program of lytic infection based on hybridisation of labelled infected-cell transcripts to restriction fragments of virus DNA [42]. Thin coloured lines indicate whether continuing transcription persists over time. (e) The transcription map data of [42] are shown, projected onto the in silico restriction map of phiH1, as determined from the complete genome sequence (this study). Enzymes are indicated at the left. Numbers on the restriction map refer to those of the original publication of [42] (see also Figure S1). Coloured shading follows that of panels c and d. Dotted pattern shown beyond the right-hand pac site indicates terminal redundancy of virus DNA. Scale bars (in kb) are shown below panels a and e.
CRISPR spacers matching phiH1.
| No. | CRISPR Spacer Matches to phiH1 1 | Translation 2 |
|---|---|---|
|
| ||
1 The matching spacer sequences were found in the following NCBI bioprojects using the crass program: PRJNA337743, (SRA SRR4030040; Alviso Ponds, San Francisco, CA, USA; metagenome); PRJNA245787 (Halostagnicola sp. A56 26 genome; Andaman Islands, India); PRJEB18068 (Lake Meyghan, Iran; metagenome). Aligned sequences show nt positions for phiH1, and asterisks indicated identical bases. DR: direct repeat (with haloarchaea containing most closely matching DR shown in brackets). 2 Symbols under alignment (*:.) indicate identical, similar and weakly similar residues, respectively (based on Gonnet PAM 250 matrix).
Figure 3Circos plot of amino acid similarity (tBLASTx) between phiH1 and the haloviruses phiCh1, BJ1, CGphi46 and HSTV-1. The threshold for connecting lines was E-value ≤ 10−40, with line colours reflecting the ratio of actual tBLASTx bitscore to the maximal score (using ‘score/max’ ratio colouring with blue ≤ 0.25, green ≤ 0.50, orange ≤ 0.75, red > 0.75). The outer histogram counts how many times each colour has hit the specific part of the sequence and uses an equivalent colouring scheme. The distance between successive tick marks shown along each virus genome represents 0.1 of the full genome length. Protein names shown along the phiH1 genome indicate the positions of the corresponding genes.
Figure 4Phylogenetic tree reconstruction (NJ method) of major capsid proteins (MCP) of phiH1, other haloviruses and related proteins of haloarchaea. Species names of haloarchaeal species are shown, with accession numbers given at the right side. Bootstrap confidence values (100 repetitions) are shown at branch points. The pink shading highlights taxa belonging to the class Halobacteria. Scale bar (expected changes per site) is shown at top. The outgroup (not shown) consisted of distantly related MCP sequences of Bacillus spp. (WP_001060157.1, WP_098773561.1, WP_001064748.1 and WP_000178926.1).