| Literature DB >> 28335474 |
Jozef I Nissimov1,2, António Pagarete3, Fangrui Ma4, Sean Cody5, David D Dunigan6, Susan A Kimmance7, Michael J Allen8.
Abstract
Coccolithoviruses (Phycodnaviridae) infect and lyse the most ubiquitous and successful coccolithophorid in modern oceans, Emiliania huxleyi. So far, the genomes of 13 of these giant lytic viruses (i.e., Emiliania huxleyi viruses-EhVs) have been sequenced, assembled, and annotated. Here, we performed an in-depth comparison of their genomes to try and contextualize the ecological and evolutionary traits of these viruses. The genomes of these EhVs have from 444 to 548 coding sequences (CDSs). Presence/absence analysis of CDSs identified putative genes with particular ecological significance, namely sialidase, phosphate permease, and sphingolipid biosynthesis. The viruses clustered into distinct clades, based on their DNA polymerase gene as well as full genome comparisons. We discuss the use of such clustering and suggest that a gene-by-gene investigation approach may be more useful when the goal is to reveal differences related to functionally important genes. A multi domain "Best BLAST hit" analysis revealed that 84% of the EhV genes have closer similarities to the domain Eukarya. However, 16% of the EhV CDSs were very similar to bacterial genes, contributing to the idea that a significant portion of the gene flow in the planktonic world inter-crosses the domains of life.Entities:
Keywords: E. huxleyi; coccolithovirus; domains of life; genome comparison; horizontal gene transfer
Mesh:
Year: 2017 PMID: 28335474 PMCID: PMC5371807 DOI: 10.3390/v9030052
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
A brief description of the coccolithoviruses used in this study, their geographical origin, year of isolation and sea water depth from which they were obtained.
| Strain # | Isolate | Isolation | Lat/Long | Depth (m) | Date | NCBI | Reference * |
|---|---|---|---|---|---|---|---|
| Location | Date | Sequenced | Accession # | ||||
| EhV-84 | E.C. | 1999 | 50°15′ N, 04°13′ W | 15 | 2011 | JF974290 | Schroeder et al., 2002 [ |
| EhV-86 | E.C. | 1999 | 50°30′ N, 04°20′ W | surface | 2005 | AJ890364 | Schroeder et al., 2002 [ |
| EhV-88 | E.C. | 1999 | 50°15′ N, 04°13′ W | 5 | 2011 | JF974310 | Schroeder et al., 2002 [ |
| EhV-201 | E.C. | 2001 | 49°56′ N, 04°19′ W | 2 | 2011 | JF974311 | Schroeder et al., 2002 [ |
| EhV-202 | E.C. | 2001 | 50°00′ N, 04°18′ W | 15 | 2011 | HQ634145 | Schroeder et al., 2002 [ |
| EhV-203 | E.C. | 2001 | 50°00′ N, 04°18′ W | 15 | 2011 | JF974291 | Schroeder et al., 2002 [ |
| EhV-207 | E.C. | 2001 | 50°15′ N, 04°13′ W | 5 | 2011 | JF974317 | Schroeder et al., 2002 [ |
| EhV-208 | E.C. | 2001 | 50°15′ N, 04°13′ W | 5 | 2011 | JF974318 | Schroeder et al., 2002 [ |
| EhV-99B1 | N.F. | 1999 | 60°20′ N, 05°20′ E | surface | 2013 | FN429076 | Pagarete et al., 2013 [ |
| EhV-18 | E.C. | 2008 | 50°15′ N, 04°13′ W | surface | 2013 | KF481685 | Nissimov et al., 2014 [ |
| EhV-145 | Loss. | 2008 | 57°72′ N, 03°29′W | surface | 2013 | KF481686 | Nissimov et al., 2014 [ |
| EhV-156 | E.C. | 2009 | 50°15′ N, 04°13′ W | surface | 2013 | KF481687 | Nissimov et al., 2014 [ |
| EhV-164 | SSF. | 2009 | 56°26′ N, 02°63′ W | surface | 2013 | KF481688 | Nissimov et al., 2014 [ |
* The references refer to literature that first presented information on each isolate or its genome. E.C.: English Channel; N.F.: Norwegian Fjord; Loss: Lossiemouth (off the UK coast); SSF.: Scottish Shore of Fife (UK).
Average Nucleotide Identity (ANI) analysis of EhV genomes against EhV-86. The analysis included 12 draft EhV genomes, where a higher ANI score indicates greater genome similarity.
| Reference Genome | Draft Genome | ANI Score | Total BBH * | Clade |
|---|---|---|---|---|
| EhV-86 | EhV-164 | 99.95 | 443 | A1 |
| EhV-145 | 99.93 | 456 | A1 | |
| EhV-84 | 99.07 | 434 | A1 | |
| EhV-88 | 98.96 | 442 | A1 | |
| EhV-99B1 | 98.23 | 421 | A3 | |
| EhV-208 | 96.78 | 399 | A2 | |
| EhV-207 | 96.67 | 411 | A2 | |
| EhV-201 | 96.6 | 399 | A2 | |
| EhV-203 | 96.6 | 402 | A2 | |
| EhV-18 | 79.52 | 307 | B | |
| EhV-202 | 79.42 | 312 | B | |
| EhV-156 | 79.4 | 308 | B |
* BBH: bidirectional best hit.
Figure 1Phylogenetic analysis of coccolithoviruses based on their DNA polymerase and serine palmitoyltransferase (SPT) genes. The evolutionary history of 13 EhV strains was inferred based on the 2604 bp long SPT (I and II) and 2921 bp long DNA polymerase (III and IV) genes, using the Neighbor-Joining (I and III) and Maximum Likelihood (II and IV) methods. Note that EhV-18 and EhV-145 are absent from the serine palmitoyltransferase tree due to the full length SPT protein being split over two separate genes in their respective genomes. Based on the DNA polymerase phylogeny, the EhVs cluster into two main clades: A and B (green). Clade A is further divided into sub-clusters A1 (red), A2 (yellow), and A3 (purple). The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches. The evolutionary distances were computed using the Tamura-Nei method and are in the units of the number of base substitutions per site.
Figure 2Whole genome alignment of sequenced coccolithovirus genomes. The genomes were aligned using MAUVE, in relation to the non-gapped backbone genome of EhV-86. Syntenous blocks are indicated in the same colours and the lines that connect them indicate the position of each block in relation to the same block of genes in the genome of EhV-86. The small red lines on each genome represent the exact positions of the gaps that separate the different contigs within each draft genome. The genomes are ordered based on their DNA polymerase phylogeny (Figure 1), based on the ANI analysis of this study (Table 2), and based on previously published microarray data that puts them into the aforementioned groups and sub-clades [20].
Predicted genomic characteristics of sequenced coccolithoviruses. The statistics for each genome were obtained from the annotated ordered genomes uploaded into the Integrated Microbial Genomes—Expert Review (IMG/ER) online genome analysis pipeline [66]. Note that the numbers of genes, bases, coding sequences (CDSs), coding bases and transfer RNAs (tRNAs) here are underestimates (except for EhV-86) due to incomplete genome sequences.
| Genome Name | Genes | Total Bases | CDS | Coding Bases | Genes with Function Prediction | tRNAs | GC (%) | Number of Gaps in Genome |
|---|---|---|---|---|---|---|---|---|
| 486 | 396620 | 482 | 334463 | 85 | 4 | 40.17 | 8 | |
| 478 | 407339 | 472 | 369157 | 90 | 5 | 40.18 | 0 | |
| 480 | 397298 | 475 | 357803 | 90 | 5 | 40.18 | 7 | |
| 457 | 407301 | 451 | 363714 | 89 | 6 | 40.46 | 6 | |
| 488 | 407516 | 485 | 352215 | 93 | 3 | 40.3 | 11 | |
| 470 | 400520 | 464 | 364178 | 91 | 6 | 40.12 | 5 | |
| 479 | 421891 | 473 | 371313 | 93 | 6 | 40.49 | 15 | |
| 461 | 411003 | 455 | 348386 | 90 | 6 | 40.42 | 16 | |
| 451 | 376759 | 444 | 333400 | 90 | 6 | 40.04 | 16 | |
| 508 | 399651 | 503 | 346161 | 91 | 5 | 40.49 | 21 | |
| 552 | 397508 | 548 | 350414 | 103 | 4 | 39.94 | 41 | |
| 498 | 399344 | 493 | 351083 | 88 | 5 | 40.47 | 19 | |
| 514 | 400675 | 510 | 354290 | 95 | 4 | 40.11 | 17 |
Genes predicted to encode tRNAs in the genomes of 13 coccolithoviruses. Their presence (+) in each genome is indicated, and in grey shaded cells are those tRNAs common to all genomes. The phylogenetic clade of each genome (based on their DNA polymerase gene) is indicated above the column headers (see Figure 1). It is important to note that some of these tRNAs may still be present in the genomes of some EhVs in the unsequenced parts between the different contigs.
| Phylogenetic Group | A | B | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A1 | A2 | A3 | ||||||||||||
| tRNA | Genome | EhV-84 | EhV-86 | EhV-88 | EhV-164 | EhV-145 | EhV-201 | EhV-203 | EhV-207 | EhV-208 | EhV-99B1 | EhV-18 | EhV-156 | EhV-202 |
| Arg | + | + | + | + | + | + | + | + | + | + | + | + | + | |
| Asn | + | + | + | + | + | + | + | + | + | + | + | + | + | |
| Gln | + | + | + | + | + | + | + | + | + | + | + | + | + | |
| Glu | + | + | + | + | ||||||||||
| Ile | + | + | + | + | + | + | ||||||||
| Leu | + | + | + | + | + | + | + | + | ||||||
| Lys | + | + | + | + | + | + | + | + | ||||||
Coccolithovirus strain-specific CDSs that are not shared among the different viruses and are “unique” to each strain. Analysis was done using the “build in BLASTp” algorithm in IMG/ER [61] using a maximum E-value of 1 × 10−5 and a minimum % identity of 30. The phylogenetic clades regrouping genomes are shown above the column headers (see Figure 1).
| Phylogenetic Group | A | B | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A1 | A2 | A3 | |||||||||||
| Predicted CDS | EhV-84 | EhV-86 | EhV-88 | EhV-145 | EhV-164 | EhV-201 | EhV-203 | EhV-207 | EhV-208 | EhV-99B1 | EhV-202 | EhV-18 | EhV-156 |
| hypothetical protein | 27 | 3 | 4 | 7 | 8 | 4 | 4 | 9 | 9 | 6 | 18 | 9 | 6 |
| putative endonuclease | 2 | ||||||||||||
| putative membrane protein | 2 | 1 | 1 | 4 | |||||||||
| putative transposase | 1 | ||||||||||||
| putative DUF814 domain containing protein | 1 | ||||||||||||
| zinc finger protein | 1 | ||||||||||||
| putative ribonuclease | 1 | 1 | |||||||||||
| glycosyltransferase family 29 (sialyltransferase) | 1 | 1 | |||||||||||
| 27 | 5 | 4 | 8 | 9 | 4 | 5 | 9 | 9 | 15 | 18 | 11 | 8 | |
Figure 3“Best BLAST hit” analysis of coccolithovirus CDSs in relation to the three domains of life: Eukarya, Bacteria and Archaea. Predicted genes within EhV genomes were BLASTp analyzed against possible hits in the three domains of life using a gene BitScore of >50 (A); and >100 (B). Further EhV gene hits analysis to the taxonomic level of “order” in Eukarya (C); and Bacteria (D) was performed using a BitScore of >100.
“Best BLAST hit” analysis of EhVs against the Bacterial and Eukaryotic domains of life, as well as against the known host Emiliania huxleyi. The top EhV genes with function predictions are shown based on the “Best BLAST hit” analysis using a BitScore of >100 and are ordered based on their E-value. The COG (Clusters of Orthologous Groups of genes with similar functions across the domains of life) cluster for each gene was identified on IMG/ER, where the different letters represent metabolic pathways involved in nucleotide transport and metabolism [F]; carbohydrate transport and metabolism [G]; general function prediction only [R]; replication, recombination and repair [L]; lipid transport and metabolism [I]; transcription [K]; inorganic ion transport and metabolism [P]; intracellular trafficking, and secretion and vesicular transport [U].
| DNA-directed RNA polymerase subunit B | K | 40.39 | 0 | 825 | NA | NA | ||
| DNA ligase | L | 51.29 | 0 | 612 | ||||
| DNA topoisomerase | L | 33 | 0 | 586 | NA | NA | ||
| DNA-dependent RNA polymerase II largest subunit | K | 34.09 | 7 × 10−171 | 555 | ||||
| thymidylate synthase | F | 49.7 | 1 × 10−168 | 497 | NA | NA | ||
| DNA polymerase delta catalytic subunit | L | 35.11 | 2 × 10−132 | 436 | ||||
| DNA helicase | L | 49.3 | 3 × 10−137 | 419 | ||||
| deoxycytidylate deaminase | F | 62.96 | 8 × 10−64 | 206 | ||||
| Sialidase | G | 30.03 | 2 × 10−37 | 148 | ||||
| DNA-binding protein | R | 34.36 | 7 × 10−32 | 129 | NA | NA | NA | |
| endonuclease | L | 45.67 | 4 × 10−32 | 121 | ||||
| fatty acid desaturase | I | 34.76 | 2 × 10−28 | 121 | ||||
NA: NCBI classification was not available for Phylum, Class, Order or Genus.