| Literature DB >> 30538677 |
Philippe Colson1, Anthony Levasseur1, Bernard La Scola1, Vikas Sharma1,2, Arshan Nasir3,4, Pierre Pontarotti1,2, Gustavo Caetano-Anollés3, Didier Raoult1.
Abstract
Giant viruses of amoebae were discovered in 2003. Since then, their diversity has greatly expanded. They were suggested to form a fourth branch of life, collectively named 'TRUC' (for "Things Resisting Uncompleted Classifications") alongside Bacteria, Archaea, and Eukarya. Their origin and ancestrality remain controversial. Here, we specify the evolution and definition of giant viruses. Phylogenetic and phenetic analyses of informational gene repertoires of giant viruses and selected bacteria, archaea and eukaryota were performed, including structural phylogenomics based on protein structural domains grouped into 289 universal fold superfamilies (FSFs). Hierarchical clustering analysis was performed based on a binary presence/absence matrix constructed using 727 informational COGs from cellular organisms. The presence/absence of 'universal' FSF domains was used to generate an unrooted maximum parsimony phylogenomic tree. Comparison of the gene content of a giant virus with those of a bacterium, an archaeon, and a eukaryote with small genomes was also performed. Overall, both cladistic analyses based on gene sequences of very central and ancient proteins and on highly conserved protein fold structures as well as phenetic analyses were congruent regarding the delineation of a fourth branch of microbes comprised by giant viruses. Giant viruses appeared as a basal group in the tree of all proteomes. A pangenome and core genome determined for Rickettsia bellii (bacteria), Methanomassiliicoccus luminyensis (archaeon), Encephalitozoon intestinalis (eukaryote), and Tupanvirus (giant virus) showed a substantial proportion of Tupanvirus genes that overlap with those of the cellular microbes. In addition, a substantial genome mosaicism was observed, with 51, 11, 8, and 0.2% of Tupanvirus genes best matching with viruses, eukaryota, bacteria, and archaea, respectively. Finally, we found that genes themselves may be subject to lateral sequence transfers. In summary, our data highlight the quantum leap between classical and giant viruses. Phylogenetic and phyletic analyses and the study of protein fold superfamilies confirm previous evidence of the existence of a fourth TRUC of life that includes giant viruses, and highlight its ancestrality and mosaicism. They also point out that best evolutionary representations for giant viruses and cellular microorganisms are rhizomes, and that sequence transfers rather than gene transfers have to be considered.Entities:
Keywords: TRUC; giant virus; informational genes; megavirales; mimivirus; protein structural domains
Year: 2018 PMID: 30538677 PMCID: PMC6277510 DOI: 10.3389/fmicb.2018.02668
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Comparison of major features used as criteria to define classical viruses with those of giant viruses and to hallmark features of cellular microbes.
| Phenotypic and genotypic characteristics | Classical viruses | Giant viruses | Cellular micro-organisms | |||
|---|---|---|---|---|---|---|
| Majority case | Exceptions/comments | Majority case | Exceptions/comments | Majority case | Exceptions/comments | |
| Visible under a light microscope (>0.2 μm) | No | – | Yes | – | Yes | – |
| Genome size > 350 kbp | No | Yes | – | Yes | – | |
| Presence of a virally-encoded capsid | Yes | Some capsidless viruses: genus | Yes | Pandoraviruses ( | No | Icosahedral compartments exist in bacteria and archaea that resemble to viral capsids: the encapsulin nanocompartments structurally similar to and possibly derived from major capsid proteins of tailed bacterial and archaeal caudaviruses, and bacterial microcompartments present in bacteria (including cyanobacteria and many chemotropic bacteria) that encapsulate enzymes involved in metabolic pathways ( |
| Presence of DNA and RNA inside the viral particle | No | Cytomegalovirus ( | Yes | – | Yes | – |
| Absolute parasitism | Yes | – | Yes | – | Several bacteria and archeae | Case of strictly intracellular microorganisms |
| Multiplication by binary fission | No | – | No | – | Yes | No |
| Eclipse period during the replicative cycle | Yes | – | Yes | – | No | – |
| Entry into host cells by phagocytosis | No | – | Yes | – | – | – |
| Presence of a virus factory | In several viruses (e.g., adenoviruses, polyadenoviruses) ( | – | Yes | Mollivirus ( | – | Morula similar to a viral factory for |
| Energy (ATP) generating machinery | No | – | No | – | Yes | Through glycolysis in |
| Presence of genes encoding ribosomal RNA and proteins | No | – | No | – | Yes | Uncomplete sets of ribosomal proteins and aminoacyl-tRNA synthetase in |
| Presence of genes encoding translation-associated proteins | No | – | Yes | – | Yes | – |
| Presence of tRNA genes | No | – | Yes | Marseilleviruses, faustoviruses ( | Yes | – |
| Presence of viral proteins of transcription | Yes | – | Yes | Not detected by proteomics in a marseillevirus ( | Yes | – |
| Presence of host ribosomal proteins inside virions | No | Arenaviruses ( | In Mollivirus ( | – | Yes | – |
| Presence of group I, II or spliceosomal introns, inteins | No | – | Yes | – | Yes | – |
| Transposable elements | No | – | Yes | Introns, inteins, transpovirons, miniature inverted-repeat transposable elements (MITEs, in pandoraviruses) | Yes | – |
| Infection by other viruses | No | – | No | Mimiviruses with (pro)virophages ( | ||
| Mechanism of defense against viruses | No | – | Yes for mimiviruses | – | Yes | – |
| High level of genome mosaicism | No | – | Yes | – | Yes | – |
| Evidence of ancestrality based on conserved/ubiquitous genes and protein fold-superfamilies | Four monophyletic classes of viruses ( | – | Yes | – | Yes | – |
FIGURE 1Venn diagram displaying FSF distribution and sharing patterns among Archaea, Bacteria, Eukarya, and Megavirales. A, Archaea; B, Bacteria; E, Eukarya; FSF, fold superfamilies; V, viruses.
FIGURE 2Phylogeny of proteomes describing the evolution of 182 proteomes randomly sampled from cellular organisms and viruses. The universal Tree of Life is rooted using Weston’s generality criterion. The 102 cellular proteomes are from Nasir and Caetano-Anollés (2015).
FIGURE 3Evolutionary principal coordinate (evoPCO) analysis plot portrays in its first three axes the evolutionary distances between cellular and viral proteomes. The percentage of variability explained by each coordinate is given in parentheses on each axis. Data points of the 3-dimensional scatter plot describing temporal clouds are mapped onto projections planes and connected with vertical leading drop lines along the PCO3 axis. The list of whole coordinate information for building the PCoA plot of this figure is provided in Supplementary Table S3.
FIGURE 4Plots of the indices of the phylogenetic tree of proteomes describing the evolution of 182 proteomes randomly sampled from cellular organisms and viruses (corresponding to Figure 2) against the age of the phylogenetic character [fold superfamily (FSF)]. Five measures of the levels of lateral sequence transfers for the maximum parsimony tree reconstruction performed in the present study, namely consistency index (A), retention index (B), rescaled consistency index (C), homoplasy index (D), and G-fit (E), are plotted against the age of the phylogenetic character FSF [measured as node distance (nd) values] for 289 characters (FSF) shared by archaea, bacteria, eukaryota, and viruses. High retention indices, especially for lower nd values (corresponding to older domains), indicates excellent fit of the characters to the phylogeny.
FIGURE 5RNAP1 phylogenetic tree. The RNAP1 tree was built by using aligned protein sequences from Megavirales (red), Bacteria (green), Archaea (pink), and Eukarya (blue). Confidence values were calculated by the Shimodaira-Hasegawa (SH) test using the FastTree program (Price et al., 2010). Average length of sequences was 1,336 amino acids. The scale bar represents the number of estimated changes per position.
FIGURE 6RNAP2 phylogenetic tree. The RNAP2 tree was built by using aligned protein sequences from Megavirales (red), Bacteria (green), Archaea (pink), and Eukarya (blue). Confidence values were calculated by the SH test using the FastTree program (Price et al., 2010). Average length of sequences was 1,188 amino acids. The scale bar represents the number of estimated changes per position.
FIGURE 7DNA polymerase phylogenetic tree. The DNA polymerase tree was built by using aligned protein sequences from Megavirales (red), Bacteria (green), Archaea (pink), and Eukarya (blue). Confidence values were calculated by the SH support using the FastTree program (Price et al., 2010). Average length of sequences was 1,134 amino acids. The scale bar represents the number of estimated changes per position.
FIGURE 8Hierarchical clustering by phyletic pattern based on the presence/absence of informational Clusters of Orthologous Groups (COGs) of proteins. The Megavirales members are represented in red, Bacteria members in green, Archaea members in pink, and Eukarya members in blue.
FIGURE 9Rhizomes of genomes illustrative of the mosaicism of the genomes of representatives of the four TRUCs of microbes including Tupanvirus soda lake (a mimivirus) (A); Encephalitozoon intestinalis (a microbial eukaryote) (B); Methanomassiliicoccus luminyensis (an archaeon) (C); and Rickettsia bellii (a bacterium) (D). The genes of these four microorganisms were linked to their most similar sequences in the NCBI GenBank protein sequence database according to the BLAST program (https://blast.ncbi.nlm.nih.gov/Blast.cgi), classified according to their belonging to viruses, eukaryotes, bacteria or archaea, and integrated in a circular gene data visualization. The figures were performed using the CIRCOS online tool (http://mkweb.bcgsc.ca/tableviewer/visualize/). Circular representations in A and C are the same than those produced for figures from articles Abrahao et al. (2018) and Levasseur et al. (2017), respectively, as they originate from the same data. These representations are licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/) and CC-BY-NC (https://creativecommons.org/licenses/by-nc/4.0/), respectively.
FIGURE 10Rhizomes of methionyl-tRNA synthetase gene fragments illustrative of the mosaicism of the genes of representatives of the four TRUCs of microbes including Tupanvirus soda lake (a mimivirus) (A); Encephalitozoon intestinalis (a microbial eukaryote) (B); Methanomassiliicoccus luminyensis (an archaeon) (C); and Rickettsia bellii (a bacterium) (D). Forty amino acid-long fragments of the methionyl-tRNA synthetase encoding genes of the four microorganisms were linked to their most similar sequences in the NCBI GenBank protein sequence database according to the BLAST program (https://blast.ncbi.nlm.nih.gov/Blast.cgi), classified according to their belonging to viruses, eukaryotes, bacteria or archaea, and integrated in a circular gene data visualization. The figures were performed using the CIRCOS online tool (http://mkweb.bcgsc.ca/tableviewer/visualize/).
FIGURE 11Representation as a rhizome of the genetic evolution for four current intracellular parasites of the four TRUCs of microbes with a comparable genome size, including Rickettsia bellii (a bacterium), Methanomassiliicoccus luminyensis (an archaeon), Encephalitozoon intestinalis (a microbial eukaryote), and Tupanvirus soda lake (a mimivirus). Rhizomes consist in a representation of genome evolution and mosaicism that takes into account that genes and intragenic sequences do not have the same evolutionary history, being proposed as better paradigm of genetic evolution than phylogenetic trees. The genomes of each of the four represented current microorganisms harbor mixtures of sequences of different origins. Sequences corresponding to current bacteria, Archaea, eukaryota, giant viruses, and to ORFans are colored in green, purple, blue, red, and orange, respectively. Rhizomes of the genomes of Tupanvirus and Methanomassiliicoccus luminyensis were adapted from same representations than representations from Levasseur et al. (2017) and Abrahao et al. (2018), respectively, licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/) and CC-BY-NC (https://creativecommons.org/licenses/by-nc/4.0/), respectively (see legend to Figure 9).