| Literature DB >> 34542398 |
Heli A M Mönttinen1,2, Cedric Bicep1,3, Tom A Williams1,4, Robert P Hirt1.
Abstract
The nucleocytoplasmic large DNA viruses (NCLDVs) are a diverse group that currently contain the largest known virions and genomes, also called giant viruses. The first giant virus was isolated and described nearly 20 years ago. Their genome sizes were larger than for any other known virus at the time and it contained a number of genes that had not been previously described in any virus. The origin and evolution of these unusually complex viruses has been puzzling, and various mechanisms have been put forward to explain how some NCLDVs could have reached genome sizes and coding capacity overlapping with those of cellular microbes. Here we critically discuss the evidence and arguments on this topic. We have also updated and systematically reanalysed protein families of the NCLDVs to further study their origin and evolution. Our analyses further highlight the small number of widely shared genes and extreme genomic plasticity among NCLDVs that are shaped via combinations of gene duplications, deletions, lateral gene transfers and de novo creation of protein-coding genes. The dramatic expansions of the genome size and protein-coding gene capacity characteristic of some NCLDVs is now increasingly understood to be driven by environmental factors rather than reflecting relationships to an ancient common ancestor among a hypothetical cellular lineage. Thus, the evolution of NCLDVs is writ large viral, and their origin, like all other viral lineages, remains unknown.Entities:
Keywords: NCLDV; gene network; gene phylogenies; phylum Nucleocytoviricota; protein families
Mesh:
Substances:
Year: 2021 PMID: 34542398 PMCID: PMC8715426 DOI: 10.1099/mgen.0.000649
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
The list of classified (according to ICTV classification) and unclassified NCLDVs considered in this review
|
Classified NCLDVs | |||||
|---|---|---|---|---|---|
|
Virus class |
Virus order |
Virus family |
Genome size (kb) |
Number of predicted ORFs |
Taxonomy of natural or experimental* or inferred** host |
|
Megaviricetes |
|
|
155–474 |
150–886 |
Alveolata, Chlorophyta, Haptophyta, Stramenopiles |
|
|
|
617–1259 |
544–1120 |
Amoeba*, Stramenopiles | |
|
|
|
119–186 |
119–180 |
Lepidoptera | |
|
|
|
106–220 |
99–468 |
Arthropods, Fish, Amphibia | |
|
|
|
347–403 |
403–470 |
Amoeba* | |
|
Pokkesviricetes |
|
|
170 |
152 |
Swines |
|
|
|
134–360 |
130–328 |
Arthropods, Vertebrates | |
|
| |||||
|
|
|
|
460 |
451 |
Amoeba* |
|
|
|
351 |
465 |
Amoeba* | |
|
|
|
|
1385–1570 |
1207–1545 |
Protist** |
|
|
|
|
381 |
461 |
Amoeba* |
|
|
|
|
652 |
523 |
Amoeba* |
|
|
|
|
610 |
467 |
Amoeba* |
|
|
|
|
1909–2474 |
1487–2541 |
Amoeba* |
Fig. 1.The presence–absence tree of NCLDVs and distribution of the 26 most shared protein clusters in NCLDVs. The tree is based on the presence–absence of the 3464 protein clusters. The protein clusters were made as follows. The tree was reconstructed from binary data with GTR2 model [95] with an ascertainment bias correction model [96] and 1000 ultrafast bootstraps [97] using IQ-tree [98]. Branches are marked with a black dot if the branch is supported at >95 %. The presence and number of protein cluster members are shown in a heatmap for the 26 most shared protein clusters that are present in more than six virus families or groups.
Other virus families considered in the review and their links with NCLDVs
|
Virus kingdom |
Virus phylum |
Virus class |
Virus order |
Virus family (or virus-related element) |
Description |
Link to the NCLDV |
|---|---|---|---|---|---|---|
|
|
|
|
|
|
A dsDNA virus family infecting insects |
Shares several similar genes especially with insect-infecting NCLDVs. A DNA polymerase is similar to NCLDVs. |
|
|
|
|
|
|
A dsDNA virus family infecting vertebrates. |
A DNA polymerase similar to NCLDVs |
|
|
|
|
|
Politons |
Large and complex transposable elements, which are found among eukaryotes. They have a conserved set of genes: protein-primed type B DNA polymerase (pPolB), retroviral family integrase, A32 -like ATPase, adenovirus-type cysteine protease and two capsid proteins. |
A suggested origin for NCLDVs and virophages. The packaging ATPase 32-like protein and the major capsid protein have a common origin with phycodnavirus. The minor capsid protein is similar to |
|
|
|
|
|
(currently classified virophages) |
Small dsDNA virus that needs co-infection of a giant virus. The replication of virophage is dependent on the giant virus replication system. Most of the virophages have a small subset of conserved genes: Minor and major capsid proteins, A32-like ATPase, cysteine protease, primase-superfamily three helicase and zinc-ribbon domain protein. |
A suggested origin from Politons. Distant structural and sequence similarity to Polintons and a NCLDV jelly-roll capsid. A minor capsid protein is similar to NCLDVs, Polintons and tectiviruses. |
|
|
|
|
|
|
dsDNA bacteriophage |
A suggested origin for Polintons. The DNA polymerase is similar to polintons. Structural similarity between the major capsid protein and NCLDV jelly-roll capsid. Protein sharing distant sequence similarities with minor capsid proteins of mimiviruses and phycodnaviruses |
|
|
|
|
|
Yaravirus |
Recently described amoeba-infecting virus. Either highly derived NCLDV or the first non-NCLDV |
The yaravirus has a major capsid protein and ATPase similar to NCLDVs. |
Fig. 2.The ORF content in NCLDVs and percentage of homologues outside NCLDVs. The ORF content of each NCLDV is mapped on the heatmap, reflecting the percentage of ORFs belonging to a certain category listed on the top. The ORFs are classified into two groups if they are members of (a) a protein cluster or (b) single proteins that are not members of protein clusters. Both of these categories are divided into smaller subcategories, if (1) protein clusters (or single protein) are found only within the NCLDV or (2) if homologues are found outside NCLDVs (based on the best blast hits). Every ORF can belong only to one classification. The presence of a protein cluster in cells and other viruses is based on the best protein blast hits (blast version 2.2.31+) of each protein cluster member outside NCLDVs. The NCLDV protein cluster members were blasted against the NCBI non-redundant protein database, which was downloaded on 18 July 2016. The used e-value cut-off for protein blast hits was 10−5. Colours reflect the proportion of ORFs in the NCLDV genome falling into a classification. On the right the NCLDV genome size (kb) and number of ORFs are given as a histogram. The numbers on the left of the heatmap refer to the corresponding NCLDV genome in Table S1.
Fig. 3.Bipartite network of shared protein clusters. The bipartite network is drawn for NCLDV protein clusters that shared by at least by two NCLDV genomes or NCLDV and other virus family or a cell. Protein clusters are depicted with small black circles that are connected to NCLDV genomes or cell proteins or other viruses (large nodes). The ORFs included in a cell or other virus nodes are indicated in parenthese. An NCLDV genome is linked to a protein cluster, if the genome contains at least one member of the particular protein cluster. The presence of a protein cluster in cells and other viruses is based on the best protein blast hits (version 2.2.31+) of each protein cluster member outside NCLDVs. The NCLDV protein cluster members were blasted against the NCBI non-redundant protein database excluding NCLDV members, which was downloaded on 18 July 2016. The used e-value cut-off for protein blast hits was 10−5. The bipartite network is visualized using an organic layout in Cytoscape v2.8.0 [99].
Protein clusters that have the highest copy number in widest range of NCLDV genomes
|
Protein cluster |
Pfam protein families (no. of cluster members) identified among members of a cluster and their description |
No. of genomes in which protein clusters have >5 members/total no. of genomes in which protein cluster is present |
Copy no. (>5) |
NCBI assembly accession (genome size as kb) |
|---|---|---|---|---|
|
Cluster_1 |
Ankyrin repeat
Fbox domain
Multigene family 530 protein
PRANC domain
Ankyrin repeats
F-box-like
Ankyrin repeat
Ankyrin repeats
Ring finger domain
Ankyrin repeats |
28 (6–307 members)/59 |
9 116 98 131 16 132 217 307 8 8 10 50 26 14 30 34 6 11 15 7 9 13 11 7 7 7 7 7 |
GCF_000858485.1 (170) GCF_000888735.1 (1181) GCF_000904035.1 (1021) GCF_000893915.1 (1259) GCF_001292995.1 (651) GCF_000911655.1 (1908) GCF_000928575.1 (2243) GCF_000911955.1 (2473) GCF_000847045.1 (330) GCF_000871245.1 (344) GCF_000839765.1 (336) GCF_000841685.1 (359) GCF_000922075.1 (282) GCF_001431935.1 (189) GCF_000838605.1 (289) GCF_000923135.1 (307) GCF_000892975.1 (176) GCF_000839105.1 (206) GCF_000839185.1 (224) GCF_000841905.1 (210) GCF_000857045.1 (197) GCF_001029045.1 (215) GCF_000869985.1 (198) GCF_000860085.1 (195) GCF_000859885.1 (186) GCF_000844045.1 (134) GCF_000930695.1 (140) GCF_000886295.1 (145) |
|
Cluster_2 |
GIY-YIG catalytic domain
BRO family, N-terminal domain
Helix-turn-helix domain of resolvase
Poxvirus D5 protein-like
CENP-B N-terminal DNA-binding domain
KilA-N domain
Protein of unknown function
T5orf172 domain
MSV199 domain
Protein of unknown function
Ring finger domain |
15 (6–47 members)/49 |
7 12 14 18 15 13 18 16 16 6 28 13 16 47 27 |
GCF_000871485.1 (186) GCF_000881595.1 (119) GCF_000838105.1 (212) GCF_000923155.1 (220) GCF_000916235.1 (196) GCF_000909775.1 (198) GCF_000915575.1 (199) GCF_000914535.1 (205) GCF_000891235.1 (206) GCF_000918955.1 (163) GCF_000916855.1 (246) GCF_000837185.1 (232) GCF_000427135.1 (229) GCF_000427115.1 (308) GCF_000427175.1 (283) |
|
Cluster_7 |
Large eukaryotic DNA virus major capsid protein
Major capsid protein N-terminus |
8 (7-8 members)/59 |
8 7 8 8 8 8 8 8 |
GCF_000872425.2 (187) GCF_000889515.1 (199) GCF_000890375.1 (184) GCF_000888835.1 (194) GCF_001399285.1 (196) GCF_001399225.1 (182) GCF_000885975.1 (192) GCF_000887855.1 (184) |