| Literature DB >> 32456325 |
Khalil Geballa-Koukoulas1,2, Hadjer Boudjemaa1,3, Julien Andreani1, Bernard La Scola1, Guillaume Blanc2.
Abstract
Faustovirus is a recently discovered genus of large DNA virus infecting the amoeba Vermamoeba vermiformis, which is phylogenetically related to Asfarviridae. To better understand the diversity and evolution of this viral group, we sequenced six novel Faustovirus strains, mined published metagenomic datasets and performed a comparative genomic analysis. Genomic sequences revealed three consistent phylogenetic groups, within which genetic diversity was moderate. The comparison of the major capsid protein (MCP) genes unveiled between 13 and 18 type-I introns that likely evolved through a still-active birth and death process mediated by intron-encoded homing endonucleases that began before the Faustovirus radiation. Genome-wide alignments indicated that despite genomes retaining high levels of gene collinearity, the central region containing the MCP gene together with the extremities of the chromosomes evolved at a faster rate due to increased indel accumulation and local rearrangements. The fluctuation of the nucleotide composition along the Faustovirus (FV) genomes is mostly imprinted by the consistent nucleotide bias of coding sequences and provided no evidence for a single DNA replication origin like in circular bacterial genomes.Entities:
Keywords: Asfarvirus; Faustovirus; genome evolution; nucleo-cytoplasmic large DNA virus
Mesh:
Substances:
Year: 2020 PMID: 32456325 PMCID: PMC7290515 DOI: 10.3390/v12050577
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Genomic features of Faustoviruses (FVs).
| Strain | Clade | Sampling Site | Genbank ID | Contig Length (bp) | G + C% | Gene no. | Family no. | No. of Introns MCP | TIR (bp) |
|---|---|---|---|---|---|---|---|---|---|
| F-S17 | E9 | Oran, Algeria, sewage | MN830296 | 476,423 | 39.6 | 486 | 482 | 18 | 249 |
| F-M6 | E9 | Marseille, France, sewage | MN830295 | 472,803 | 39.8 | 492 | 485 | 17 | 372 |
| F-VV57 | E9 | Telmcen, Algeria, reservoir lake | MN830297 | 478,172 | 39.7 | 497 | 491 | 15 | 61 |
| F-VV63 | E9 | Chlef Marsa, Algeria, sewage | MN830298 | 479,542 | 39.7 | 498 | 490 | 15 | 687 |
| F-LCD7 | E9 | La Ciotat, France, sewage | MN830294 | 477,407 | 39.9 | 502 | 495 | 14 | 247 |
| F-LC9 | E9 | La Ciotat, France, sewage | CZDJ02000001-5 | 470,873 | 39.8 | 500 | 492 | 14 | 0 |
| F-E9 | E9 | Marseille, France, sewage | MT335755 | 491,024 | 39.6 | 506 | 498 | 16 | 489 |
| F-VV10 | D | Mostaganem, Algeria, sewage | MN956669 | 456,728 | 37.7 | 471 | 456 | 17 | 0 |
| F-D3 | D | Dakar, Senegal, sewage | KU556803 | 455,803 | 37.8 | 481 | 476 | 16 | 380 |
| F-D5b | D | Dakar, Senegal, sewage | KU702949 | 464,523 | 37.7 | 488 | 481 | 14 | 324 |
| F-D6 | D | Dakar, Senegal, sewage | KU702951 | 462,011 | 37.7 | 485 | 479 | 14 | 309 |
| F-D5a | M/L | Dakar, Senegal, sewage | KU702950 | 466,051 | 36.2 | 474 | 472 | 13 | 528 |
| F-E12 | M/L | Marseille, France, sewage | KJ614390 | 466,265 | 36.2 | 474 | 472 | 13 | 498 |
| F-E23 | M/L | Marseille, France, sewage | KU702952 | 465,956 | 36.2 | 474 | 472 | 14 | 528 |
| F-E24 | M/L | Marseille, France, sewage | KU702948 | 466,012 | 36.2 | 474 | 472 | 13 | 556 |
| F-ST1 | M/L | St Pierre de Mezoargues, France, wastewater | LT839607 | 470,659 | 36.7 | 495 | 467 | 13 | 0 |
| F-Liban | M/L | Tripoli El Mina, Lebanon, sea water | MN534311 | 470,731 | 36.7 | 478 | 465 | 13 | 0 |
Figure 1Venn diagram of the numbers of protein families shared between FV clades. For each area of the diagram, the number of protein families ubiquitously conserved in all of the FVs of the corresponding group is indicated. Numbers in brackets indicate additional protein families that were only conserved in a subset of FVs from the relevant group. The names of the FV strains contained in each clade are detailed in Figure 2.
Figure 2Phylogenetic relationships between FVs and virus relatives. (A) Unrooted phylogenetic tree of FVs and virus relatives reconstructed using the viral DNA polymerase as a marker. FV clades have been collapsed for clarity. Details of the clade compositions are given in B. SH-like local supports for branches are indicated beside nodes. The scale bar indicates the number of amino-acid substitutions per site. (B) Phylogenetic tree of FVs reconstructed from the concatenation of the alignments of 267 single-copy core FV proteins. All branches received maximal SH-like local support. Names of the newly sequenced FV strains are shown in blue.
Figure 3Evolution of the major capsid protein (MCP) gene structure in FVs and Kaumoebavirus (KV). The figure shows a graphical representation of the FV and KV MCP gene alignment. Exons and introns of the MCP genes are shown by red rectangles and 2-colour segments, respectively. Introns represented by the same 2-colour code are orthologous (i.e., share sequence similarity and position relative to the MCP coding sequence). A circled “A” marks orthologous introns that were present in the last FV common ancestor (i.e., introns shared by any virus from clade M/L and any virus from clades D and/or E9). Inversely circled “N”s indicate introns that have no evidence of being present in the last FV common ancestor. Light and dark grey areas indicate significant nucleotide similarity between introns and between exons, respectively. Predicted ORFs with significant protein similarity to group I intron endonucleases are shown with blue arrows with a number inside indicating the ORF id in the respective genome annotation.
Figure 4Sequence conservation along viral chromosomes. The graphs represent sequence conservation along reference genomes of FVs (i.e., F-M6 for clade E9, F-D3 for clade D, and F-E12 for clade M/L) and African swine fever viruses (ASFVs) (strain BA71G). Each reference FV genome was aligned against the other FV genomes of the same clade using BLASTN (evalue < 1E-15). The reference ASFV genome was aligned against 12 other sequenced ASFV genomes available in Genbank. The resulting alignments were parsed to compute various statistics within 10 Kb windows slid along the genomes with a 1 Kb step. The dark, medium and light blue areas represent maximal, average and minimal levels of within-clade sequence conservation (global identity) within windows. The mauve and black curves represent the frequencies of positions containing a nucleotide substitution or a gap, respectively, in any alignment within windows. Open red rectangles indicate the positions of the MCP genes, with individual exons shown with shaded boxes below the x-axis.
Figure 5Complimentary nucleotide composition bias along viral genomes. Each graph represents the G-C walk (black), A-T walk (blue) and CDS walk (kaki – y-axis not shown). Open red rectangles indicate the position of the MCP genes, with individual exons shown with shaded boxes below the x-axis. The x-axis units are base pairs. PV: Pacmanvirus A19; ASFV: Asfarvirus BA71V; KV: Kaumoebavirus Sc.