| Literature DB >> 30906288 |
Matthieu Legendre1, Jean-Marie Alempic1, Nadège Philippe1, Audrey Lartigue1, Sandra Jeudy1, Olivier Poirot1, Ngan Thi Ta1, Sébastien Nin1, Yohann Couté2, Chantal Abergel1, Jean-Michel Claverie1.
Abstract
With genomes of up to 2.7 Mb propagated in μm-long oblong particles and initially predicted to encode more than 2000 proteins, members of the Pandoraviridae family display the most extreme features of the known viral world. The mere existence of such giant viruses raises fundamental questions about their origin and the processes governing their evolution. A previous analysis of six newly available isolates, independently confirmed by a study including three others, established that the Pandoraviridae pan-genome is open, meaning that each new strain exhibits protein-coding genes not previously identified in other family members. With an average increment of about 60 proteins, the gene repertoire shows no sign of reaching a limit and remains largely coding for proteins without recognizable homologs in other viruses or cells (ORFans). To explain these results, we proposed that most new protein-coding genes were created de novo, from pre-existing non-coding regions of the G+C rich pandoravirus genomes. The comparison of the gene content of a new isolate, pandoravirus celtis, closely related (96% identical genome) to the previously described p. quercus is now used to test this hypothesis by studying genomic changes in a microevolution range. Our results confirm that the differences between these two similar gene contents mostly consist of protein-coding genes without known homologs, with statistical signatures close to that of intergenic regions. These newborn proteins are under slight negative selection, perhaps to maintain stable folds and prevent protein aggregation pending the eventual emergence of fitness-increasing functions. Our study also unraveled several insertion events mediated by a transposase of the hAT family, 3 copies of which are found in p. celtis and are presumably active. Members of the Pandoraviridae are presently the first viruses known to encode this type of transposase.Entities:
Keywords: Acanthamoeba; comparative genomics; de novo gene creation; giant viruses; hAT transposase; soil viruses
Year: 2019 PMID: 30906288 PMCID: PMC6418002 DOI: 10.3389/fmicb.2019.00430
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
FIGURE 1Phylogenetic tree of known, fully characterized pandoraviruses. The tree was computed from the pandoraviruses’ core genes (see Materials and Methods for details). The estimated bootstrap values were all equal to 1 and thus not reported. Internal nodes are labeled according to their depth in the tree.
P. celtis and p. quercus unique protein-coding genes.
| Gene # | Size (aa) | Most ancestral detection | Homolog in p. quercus | Predicted DNA binding | Median RNAseq read coverage |
|---|---|---|---|---|---|
| pclt_cds_11 | 69 | Since node 5 | None | Yes | 704 |
| pclt_cds_308 | 181 | Since node 2 | pqer_ncRNA_47 | No | 986 |
| pclt_cds_350 | 121 | Since node 9 | Intergenic | Yes | 57 |
| pclt_cds_376 | 145 | Since node 5 | 3′UTR pqer_cds_371 | Yes | 454 |
| pclt_cds_725 | 104 | None | None | Yes | 46 |
| pclt_cds_870 | 205 | Since node 2 | None | Yes | 2027 |
| pclt_cds_995 | 149 | Since node 5 | 5′UTR pqer_cds_981 | Yes | 13 |
| pclt_cds_1081 | 125 | Since node 8 | 5′UTR pqer_cds_1061 | Yes | 12,090 |
| pclt_cds_1084 | 114 | Since node 8 | intergenic | Yes | 130 |
| pqer_cds_6 | 76 | Since node 8 | None | Yes | 311 |
| pqer_cds_13 | 117 | Since node 5 | Intergenic | Yes | 99 |
| pqer_cds_17 | 82 | Since node 2 | Intergenic | Yes | 482 |
| pqer_cds_53 | 85 | Since node 8 | Intergenic | Yes | 48 |
| pqer_cds_143 | 93 | Since node 8 | Anti 5′UTR pclt_cds_146 | Yes | 177 |
| pqer_cds_151 | 71 | Since node 9 | Intergenic | Yes | 21 |
| pqer_cds_203 | 101 | Since node 8 | Antisense pclt_cds_206 | Yes | 528 |
| pqer_cds_350 | 94 | None | None | No | 160 |
| pqer_cds_474 | 74 | Since node 9 | Intergenic | Yes | 114 |
| pqer_cds_486 | 146 | Since node 2 | 3′UTR pclt_cds_499 | Yes | 71 |
| pqer_cds_665 | 114 | None | None | Yes | 1,955 |
| pqer_cds_673 | 383 | None | None | Yes | 656 |
| pqer_cds_685 | 124 | Since node 8 | Alternative frame pclt_cds_685 | No | 117 |
| pqer_cds_736 | 121 | Since node 9 | Intergenic | Yes | 70 |
| pqer_cds_875 | 136 | None | None | No | 2,050 |
| pqer_cds_876 | 74 | Since node 2 | None | No | 1,695 |
| pqer_cds_877 | 78 | None | None | Yes | 320 |
| pqer_cds_878 | 224 | None | None | Yes | 231 |
| pqer_cds_1061 | 152 | Since node 8 | Alternative frame pclt_cds_1081 | Yes | 14,186 |
| pqer_cds_1178 | 84 | Since node 9 | Anti 5′UTR pclt_cds_1203 | Yes | 170 |
| pqer_cds_1183 | 158 | Since node 2 | None | Yes | 198 |
FIGURE 2Dot plot (nucleotide) comparison of the p. celtis vs. p. quercus genomes. Computations and display were generated using Gepard (Krumsiek et al., 2007). Specific regions, labeled P, T, S0, S1, and S2 are described in the main text.
FIGURE 3Pandoraviridae pan-genome and core-genome. The boxplot represents the number of protein clusters as a function of the number of sequenced genomes in all possible combinations. The whiskers correspond to the extreme data points.
FIGURE 4Estimated selection pressure acting on pandoravirus genes as a function of their ancestry. Shown is the mean of dN/dS ratios (filled bars) for genes that are unique to the pandoravirus strains below a given node in their phylogeny tree (see Figure 1). Error bars correspond to standard deviations. Bars are colored according to the depth of the node in the tree. As a control, we calculated the dN/dS of pandoraviruses’ core genes (empty bars). For a given node, we considered pairs of orthologous genes whose common ancestor correspond to that node.