| Literature DB >> 33391820 |
Vikas Sharma1,2, Pierre Lefeuvre3, Philippe Roumagnac4,5, Denis Filloux4,5, Pierre-Yves Teycheney6,7, Darren P Martin8, Florian Maumus1.
Abstract
The family Geminiviridae contains viruses with single-stranded DNA genomes that have been found infecting a wide variety of angiosperm species. The discovery within the last 25 years of endogenous geminivirus-like (EGV) elements within the nuclear genomes of several angiosperms has raised questions relating to the pervasiveness of EGVs and their impacts on host biology. Only a few EGVs have currently been characterized and it remains unclear whether any of these have influenced, or are currently influencing, the evolutionary fitness of their hosts. We therefore undertook a large-scale search for evidence of EGVs within 134 genome and 797 transcriptome sequences of green plant species. We detected homologues of geminivirus replication-associated protein (Rep) genes in forty-two angiosperm species, including two monocots, thirty-nine dicots, and one ANITA-grade basal angiosperm species (Amborella trichopoda). While EGVs were present in the members of many different plant orders, they were particularly common within the large and diverse order, Ericales, with the highest copy numbers of EGVs being found in two varieties of tea plant (Camellia sinensis). Phylogenetic and clustering analyses revealed multiple highly divergent previously unknown geminivirus Rep lineages, two of which occur in C.sinensis alone. We find that some of the Camellia EGVs are likely transcriptionally active, sometimes co-transcribed with the same host genes across several Camellia species. Overall, our analyses expand the known breadths of both geminivirus diversity and geminivirus host ranges, and strengthens support for the hypothesis that EGVs impact the biology of their hosts.Entities:
Keywords: Geminiviridae; Viridiplantae; endogenous virus; genomes; paleovirology; phylogeny; transcriptomes
Year: 2020 PMID: 33391820 PMCID: PMC7758297 DOI: 10.1093/ve/veaa071
Source DB: PubMed Journal: Virus Evol ISSN: 2057-1577
Figure 1.Distribution of predicted core geminivirus Pfam domains within the 5 kb flanking regions of the identified EGVs.
Figure 2.Distribution of core geminivirus and non-geminivirus Pfam domains across the plant transcriptomes around the identified EGVs.
Figure 3.Phylogenetic placement of EGV-encoded Rep proteins. Maximum-likelihood phylogenetic tree constructed using EGV Rep protein sequences retrieved from plant genome sequences (Gen; thirty-six sequences) and transcriptome datasets (Tr; 5 sequences) and from representatives of the geminivirus genus (fifty-five sequences). Distant homologues (four sequences from genomoviruses and phytoplasma) were used as outgroups. Sequences are color-coded according to virus genera, EGV, genomovirus, or phytoplasma origin. Bootstrap values are shown on the nodes of the phylogenetic tree. A scale of substitution rates is provided at the bottom of the tree.
Figure 4.(A) Phylogenetic tree of the rep catalytic domain of the sequences discovered within the genomes and transcriptomes of plants in the Ericales order along with representative geminivirus sequences focusing on (B) integration event numbers 3, 4, and 5, (C) integration event number 6, and (D) integration event number 8. All trees are constructed at the same scale according to the scale bar of panel A. Phylogenetic trees branches are coloured according to the plant within which the rep sequence was discovered (see colour key on bottom right). Branches from Camellia oleifera, Camellia reticulata, Camellia sasanqua, and Camellia taliensis have been pooled under ‘other Camellia species’. Geminivirus reference sequences are coloured in black. Integration numbers are indicated with circled numbers (see Supplementary Table S10). (E) The left panel shows a cladogram of the Ericales species constructed on the basis of previous work (Rose et al. 2018). Black numbers indicate divergence time estimates (collected from (Li et al. 2013; Rose et al. 2018)) and red numbers indicate integration events. The right panel shows the number of identified viral-like Rep genomic loci and related protein information in Ericales.
Figure 5.Gene synteny comparison between transcripts from different Camellia species. Vertical grey lines and blocks show the similarity between the transcript sequences based on tBLASTx. Genes encoded by the Camellia transcripts have been displayed using arrows (hypothetical protein; CP: coat protein; BC1: movement protein; C1: C2: replication-associated protein).
Figure 6.Violin plot representing the distributions of average cytosine methylation (CHH, CHG, and CG) levels of protein-coding genes, EGV Rep genomic loci, and repeated genes in the Camellia sinensis var. assamica genome.