| Literature DB >> 30239708 |
Sunlu Chen1,2, Nozomi Saito1, Jaymee R Encabo1,3,4, Kanae Yamada1, Il-Ryong Choi3, Yuji Kishima1.
Abstract
Endogenous viral sequences in eukaryotic genomes, such as those derived from plant pararetroviruses (PRVs), can serve as genomic fossils to study viral macroevolution. Many aspects of viral evolutionary rates are heterogeneous, including substitution rate differences between genes. However, the evolutionary dynamics of this viral gene rate heterogeneity (GRH) have been rarely examined. Characterizing such GRH may help to elucidate viral adaptive evolution. In this study, based on robust phylogenetic analysis, we determined an ancient endogenous PRV group in Oryza genomes in the range of being 2.41-15.00 Myr old. We subsequently used this ancient endogenous PRV group and three younger groups to estimate the GRH of PRVs. Long-term substitution rates for the most conserved gene and a divergent gene were 2.69 × 10-8 to 8.07 × 10-8 and 4.72 × 10-8 to 1.42 × 10-7 substitutions/site/year, respectively. On the basis of a direct comparison, a long-term GRH of 1.83-fold was identified between these two genes, which is unexpectedly low and lower than the short-term GRH (>3.40-fold) of PRVs calculated using published data. The lower long-term GRH of PRVs was due to the slightly faster rate decay of divergent genes than of conserved genes during evolution. To the best of our knowledge, we quantified for the first time the long-term GRH of viral genes using paleovirological analyses, and proposed that the GRH of PRVs might be heterogeneous on time scales (time-dependent GRH). Our findings provide special insights into viral gene macroevolution and should encourage a more detailed examination of the viral GRH.Entities:
Mesh:
Year: 2018 PMID: 30239708 PMCID: PMC6179347 DOI: 10.1093/gbe/evy207
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Summary of eRTBVL-D Segments in the Oryza sativa Genome
| ID | Positions in Rice Genome | Length (bp) | Distances to Neighbor Genes | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Chr | Start | End | Strand | Start | End | Left | Right | ||||
| d1 | JaE1-4 | 1 | 5,193,920 | 5,194,407 | + | 6,880 | 7,353 | 488 | NA | 43,579 | 7,455 |
| d2 | JaE1-4 | 1 | 5,194,744 | 5,195,008 | + | 108 | 359 | 265 | 0 | 44,403 | 6,854 |
| d3 | JaE1-4 | 1 | 5,195,124 | 5,198,055 | + | 4,259 | 7,180 | 2,932 | 9 | 44,783 | 3,807 |
| d4 | NA | 2 | 12,120,385 | 12,120,601 | + | 6,540 | 6,758 | 217 | NA | 5,419 | 7,312 |
| d5 | NA | 2 | 12,120,644 | 12,121,245 | − | 6,539 | 7,184 | 602 | NA | 5,678 | 6,668 |
| d6 | NA | 4 | 18,488,307 | 18,488,450 | + | 6,873 | 7,021 | 144 | NA | 8,092 | 1,903 |
| d7 | NA | 4 | 18,488,544 | 18,488,674 | − | 6,852 | 6,984 | 131 | NA | 8,329 | 1,679 |
| d8 | NA | 4 | 21,097,520 | 21,097,778 | − | 6,952 | 7,196 | 259 | NA | 86,783 | 85,199 |
| d9 | NA | 5 | 6,506,720 | 6,507,050 | − | 6,953 | 7,278 | 331 | NA | 8,332 | 7,696 |
| d10 | JaE7-4 | 7 | 8,906,209 | 8,907,980 | + | 5,547 | 7,343 | 1,772 | 5 | 27,304 | 18,154 |
| d11 | JaE7-5/7-6 | 7 | 8,909,007 | 8,920,257 | − | 4,817 | 7,341 | 11,251 | 9 | 30,102 | 5,877 |
| d12 | NA | 9 | 16,726,920 | 16,727,025 | + | 32 | 137 | 106 | 0 | 5,997 | 1,153 |
| d13 | NA | 10 | 9,099,571 | 9,099,688 | − | 2,584 | 2,696 | 118 | 1 | 5,282 | 2,080 |
| d14 | JaE11-2 | 11 | 5,069,912 | 5,072,964 | + | 3,580 | 6,416 | 3,053 | 19 | 5,378 | 243 |
| d15 | JaE11-5 | 11 | 11,654,495 | 11,657,114 | − | 3,806 | 6,413 | 2,620 | 15 | 2,608 | 4,216 |
Previous names of six segments (JaE1-4, 7-4, 7-5, 7-6, 11-2, and 11-5) are from Liu et al. (2012).
Segments were mapped to the reconstructed viral genomes.
Single nucleotide polymorphisms and insertion/deletion-induced nonsense mutations and frameshift mutations.
NA, not available for intergenic regions; Chr, chromosome.
. 1.—Viral genomic structure of eRTBVL-D segments in the Oryza sativa genome (A) and ortholog presence/absence in the genus Oryza (B). In panel (A), the circular viral genome is displayed at the top with open reading frames (ORFs) represented with arrows and functional domains (genes) outlined in different colors. Intergenic regions (IGRs) are represented as black curved lines. Black dots and diamonds indicate primer-binding sites and polypurine tracts, respectively. The eRTBVL-D segments were mapped to the linear viral genome, where ORFs are indicated by rectangles with arrows, and IGRs are represented by thick black lines. Domains/ORFs (genes) examined in detail in this study are highlighted, and segments examined in detail are indicated by red IDs. Two large insertions in the d11 segment are indicated by inverted triangles. In panel (B), the known phylogeny of the genus Oryza (Zheng and Ge 2010; Huang et al. 2012; Stein et al. 2018) is presented at the top. Branches corresponding to Oryza AA-, BB-, and FF-genome groups are depicted in different colors, and their corresponding divergence times (millions of years) are labeled. The table summarizes the pattern of ortholog presence/absence of each eRTBVL-D segment. Green, white, and gray indicate presence, absence, and unclear results, respectively, with yellow symbolizing loss due to deletion. Detailed results are provided in supplementary table S2, Supplementary Material online.
Fig. 2.—Long-term substitution rates of plant pararetroviruses (PRVs) estimated with eRTBVL-D. (A) Strategy used to estimate genetic distances for substitution rate calculations. The distance between an eRTBVL-D sequence in the rice genome and viral sequences of eRTBVL-A/-B/-C (uncorrected distance) minus the distance accumulated in the rice genome from 2.41 to 6.76 Myr for eRTBVL-D was considered to approximate the distance between the viruses of eRTBVL-D and eRTBVL-A/-B/-C from 2.25 to 6.75 Myr (corrected distance). Each element is represented by a different color. The (unknown) amount of time required for a viral sequence to be endogenized in a host population was ignored because it was ∼0 relative to a million years of macroevolution. Therefore, the divergence time between the viral sequences of the studied eRTBVL-D and eRTBVL-A/-B/-C segments was approximated as the difference between the ages of eRTBVL-D and eRTBVL-A/-B/-C sequences. (B) Long-term substitution rates of PRVs calculated using corrected distances. (C) Time-dependent rate phenomenon of PRVs. The plot presents the relationship between substitution rates (substitutions/site/year) and the corresponding measurement time scales (years). The log10-transformed values underwent a linear regression analysis (red line), and the resulting equation is displayed. The data are from previous studies (short-term; green dots) (Yasaka et al. 2014; Guimarães et al. 2015) and from this study (long-term; blue dots). One value from Yasaka et al. (2014) that was calculated only from divergent gene regions was not included.
Fig. 3.—Long-term GRH between the conserved RT/RH and divergent ORFz genes of PRVs. (A) Long-term substitution rates of the RT/RH and ORFz genes of PRVs. Substitution rates were calculated using corrected distances. NA, not available. (B) Quantification of the long-term GRH between these two genes. GRH values (fold difference) are displayed on the plots, with green dotted lines indicating the averages. (C) Comparison between the rate decay speed of the RT/RH and ORFz genes of PRVs. The plot presents the relationship between gene substitution rates (substitutions/site/year) and the corresponding measurement time scales (years). The log10-transformed values underwent a linear regression analysis (orange dots and line for the RT/RH gene, and blue dots and line for the ORFz gene), and the resulting equations are displayed. The short-term data are from a previous study (Yasaka et al. 2014). Note that the short-term substitution rate for the RT/RH gene in the analysis is actually an average value for ORFs I–V of CaMV (the RT/RH gene is located in ORF V), thus the slope for the RT/RH gene is >−0.73.