| Literature DB >> 35262706 |
Julie Dazenière1, Alexandros Bousios1, Adam Eyre-Walker1.
Abstract
Transposable elements are a major component of most eukaryotic genomes. Here, we present a new approach which allows us to study patterns of natural selection in the evolution of transposable elements over short time scales. The method uses the alignment of all elements with intact gag/pol genes of a transposable element family from a single genome. We predict that the ratio of nonsynonymous to synonymous variants in the alignment should decrease as a function of the frequency of the variants, because elements with nonsynonymous variants that reduce transposition will have fewer progeny. We apply our method to Sirevirus long-terminal repeat retrotransposons that are abundant in maize and other plant species and show that nonsynonymous to synonymous variants declines as variant frequency increases, indicating that negative selection is acting strongly on the Sirevirus genome. The asymptotic value of nonsynonymous to synonymous variants suggests that at least 85% of all nonsynonymous mutations in the transposable element reduce transposition. Crucially, these patterns in nonsynonymous to synonymous variants are only predicted to occur if the gene products from a particular transposable element insertion preferentially promote the transposition of the same insertion. Overall, by using large numbers of intact elements, this study sheds new light on the selective processes that act on transposable elements.Entities:
Keywords: adaptive evolution; maize; plants; purifying selection; transposable elements
Mesh:
Substances:
Year: 2022 PMID: 35262706 PMCID: PMC9073684 DOI: 10.1093/g3journal/jkac056
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.542
Plant species and Sirevirus families were included in this study.
| Species | Family | Full-length elements | Intact Sireviruses |
|---|---|---|---|
|
|
| 10,788 | 2,445 |
|
| 10,563 | 2,345 | |
|
| 374 | 140 | |
|
| 504 | 139 | |
|
| 279 | 40 | |
|
| Family 1 | 92 | 22 |
| Family 2 | 457 | 58 | |
|
| Family 1 | 403 | 64 |
| Family 2 | 842 | 404 | |
|
| Family 1 | 1,360 | 60 |
|
| Family 1 | 263 | 134 |
|
| Family 1 | 31 | 13 |
| Family 2 | 71 | 7 | |
| Family 3 | 56 | 12 | |
| Family 4 | 243 | 44 | |
|
| Family 1 | 70 | 11 |
| Family 2 | 62 | 5 | |
| Family 3 | 200 | 33 | |
| Family 4 | 213 | 32 |
In maize, known TE exemplars were used to assign each element to a known family (see Materials and Methods). Simple names (e.g. Family 1, Family 2) were used for species with no exemplars.
Examples of how synonymous and nonsynonymous variants are counted.
| Sequence | Codon 1 | Codon 2 | Codon 3 | Codon 4 | Codon 5 |
|---|---|---|---|---|---|
| 1 | TTT | TTT | TTT | TTT | TTT |
| 2 | TTT | TTT | TTC | TTC | TTC |
| 3 | TTC | CTT | CTT | AGA | CTT |
| … | … | … | … | … | … |
| 10 | TTC | CTT | CTT | AGG | CTT |
| 11 | TTC | CTT | CTT | GCT | AAA |
| 12 | TTC | CTT | CTT | ACT | AAA |
| … | … | … | … | … | |
| 20 | TTC | CTT | CTT | TTT | AAG |
| Synonymous variant count | 1 | 0 | 1 | 0 | 2 |
| Nonsynonymous variant count | 0 | 1 | 1 | 0 | 1 |
| Notes | No codon set has 10 instances | Multiple codon sets included |
Fig. 1.The value of vN/vS as a function of the frequency of the variants in the alignment for the 5 families of Sireviruses in maize. Ji and Opie are sampled to 130 sequences each, while Giepum, Hopie, and Jienv are the full datasets. P0.015625 refers to the frequency category 0 < x ≤ 2−6, P0.03125 to 2−6 < x ≤ 2−5…, and so on.
Spearman correlation between vN/vS and variant frequency for each family and each gene.
| Group | Spearman's correlation coefficient |
| |
|---|---|---|---|
| Families |
| −1 | 0.003 |
|
| −0.829 | 0.058 | |
|
| −0.6 | 0.208 | |
|
| −0.771 | 0.072 | |
|
| −0.7 | 0.188 | |
| Combined | <0.001 | ||
| Genes |
| −0.829 | 0.058 |
|
| −0.771 | 0.103 | |
|
| −0.771 | 0.072 | |
|
| −0.714 | 0.111 | |
|
| −0.943 | 0.005 | |
| Combined | <0.001 | ||
Note the correlation is calculated across frequency categories.
Testing whether vN/vS differs between families and genes for each frequency category.
| Analysis | Frequency category | Chi-square |
|
|
|---|---|---|---|---|
| Between genes | 0 < x ≤ 1/64 | 12.09 | 4 | 0.02 |
| 1/64 < x ≤ 1/32 | 8.69 | 4 | 0.07 | |
| 1/32 < x ≤ 1/16 | 10.65 | 4 | 0.03 | |
| 1/16 < x ≤ 1/8 | 15.2 | 4 | 0 | |
| 1/8 < x ≤ 1/4 | 9.89 | 4 | 0.04 | |
| 1/4 < x ≤ 1/2 | 11.09 | 4 | 0.03 | |
| Total | 67.61 | 24 | <0.00001 | |
| Between families | 0 < x ≤ 1/64 | 7.99 | 3 | 0.05 |
| 1/64 < x ≤ 1/32 | 7.06 | 4 | 0.13 | |
| 1/32 < x ≤ 1/16 | 9.52 | 4 | 0.05 | |
| 1/16 < x ≤ 1/8 | 5.9 | 4 | 0.21 | |
| 1/8 < x ≤ 1/4 | 3.29 | 4 | 0.51 | |
| 1/4 < x ≤ 1/2 | 2.91 | 4 | 0.57 | |
| Total | 36.67 | 23 | 0.035 |
The chi-square value is given along with the degrees of freedom and the P-value. Note, in the family analysis there is only 3 df in the lowest frequency class because Jienv has only 40 intact elements and hence no variants in the lowest frequency class.
Fig. 2.The distribution of relative ages across TE copies from each family. Histogram plots show the distribution of divergence (numbers of point mutations and indels per site) between the LTRs of each element. The green line represents the median within each family. Note that the x-axis is on a log10 scale.
Fig. 3.a) The value of vN/vS as a function of the frequency of the variants in the alignment for the 5 genes in the Sirevirus element, with the families combined. b) The length variation of the 5 maize families for each gene.
Fig. 4.The value of vN/vS as a function of the frequency of the variants in the alignment for Sirevirus families in various plant species.