| Literature DB >> 30753202 |
Gabriel E Rech1, María Bogaerts-Márquez1, Maite G Barrón1, Miriam Merenciano1, José Luis Villanueva-Cañas1, Vivien Horváth1, Anna-Sophie Fiston-Lavier2, Isabelle Luyten3, Sandeep Venkataram4, Hadi Quesneville3, Dmitri A Petrov4, Josefa González1.
Abstract
Most of the current knowledge on the genetic basis of adaptive evolution is based on the analysis of single nucleotide polymorphisms (SNPs). Despite increasing evidence for their causal role, the contribution of structural variants to adaptive evolution remains largely unexplored. In this work, we analyzed the population frequencies of 1,615 Transposable Element (TE) insertions annotated in the reference genome of Drosophila melanogaster, in 91 samples from 60 worldwide natural populations. We identified a set of 300 polymorphic TEs that are present at high population frequencies, and located in genomic regions with high recombination rate, where the efficiency of natural selection is high. The age and the length of these 300 TEs are consistent with relatively young and long insertions reaching high frequencies due to the action of positive selection. Besides, we identified a set of 21 fixed TEs also likely to be adaptive. Indeed, we, and others, found evidence of selection for 84 of these reference TE insertions. The analysis of the genes located nearby these 84 candidate adaptive insertions suggested that the functional response to selection is related with the GO categories of response to stimulus, behavior, and development. We further showed that a subset of the candidate adaptive TEs affects expression of nearby genes, and five of them have already been linked to an ecologically relevant phenotypic effect. Our results provide a more complete understanding of the genetic variation and the fitness-related traits relevant for adaptive evolution. Similar studies should help uncover the importance of TE-induced adaptive mutations in other species as well.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30753202 PMCID: PMC6372155 DOI: 10.1371/journal.pgen.1007900
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Fig 1Worldwide distribution of D. melanogaster populations used in this study.
Location of the 39 European, 14 North American, five Australian, one Asian, and one African population analyzed in this work. Note that the location of some populations overlap in the map. For more details, see S1 Table. Colors indicate the five major Köppen climate zones [114].
Fig 2Workflow showing the main steps applied for identifying TEs present at high frequencies in high recombination regions in the D. melanogaster genome.
LRR: TEs located at low recombination rate regions. HRR: TEs located at high recombination rate regions. Fixed: HRR TEs at frequencies > 95% in all populations. LowFreq: low frequency HRR TEs (frequencies < 10% in all samples). HighFreq: high frequency HRR TEs (frequencies < 95% in all samples and at >10% frequency in at least three samples). HighFreq TEs were further classified according to their frequency in African (AF) and/or out-of-Africa (OOA) populations: AF: TEs at high frequency only in the African population; AF-OOA: TEs at high frequency in Africa and out-of-Africa populations; OOA: TEs at high frequency in out-of-Africa populations and low frequency in the African population and NA-AF: TEs present at high frequency in out-of-Africa populations but for which we have no data for the African population.
Fig 3TE age of the different frequency groups.
A) Top: Boxplots showing the distribution of TE age (terminal branch length) values for each of the categories. Bottom: Zoomed-in version of the boxed area showing the lowest values of the TE age distribution. B) Proportion of young (age < 0.01) and old (age ≥ 0.01) TEs in each category. * p-value < 0.05, *** p-value < 0.001 from Chi-square test.
Age distribution of TEs belonging to the different population frequency categories.
| TE category | Old | Young | P-value | Enrichment | ||
|---|---|---|---|---|---|---|
| All | 608 (48%) | 661 (52%) | - | - | ||
| LRR | 480 (71%) | 198 (29%) | 2.20e-16 | Old TEs | ||
| HRR | All | 128 (22%) | 463 (78%) | - | - | |
| Fixed | 100 (76%) | 31 (24%) | 2.33e-14 | Old TEs | ||
| LowFreq | 0 (0%) | 177 (100%) | 2.20e-16 | Young TEs | ||
| HighFreq | All | 28 (10%) | 255 (90%) | 2.20e-16 | Young TEs | |
| NA-AF | 0 (0%) | 8 (100%) | 6.22e-03 | Young TEs | ||
| AF | 0 (0%) | 7 (100%) | 6.22e-03 | Young TEs | ||
| AF-OOA | 19 (19%) | 82 (81%) | 4.63e-12 | Young TEs | ||
| OOA | 9 (5%) | 158 (95%) | 2.20e-16 | Young TEs | ||
*P-values are from Chi-square tests comparing TEs at each category with the expectations based on “All TEs”. Note that TEs without age or category classification were excluded from this analysis.
Fig 4Number of TEs at different TE length ratios (%).
Bars indicate number of TEs (vertical axis) per bin of TE Length Ratio (%) (horizontal axis) and color shade indicates the proportion of young and old TEs in each bin.
Fig 5HighFreq TEs with signals of selection.
41 HighFreq TEs showing at least one signal of selection either or both in the selective sweep tests (iHS, H12 or nSL, 36 TEs) or the population differentiation test (F, 9 TEs). Red and grey circles indicate statistical significance for each TE at each test and population (Significant and No significant, respectively). Empty circles (ND) indicates that the test could not be calculated.
84 reference TE insertions showed evidence of selection.
The 62 TEs identified in this work are listed at the top of each frequency category, followed by TEs identified in other studies. Note that for 11 of the 62 TEs there was previous evidence suggesting that they were evolving under positive selection.
| TE category | Flybase ID | Evidence of selection | Reference | GO enrichment/ Gene association |
|---|---|---|---|---|
| OOA | FBti0018916 | iHS | This work | - |
| FBti0018937 | iHS | This work | RtS/ olfactory | |
| FBti0019056 | FST / CSTV | This work/ [ | RtS | |
| FBti0019065 | FST, nSL / fTE / CSTV | This work / [ | RtS/ xenobiotic | |
| FBti0019079 | H12 | This work | RtS | |
| FBti0019081 | nSL | This work | RtS | |
| FBti0019279 | H12 | This work | RtS/ alcohol, olfactory | |
| FBti0019354 | iHS / Allele age | This work/ [ | - /alcohol | |
| FBti0019453 | H12, nSL | This work | RtS/circadian | |
| FBti0019457 | FST/nSL | This work | - | |
| FBti0019601 | H12 | This work | -/ xenobiotic | |
| FBti0019604 | H12 | This work | RtS/ alcohol, heavy metal, olfactory | |
| FBti0019627 | FST, iHS, H12/ iHS / Phenotypic | This work/ [ | RtS/ xenobiotic, diapause | |
| FBti0019632 | H12 | This work | RtS | |
| FBti0019657 | iHS | This work | RtS | |
| FBti0020036 | iHS | This work | RtS/ agressiveness, hypoxia, olfactory | |
| FBti0020057 | H12 / nSL | This work | - / immunity, xenobiotic, diapause | |
| FBti0020091 | iHS | This work | - | |
| FBti0020096 | iHS/ nSL | This work | - | |
| FBti0020116 | H12 | This work | RtS/ olfactory | |
| FBti0020149 | H12, nSL / Allele age | This work/ [ | - / olfactory | |
| FBti0020393 | iHS | This work | RtS/heavy metal | |
| FBti0019360 | FST | [ | - | |
| FBti0020125 | Allele age | [ | RtS/olfactory | |
| FBti0019386 | CL test, TajimaD, Phenotypic | [ | RtS | |
| FBti0019985 | TajimaD, iHS, H12, Phenotypic | [ | RtS/ diapause | |
| FBti0020155 | Phenotypic | [ | RtS/ immunity, starvation, alcohol | |
| FBti0020046 | Allele age | [ | -/ immunity | |
| AF-OOA | FBti0018880 | H12, nSL / iHS / Phenotypic | This work / [ | - /immunity, xenobiotics, alcohol, circadian, starvation, heat-shock |
| FBti0019010 | iHS / FST | This work/ [ | RtS | |
| FBti0019071 | FST | This work | - | |
| FBti0019112 | iHS, H12, nSL | This work | RtS/ alcohol, olfactory, starvation | |
| FBti0019133 | H12 | This work | RtS/ agressiveness | |
| FBti0019372 | H12 | This work | RtS/ olfactory, pigmentation | |
| FBti0019378 | FST | This work | RtS | |
| FBti0019613 | H12 | This work | RtS | |
| FBti0019617 | iHS | This work | RtS/ alcohol, diapause | |
| FBti0019677 | H12 | This work | -/starvation, agressiveness | |
| FBti0019771 | FST | This work | - | |
| FBti0019975 | iHS | This work | - | |
| FBti0020086 | FST, iHS / Allele age | This work/ [ | RtS/ circadian, xenobiotic | |
| FBti0020114 | iHS, nSL | This work | - | |
| FBti0020146 | FST | This work | RtS | |
| FBti0060715 | iHS | This work | RtS | |
| FBti0061417 | H12 | This work | RtS/ heavy metal | |
| FBti0061506 | iHS | This work | RtS/ hypoxia, immunity, olfactory, xenobiotics | |
| FBti0019276 | CSTV | [ | RtS | |
| FBti0019344 | FST | [ | RtS | |
| FBti0019564 | TajimaD | [ | RtS | |
| FBti0019611 | CSTV | [ | Nsd, locomotion, chemotaxis / olfactory, pigmentation, alcohol, diapause | |
| FBti0019082 | TajimaD | [ | RtS/ starvation | |
| FBti0060443 | CSTV | [ | RtS/ alcohol | |
| FBti0019170 | fTE / Phenotypic | [ | RtS/ olfactory | |
| NA-AF | FBti0019430 | H12 / TajimaD / iHS, fTE / Alllele age / Phenotypic | This work/ [ | -/ immunity, hypoxia |
| FBti0019200 | Allele age | [ | RtS/ starvation | |
| LowFreq | FBti0020082 | Allele age | [ | RtS |
| FBti0061742 | TajimaD | [ | - | |
| Fixed | FBti0059674 | Young&Long | This work | -/ alcohol, cold, heavy-metal, olfactory, pigmentation, xenobiotics |
| FBti0019153 | Young&Long | This work | - | |
| FBti0019149 | Young&Long | This work | - | |
| FBti0059794 | Young&Long | This work | -/ heavy-metal, olfactory | |
| FBti0019355 | Young&Long | This work | -/ xenobiotic | |
| FBti0019590 | Young&Long | This work | RtS, Development / pigmentation | |
| FBti0019191 | Young&Long | This work | -/ alcohol, olfactory | |
| FBti0020098 | Young&Long | This work | -/ alcohol | |
| FBti0020101 | Young&Long | This work | -/ alcohol | |
| FBti0020015 | Young&Long | This work | -/ pigmentation, diapause, hypoxia, oxidative, starvation, xenobiotic, alcohol, oxidative, xenobiotics | |
| FBti0020013 | Young&Long | This work | -/ alcohol, olfactory, heavy-metal, pigmentation | |
| FBti0019199 | Young&Long / Allele age | This work/ [ | RtS/ alcohol, pigmentation | |
| FBti0018940 | TajimaD | This work | - | |
| FBti0020147 | TajimaD | This work | - | |
| FBti0060295 | TajimaD | This work | - | |
| FBti0061024 | TajimaD | This work | - | |
| FBti0062854 | TajimaD | This work | - | |
| FBti0062980 | TajimaD | This work | - | |
| FBti0063022 | TajimaD | This work | - | |
| FBti0063801 | TajimaD | This work | -/ alcohol, diseccation, pigmentation | |
| FBti0060388 | TajimaD | This work/ [ | RtS | |
| FBti0060479 | TajimaD | [ | RtS | |
| FBti0062283 | TajimaD | [ | RtS/ immunity, alcohol | |
| FBti0063191 | TajimaD | [ | RtS/ alcohol, diapause, immunity, oxidative, starvation, xenobiotic | |
| FBti0019655 | TajimaD | [ | - | |
| FBti0020329 | TajimaD | [ | RtS/ hypoxia | |
| FBti0059793 | TajimaD | [ | - /immunity, oxidative, starvation, alcohol, hypoxia |
CSTV: Correlation with spatio-temporal variables. RtS: response to stimulus, Nsd: Nervous system development.
Fig 6Functional enrichment analysis of genes nearby TEs showing evidence of selection (in this or previous works) and HighFreq TEs.
Bar colors indicates similar biological functions of the DAVID clusters (A) and the fitness-related traits (B): Green: stress response, Red: behavior, Blue: development Yellow: pigmentation. A) Significant gene ontology clusters according to DAVID functional annotation tool (enrichment score > 1.3). For genes nearby HighFreq TEs, only top five clusters are showed. The horizontal axis represent DAVID enrichment score (see S8A and S8B Table for details). B) Significantly overrepresented fitness-related genes according to previous genome association studies. All FDR corrected p-values < 0.05, Chi-square (χ2) test (see S10A and S10B Table for details). The horizontal axis represents the log10(χ2). In both, A) and B), numbers nearby each bar indicate total number of genes in that cluster/category.
Correlation between TE presence and expression level of nearby genes.
| TEs | HighFreq | Fixed | LowFreq | Private | |||||
|---|---|---|---|---|---|---|---|---|---|
| 70 | 11% | 192 | 30% | 376 | 59% | 25 | 4% | ||
| 19 | 38% (***) | 12 | 24% | 19 | 38% | 4 | 8% | ||
| 15 | 37% (***) | 11 | 27% | 15 | 37% | 4 | 10% (*) | ||
| 11 | 32% (***) | 8 | 24% | 15 | 44% | 3 | 9% (*) | ||
| 5 | 50% (***) | 0 | 0% | 5 | 50% | 0 | 0% | ||
Number of TEs located in high recombination regions for which correlations were calculated (All TEs analyzed), and number of TEs with significant correlations for each frequency group are given (Significant TEs). Frequency groups were determined based on their frequency in the DGRP population. LowFreq TEs were further classified as Private if only one strain contained the TE. Note that TEs are classified as fixed if they are present in > 95% of the strains analyzed, thus for some of these TEs there could be strains that do not contain the insertion. Percentages regarding the total number of TEs in that frequency category are also given. Chi-square test * p-value < 0.05 and *** p-value < 0.0001.
Fig 7Caracteristics of the HighFreq TEs.
A) TE location regarding the nearest gene. B) Location of intragenic TEs. C) TE order. *: p-value < 0.05. ***: p-value < 0.001 (Chi-square test).