Daniel K Fabian1,2, Handan Melike Dönertaş1, Matías Fuentealba1,2, Linda Partridge2,3, Janet M Thornton1. 1. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom. 2. Institute of Healthy Ageing, Department of Genetics, Evolution and Environment, University College London, United Kingdom. 3. Max Planck Institute for Biology of Ageing, Cologne, Germany.
Abstract
Transposable elements (TEs) inflict numerous negative effects on health and fitness as they replicate by integrating into new regions of the host genome. Even though organisms employ powerful mechanisms to demobilize TEs, transposons gradually lose repression during aging. The rising TE activity causes genomic instability and was implicated in age-dependent neurodegenerative diseases, inflammation, and the determination of lifespan. It is therefore conceivable that long-lived individuals have improved TE silencing mechanisms resulting in reduced TE expression relative to their shorter-lived counterparts and fewer genomic insertions. Here, we test this hypothesis by performing the first genome-wide analysis of TE insertions and expression in populations of Drosophila melanogaster selected for longevity through late-life reproduction for 50-170 generations from four independent studies. Contrary to our expectation, TE families were generally more abundant in long-lived populations compared with nonselected controls. Although simulations showed that this was not expected under neutrality, we found little evidence for selection driving TE abundance differences. Additional RNA-seq analysis revealed a tendency for reducing TE expression in selected populations, which might be more important for lifespan than regulating genomic insertions. We further find limited evidence of parallel selection on genes related to TE regulation and transposition. However, telomeric TEs were genomically and transcriptionally more abundant in long-lived flies, suggesting improved telomere maintenance as a promising TE-mediated mechanism for prolonging lifespan. Our results provide a novel viewpoint indicating that reproduction at old age increases the opportunity of TEs to be passed on to the next generation with little impact on longevity.
Transposable elements (TEs) inflict numerous negative effects on health and fitness as they replicate by integrating into new regions of the host genome. Even though organisms employ powerful mechanisms to demobilize TEs, transposons gradually lose repression during aging. The rising TE activity causes genomic instability and was implicated in age-dependent neurodegenerative diseases, inflammation, and the determination of lifespan. It is therefore conceivable that long-lived individuals have improved TE silencing mechanisms resulting in reduced TE expression relative to their shorter-lived counterparts and fewer genomic insertions. Here, we test this hypothesis by performing the first genome-wide analysis of TE insertions and expression in populations of Drosophila melanogaster selected for longevity through late-life reproduction for 50-170 generations from four independent studies. Contrary to our expectation, TE families were generally more abundant in long-lived populations compared with nonselected controls. Although simulations showed that this was not expected under neutrality, we found little evidence for selection driving TE abundance differences. Additional RNA-seq analysis revealed a tendency for reducing TE expression in selected populations, which might be more important for lifespan than regulating genomic insertions. We further find limited evidence of parallel selection on genes related to TE regulation and transposition. However, telomeric TEs were genomically and transcriptionally more abundant in long-lived flies, suggesting improved telomere maintenance as a promising TE-mediated mechanism for prolonging lifespan. Our results provide a novel viewpoint indicating that reproduction at old age increases the opportunity of TEs to be passed on to the next generation with little impact on longevity.
Despite of medical relevance, the molecular mechanisms of aging are still poorly understood due to its complex genetic basis. Transposable elements (TEs), which are prevalent in eukaryotic genomes, have recently emerged as important factors in aging as they contribute to age-associated disorders and become more mobile during aging. By analyzing genomes and transcriptomes of long-lived fruit fly populations that underwent reproduction at old age for 58–170 generations, we find that late-life breeding increases the opportunity of many TEs to be passed on to the next generation, but causes an overall reduced TE transcription. Our results suggest that reducing TE expression might be more important for longevity than purging genomic insertions, further strengthening the role of TEs in aging.
Introduction
Aging, also known as senescence, is an evolutionary conserved process described as the progressive loss of physiological homeostasis starting from maturity with disease promotion, decline in phenotypic function, and increased chance of mortality over time as a consequence (Fabian and Flatt 2011; Flatt and Heyland 2011; López-Otín et al. 2013). At the molecular level, studies of loss-of-function mutations in model organisms such as yeast, Caenorhabditis elegans, Drosophila melanogaster, and mice have successfully identified key pathways underlying aging and longevity including the conserved insulin/insulin-like growth factor signaling (IIS) and target of rapamycin (TOR) nutrient-sensing network (Piper et al. 2008; Fontana et al. 2010; Gems and Partridge 2013; Pan and Finkel 2017). More recently, sequencing of whole genomes, transcriptomes, and epigenomes corroborated that aging has a complex genetic basis involving many genes and is accompanied by changes across a broad range of interconnected molecular functions (López-Otín et al. 2013).Although there has been a predominant focus on understanding the links between genes and phenotypes correlated with aging, the role of transposable elements (TEs) in senescence and longevity has received less attention even though their discovery by Barbara McClintock goes back more than half a century ago (McClintock 1950). TEs, or transposons, are selfish genetic elements that replicate and move within and between genomes of their hosts (Gilbert and Feschotte 2018). In eukaryotes, TEs typically constitute a considerable portion of the genome, with estimates around ∼3% in yeast, ∼20% in D. melanogaster, ∼45% in humans, and ∼85% in maize (Quesneville et al. 2005; Schnable et al. 2009; de Koning et al. 2011; Carr et al. 2012). To date, several thousand TE families broadly classified into DNA-transposons and retrotransposons, which multiply via DNA or RNA intermediates, respectively, have been identified and are known to vary hugely in their transpositional mobility (Jurka et al. 2011; Deniz et al. 2019). For example, only a small fraction of L1 retrotransposons are responsible for most of the transposition events in the human genome, whereas the vast majority of L1s and other TE families have been inactivated by the accumulation of structural and point mutations over evolutionary time scales (Brouha et al. 2003).In spite of the substantial evidence implicating TEs in adaptive evolution and diseases, the majority of transposons residing in the genome are likely to be neutral or deleterious for host fitness (Barrón et al. 2014; Arkhipova 2018). Yet, their exact physiological functions and the extent to which particular TE insertions or whole TE classes contribute to host fitness is still poorly understood (Brunet and Doolittle 2015). In general, TE mobility causes genomic instability through insertional mutagenesis, which can directly affect coding sequences of genes or modify their transcription. Typically, TE insertions into or close to genes impose negative consequences on health and have been associated with ∼100 diseases in humans, including cystic fibrosis, hemophilia, and cancer (Hancks and Kazazian 2012; Payer and Burns 2019). It is not just through the insertion of TEs that their presence may be deleterious, but also by causing detrimental chromosomal rearrangements resulting from ectopic recombination between TE families with similar sequences in different genomic locations (Montgomery et al. 1987; Charlesworth et al. 1992; Petrov et al. 2011). Additionally, TE expression and translation also allow the formation of toxic TE products that, for example, contribute to autoimmune diseases, whereas TE activity and replication of an increased genomic TE content might indirectly impose metabolic costs to the host (Kaneko et al. 2011; Barrón et al. 2014; Volkman and Stetson 2014; Bogu et al. 2019). On the other hand, there is mounting experimental evidence for positive selection on segregating TE insertions conferring beneficial phenotypic properties in multiple taxa including insecticide and virus resistance in Drosophila; but overall, changes in the genomic TE landscape in response to selection are still poorly understood (Daborn et al. 2002; Magwire et al. 2011; Kuhn et al. 2014; Li et al. 2018; Rech et al. 2019; Salces-Ortiz et al. 2020).A common feature of TEs observed in various organisms including yeast, D. melanogaster, C. elegans, mice, and humans is the age-associated increase in transposition and expression, which usually coincides with weakening of the host TE silencing machinery and loss of genomic stability (Maxwell et al. 2011; Dennis et al. 2012; Solyom et al. 2012; De Cecco et al. 2013; Li et al. 2013; Gorbunova et al. 2014; Chen et al. 2016; Bogu et al. 2019; De Cecco et al. 2019). TEs have further been implicated in age-related neurodegenerative diseases (Krug et al. 2017; Prudencio et al. 2017; Guo et al. 2018) and might promote chronic inflammation observed during aging (Chen et al. 2014; De Cecco et al. 2019) further supporting the involvement of TEs in senescence and longevity as proposed by the emerging “transposable element theory of aging” (Kirkwood 1989; Sedivy et al. 2013). The age-related change in TE activity detected in many tissues has mainly been attributed to chromatin remodeling and the decline in repressive heterochromatin structure, which is commonly rich in TEs (Dimitri and Junakovic 1999; Wood and Helfand 2013; Chen et al. 2016; Wood et al. 2016). TEs that are not suppressed by chromatin structure are the target of post-transcriptional silencing by the host RNA-interference (RNAi) machinery, mostly the piwi-interacting RNA (piRNA) pathway, which is in turn also necessary for heterochromatin formation and stability (Lippman and Martienssen 2004; Martienssen and Moazed 2015). Indeed, research has identified longevity-promoting effects of several genes involved in the RNAi machinery and heterochromatin formation (Mori et al. 2012; Wood and Helfand 2013; Wood et al. 2016). Interestingly, it is possible that age-related misexpression of TEs is exclusive to the soma due to efficient post-transcriptional TE silencing mediated by the piRNA machinery in the germline (Sturm et al. 2015; Elsner et al. 2018; Erwin and Blumenstiel 2019). Considering current evidence, it seems natural that longevity can be achieved through impeding TE activity and controlling the genomic content of TEs. However, whether variation in aging and lifespan within species is also mediated by transposons and their role in the evolution of senescence is largely unknown.Here, we analyze published genomes of D. melanogaster populations experimentally selected for increased lifespan through postponed reproduction from four independent studies to understand the role of TEs in the evolution and genomic basis of late-life performance and aging. The invertebrate D. melanogaster is an excellent model in this respect as it exhibits abundant genetic and phenotypic variation in fecundity and traits related to aging that can be selected for. In the present experiments, replicate populations derived from nature were subjected to a late-life breeding scheme in which only flies surviving and fertile at old age contributed to the subsequent generations, whereas control individuals reproduced earlier in life. When the genomes of early- and late-breeding populations were sequenced, the selection process had continued for ∼170 and ∼150 generations for Carnes et al. 2015 (Carnes2015) and Fabian et al. 2018 (Fabian2018), and for 58 and 50 generations for Hoedjes et al. 2019 (Hoedjes2019) and Remolina et al. 2012 (Remolina2012) enabling us to quantify differences in TE content of long- and short-term evolutionary responses. Selection for postponed senescence has resulted in phenotypic divergence of multiple fitness traits, most notably an ∼8% to ∼74% increase in lifespan and improved old age fecundity at the cost of reduced early reproduction (Luckinbill et al. 1984; Rose 1984; Remolina et al. 2012; Carnes et al. 2015; Fabian et al. 2018; Hoedjes et al. 2019; May et al. 2019). At the genome level, previous analysis of genetic differentiation has revealed a significant sharing in candidate genes across the four studies indicating parallel evolution (Hoedjes et al. 2019), but, at the same time, exposed multiple novel targets of selection. For instance, three of the studies report genetic and/or transcriptomic divergence in immunity genes, and it has recently been confirmed that these molecular changes reflect differences in traits related to pathogen resistance (Fabian et al. 2018). Thus, despite variations in the experimental designs, numerous evolutionary repeatable phenotypic and genetic adaptations have been observed, but the importance of TEs in these studies has remained unexplored. Therefore, our main objective was to investigate for the first time whether TE abundance in the genome, and host genes related to TE regulation, had undergone similar parallel changes. Using RNA-seq data from Carnes et al. (2015), we further test if males and females of selected populations evolved to suppress TE transcription to mitigate potentially negative effects on longevity.
Results
Selection for Postponed Reproduction Affects Genomic Abundance of TE Families
To analyze if selection for longevity affected TE copy number, we used DeviaTE (Weilguny and Kofler 2019) on whole-genome pool-sequences of a total of 24 late-breeding, long-lived selection (S) and 22 early-breeding control (C) populations from four studies (see supplementary table S1, Supplementary Material online, for details on experimental designs) (Remolina et al. 2012; Carnes et al. 2015; Fabian et al. 2018; Hoedjes et al. 2019). DeviaTE is an assembly-free tool that estimates genomic abundance of 179 TE families by contrasting the sequencing depth of TEs and five single-copy genes taking internal deletions within TEs into account (supplementary fig. S1, Supplementary Material online).After removing TE families with poor mapping quality and insufficient coverage most of which had very low or zero copy numbers, we screened for differences in abundance between control and selection breeding regimes of 110 to 115 TE families dependent on the study, using three complementary approaches that vary in stringency (see overview in supplementary fig. S1, Supplementary Material online, and Materials and Methods, summary statistics in supplementary table S2, Supplementary Material online). In brief, we (1) analyzed studies independently, (2) fit models combining all studies using proportions of TE family abundance relative to the total genomic TE content, and (3) tested if copy number differences are driven by TE expansions specific to particular populations by investigating if changes in TE abundance are consistent across all replicates within regime and study. For all methods, we found more TE families with higher copy numbers in selected populations relative to controls than vice versa, with the exception of the high protein/sugar larval diet regime in Hoedjes2019 (table 1, see Supplementary Results, for breeding regime differences within each diet also see supplementary table S3 and fig. S2, Supplementary Material online).
Table 1
Number of Detected TE Families (N) and Percentage of Families More Abundant in Selected (S > C) or Control Regimes (C > S) or Not Different (NS) Using Three Different Approaches (also see supplementary fig. S1 and table S2, Supplementary Material online).
Approach
Study
N
N (sign.)a
S > C
C > S
NS
#1
For Hoedjes2019:
∼Regime+Diet+Regime × Diet
For other studies: ∼Regime+Pop[Regime]
Carnes2015
112
103
73%
19%
8%
Fabian2018
110
68
34%
28%
38%
Hoedjes2019
115
52
36%
10%
54%
Remolina2012
110
51
37%
9%
54%
#2
∼Study+Regime+Study × Regime
Studies Combined
103
Regime: 41
Study: 101
Study × Regime: 65
33%
7%
60%
#3
Consistent differences between all S and C populations
Carnes2015
112
43%
2%
55%
Fabian2018
110
14%
7%
79%
Hoedjes2019: Lowb
115
37%
2%
61%
Hoedjes2019: Mediumb
115
3%
0%
97%
Hoedjes2019: Highb
115
3%
29%
69%
Remolina2012
110
3%
0%
97%
Significant after Bonferroni correction at α = 0.01 and |δinsertion| > 0.3 in approach #1, and FDR < 0.05 in approach #2.
Three larval diet conditions; low had 0.25× less and high had 2.5× more sugar and protein compared with medium diet.
Number of Detected TE Families (N) and Percentage of Families More Abundant in Selected (S > C) or Control Regimes (C > S) or Not Different (NS) Using Three Different Approaches (also see supplementary fig. S1 and table S2, Supplementary Material online).#1For Hoedjes2019:∼Regime+Diet+Regime × DietFor other studies: ∼Regime+Pop[Regime]#2∼Study+Regime+Study × RegimeRegime: 41Study: 101Study × Regime: 65#3Consistent differences between all S and C populationsSignificant after Bonferroni correction at α = 0.01 and |δinsertion| > 0.3 in approach #1, and FDR < 0.05 in approach #2.Three larval diet conditions; low had 0.25× less and high had 2.5× more sugar and protein compared with medium diet.For the downstream analysis, we describe TE families varying between regimes as defined by approach #1 (fig. 1, table 1). In this approach, between 34% and 73% of all TE families had a significantly larger number of genomic insertions in the selected populations relative to controls (from here on, referred to as S > C TEs) after Bonferroni correction for multiple testing and filtering out TE families with small differences between regimes (supplementary fig. S3 and Supplementary Results, Supplementary Material online). In contrast, only 9–28% of TE families showed the opposite pattern and had more insertions in the controls (from here on, referred to as C > S TEs).
Fig. 1.
Dynamics of TE copy number change between breeding regimes. (A) Log2 fold change in average genomic insertions of the late-breeding selected populations (“S”) relative to early-breeding controls (“C”). The dashed line indicates no difference between regimes. >0 and <0 denote TE families with a larger abundance in selected populations (“S > C”) or with more insertions in controls (“C > S”), respectively. Number of TE families in these two categories are given in the center at the top and bottom of each plot. TE subclasses are given in different colors. Selected flies had more genomic insertions than controls for most TE families (also see table 1). (B) Difference in the magnitude of absolute log2 fold change between C > S and S > C TE groups. Significant difference between TE groups was determined using t-tests for each study. (C) Magnitude of absolute log2 fold change between studies, analyzed using ANOVA with Study as single term (F3,358 = 106.5, P < 2e-16) and pairwise Tukey post-hoc tests. * P < 0.05; ** P < 0.01; *** P < 0.001; ns, not significant. (D) Total number of genomic TE insertions. We used ANOVA to test the effects of Study, Regime and the Study × Regime interaction (see supplementary table S4, Supplementary Material online, for a summary of the statistical analysis).
Dynamics of TE copy number change between breeding regimes. (A) Log2 fold change in average genomic insertions of the late-breeding selected populations (“S”) relative to early-breeding controls (“C”). The dashed line indicates no difference between regimes. >0 and <0 denote TE families with a larger abundance in selected populations (“S > C”) or with more insertions in controls (“C > S”), respectively. Number of TE families in these two categories are given in the center at the top and bottom of each plot. TE subclasses are given in different colors. Selected flies had more genomic insertions than controls for most TE families (also see table 1). (B) Difference in the magnitude of absolute log2 fold change between C > S and S > C TE groups. Significant difference between TE groups was determined using t-tests for each study. (C) Magnitude of absolute log2 fold change between studies, analyzed using ANOVA with Study as single term (F3,358 = 106.5, P < 2e-16) and pairwise Tukey post-hoc tests. * P < 0.05; ** P < 0.01; *** P < 0.001; ns, not significant. (D) Total number of genomic TE insertions. We used ANOVA to test the effects of Study, Regime and the Study × Regime interaction (see supplementary table S4, Supplementary Material online, for a summary of the statistical analysis).To explore whether the dynamics of TE copy number change are similar among studies, we first contrasted log2 fold changes in abundance between S > C and C > S TEs. S > C TEs had a significantly larger magnitude of change than C > S TEs in the short-term evolution study of Remolina2012, whereas the opposite pattern was observed for Fabian2018, but no differences were found for Carnes2015 and Hoedjes2019 (fig. 1t-tests, Fabian2018 and Remolina2012: P < 0.05; Carnes2015: P = 0.466; Hoedjes2019: P = 0.123). Moreover, studies differed significantly in the size of log2 FC values in the order of Carnes2015 > Fabian2018 > Hoedjes2019 = Remolina2012 (fig. 1 ANOVA with Study term, Tukey HSD, P < 0.01 for all pairwise comparisons except Hoedjes2019-Remolina2012, P = 0.995), seemingly scaling with the length of selection (Carnes2015: 170; Fabian2018: ∼146; Hoedjes2019: 58; Remolina2012: 50 generations).We next asked if changes in TE abundance are driven by certain TE subclasses (long terminal repeat, LTR; nonlong terminal repeat, non-LTR; terminal inverted repeat, TIR) or class (RNA, DNA) and tested S > C and C > S TEs for enrichment of these types using two-sided Fisher’s exact tests. We only detected that TIRs and DNA-class TE families were underrepresented (i.e., overrepresentation of RNA-class) in the C > S group of Carnes2015, and overrepresented in S > C TEs of Hoedjes2019 (Carnes2015, TIRs: P = 0.044; DNA/RNA class: P = 0.024; and Hoedjes2019, TIRs: P = 0.03; DNA/RNA class: P = 0.032), whereas there was no enrichment in Fabian2018 and Remolina2012.Despite many individual TE families having a higher genomic abundance in the selected populations, the whole genomic TE content was not significantly different between the regimes, but varied among studies (fig. 1, and supplementary table S4, Supplementary Material online). This was perhaps partly driven by the fact that although C > S TEs were fewer in number than S > C TEs, they showed a significantly higher difference in insertion counts in the two long-term evolution studies, whereas S > C TEs only had a higher change in Remolina2012 (supplementary fig. S4, Supplementary Material online, t-test using δInsertion values; Carnes2015: P = 0.046; Fabian2018: P = 0.02; Hoedjes2019: P = 0.727; Remolina2012: P = 0.014). The nonsignificant difference in overall genomic TE load could therefore be a result of a large number of S > C TEs with small differences that are balanced by fewer C > S TEs with large differences. We further analyzed the whole genomic abundance of individual TE subclasses and identified a significantly higher TIR content in selected populations compared with controls (supplementary fig. S5, Supplementary Material online, ANOVA, both Regime and Regime × Study factors, P < 0.001), but this effect was strongly influenced by Carnes2015 (Tukey HSD, Regime × Study factor testing for C vs S within studies, Carnes2015: P < 0.0001; other studies: P > 0.85). We also detected that selected populations had a larger LTR retrotransposon load than controls (ANOVA, Regime factor, P = 0.026), whereas non-LTR content did not differ significantly. Finally, we note that studies in general varied significantly in total TE content and subclass-specific loads (ANOVA, Study factor, P < 0.0001 in all models).In summary, our results demonstrate that selection for postponed reproduction leads to evolutionary repeatable increases in copy number of many TE families relative to early bred controls, but without affecting the overall genomic TE load.
TE Families Varying in Genomic Abundance Differ in Evolutionary Age and Activity
We next tested if differences in TE activity explain the changes in abundance of TE families between control and selected populations. In Drosophila, most TE families are considered to be active (Guio and González 2019), and it has been shown that the average population frequency of TE insertions within a family serves as a good proxy for recent activity and age of TE invasion (Kofler et al. 2012; Kofler et al. 2015).We first determined the exact genomic location and frequency of TE insertions using PoPoolationTE2 (Kofler et al. 2016) and calculated average population frequency across all insertion sites for each TE family. As expected, the number of detected TE insertions which could be mapped to genomic locations partially scaled with coverage (see Materials and Methods): across all populations within a study, we found 13,018 TE insertions in Hoedjes2019, 8,402 in Fabian2018, and 4,502 in Remolina2012, which is in the range recently identified in natural populations (i.e., 4,277–11,649 TE insertions in Lerat et al. 2019). The least number of TE insertion locations was found for Carnes2015 for which we detected an unusually small number of 567 TE insertions, likely reflecting a large number of false negatives due to low sequencing depth. For each TE family, we then averaged frequencies across all of its detected genomic positions to estimate the mean frequency at which a TE is segregating in a population (Kofler et al. 2015). Studies varied in the minimum average TE family frequency in the order of Carnes2015 > Remolina2012 > Fabian2018 > Hoedjes2019, which is likely a further effect of dissimilar sequencing depths and other experimental factors (average frequency ranges of Hoedjes2019: 0.01–0.9; Fabian2018: 0.02–1; Remolina2012: 0.04–0.84; Carnes2015: 0.19–0.9). Therefore, the TE frequencies of Carnes2015 need to be interpreted with care, considering the likely insufficient amount of data.To get unbiased average TE frequency estimates independent of coverage fluctuations across studies, we also obtained previously published average frequencies from a single natural South African (SA) population (Kofler et al. 2015; Kofler 2019). The SA population had a higher sequencing depth than all studies here (i.e., 381×) and thus presumably a more accurate estimate of TE frequencies. Notably, this population was not subjected to any selection or control treatment and was only maintained eight generations in the lab before sequencing. Average genome-wide TE frequencies of control and selected populations of Fabian2018, Hoedjes2019, and Remolina2012, but not Carnes2015, were significantly correlated with the SA TE frequencies (supplementary fig. S6A, Supplementary Material online; Spearman’s p, Fabian2018: 0.65; Hoedjes2019: 0.61; Remolina2012: 0.58, all three P < 0.0001; Carnes2015: 0.1, P = 0.403), demonstrating that the SA population can function as an appropriate reference here.In accordance with previous reports, we observed a negative correlation between genomic abundance and average frequency of TE families (supplementary fig. S6B, Supplementary Material online, Spearman’s p between TE abundance and average frequency of SA population: p = –0.43 to –0.53, all P < 0.0001; similar when frequencies of experimental evolution studies were used: p = –0.07 to –0.41, all P < 0.001 except Carnes2015, P = 0.541) (Petrov et al. 2011; Kofler et al. 2015).We then compared TE family frequencies between the C > S and S > C groups in the SA population and found that C > S TEs generally had a significantly lower frequency than S > C TEs (fig. 2, t-tests between C > S and S > C TE family frequencies, P < 0.05 for all four studies). Notably, whereas C > S TEs included almost exclusively TE families of low frequency, the S > C group comprised TE families ranging from low to high frequency. As there were more S > C than C > S TEs, we also contrasted the average frequencies of the top 10 C > S and S > C TEs with the biggest changes in genomic abundance defined by log2 FC values (fig. 1). We only detected a significantly higher frequency in top 10 S > C relative to C > S TEs for Carnes2015 (t-test, P = 0.03), but not in the other three studies. Considering the relationship between insertion age, frequency and activity of TE families (Kofler et al. 2015), the lower frequency of C > S TEs suggests that they are, on average, evolutionary younger and potentially more active than S > C TEs.
Fig. 2.
Differences in average TE frequency. Average TE frequency from the South African population separated into C > S (blue) and S > C TE families (red) are shown on the Y-axis. We investigated differences considering all C > S and S > C TEs (“All”) or only the top 10 TE families with the biggest differences in log2 FC of insertions (“Top 10”). t-tests were used to assess statistical significance. ns, not significant; * P < 0.05; ** P < 0.01; *** P < 0.001.
Differences in average TE frequency. Average TE frequency from the South African population separated into C > S (blue) and S > C TE families (red) are shown on the Y-axis. We investigated differences considering all C > S and S > C TEs (“All”) or only the top 10 TE families with the biggest differences in log2 FC of insertions (“Top 10”). t-tests were used to assess statistical significance. ns, not significant; * P < 0.05; ** P < 0.01; *** P < 0.001.
Genetic Drift Is Not Driving Differences in TE Abundance
A major challenge in experimental evolution studies is to differentiate selection from the confounding genomic signals of genetic drift, which might be amplified by small effective population sizes (N) or varying generations spent in the lab between control and selected populations. We therefore calculated genome-wide nucleotide diversity across 100 kb windows using the two estimators π and Watterson’s θ as a proxy for N. With the exception of Fabian2018, where π was equal between regimes (ANOVA, Regime factor, P = 0.179), we found that both estimators were significantly higher in selected relative to control populations (supplementary table S5, Supplementary Material online; ANOVA, Regime factor, all P < 0.0001). A generally reduced N in controls should lead to the loss of low and fixation of high-frequency TEs so that C > S TEs would have higher frequencies on average under neutrality. However, we observed the opposite pattern in our analysis above, suggesting deviations from neutral expectations (fig. 2).To further formally test if the increased abundance of many TE families is driven by selection on preexisting TE insertions or genetic drift alone, we performed population-genetic simulations using the correlated average TE frequencies from the natural SA population (Kofler et al. 2015) as a starting point (see supplementary fig. S6A, Supplementary Material online, and results above). We simulated TE frequency change in selected and control populations 5,000 times given the reported consensus population sizes as N, generations and number of replicates. We then asked how often the same or a higher relative proportion of S > C to C > S TEs as in our observations is obtained (table 1). Although the results from Carnes2015, Hoedjes2019, and Remolina2012 were significantly different from the expected proportions, the TE abundance differences of Fabian2018 could be caused by genetic drift alone (supplementary fig. S7, Supplementary Material online). The difference in Fabian2018 is a result of the relatively small population size combined with the large number of generations the control populations spent in the laboratory, which increases the number of S > C TEs expected by chance. Testing different ranges of the reported population sizes and assuming that only 50% and 25% of flies in the selected populations were able to breed at old age resulted in qualitatively similar results (not shown). We also quantified expected proportions of TE families consistently varying in frequency across simulated replicates: although there were generally more TE families consistently higher in abundance in selected populations (table 1, approach #3), all our simulations resulted in more TE families with a consistently higher frequency in controls. The increased genomic abundance of many TE families in selected populations is therefore unlikely to be solely caused by genetic drift.
Limited Evidence for Selection on TE Abundance and Insertion Frequencies
Considering the deviation from neutrality, we next asked if the parallel patterns in TE abundance are caused by the same or different TE families, which could indicate that genomic copy number of certain TE families affects lifespan and has been consistently shaped by selection in all four studies. Among the 103 common TE families, we identified 6 S > C and 2 C > S TEs showing the same direction of copy number change in all four studies (fig. 3, supplementary table S6, Supplementary Material online). Despite the larger overlap in S > C TEs, only the Remolina2012/Hoedjes2019 and Carnes2015/Remolina2012/Hoedjes2019 overlaps were not expected by chance (SuperExactTest, P < 0.05). Yet, we found that the most common telomeric TE HeT-A (Casacuberta 2017) was on average more abundant in selected populations in all four studies (supplementary fig. S1, Supplementary Material online, also identified by approach #2, see supplementary table S2, Supplementary Material online), suggesting that long-lived populations might have evolved longer telomeres to avoid attrition, which is considered to be a key conserved mechanism of aging (López-Otín et al. 2013). In contrast to S > C TEs, the overlap in C > S TEs across all four studies, two triple set and two double set comparisons, was significant (fig. 3, supplementary table S2, Supplementary Material online). Potentially, a higher genomic abundance of G-element and G2 found in the control populations of all studies is detrimental for longevity and late-reproduction (fig. 3). However, we did not observe any significant Spearman’s correlation coefficients in pairwise comparisons of log2 FC values between studies except for Hoedjes2019-Remolina2012 (p = 0.28, P = 0.004), showing that TE families generally lack parallel changes in abundance.
Fig. 3.
Selection on TE abundance and insertions. (A) Overlap of TE families with significant abundance differences among studies. S > C and C > S denote TE families with a higher abundance in selected or control populations, respectively. Red bars indicate a significant overlap at P < 0.05. The combination of studies within all possible overlaps is indicated by dots below the plot (also see supplementary table S6, Supplementary Material online). (B) Boxplots of the number of genomic insertions relative to the total genomic content of the two significantly shared C > S TEs. (C) Genome-wide differentiation in TE insertion frequency between selected and control populations in Fabian2018 and (D) Hoedjes2019. Every point indicates the -log10 P-value of a TE insertion across chromosomal arms (alternating black and grey color). The solid orange line corresponds to the Bonferroni cut-off at α = 0.05 (Fabian2018: P < 5.9 × 10−6; Hoedjes2019: P < 3.8 × 10−6). Red and blue points denote TE insertions with a significantly higher frequency in selected or control populations, respectively. More details including exact positions, frequency and annotation of candidate TE insertions can be found in supplementary table S7, Supplementary Material online.
Selection on TE abundance and insertions. (A) Overlap of TE families with significant abundance differences among studies. S > C and C > S denote TE families with a higher abundance in selected or control populations, respectively. Red bars indicate a significant overlap at P < 0.05. The combination of studies within all possible overlaps is indicated by dots below the plot (also see supplementary table S6, Supplementary Material online). (B) Boxplots of the number of genomic insertions relative to the total genomic content of the two significantly shared C > S TEs. (C) Genome-wide differentiation in TE insertion frequency between selected and control populations in Fabian2018 and (D) Hoedjes2019. Every point indicates the -log10 P-value of a TE insertion across chromosomal arms (alternating black and grey color). The solid orange line corresponds to the Bonferroni cut-off at α = 0.05 (Fabian2018: P < 5.9 × 10−6; Hoedjes2019: P < 3.8 × 10−6). Red and blue points denote TE insertions with a significantly higher frequency in selected or control populations, respectively. More details including exact positions, frequency and annotation of candidate TE insertions can be found in supplementary table S7, Supplementary Material online.Abundance of TE families in selected populations might also be increased because selection acted on a large number of segregating TE insertions resulting in frequency divergence between control and selected populations. We therefore screened all identified TE insertion sites for significant frequency differences between regimes in each study by performing ANOVAs on arcsine square root transformed frequencies (supplementary table S7, Supplementary Material online). After correcting for multiple testing, we detected significant frequency differences for 38 TE insertions in Fabian2018 and 100 in Hoedjes2019 (fig. 3). At the gene level, the significant TEs were inside or within 1 kb of 29 and 98 genes in Fabian2018 and Hoedjes2019, respectively, and none were shared between the two studies. However, in Carnes2015 and Remolina2012 insertions did not show significant frequency differentiation even at a less stringent cut-off (FDR < 0.05).We further tested if the differences in abundance of TE families are driven by selection acting on segregating TE insertions at the family level, thereby changing average TE family frequency. In line with the previous analysis, we found little evidence for parallel changes in genomic abundance and TE family frequency between regimes, except for Carnes2015 (supplementary table S8, Supplementary Material online; Carnes2015: 26 TE families significant for abundance and frequency; other studies: 0 to 2).Thus, although differences in abundance of TE families are unlikely to be driven by neutral evolution alone, we only found limited evidence for parallel evolution of TE copy numbers and sparse TE frequency differentiation, suggesting that TE abundance differences are a result of increased transposition events in selected populations.
Sex, Age, and Selection Regime Affect TE Expression
To test whether the increased genomic abundance of TE families in selected flies is explained by a higher transcriptional activity, we analyzed available RNA-seq data from whole flies of Carnes2015 (fig. 4 see supplementary table S9, Supplementary Material online, for the complete statistical analysis). We first fit a model with Sex, Age, and Regime to every TE family and each gene on the major chromosomal arms (supplementary fig. S8, Supplementary Material online). In line with sex differences in gene expression observed by Carnes et al. (2015), ∼92% of TE families had a significant sex term of which most had a higher expression in males than females.
Fig. 4.
Multiple factors influence TE expression. (A) Proportions of differentially expressed TE families at adjusted P < 0.05 and directionality relative to 123 TE families with detectable expression for factors from statistical models on pre-filtered read counts in DESeq2 (also see supplementary table S9, Supplementary Material online). “Sex” refers to the results of the model including Sex (M, males; F, females), Age (young; old), and Regime (C, control; S, selected). “Regime,” “Age,” and “R × A” (i.e., Regime × Age interaction) refer to results from model fits with males and females separately analyzed. The absolute number of TE families for factor levels are given above or below bars. (B) Log2 fold change of regime (selected vs control) and (C) age (young vs old) for males and females. Colors designate TE families significant only in males (violet), or females (green), or shared between both sexes (orange). Not significant TE families are in grey. (D and E) Log2 fold changes across regime against age differences in males and females. Colors designate TE families significant only for regime (violet), or age (green), or for both factors (orange). Not significant TE families are in grey. (F) Relationship of log2 fold changes in TE expression and genomic abundance between regimes in females. (B to E): r, Pearson’s correlation coefficient; (F): p, Spearman’s correlation coefficient; * P < 0.05; *** P < 0.0001; ns, not significant.
Multiple factors influence TE expression. (A) Proportions of differentially expressed TE families at adjusted P < 0.05 and directionality relative to 123 TE families with detectable expression for factors from statistical models on pre-filtered read counts in DESeq2 (also see supplementary table S9, Supplementary Material online). “Sex” refers to the results of the model including Sex (M, males; F, females), Age (young; old), and Regime (C, control; S, selected). “Regime,” “Age,” and “R × A” (i.e., Regime × Age interaction) refer to results from model fits with males and females separately analyzed. The absolute number of TE families for factor levels are given above or below bars. (B) Log2 fold change of regime (selected vs control) and (C) age (young vs old) for males and females. Colors designate TE families significant only in males (violet), or females (green), or shared between both sexes (orange). Not significant TE families are in grey. (D and E) Log2 fold changes across regime against age differences in males and females. Colors designate TE families significant only for regime (violet), or age (green), or for both factors (orange). Not significant TE families are in grey. (F) Relationship of log2 fold changes in TE expression and genomic abundance between regimes in females. (B to E): r, Pearson’s correlation coefficient; (F): p, Spearman’s correlation coefficient; * P < 0.05; *** P < 0.0001; ns, not significant.We therefore decided to test the effects of Regime, Age, and the Regime × Age interaction in the sexes separately (fig. 4, supplementary table S9, Supplementary Material online). We detected 41 (∼34% of total) and 27 TE families (∼22%) significantly different between regimes in males and females, respectively, with the majority being upregulated in controls (fig. 4). Among these, 19 TE families significant in both sexes also had the same directionality of expression change: 10 LTR-class TE families and 6 non-LTRs were higher expressed in controls, whereas 3 non-LTR TE families (TART-A, TART-B, and TAHRE) were upregulated in selected populations (supplementary table S10, Supplementary Material online). Interestingly, TART-A, TART-B, and TAHRE provide the enzymatic machinery for telomeric maintenance (Casacuberta 2017), again suggesting that reduced telomere attrition evolved in response to selection, paralleling the genome-based analysis. In general, regime affected TE expression in males and females similarly, as indicated by a significant correlation of log2 fold change values between sexes (fig. 4, Pearson’s r = 0.73, P < 0.0001). We further asked if the magnitude of log2 fold change varies between TE families more expressed in controls or selected populations, and did not find any significant difference (supplementary fig. S9, Supplementary Material online, t-test, females: P = 0.86; males: P = 0.95).Supporting the notion that TEs become derepressed during aging, the effect of age on TE expression in males was general as 107 of the 108 significant TE families (i.e., ∼88% of total) had a higher expression in older flies. Less pronounced differences were found in females where 8% of all TE families—all of which were retrotransposons—increased and 4% decreased expression with age (fig. 4). Moreover, consistent with a recent study (Chen et al. 2016), TE families upregulated in older females had on average a significantly higher log2 fold change relative to those that were downregulated (supplementary fig. S9, Supplementary Material online, t-test, P = 0.018). We further found 13 TE families with a significant age factor in both sexes (fig. 4, supplementary table S11, Supplementary Material online), of which copia, Burdock, R1, and R2 are already known to increase expression with age (Li et al. 2013; Chen et al. 2016).No TE families showed a significant Regime × Age term in males, but the interaction was significant for 28 TE families (∼23% of total) in females (fig. 4). Interestingly, most of these TE families were defined by a higher expression in young controls compared with selected flies of the same age (see supplementary fig. S10, Supplementary Material online, for example). Selected populations subsequently increased whereas controls decreased expression, meeting at a similar expression level at old age. This is comparable with recent studies which suggested that age-dependent changes in TE expression differ between genotypes (Erwin and Blumenstiel 2019; Everett et al. 2020).We next investigated if differential expression of TE families is specific or similar to the overall transcriptomic changes by comparing proportions of TE families and genes up- or downregulated or unchanged within levels of sex, regime, and age (supplementary fig. S11, Supplementary Material online). Distributions generally varied significantly (χ2 tests, P < 0.001 for all, except age factor in females: P = 0.129), demonstrating that these factors have different effects on TE and gene expression.To further examine whether the selected populations might have evolved to maintain a young TE expression profile, we compared differences between regimes with those that occurred with age (fig. 4). The correlation of log2 FC values between regime and age was weakly positive for TE families in females (Pearson’s correlation, females: r = 0.21, P = 0.021; males: r = –0.01, P = 0.875), and varied from the one for genes (1000 bootstrap replicates resampling 100 genes: mean Pearson’s correlation, females: r = –0.12, 95% CI: –0.13 to –0.11; males: r = 0.09, 95% CI: 0.08 to 0.1). Thus, expression of TE families between selected and control populations only mirrors the changes between young and old flies in females.In summary, our results suggest that selected populations of Carnes2015 evolved to reduce TE expression, but differences across sex and age were overall more dominant than variation between regimes.
Differences in TE Abundance Do Not Match TE Expression Patterns
We also asked if the change in abundance of TE families parallels the expression differences between selected and control populations. Notably, as the genomic TE abundance measures came from DNA pools of female flies, we did not do this comparison in males. We first confirmed that TE expression scaled robustly with the number of genomic insertions in each age-regime combination (Spearman’s p = 0.72; P < 0.0001; supplementary fig. S12, Supplementary Material online). Next, we investigated if there were parallel changes in 23 TE families significantly varying between regimes in expression and genomic abundance. We found that a majority of 13 TE families had nonparallel changes (supplementary table S12, Supplementary Material online). Indeed, log2 FC expression and log2 FC insertions between regimes were not significantly correlated (fig. 4, Spearman’s p = 0.14, P = 0.149), indicating that differences in TE abundance poorly predict differential expression between control and selected populations. As expected, correcting RNA-seq read counts for TE copy number to examine if average expression per TE insertion varies between regimes yielded qualitatively similar results compared with analyzing overall TE expression (supplementary table S13, Supplementary Material online). However, the tendency of TE families to be more highly expressed in controls was substantially larger (63 TE families more, 3 less expressed in controls), further emphasizing that selection for late-reproduction leads to a reduction in TE expression.
Little Study-Wide Sharing in Candidate Genes Involved in Regulation of TE Activity
We next hypothesized that if TE expression and transposition are predominantly detrimental for lifespan and aging, as proposed by many studies, experimental evolution for longevity would have likely resulted in selection on host alleles that influence TE activity. To test this, we screened 96 chromatin structure, piRNA, and transposition-associated genes known to be involved in TE regulation and silencing for clear-cut genetic and expression differentiation possibly driven by selection (supplementary table S14, Supplementary Material online). Of these, 3 to 10 genes were previously identified to be under selection across the four studies (fig. 5). We found that only two genes, E2f1 (FBgn0011766, Carnes2015/Fabian2018) and Hsp83 (FBgn0001233, Carnes2015/Remolina2012), were shared between two data sets, but both overlaps were expected by chance (SuperExactTest, P > 0.28). Although significant sharing among these studies has been reported when all candidate genes were considered (see Fabian et al. 2018 and Hoedjes et al. 2019), our results suggest little parallel evolution in TE regulation genes. Moreover, the four studies did not report any significant enrichment of GO terms related to transposon silencing and chromatin structure.
Fig. 5.
Number of genetically and transcriptionally differentiated genes involved in regulation of TE activity. (A) Counts of genetically differentiated (G.D.) TE regulation genes reported in the four experimental evolution studies. (B) Number of TE regulation genes differentially expressed (D.E.) between regimes (C, control; S, selected) and ages (young; old) in the RNA-seq data of Carnes2015 (whole female flies) and microarray data of Remolina2012 (female heads and abdomens) (also see supplementary table S14, Supplementary Material online, for information on all 96 TE regulation genes).
Number of genetically and transcriptionally differentiated genes involved in regulation of TE activity. (A) Counts of genetically differentiated (G.D.) TE regulation genes reported in the four experimental evolution studies. (B) Number of TE regulation genes differentially expressed (D.E.) between regimes (C, control; S, selected) and ages (young; old) in the RNA-seq data of Carnes2015 (whole female flies) and microarray data of Remolina2012 (female heads and abdomens) (also see supplementary table S14, Supplementary Material online, for information on all 96 TE regulation genes).Using the available RNA-seq data from whole flies of both sexes in Carnes2015 and microarray data from female heads and abdomens in Remolina2012, we then asked if TE regulation genes are differentially expressed (fig. 5). In Carnes2015, we found 19 and 29 TE regulation genes significant for regime in males and females, respectively, most of which tended to be upregulated in controls, whereas no genes differed in Remolina2012. Interestingly, similar to TE expression patterns in Carnes2015 (fig. 4), TE regulation genes showed a clear tendency for upregulation with age in males but to a lesser degree in females (fig. 5). Comparable patterns were detected in Remolina2012, where the age effect was stronger in abdominal compared with head tissue, suggesting that boosting the expression of TE regulation genes is common during aging.Taken together, the small number of genetically differentiated TE regulation genes, lack of TE-associated GO enrichment, and overall missing parallel patterns suggest that improving TE repression was either specific to studies and/or not a prime target of selection.
Discussion
Are TEs conferring an adaptive advantage as shown for many traits (Daborn et al. 2002; Magwire et al. 2011; Kuhn et al. 2014; Li et al. 2018; Rech et al. 2019) or should they be purged and repressed during the evolution of longevity due to their widespread negative effects on fitness (Chen et al. 2014; Krug et al. 2017; Prudencio et al. 2017; Guo et al. 2018; De Cecco et al. 2019)? In this report, we attempt to answer this controversial question by employing four independent data sets to present the first characterization of the genome-wide TE content and expression in D. melanogaster populations that were experimentally selected for late-life reproduction and longevity.
Does Longevity-Selection Lead to Changes in TE Abundance?
Variation in TE copy number has been associated with some geographic and climatic factors (Kalendar et al. 2000; Kreiner and Wright 2018; Lerat et al. 2019) in natural populations of plants and Drosophila and was shown to change during experimental evolution in different temperatures (Kofler et al. 2018). Our analysis revealed a repeatable trend showing that many, but not all, TE families have an increased number of genomic insertions in late-breeding, long-lived populations (fig. 1 and table 1). However, changes in TE abundance might depend on developmental diet as we observed the opposite pattern under high sugar/protein conditions compared with low and medium diets in Hoedjes2019 (supplementary table S3, Supplementary Material online). Overall, these findings indicate that reproductive age, with some dependency on developmental diet, is another factor influencing divergence in genomic TE abundance. Interestingly, we found a significant difference in the magnitude of TE abundance change between studies that roughly scaled with the number of generations under selection (fig. 1). Although parallel changes in TE characteristics within populations of the same selection regime have been reported by similar experiments (Graves et al. 2017; Kofler et al. 2018), it is striking that we observed this pattern in data created by four independent studies, although we note that the proportions of TE families that increased or decreased in abundance varied between studies. Despite many TE families being more abundant in long-lived populations, our analysis shows no significant difference in the total genomic TE content between control and selected populations (fig. 1), possibly partially driven by the fact that there were a few TE families with large increases in copy number in controls in contrast to many with small increases in abundance in selected populations in two of the studies (supplementary fig. S4, Supplementary Material online). Against our expectation that reducing the overall slightly deleterious TE content is beneficial for fitness, as demonstrated for some traits (Pasyukova et al. 2004), our results suggest that changes in the overall genomic TE load are not essential to evolve longevity or fecundity at old age in Drosophila. These findings are in contrast to recent work in several killifish species, which reported that TE expansion can cause an increased genome size with possible negative effects on lifespan (Cui et al. 2019). However, our analyses focused exclusively on the genomic TE load and as such we cannot exclude a difference in genome size between control and selected populations, which may be caused by other factors such as nonrepetitive InDels or repetitive DNA unrelated to TEs.
Are TEs Adaptive during the Evolution of Aging?
The evolution of the genomic TE content is driven by various factors, including replicative transposition, selection, genetic drift, and the TE defense machineries of the host (Charlesworth and Charlesworth 1983; Kofler 2019). By performing population-genetic simulations that consider only genetic drift, we were able to exclude that population size and generations spent in the lab per se cause an increased abundance of TE families in selected populations in three of the four studies (supplementary fig. S7, Supplementary Material online). Even though it is known that the majority of TE insertions are neutral or deleterious to fitness (Barrón et al. 2014; Arkhipova 2018), our findings suggest that factors other than genetic drift influenced TEs.From a selective point of view, increasing the copy number of many TE families might be beneficial for longevity, whereas only a small number of families may affect lifespan negatively. Although speculative, a higher TE abundance could result in an enhanced piRNA-production and a better protection from TEs, thereby improving survival (Jones et al. 2016; Luo et al. 2020). Under this scenario, selection would likely lead to parallel increases or decreases of the same TE families across studies. However, when we screened for parallel patterns in abundance change, we found only two TE families (G-element and G2) that had decreased copy numbers in selected flies and were significantly shared across all studies (fig. 3). Both elements are jockey-like non-LTR TEs, of which G2 is highly enriched in centromeric regions of the genome (Chang et al. 2019). Thus, changing centromeric structure by altering its TE content could be one mechanism modulating aging, but experimental evidence for this is still missing. In contrast to this, we did not find any significant overlap between all four studies among TE families with an increased abundance in the late-breeding populations. Unless many TE families had nonrepeatable effects on longevity, the small amount of significant sharing suggests that abundance of most TE families is neutral, but we note that the variable numbers of generations the four experiments have selected for may have also played a role.Another possibility is that abundance of TE families is altered through selection affecting TE insertions at a genome-wide scale, resulting in a large number of insertions significantly varying in frequency between control and selected populations. We found only a minor fraction of TE insertions in Fabian2018 and Hoedjes2019, but not in the other two studies, with significantly different frequencies between the regimes that are inside or within 1 kb of <100 genes (fig. 3 and supplementary table S7, Supplementary Material online). A small fraction of TE insertions with a higher frequency in selected populations were found in two of the studies. Taken together with the fact that there were very few differences in frequency of TE families, we propose that standing genetic variation presented by TEs plays a role in the evolution of aging, but it is unlikely to be a major driver of TE abundance differentiation. However, as we identified genomic locations of TEs only using PoPoolationTE2, which has been shown to have a low rate of false positives and performs better the higher the sequencing depth is, we might miss insertions that would otherwise have been found by comparable software (Kofler et al. 2016; Nelson et al. 2017; Lerat et al. 2019).Yet, we found that telomere maintenance, a key hallmark of aging known to be associated with mortality, diseases and the rate of senescence in several organisms might be improved in the late-breeding populations (Canela et al. 2007; López-Otín et al. 2013; Dantzer and Fletcher 2015; Foley et al. 2018; Whittemore et al. 2019). Among the three TE families constituting and maintaining D. melanogaster telomeres (Casacuberta 2017), HeT-A showed parallel increases in copy number in long-lived flies although the difference was less clear in two studies (fig. 3 and supplementary fig. S1, Supplementary Material online). Simultaneously, the few TE families transcriptionally upregulated in long-lived populations of Carnes2015 were almost exclusively telomeric elements (fig. 4). Despite similarities, the fundamental differences in telomeres between species make generalizations difficult (Mason et al. 2008). Moreover, previous studies in D. melanogaster and C. elegans failed to establish a connection between telomeres and lifespan, but telomere length might affect other traits such as fecundity (Raices et al. 2005; Walter et al. 2007). Also, in several species the rate of telomere shortening rather than the initial length itself was a better predictor for lifespan (Whittemore et al. 2019). Another complication yet to be addressed is if these patterns are caused by “intergenerational plasticity” of telomere length, determined by paternal age at reproduction as observed in several mammals including humans (Eisenberg et al. 2012; Eisenberg and Kuzawa 2018). Thus, the exact impact of telomere length on evolutionary fitness and aging remains poorly understood.
Is TE Expression Detrimental for Longevity?
At the transcriptional level, age-dependent misregulation of TEs, thought to be resulting from a gradual decline in heterochromatin maintenance, has been proposed to be harmful for lifespan in Drosophila (Li et al. 2013; Chen et al. 2016; Wood et al. 2016; Brown et al. 2020; Guo et al. 2018), mice (De Cecco et al. 2019), and humans (Bogu et al. 2019). Further supporting the notion that expression of many TEs is detrimental, our RNA-seq analysis indicates that long-lived populations evolved to downregulate TE families, and this effect was even more apparent after we corrected for genomic copy numbers (fig. 4 and supplementary table S13, Supplementary Material online). Considering the missing association between genomic abundance and TE transcription (fig. 4), this further suggests that lowering expression of TEs might be more important than purging them from the genome during the evolution of longevity.Overall, however, TE expression appeared to be more strongly influenced by sex and age compared with selection regime. Interestingly, the trend of TE families being less expressed in late-breeding populations and upregulated with age was more pronounced in male flies, which further had generally higher levels of TE expression relative to females (fig. 4 and supplementary fig. S8A, Supplementary Material online). These findings are consistent with recent work showing that males suffer more from TE derepression during aging due to their entirely repetitive, heterochromatin-rich Y chromosome (Brown et al. 2020). However, if the divergent TE expression patterns between sexes are caused by differences in tissue compositions and whether this disparity explains sexual dimorphism in lifespan is yet to be confirmed. DNA sequencing of male flies in the four experimental evolution studies would be necessary to determine if selection for postponed senescence had similarly strong effects on TE copy number of the Y chromosome.
Did Selection Lead to Differentiation in Genes Related to Regulation of TE Activity?
We also hypothesized that potential detrimental effects of TEs on longevity should be reflected by selection on genes related to TE regulation and transposition (fig. 5). Although parallel genetic changes have been reported among the four studies (Fabian et al. 2018; Hoedjes et al. 2019), we found that genetically and transcriptionally differentiated TE regulation genes were generally not shared between studies. Together with the missing functional enrichment associated with TE regulation, we hypothesize that improvement of chromatin structure/heterochromatin maintenance, piRNA-mediated silencing and modulators of transposition are not prime targets of selection during the evolution of longevity. This, however, does not preclude that other means of TE protection have evolved. It is becoming increasingly evident that TE expression acts as a causative agent of inflammation and immune activation in mammals (Kassiotis and Stoye 2016; De Cecco et al. 2019). Interestingly, Carnes2015, Fabian2018, and Remolina2012 all found significant divergence in innate immunity genes, whereas Fabian et al. (2018) demonstrated an improved survival upon infection and alleviated immunosenescence in the long-lived populations. Rather than reducing TE copy number and expression, selection might preferentially act on immunity genes to reduce TE-mediated inflammation and increase tolerance to TEs with extended lifespan as a consequence. It remains to be explored to what degree innate immune pathways other than the RNAi machinery contribute to TE regulation in D. melanogaster.
Is Reproduction at Old Age Associated with an Increased TE Content?
Our findings suggest that neither genetic drift nor pervasive selection on TEs or genes related to TE regulation is a predominant driver of the differences in TE family abundance. Perhaps, the most parsimonious explanation, therefore, is that postponed reproduction increases the chance for many TE families to transpose and generate new germline insertions which are then passed on to the next generation. Transposition events could particularly be increased in old males, where the high TE expression suggests derepression, but germline-specific analyses will be required to further understand these patterns. In particular, TE families of high frequency, which are putatively low in transpositional activity, might need the prolonged chronological time offered by late-life reproduction to achieve a successful genomic insertion (fig. 2). Over many generations, flies breeding at old age would have accumulated more TEs in the genome than populations reproducing early in life. Supporting this hypothesis, it has been demonstrated that most TE families had a higher rate of insertions in the ovaries of older relative to young P-element-induced dysgenic hybrids, even though at the same time fertility was restored and improved with age (Khurana et al. 2011). However, if this applies to nondysgenic fruit flies and whether it can result in a larger number of TEs over multiple generations has to our knowledge not yet been observed. Thus, TE accumulation in late-breeding populations is comparable to the regularly observed positive correlation between parental age and number of de novo mutations in offspring (Goldmann et al. 2019; Sasani et al. 2019). In line with this, genome-wide measures of nucleotide diversity were also repeatably larger in late-breeding populations across four experiments (supplementary table S5, Supplementary Material online). Although, we have not ruled out that greater nucleotide diversity was driven by other factors, such as reduced genetic drift and adaptation to the laboratory environment in the selected populations which spent fewer generations under the laboratory setting than controls, or balancing selection as proposed by one study (Michalak et al. 2017).Opposing our hypothesis, two recent studies in termites (Elsner et al. 2018) and D. melanogaster (Erwin and Blumenstiel 2019) suggest that the germline is protected from TE invasions through increased transcription of the piRNA machinery. Indeed, our expression analysis confirms that many genes associated with transcriptional and post-transcriptional TE silencing tend to be upregulated with age. Despite this, many TE families had a higher copy number in populations reproducing late in life. It therefore remains to be determined whether this age-dependent upregulation of TE regulation genes really equates to reduced insertional activity, because potential and realized TE repression might not necessarily match. The observation that these genes also tended to be more expressed in controls relative to selected flies in Carnes2015 further poses the question whether there is a trade-off between TE silencing in the germline and lifespan, which could be another mechanism explaining the rising abundance of TE families in the genomes of long-lived flies.Altogether, our work presents a novel viewpoint on the poorly understood role of TEs in aging and longevity that is largely, but not exclusively, neutral. However, the caveat remains that we are unable to rule out that survival of selected populations would be further extended if they had a reduced TE content and expression. In-depth studies tracking piRNA production in the germline together with direct measures of TE transposition rates throughout life or measuring longevity upon knockdown and overexpression of TEs would be crucial experiments to obtain a more complete picture.
Materials and Methods
Data Sets
We utilized genomic data from four independent studies performing laboratory selection for postponed reproduction on wild-derived replicate populations by only allowing flies of relatively old age to contribute to subsequent generations, whereas controls reproduced early in life (Remolina et al. 2012; Carnes et al. 2015; Fabian et al. 2018; Hoedjes et al. 2019) (supplementary table S1, Supplementary Material online). The experimental designs of the studies were overall comparable, but notable differences include the mode of selection, maintenance of controls, variable source populations, number of replicate populations, and generations at the time of sequencing. Moreover, Hoedjes2019 performed the selection for postponed senescence on three varying larval diets ranging from low to high sugar/protein content. The genomic analysis was based on available raw fastq files from whole-genome pool-sequencing of 100 to 250 females. RNA-seq data from Carnes et al. (2015) consisted of raw fastq files from pools of 50 flies. The study included transcriptomes of all selected and control populations, for which both sexes at two ages 3–5 days (young) and 26–35 days of age (old) have been sequenced in replicates. Microarray expression data from Remolina et al. (2012) are derived from heads and abdomens from females at the age of 1, 5, 15, 30, and 50 days of age from the three control and selected populations. See methods in the publications of each study for details on experimental design and supplementary table S1, Supplementary Material online, for a summary. For simplicity, we refer to Carnes et al. (2015) as Carnes2015, Fabian et al. (2018) as Fabian2018, Hoedjes et al. (2019) as Hoedjes2019, and Remolina et al. (2012) as Remolina2012 throughout this report. All statistics were done in R using in-built functions unless otherwise stated. More details on the bioinformatic pipeline are available in supplementary table S15, Supplementary Material online.
Genome-Wide TE Abundance
To quantify the number of genomic insertions for each TE family in selected and control populations we used DeviaTE (Weilguny and Kofler 2019) (supplementary table S15, Supplementary Material online). In brief, DeviaTE maps raw reads to an incorporated library of 179 TE family consensus sequences (Sackton et al. 2009; Bergman et al. 2018) and normalizes the obtained coverage values by the average depth of the same five single-copy genes (Act5C, p53, piwi, RpII140, RpL32). The distribution of normalized values reflects fluctuations in insertion number estimates within a TE family, where averaging overall consensus positions of a TE family gives the mean abundance per haploid genome (see Weilguny and Kofler 2019 for details). We restricted our downstream analysis to TE families that had a study-average of >=0.5 insertion estimates for at least 80% of the consensus positions within a TE family sequence. Thus, TE families without any mapped reads (i.e., 24–29 across studies), and with very low and/or very few insertion estimates were excluded from the downstream analysis. Dependent on the study, this resulted in 110–115 retained TE families.We then investigated if TE families vary in genomic abundance between control and selected populations using three different approaches (see supplementary fig. S1, Supplementary Material online, for a comprehensive description). In our least conservative approach #1, we analyzed studies by fitting Regime (control, selected) and Population[Regime] (replicate populations nested within regime) to normalized coverage values of consensus sequence positions within a TE family. For Hoedjes2019, we used a different model and included Regime, Diet (low, medium, high protein/sugar larval diet regime), and the Regime × Diet interaction. The normalized coverage values were averaged to obtain a single insertion estimate per TE family and population, and these values used for all the remaining analyses. We considered TE families as different between regimes if they were (a) significant after Bonferroni correction for multiple testing at α = 0.01, and (b) if regime averages varied by more than 0.3 insertions (i.e., on average, at least 30% of the chromosomes in the compared control/selection pools of flies have 1 more insertion), thereby filtering out TE families with small differences (supplementary fig. S3, Supplementary Material online). We further used SuperExactTest (Wang et al. 2015) to analyze if the overlap of TEs with a significantly higher genomic abundance in selected (“S > C”) or control populations (“C > S”) between postponed senescence studies is expected by chance.For approach #2, we arcsine square root transformed proportions of TE family copy number relative to the total genomic TE content within a population and analyzed all studies together rather than independently by fitting Study (four levels: Carnes2015, Fabian2018, Hoedjes2019, Remolina2012), Regime and the Study × Regime interaction as factors. TE families with an FDR < 0.05 were considered significant.Finally, our approach #3 is the most conservative as we only considered TE families that showed a consistent increase or decrease in copy number (i.e., average of insertion estimates across all consensus positions) within all selected relative to all control populations in each study and within diet regimes of Hoedjes2019.To analyze differences in the total genomic and subclass-specific (LTR, non-LTR, TIR) TE content, we summed up all TE insertion estimates within a population and fit models with Study, Regime and the Study × Regime interaction.
Genomic TE Locations and Activity/Age of TE Families
We first masked the D. melanogaster reference (v.6.27) for TE families present in the DeviaTE library using RepeatMasker (Smit et al. 1996) (supplementary table S15, Supplementary Material online). We then trimmed reads with cutadapt (Martin 2011) and mapped them using bwa bwasw (Li and Durbin 2009). PoPoolationTE2 was then employed to obtain the exact genomic positions and population frequency of TE insertions on chromosomes X, 2, 3, and 4 of each study using the joint analysis mode, which finds insertions by combining all samples rather than considering them separately (Kofler et al. 2016). Importantly, whereas abundance of TE families is quantified by the total number of reads mapping to a TE relative to single-copy genes (Weilguny and Kofler 2019), identifying the exact genomic location of insertions requires mates of a read-pair to map discordantly to the reference genome and TE sequence, and strongly depends on the sequencing depth and number of populations (Cridland et al. 2013; Kofler et al. 2016; Lerat et al. 2019). For each TE family, we calculated the average population frequency across all of its detected genomic locations within a population as a proxy for active or recent transposition events and evolutionary age (Kofler et al. 2015). We used Spearman’s correlation analysis to compare average frequency values of each study with average frequencies from a natural SA population sequenced to a high genomic coverage (Kofler et al. 2015), and to correlate abundance of TE families with average frequency. We employed t-tests to analyze if average population frequency from the SA population varies between TE families more abundant in selected or control populations, and also performed this analysis using only the top 10 TE families with the largest log2 FC values of abundance change.
Genome-Wide Nucleotide Diversity and Genetic Drift Simulations
We mapped trimmed paired-end reads against the repeat-masked reference genome, the TE library from DeviaTE (Weilguny and Kofler 2019), Wolbachia pipientis (NC_002978.6), and two common gut bacteria Acetobacter pasteurianus (AP011121.1), and Lactobacillus brevis (CP000416.1) using bwa mem (Li and Durbin 2009), and removed duplicates using PicardTools (supplementary table S15, Supplementary Material online). We then filtered and created pileup files using samtools mpileup (Li et al. 2009). To calculate nucleotide diversity π and Watterson’s θ across nonoverlapping 100-kb windows, we used Popoolation (Kofler et al. 2011) and then fitted ANOVA models including the factors Chromosome (X, 2 L, 2 R, 3 L, 3 R, 4), Diet, Regime, and the Diet × Regime interaction for Hoedjes2019, and Population[Regime], Chromosome, and Regime for all other studies. Average coverage across major chromosomal arms was 162×, 101×, 41×, and 23× for Fabian2018, Hoedjes2019, Remolina2012, and Carnes2015, respectively. We detected reads mapping to the genome of the intracellular bacterium Wolbachia in all populations.To test if TE family abundance differences can be caused by genetic drift alone, we compared proportions of S > C and C > S TEs from 5,000 simulations of TE frequency change to observed proportions from approach #1 and #3 (see supplementary methods, Supplementary Material online, for more details).
TE Frequency Differences
To identify genomic TE insertion sites putatively involved in lifespan and aging, we analyzed differences in arcsine square root transformed insertion frequencies between selected and control populations fitting models with Regime for Carnes2015, Fabian2018, and Remolina2012, and with factors Diet, Regime, and Diet × Regime for Hoedjes2019. Bonferroni correction at α = 0.05 was used to correct for multiple testing. Functional annotations were supplemented using SnpEff (v.4.0e, Cingolani et al. 2012) considering TE insertions within 1000 bp of the 5’ and 3’ UTR as upstream or downstream of a gene.We further analyzed whether each TE family varies in frequency between regimes by fitting the factors of Diet, Regime, and Diet × Regime for Hoedjes2019, or Regime and Population[Regime] for all other studies on arcsine square root transformed insertion site frequencies. FDR values were obtained by using “p.adjust” in R and TE families considered significant at FDR < 0.05.
RNA-seq Analysis
RNA-seq data from Carnes et al. (2015) consisted of two replicates of young and old males and females from all control and selected populations (supplementary table S1, Supplementary Material online). Raw reads were filtered using cutadapt (Martin 2011) and mapped to the repeat-masked reference genome, the TE library from DeviaTE, Wolbachia pipientis, Acetobacter pasteurianus, and Lactobacillus brevis (see above) using STAR (Dobin et al. 2013) (supplementary table S15, Supplementary Material online). Read counts were obtained using featureCounts (Liao et al. 2014). We next pre-filtered read count data by excluding all genes and TE families that did not have a sum of 400 counts across all 80 samples (i.e., on average five counts per sample). Five TE families that are not known to occur in D. melanogaster passed this filter and were excluded. For simplicity, the analysis was performed on average read counts from two replicates, as all replicates were highly significantly correlated (Pearson’s r ranging from 0.95 to 1, significant after Bonferroni correction). To analyze differential expression, we fit models using read counts of genes and TE families with DESeq2 in R (Love et al. 2014). First, a model testing the main effects of Regime (selected vs control), Sex (male vs female), and Age (young vs old) was fit. As the sex term was significant for most TE families, we decided to analyze males and females separately and fitted models with Regime and Age to analyze the main effects. To examine the interaction, we also fitted models including Regime × Age. We obtained log2 fold change values for each factor and the library-size normalized read counts from DESeq2 for further analysis. To investigate average expression per TE insertion, we divided read counts of TE families from females by the number of genomic insertions observed in each population, assuming that genes and 13 TE families that did not pass our filters in the genomic analysis have a single copy in the genome.
Evolution of TE Regulation Genes
The list of genes involved in TE regulation consisted of piRNA pathway genes also analyzed in Erwin and Blumenstiel 2019 and Elsner et al. 2018, and genes involved in heterochromatic and chromatin structure from Lee and Karpen 2017. We further added seven genes involved in these functions, and genes annotated to “regulation of transposition” (GO:0010528) and “transposition” (GO:0032196) according to FlyBase so that we ended up with a total of 96 genes (supplementary table S14, Supplementary Material online). We then screened the published genomic candidate gene lists from Carnes2015, Fabian2018, Hoedjes2019, and Remolina2012 for these genes. We also compared TE regulation genes with differentially expressed genes from the RNA-seq analysis of Carnes2015 (see above). We further obtained normalized microarray expression data from Remolina2012 of female flies at 1, 5, 15, 30, and 50 days of age (supplementary table S1, Supplementary Material online). Notably, the expression data were created from flies at 40 generations of selection compared with 50 generations in the genomic analysis. We fit a mixed effects model similar to the one used in their original publication with Age, Regime, and Age × Regime as fixed and replication within population-age combination as random effect. The two available tissues (heads and abdomens) were analyzed separately. A gene was considered to be differentially expressed if it had an FDR < 0.05.
Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.Click here for additional data file.
Authors: Marcelo A Mori; Prashant Raghavan; Thomas Thomou; Jeremie Boucher; Stacey Robida-Stubbs; Yazmin Macotela; Steven J Russell; James L Kirkland; T Keith Blackwell; C Ronald Kahn Journal: Cell Metab Date: 2012-09-05 Impact factor: 27.287
Authors: Kurt Whittemore; Elsa Vera; Eva Martínez-Nevado; Carola Sanpera; Maria A Blasco Journal: Proc Natl Acad Sci U S A Date: 2019-07-08 Impact factor: 11.205
Authors: Szilvia Solyom; Adam D Ewing; Eric P Rahrmann; Tara Doucet; Heather H Nelson; Michael B Burns; Reuben S Harris; David F Sigmon; Alex Casella; Bracha Erlanger; Sarah Wheelan; Kyle R Upton; Ruchi Shukla; Geoffrey J Faulkner; David A Largaespada; Haig H Kazazian Journal: Genome Res Date: 2012-09-11 Impact factor: 9.043