Literature DB >> 28338986

Genome-Wide Estimates of Transposable Element Insertion and Deletion Rates in Drosophila Melanogaster.

Jeffrey R Adrion1, Michael J Song2, Daniel R Schrider3, Matthew W Hahn1,4, Sarah Schaack5.   

Abstract

Knowing the rate at which transposable elements (TEs) insert and delete is critical for understanding their role in genome evolution. We estimated spontaneous rates of insertion and deletion for all known, active TE superfamilies present in a set of Drosophila melanogaster mutation-accumulation (MA) lines using whole genome sequence data. Our results demonstrate that TE insertions far outpace TE deletions in D. melanogaster. We found a significant effect of background genotype on TE activity, with higher rates of insertions in one MA line. We also found significant rate heterogeneity between the chromosomes, with both insertion and deletion rates elevated on the X relative to the autosomes. Further, we identified significant associations between TE activity and chromatin state, and tested for associations between TE activity and other features of the local genomic environment such as TE content, exon content, GC content, and recombination rate. Our results provide the most detailed assessment of TE mobility in any organism to date, and provide a useful benchmark for both addressing theoretical predictions of TE dynamics and for exploring large-scale patterns of TE movement in D. melanogaster and other species.
© The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  Drosophila melanogaster; deletion rate; insertion rate; transposable elements; transposition

Mesh:

Substances:

Year:  2017        PMID: 28338986      PMCID: PMC5447328          DOI: 10.1093/gbe/evx050

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Introduction

Transposable elements (TEs) make up a significant portion of most multicellular eukaryotic genomes and can profoundly influence their evolution (Burns and Boeke 2012). Often considered genomic parasites, these discrete DNA sequences are capable of moving and replicating throughout the genome and have been found to comprise ∼20% of the Drosophila melanogaster genome, and ∼65 and ∼85% of the human and maize genomes, respectively (Quesneville et�al. 2005; Schnable et�al. 2009; de Koning et�al. 2011). TE abundance is highly variable among taxa, as is the spatial distribution and differential proliferation of TE types within species (Kidwell 2002; Feschotte and Pritham 2007). While there are numerous examples of beneficial TE insertions (reviewed in Casacuberta and Gonz�lez 2013), transposition events are expected to be deleterious on average (Pasyukova et�al. 2004; Casacuberta and Gonz�lez 2013). Moreover, selection against the deleterious effects of TEs is expected to shape both the rates of TE activity (Charlesworth and Charlesworth 1983; Charlesworth and Langley 1989) and the spatial distribution of TEs along and among chromosomes (Duret et�al. 2000; Bartolom� et�al. 2002; Rizzon et�al. 2002). However, features of the host genome and of the transposition process itself may contribute to the observed variation in TE abundance, diversity, and distributions. Natural selection has the potential to obscure these patterns, which may be harder to detect in natural populations. Thus, knowing the rates and distribution of TEs in the absence of selection is a critical component for understanding their role in genome evolution. The earliest and most numerous studies on TE movement in metazoans have been performed in D. melanogaster (e.g. Engels 1983; Lewis and Brookfield 1987). The results of these kinds of landmark studies provided the data and insights to form a theoretical framework within which many subsequent studies investigating TE dynamics in other systems have been interpreted (e.g. Charlesworth and Langley 1989; Lee and Langley 2010). Rates of TE movement have been estimated empirically in both natural populations and in laboratory experiments, in many cases taking advantage of polytene chromosomes to perform in situ hybridization (see supplementary table S1, Supplementary Material online). However, none of these prior studies were able to examine the movement of all TEs in the genome simultaneously and instead relied on data from one or a few families to generalize patterns across TE families, despite major differences in transposition mechanisms (Rebollo et�al. 2012). While useful, simply estimating the absolute rates of insertion or deletion for individual TE families is only a first step towards investigating the long-term dynamics of TEs in the genome. Instead, considering the relative rates of gains and losses genome-wide, as well as the spatial distribution of these events along the chromosomes, allows one to understand the global dynamics of TE movement. Although selection against the deleterious effects of TEs undoubtedly contributes to the rate variation and spatial distribution of TEs along and among the chromosomes, non-uniform mutation is often overlooked as an explanation for much of this variation in nature. Indeed, TE insertion-site preference could in part shape this distribution, and such preferences are quite common across eukaryotes. For instance, Vazquez and et�al. (2007) found that roo elements preferentially integrate into proximal and distal regions of autosomal arms and the X chromosome in D. melanogaster. P elements in D. melanogaster have also been shown to preferentially insert into specific sequences acting as origins of replication (Spradling et�al. 2011). Most new Ty5 insertions (∼95%) in Saccharomyces cerevisiae occur either in heterochromatin at the telomeres or in the silent mating cassettes (Bushman 2003). TE insertion-site preference has also been described in many other organisms, including D. willistoni (Gon�alves et�al. 2014), Daphnia pulex (Elliott et�al. 2013), Schizosaccharomyces pombe (Singleton and Levin 2002), Oryza (Miyao et�al. 2003), and mouse and human cell cultures (Yant et�al. 2005). Transposition bias may contribute to the non-random spatial distribution of TEs in D. melanogaster, but TE insertion and deletion rates have yet to be investigated on a genome-wide scale in the absence of selection. In order to estimate genome-wide rates and patterns of TE movement in the absence of natural selection, we took a whole-genome sequencing mutation-accumulation (MA) approach. We estimate the insertion and deletion rate for all known active TE superfamilies based on whole-genome sequence data from a set of eight D. melanogaster MA lines derived from two different inbred founder genotypes. We use the term “line” when referring to either of the two founder genotypes, and “subline” when referring to the unit of replication within each founder line (i.e. there are four sublines within each line). We present both per-site and per-copy rate estimates for all superfamilies where either insertion or deletion events were detected. We examine the spatial distribution of new insertions and deletions, and test for associations between transposition activity and characteristics of individual TE superfamilies and of the local genomic environment. To our knowledge, this genome-wide analysis of TE mobility provides the most detailed assessment in any organism to date, and provides both a useful benchmark for addressing theoretical predictions of TE dynamics and for exploring large-scale patterns of TE movement in D. melanogaster and other species.

Materials and Methods

Mutation-Accumulation Lines

Two inbred lines (Line 33 and Line 39) originating from the IV laboratory population of flies captured in Massachusetts in 1975 (described in Houle and Rowe 2003) were used to establish eight sublines. Once founded, each subline was subjected to 145–149 generations of mutation accumulation (sublines are referred to by number: 33–45, 33–27, 33–55, 33–5 and 39–58, 39–67, 39–51, and 39–18, respectively). During mutation-accumulation, a single pair of flies is used to found each new generation. This reduces the efficacy of natural selection relative to the strength of genetic drift, and allows for deleterious mutations that may have otherwise been purged by selection to accumulate in each line. DNA was extracted from whole flies collected from each of these sublines after mutation-accumulation, and was multiplexed and sequenced using an Illumina Genome Analyzer II at the Indiana University Center for Genomics and Bioinformatics (see Schrider et�al. 2013 for additional details). We obtained paired-end Illumina reads with 74 bp ends and an average insert size of 182 bp (see supplementary table S2, Supplementary Material online). We used cutadapt (Martin 2011) to trim adapters and low quality bases from both 5’ and 3’ ends until the minimum aggregate quality score was ≥20. Finally, we randomly subsampled reads to ensure that all eight sublines started with an equal number of paired-end reads prior to mapping (see supplementary table S2, Supplementary Material online).

TE Discovery

We used our custom TE identification program, TEFLoN (https://github.com/jradrion/TEFLoN), to discover the position and superfamily identity of all TEs present in the eight sublines. Briefly, TEFLoN creates a pseudo-reference genome with all known (i.e. reference annotated) TE sequences removed. It uses BWA-mem v.0.7.10 (Li and Durbin 2009) to align paired-end reads to a database of full- and partial-length TEs annotated in the reference in addition to aligning reads to unique genomic locations in the pseudo-reference. TEFLoN characterizes the breakpoints and superfamily identity of both new and known elements by identifying paired-end reads where one end maps to a TE and the other end maps uniquely to the pseudo-reference (map quality ≥30). Next, TEFLoN catalogs all reads at the putative breakpoints as either “presence” reads (where the read is either soft-clipped at a breakpoint or has a mate aligning to a TE), “absence” reads (where the alignment spans the breakpoints), and uninformative reads (the aligned read does not satisfy either of the previous statements) and tallies these categories. We excluded the family ine-1 from our analysis, as evidence suggests this family has been inactive for millions of years (Kapitonov and Jurka 2003). Much like other programs for identifying TEs using short-read data, TEFLoN is unable to discover or quantify nested TEs—those TEs located entirely within other TE sequence—making our estimates of starting copy-number, along with counts of insertions and deletions, a lower bound.

Estimating Rates of TE Activity

New insertions in a focal subline were scored if they satisfied three criteria: 1) ≥3 presence reads in the focal subline, 2) ≥3 absence reads and ≤1 presence reads in the three non-focal sublines, and 3) a ratio of presence reads to total reads in the focal subline of ≥70%. Likewise, three criteria were used to score deletions thought to have occurred during the experiment: 1) ≥5 absence reads and ≤1 presence read in the focal line, 2) ≥5 presence reads in the three non-focal lines, and 3) a ratio of presence reads to total reads in the three non-focal sublines of ≥ 70%. The asymmetry in read thresholds between insertions and deletions was discovered via simulation (described below). Allowing a single presence read when classifying an element as being absent corrected for small errors in the TE breakpoint estimation, especially in the case of a partial target site duplication (TSD) sharing sequence identity with the 5’ or 3’ end of a sequence in the TE database. We classified elements with ≥3 presence reads and a ratio of presence reads to total reads that was ≥ 70% in all four sublines as being present in the ancestor of the four sublines (i.e. starting copies). Finally, elements detected as being present in two sublines and absent in the other two sublines were excluded from our analyses. Our filtering methods—which require a ratio of ≥ 70% presence to absence reads—were used to filter any somatic TE mutations that may have occurred during the final generation of mutation accumulation, as somatic TE mutations should be present on fewer chromosome copies than germline TE mutations. Insertions and deletions were visually inspected and validated using the Integrative Genome Browser (Robinson et�al. 2011). Genome-wide rates of insertion and deletion were calculated as , where N is the total number of insertions or deletions genome wide, S is the number of observable sites in all eight sublines, and γ is the number of generations of MA. We defined an observable site as any site in the genome for which the minimum number of reads required to identify an event (≥ 3 for insertions, ≥ 5 for deletions) were successfully mapped (mapping quality ≥ 30). Superfamily-specific insertion and deletion rates were calculated as , where N is the number of new insertions or deletions for that superfamily, N is the starting copy-number of that superfamily, and γ is the number of generations of MA. Superfamily-specific insertion and deletion rates were estimated for all active TE superfamilies, and TEs were considered active if an insertion or deletion event was observed in either line. All 95% confidence intervals were calculated by a genome-wide bootstrap of 100 kb windows, calculating rates of activity 1000 times.

Estimating False Positive and False Negative Rates

To estimate false positive and negative rates (see supplementary table S3, Supplementary Material online), we simulated an MA experiment and analyzed these simulated data using our TEFLoN pipeline. We generated four unique chromosomes, representing four independently evolving sublines, by simulating single nucleotide polymorphisms (SNPs) in D. melanogaster chromosome 2R (r.5.57) using pIRS v1.1.0 (options: -d 0.0 -v 0.0; Hu et�al. 2012). Next, we randomly inserted a set of 100 new TEs and removed a set of 100 reference TEs from all four sublines. Finally, we inserted 100 new TEs and removed 100 reference annotated TEs from one of the four sublines (the focal subline). This technique both mimics the differences between our starting lines and the D. melanogaster reference genome, and simulates insertions and deletions of both new (relative to the reference) and known (reference annotated) TEs. Simulated insertions and deletions were restricted to lengths ≥500 bp, but were not restricted to full-length elements. We also simulated a TSD flanking each insertion, where the TSD length was randomly drawn from a Poisson distribution (λ = 5). The physical position and family identity of all simulated insertions and deletions was chosen randomly, with the caveat that we did not allow nested events. Finally, we independently simulated Illumina PE sequencing of the four chromosomes using pIRS (options: -l 74 -�17 -m 182) and used quality control and alignment methods identical to those described above. False positive rates were estimated independently for insertions and deletions as where FP is the number of discovered TEs falsely inferred to be either insertions or deletions and TN is the number of pre-existing TE copies (i.e. discovered TEs not classified as either insertions or deletions). False negative rates (FNR) were estimated independently for insertions and deletions as where FN is the number of simulated insertions or deletions that were not identified and TP is the number of simulated insertions or deletions that were correctly classified as either insertions or deletions. We also estimated these rates for euchromatic and non-euchromatic regions of the genome separately, as we expect reduced power to detect events in non-euchromatic regions given biases in sequencing and aligning to these regions. The discovery of these simulated insertions and deletions provided the basis for the read-count threshold parameters used in our study.

Statistical Analysis

We used a negative binomial generalized linear model (nbGLM) to test for significant linear relationships between TE activity and features of the local genomic environment such as TE content, exon content, GC content, and recombination rate [formula: TE events per window ∼ TE content + exon content + GC content + recombination rate]. The nbGLM used only genomic windows with >70% observable sites. TE insertion and deletion counts, TE content, exon content, and GC content were calculated for non-overlapping 10 kb windows using the D. melanogaster reference genome (FlyBase v.5.57). TE and exon contents were calculated as the fraction of bases in each window within annotated TEs or exons, respectively. Recombination rate data were acquired from Comeron et�al. (2012). We tested for non-random patterns in the spatial distribution of insertions and deletions between the chromosomes, between lines, and between chromatin state, using Fisher’s exact tests. To control for unequal power to detect events across genomic regions (due to generally higher coverage in euchromatin), we standardized each region or chromosome by the number of observable sites. One column of the contingency table was comprised of the counts of observable sites, whereas the other column was comprised of insertion or deletion counts. We tested for a proximity effect by randomly permuting our observed insertions 1,000 times to identify a distribution of distances to the nearest element of the same superfamily (calculated separately for DNA and RNA elements) and a distribution of counts for which we observe a new insertion and a pre-existing copy from the same superfamily within the a specified genomic window [1 kb, 10 kb, 100 kb, and 1,000 kb windows tested]. We obtained canonical TE lengths from the FlyBase set of full-length TEs (dos Santos et�al. 2015) and chromatin states from Kharchenko et�al. (2011; http://modencode.org). We used Bonferroni corrections when assessing the statistical significance of multiple tests; all statistical analyses were performed in R Development Core Team 2011.

Results

In total, we observed 280 insertion and 18 deletion events across all eight sublines of the MA experiment after 145–149 generations of mutation accumulation (fig. 1, table 1, see supplementary tables S6 and S7, Supplementary Material online). These observations were based on paired-end sequence data providing, on average, 17x coverage of the genome (see supplementary table S2, Supplementary Material online), which allowed us to obtain support for each event from multiple reads. We tested the performance of our TEFLoN pipeline and estimated false positive and FNR by simulating a MA line under conditions representative of the real MA experiment. To do this, we simulated four starting sublines—each derived from D. melanogaster (r.5.57) chromosome 2R—unique in their TE composition relative to the reference. We then inserted and removed TE sequence from one of those sublines and independently simulated Illumina sequencing on all four sublines (see methods). Our estimate of FNR suggest that we have less power to detect TE deletion events (FNR = 0.3) than insertion events, (FNR = 0.16), likely because many TE deletions occur in heterochromatic regions of the genome—regions that are generally more repetitive and more difficult to sequence and map. False positive rates (FPR) were similar between insertions (FPR = 0.02) and deletions (FPR = 0.01) and were not dramatically different between euchromatic and non-euchromatic regions of the genome (see supplementary table S3, Supplementary Material online).
F

Genome-wide plot of transposable element insertions and deletions events discovered along chromosomes X, 2L, 2R, 3L, and 3R in D. melanogaster (r5.57). Counts represent events discovered in both Line 33 (light green) and Line 39 (blue). The fraction of observable sites in non-overlapping 10 kb windows is plotted in gray. Centromeres are shown with black semicircles.

Table 1

Observed Insertion and Deletion Events for TEs in Eight Sublines of two Drosophila melanogaster MA Lines

SublineInsertionsDeletions
Line 3333–45190
(starting copy-number: 2311)33–27190
33–55331
33–5331
Total1042
Line 3939–58462
(starting copy-number: 2231)39–676313
39–51451
39–18220
Total17616
Observed Insertion and Deletion Events for TEs in Eight Sublines of two Drosophila melanogaster MA Lines Genome-wide plot of transposable element insertions and deletions events discovered along chromosomes X, 2L, 2R, 3L, and 3R in D. melanogaster (r5.57). Counts represent events discovered in both Line 33 (light green) and Line 39 (blue). The fraction of observable sites in non-overlapping 10 kb windows is plotted in gray. Centromeres are shown with black semicircles. We found that 24 known TE superfamilies in D. melanogaster are active in these lines. We note that our methods are unable to distinguish between TE excisions (transposition events mediated by TE machinery) and TE deletions arising by other mechanisms; both are simply classified as deletions in this report. Further, because our method does not detect nested TEs and because estimated FNR are roughly an order of magnitude greater than false positive rates (see supplementary table S3, Supplementary Material online), our estimates provide a lower bound for both the rates of transposition and the starting copy-number of TEs in each line. Despite this limitation, our counts of the starting copy-number of all TEs (2311 and 2231 in Lines 33 and 39, respectively; table 1) are roughly consistent with the number of annotated TEs in the D. melanogaster reference genome (3170 after the exclusion of ine-1 elements; FlyBase v5.57; dos Santos et�al. 2015), although they are considerably lower than some recent reports of copy-number in natural populations of D. melanogaster (e.g. >23,000 copies [Cridland et�al. 2013] and ∼10,000 copies [Kofler et�al. 2012]); these differences are likely due to the fact that many rare TEs are discovered in population studies, but may also reflect differences in the annotation methods used.

Genome-Wide Rates of Insertion and Deletion

We characterized TE activity by first estimating the genome-wide rate of insertion and deletion across all TE superfamilies per-site per-generation. Sites were calculated as the total number of positions in the genome that met the thresholds of base quality, map quality, and read depth necessary to detect insertions or deletions (see Methods, fig. 1). The genome-wide rate of insertion (2.11 � 10−9 [95% CI = 1.87 � 10−9–2.38 � 10−9] per-site per-generation) was significantly elevated relative to the rate of deletion (1.37 � 10−10 [95% CI = 8.36 � 10−11–2.06 � 10−10] per-site per-generation) (P = 2.2 � 10−16; Fisher’s exact test [FET]). Moreover, this difference persisted after correcting our TE counts for a higher FNR for deletions relative to insertions (PFET < 2.2 �10−16). The eight sublines were derived from two unrelated founder lines, allowing us to compare rates of TE activity between genotypes. We found that both insertion and deletion rates were significantly elevated in Line 39 relative to Line 33 (PFET < 7.23 � 10−4 for both types of events). The estimated rates of insertion in Line 33 and 39 were 1.57 � 10−9 [95% CI = 1.30 � 10−9–1.88 � 10−9] per-site per-generation and 2.66 � 10−9 [95% CI = 2.26 � 10−9—3.07 � 10−9] per-site per-generation, respectively, while the estimated rates of deletion were 3.04 � 10−11 [95% CI = 0.0–7.61 � 10−11] and 2.44 � 10−10 [95% CI = 1.37 � 10−10–3.66 � 10−10] per-site per-generation, respectively. The genome-wide deletion rate in Line 39 was strongly driven by deletions that occurred in a single subline (39–67), which accounted for 72% of all deletions observed in the experiment. Deletions rates were not significantly different between the lines after excluding deletions in subline 39–76.

Superfamily-Specific Rates of Insertion and Deletion

We also calculated superfamily-specific rates of insertion and deletion per-copy per-generation (with starting copy-number counted separately for each superfamily). Rates of superfamily-specific insertion and deletion were highly variable, and ranged from 0 to 5.13 � 10−3 per-copy per-generation for insertions and from 0 to 1.29 � 10−4 per-copy per-generation for deletions (fig. 2; see supplementary tables S4 and S5, Supplementary Material online). Copia insertions comprised 61% (107 out of 176) of the total insertion events in Line 39, while not a single copia insertion was detected in Line 33. This observation agrees with results reported in Houle and Nuzhdin (2004) based on in situ experiments using the same MA lines.
F

Superfamily-specific insertion and deletion rates for all active superfamilies in Line 33 (light green) and Line 39 (blue). Each dot represents the per-copy per-generation rate for an individual superfamily. Copia insertion rate in Line 39 is shown using an axis break.

Superfamily-specific insertion and deletion rates for all active superfamilies in Line 33 (light green) and Line 39 (blue). Each dot represents the per-copy per-generation rate for an individual superfamily. Copia insertion rate in Line 39 is shown using an axis break. Superfamily-specific insertion and deletion rates were not significantly different between lines (P > 0.11 for both comparisons; Mann–Whitney U tests; fig. 2). Because of the exceptionally high rate of copia insertions in Line 39, we tested for a difference between rates after excluding copia elements and found superfamily-specific insertion rates in Line 33 were marginally elevated relative to Line 39 (PMWU = 0.053). Similar superfamily-specific rates between the lines (measured per-copy) suggest that the higher genome-wide rate of insertions (measured per-site) in Line 39 might be driven by copia. Indeed, the exclusion of copia elements reversed the pattern of higher insertion in Line 39 for genome-wide per-site per-generation rate estimates, causing Line 33 to have a higher rate of insertions when measured per-site per-generation (PFET = 0.010). Notably, the elevated rate of genome-wide deletions (measured per-site per-generation) in Line 39 was not affected by the exclusion of copia (PFET = 0.002). Most of the families for which estimates are available from earlier studies were found to be active in this experiment, and the rates we estimated are generally within the range of those previously reported (see supplementary table S1, Supplementary Material online). We also tested for an effect of TE order (LTR, non-LTR, TIR), TE class (DNA, RNA), canonical sequence length, and starting copy-number on superfamily-specific rates of insertion and deletion (fig. 3). Rates of activity between orders were not significantly different for both insertions (PANOVA = 0.32) and deletions (PANOVA = 0.46). A similar pattern was seen for differences between DNA and RNA elements (class) for both insertions (PMWU = 0.46) and deletions (PMWU = 0.75). Superfamily-specific rates of both insertion and deletion were positively correlated with the canonical length of the superfamily and negatively correlated with starting copy-number, but these correlations were not statistically significant (PSpearman’s rho > 0.5 for all comparisons, fig. 3). Importantly, TEs in heterochromatin may be contributing to new insertions even though these donor copies would go undetected by TEFLoN—potentially influencing the association between insertion rates and copy-number. To estimate the extent of undetected TE donors relative to discovered copy-number, we associated the copy-number of each superfamily with its respective read coverage. The significant positive correlation between copy-number and coverage (ρ = 0.7, P < 10−16; see supplementary fig. S3, Supplementary Material online) suggests that there is a positive correlation between the number of true TE donors in these lines and the superfamily copy-numbers discovered by TEFLoN.
F

Comparison of superfamily-specific rates of insertion (A, B, and C) and deletion (E, F, and G) among TE orders (LTR, non-LTR, TIR), based on TE length, and relative to starting copy-number. Spearman’s ρ and P values obtained by testing for a correlation between activity rate and either length or copy-number for all active superfamilies.

Comparison of superfamily-specific rates of insertion (A, B, and C) and deletion (E, F, and G) among TE orders (LTR, non-LTR, TIR), based on TE length, and relative to starting copy-number. Spearman’s ρ and P values obtained by testing for a correlation between activity rate and either length or copy-number for all active superfamilies. It should be noted that a negative relationship between activity rate and copy-number is expected in the absence of any causative relationship between copy-number and counts of insertions or deletions, as our measure of superfamily-specific rate is not independent of copy-number (i.e. copy-number appears in the rate term). However, we assume that TE superfamilies with higher copy-numbers do have more opportunities to transpose relative to superfamilies with low copy-numbers. Therefore, a negative correlation between insertion rate and copy-number is consistent with theory that predicts the evolution of TE self-regulation or the evolution of host suppression (Charlesworth and Charlesworth 1983; Charlesworth and Langley 1989). These results run contrary to previous experiments linking increases in transposition rates with higher copy-number (Nuzhdin et�al. 1996; Pasyukova et�al. 1998).

The Local Genomic Environment Influences Patterns of TE Insertions and Deletions

We tested for rate heterogeneity between chromosome types by comparing counts of insertions and deletions on each chromosome arm (relative to the number of sites observable). We found significantly elevated rates of both insertions (0.86 fold increase) and deletions (5.97-fold increase) on the X chromosome relative to the autosomes (PFET < 7.15 � 10−5 for both comparisons; see supplementary fig. S1, Supplementary Material online) and a significant reduction of the insertion rate (60% reduction) on chromosome 2L (PFET = 7.58 � 10−3; see supplementary fig. S1, Supplementary Material online). Moreover, the exclusion of copia elements from these analyses strengthened the statistical significance and magnitude of chromosome-specific biases. We tested for non-independence between TE activity and chromatin state based on data from two D. melanogaster cell lines—BG3 and S2 (Kharchenko et�al. 2011)—by comparing counts of insertions and deletions in each of nine chromatin states relative to observable sites. We subdivided our data to individually test for an effect of chromatin state on the insertion rate of 1) all active TE superfamilies, 2) all superfamilies excluding copia, and 3) copia alone. Insertions of all TE superfamilies were biased to occur in regulatory chromatin (enhancers) (P < 1.44 � 10−4 for both cell lines; FET), however this pattern is strongly driven by copia insertions and is not statistically significant after excluding copia (see supplementary fig. S2, Supplementary Material online). There was no significant relationship between any chromatin state and patterns of deletion activity (PFET > 0.06 for all deletion tests). These results suggest that chromatin state may play an important role in shaping the spatial distributions of some TE families along the chromosomes, but that this role may be idiosyncratic to individual TE families. It should also be noted that the landscape of chromatin states identified in D. melanogaster cell lines may not be representative of the landscape in our experimental lines, although there are general consistencies found between the cell lines (Kharchenko et�al. 2011). We also used a generalized linear model to test for associations between insertion TE activity and additional features of the local genomic environment (i.e. TE content, GC content, exon content, and recombination rate). We found a weakly significant negative correlation between insertion activity and GC content and a suggestive negative correlation between deletion activity and exon content (table 2). The latter result is expected, as selection likely shaped the spatial distribution of TEs along the chromosomes in the founding population prior to the start of mutation accumulation. We did not find a significant correlation between TE activity and recombination rate. Importantly, we also did not find a significant correlation between insertions and exon content, consistent with little to no selection acting in our MA experiment. These results suggest little direct effect for recombination rate on the distribution of TE copies across the genome, but suggest that TE activity may be influenced by other factors of the local genomic environment, such as GC content.
Table 2

Results from Negative Binomial Generalized Linear Models Characterizing the Effect of Local Genomic Features on TE Activity

Coefficient [StdErr]Test StatisticP Value
Insertions
 TE contenta−3.44 [2.10]−1.630.10
 Exon contentb0.10 [0.26]0.400.69
 GC contentc−4.66 [2.23]−2.090.04
 Recombination rated−0.005 [0.03]−0.170.86
Deletions
 TE contenta−6.93 [11.68]−0.600.55
 Exon contentb−2.08 [1.13]−1.850.06
 GC contentc12.31 [9.10]1.350.18
 Recombination rated0.06 [0.10]0.640.52

Recombination rate estimates were acquired from Comeron et al. (2012). All other genomic features were estimated using non-overlapping 10 kb windows in the D. melanogaster reference genome (FlyBase v.5.57).

% of window in annotated TE sequence.

% of window in exons.

% GC.

cM/Mb.

Results from Negative Binomial Generalized Linear Models Characterizing the Effect of Local Genomic Features on TE Activity Recombination rate estimates were acquired from Comeron et al. (2012). All other genomic features were estimated using non-overlapping 10 kb windows in the D. melanogaster reference genome (FlyBase v.5.57). % of window in annotated TE sequence. % of window in exons. % GC. cM/Mb. Finally, we used permutation tests to test two proximity-effect hypotheses. First, we tested whether new insertions were more likely than expected to occur near pre-existing copies from the same superfamily. Second, we tested if DNA elements insert closer to pre-existing copies of the same superfamily than do RNA elements—which have to be reverse transcribed in the cytosol. We did not find a significant effect of proximity to pre-existing copies for either hypothesis (see supplementary table S8, Supplementary Material online).

Discussion

Despite the abundance of data characterizing transposable element dynamics in natural populations, there have been a limited number of experiments characterizing their mutation rates and mutational properties when selection is minimized. In D. melanogaster, previous experiments quantifying TE insertion and deletion rates using molecular techniques were indirect (e.g. de Boer et�al. 2007; Petrov et�al. 2011) or limited to one or a few TE families (e.g. Maside et�al. 2000; Nuzhdin and Mackay 1994; V�zquez et�al. 2007; see supplementary table S1, Supplementary Material online). Our MA survey provides direct estimates of the genome-wide rates and patterns of movement for all known TE superfamilies in D. melanogaster. Further, we were able to look at patterns of insertion and deletion with respect to features of TE superfamilies and features of the host genomic environment in order to determine what, if any, non-selective factors determine the accumulation of TEs in certain regions of the genome. We found that TE insertions were vastly more common than TE deletions, and also identified a strong interaction between TE activity and host genotype, as per-site per-generation insertion and deletion rates were significantly elevated in Line 39 relative to Line 33. These results hold even after taking into account the higher FNR for deletions relative to insertions. The elevated insertion rate in Line 39 was entirely driven by a burst of activity in a single family, copia, which had previously been shown to be highly active in this line using in situ methods (Houle and Nuzhdin 2004). However, the elevated rate of deletions in Line 39 was strongly driven by deletions that occurred in a single subline (39–67), which accounted for 72% of all deletions observed in the experiment, and therefore cannot be ascribed to the genetic background of Line 39. Comparative and population genetic data from Drosophila generally find a deletion bias among small indels (Petrov 2002), suggesting that the genome would be shrinking all other things being equal. Although our TEFLoN pipeline cannot distinguish between true TE excisions and spontaneous large deletions, visualizing the data using IGV suggests that many of the deletions we report are the products of complete excisions of the TE sequence that was present prior to MA (whether full or partial). The elevated rate of insertion compared with deletion of TEs reported here—coupled with the larger size of non-TE-associated duplications found previously (Schrider et�al. 2013)—may therefore help explain the relative stability of TE numbers and genome size (Drosophila 12 Genomes Consortium 2007) in the face of deletion bias. We identified a weakly significant negative relationship between GC content and overall rates of TE insertion (table 2). Similar correlations have been identified between LINE elements and GC content in humans (Jin et�al. 2012; Ovchinnikov et�al. 2001), though this pattern is notably different for some other TE families (Jin et�al. 2012; Hellen and Brookfield 2013). Based on their method of transposition, it might be expected that we should identify fewer deletions of RNA elements relative to DNA transposons, as the latter encode a transposase gene used in TE excision. Surprisingly, 13 out of the 18 TE deletions (72%) identified were deletions of RNA elements, suggesting mechanisms other than excision—such as deletion through non-allelic homologous recombination—may be a more common way of TE removal in these genomes. Moreover, we did not find significant differences in superfamily-specific rates of insertion or deletion between TE orders (LTR, non-LTR, TIR), though we did find many fewer active non-LTR superfamilies than LTRs or TIRs (fig. 3). Other studies in Drosophila found that non-LTRs tend to be older than LTRs and thus are expected to exhibit less recent activity than LTRs (Bergman and Bensasson 2007). We also did not find any evidence that new insertions occur closer to members of the same superfamily—as was recently shown for IS elements in Escherichia coli (Lee et�al. 2016). Nor did we find that new DNA-element insertions were closer to TEs of the same superfamily than are new RNA element insertions, the latter requiring reverse transcription in the cytosol and thus, perhaps, more likely to insert farther from initial donor sites. We identified a significantly elevated rate of TE deletion on the X chromosome relative to the autosomes (fig. 1, see supplementary fig. S1, Supplementary Material online). An elevated deletion rate on the X is consistent with the absence of a homologue-dependent DNA repair mechanism—excised or deleted TEs might not be restored in hemizygous males. However, very little is known about the precise mechanism of TE excision repair, and it is generally thought that excised elements are repaired from the sister chromatid during the replication cycle (reviewed in Burt and Trivers 2006; Hickman and Dyda 2015), thus rates of deletion on hemizygous chromosomes may not be expected to increase. We also found a significantly higher rate of new TE insertions on the X chromosome (fig. 1, see supplementary fig. S1, Supplementary Material online). Recent studies using natural populations of D. melanogaster have also described conflicting patterns of accumulation on X chromosome: higher densities of TEs on the X relative to the autosomes (Cridland et�al. 2013), lower TE densities on the X (before controlling for recombination rate), or no effect (after controlling for recombination rate; Kofler et�al. 2012). Male hemizygosity for the X chromosome—in concert with the lack of male recombination in Drosophila—means that recombination rates are, on average, higher on the X than on the autosomes (Comeron et�al. 2012). Our results suggest that selection against the deleterious effects of TEs might be stronger on the X chromosome than on the autosomes, consistent with data suggesting more effective selection on the X chromosome overall (Charlesworth et�al. 1987; Langley et�al. 2012; Charlesworth and Campos 2014). Indeed, there are significantly fewer reference-annotated TEs on the X chromosome than on the autosomes in the D. melanogaster reference genome (PFET = 3.46 � 10−5), though for some TE families population frequency does not appear to be different between the X chromosome and autosomes (Petrov et�al. 2011). Numerous studies have focused on the strong association between TEs and other hemizygous sex chromosomes—TEs are greatly overrepresented on the Y and W chromosomes in many animal lineages (Clinton and Haines 1999; Charlesworth and Charlesworth 2000; Graves 2006; Steinemann and Steinemann 2005; Bachtrog 2013; Chalopin et�al. 2015). These patterns have typically been attributed to the lack of recombination on the Y and W chromosomes (Charlesworth and Charlesworth 2000; Steinemann and Steinemann 2005), particularly in Drosophila where males do not recombine. However, elevated insertion rates on the X chromosome could also be driven by an effect of heterochromatin, as a larger fraction of the X chromosome is heterochromatic relative to the autosomes (Hilliker et�al. 1980). A recent study in teleost fishes (Chalopin et�al. 2015) demonstrates that TEs accumulate not only on the Y and W chromosomes, but also in young sex-determining regions of the X and Z chromosomes—chromosomes expected to freely recombine during female meiosis. Moreover, specific classes of TEs have recently proliferated in these regions, suggesting that the biased recruitment of certain TE types may be playing an active role in sex chromosome differentiation (Chalopin et�al. 2015) and that associations between TEs and sex chromosomes in the early stages of differentiation may be independent of recombination. We also found a significant association between TE insertions and regulatory chromatin—based on experimentally determined heterochromatic marks (Kharchenko et�al. 2011), although this pattern was limited to copia elements. Our results therefore provide some support for the hypothesis that heterochromatin may play a bigger role than recombination in shaping TE accumulation. Going forward, it will be essential to characterize the various routes by which insertion and deletion are facilitated or impeded at the molecular level. In particular, the special role of small RNAs is only beginning to be investigated (reviewed in Lee and Langley 2010). Recent evidence for the suppression of transposition, especially through piRNA-mediated epigenetic silencing (Lee 2015), suggests the potential for biased TE recruitment into piRNA clusters—discrete genomic loci comprised of nested TE fragments that generate piRNA primary transcripts (Brennecke et�al. 2007). The recruitment of TEs into piRNA clusters (many of which lie in heterochromatic regions) could be facilitated through heterochromatin binding proteins, such as Drosophila HP1 (reviewed in Vermaak and Malik 2009) or its homolog, Rhino, that specifically binds piRNA clusters in D. melanogaster (Zhang et�al. 2014). A similar integration preference has been observed in S. cerevisiae, where nearly all new Ty5 insertions occur in heterochromatin at the telomeres (Bushman 2003), and this integration preference is driven by an interaction between the Ty5 integrase and a yeast heterochromatin binding protein (Xie et�al. 2001; Zhu et�al. 2003). In addition to the mutational biases reported here, selection against the deleterious effects of TEs is likely to be a substantial contributor to the patterns of distribution across the genome. Previous studies using natural populations of D. melanogaster have identified a negative correlation between insertion-site frequency and recombination rate (Petrov et�al. 2011; Kofler et�al. 2012), consistent with more efficient purifying selection in areas of higher recombination. However, this association disappears after excluding pericentromeric regions of the genome (Kofler et�al. 2012). Selection could also shape the spatial landscape of TEs by favoring the recruitment of TEs into piRNA clusters. Consistent with this hypothesis, simulations have shown that piRNA-generating TEs should be selectively advantageous, as their integration thus represses the transposition of other elements (Lu and Clark 2010). Together, these results suggest that insertion and deletion biases, in addition to the effects of selection, are likely contributing to the non-random spatial distribution of TEs. Comparative analyses of TE insertion and deletion rates between the germline and soma, between the sexes, and among sister taxa are also needed to fully understand TE dynamics (e.g. Keightley et�al. 2009; Diaz-Gonzalez et�al. 2011). Evidence for differences in male versus female germline transposition rates exists—for example, R2 rates are higher in females (Zhang et�al. 2008) and roo rates are higher in males (V�zquez et�al. 2007)—but the extent and consequences of heterogeneous rates over long time-scales is unknown. Our genome-wide estimates of the rates and patterns of TE movement provide an opportunity to test key assumptions about the behaviors of TEs in a well-studied model system. Additional in-depth analyses of transposable element mobility in an experimental framework with and without selection will help explain the impact of this dynamic component of the genome over longer time-scales. Click here for additional data file.
  67 in total

1.  pIRS: Profile-based Illumina pair-end reads simulator.

Authors:  Xuesong Hu; Jianying Yuan; Yujian Shi; Jianliang Lu; Binghang Liu; Zhenyu Li; Yanxiang Chen; Desheng Mu; Hao Zhang; Nan Li; Zhen Yue; Fan Bai; Heng Li; Wei Fan
Journal:  Bioinformatics       Date:  2012-04-15       Impact factor: 6.937

2.  Transposable elements and early evolution of sex chromosomes in fish.

Authors:  Domitille Chalopin; Jean-Nicolas Volff; Delphine Galiana; Jennifer L Anderson; Manfred Schartl
Journal:  Chromosome Res       Date:  2015-09       Impact factor: 5.239

3.  Y chromosomes: born to be destroyed.

Authors:  Sigrid Steinemann; Manfred Steinemann
Journal:  Bioessays       Date:  2005-10       Impact factor: 4.345

4.  A long terminal repeat retrotransposon of fission yeast has strong preferences for specific sites of insertion.

Authors:  Teresa L Singleton; Henry L Levin
Journal:  Eukaryot Cell       Date:  2002-02

5.  Targeting of the yeast Ty5 retrotransposon to silent chromatin is mediated by interactions between integrase and Sir4p.

Authors:  W Xie; X Gai; Y Zhu; D C Zappulla; R Sternglanz; D F Voytas
Journal:  Mol Cell Biol       Date:  2001-10       Impact factor: 4.272

6.  DNA loss and evolution of genome size in Drosophila.

Authors:  Dmitri A Petrov
Journal:  Genetica       Date:  2002-05       Impact factor: 1.082

Review 7.  Human transposon tectonics.

Authors:  Kathleen H Burns; Jef D Boeke
Journal:  Cell       Date:  2012-05-11       Impact factor: 41.582

8.  Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila.

Authors:  Julius Brennecke; Alexei A Aravin; Alexander Stark; Monica Dus; Manolis Kellis; Ravi Sachidanandam; Gregory J Hannon
Journal:  Cell       Date:  2007-03-08       Impact factor: 41.582

9.  The Role of piRNA-Mediated Epigenetic Silencing in the Population Dynamics of Transposable Elements in Drosophila melanogaster.

Authors:  Yuh Chwen G Lee
Journal:  PLoS Genet       Date:  2015-06-04       Impact factor: 5.917

10.  Structural and sequence diversity of the transposon Galileo in the Drosophila willistoni genome.

Authors:  Juliana W Gonçalves; Victor Hugo Valiati; Alejandra Delprat; Vera L S Valente; Alfredo Ruiz
Journal:  BMC Genomics       Date:  2014-09-13       Impact factor: 3.969

View more
  20 in total

1.  Degradation of the Repetitive Genomic Landscape in a Close Relative of Caenorhabditis elegans.

Authors:  Gavin C Woodruff; Anastasia A Teterina
Journal:  Mol Biol Evol       Date:  2020-09-01       Impact factor: 16.240

2.  A Maximum-Likelihood Approach to Estimating the Insertion Frequencies of Transposable Elements from Population Sequencing Data.

Authors:  Xiaoqian Jiang; Haixu Tang; Wazim Mohammed Ismail; Michael Lynch
Journal:  Mol Biol Evol       Date:  2018-10-01       Impact factor: 16.240

Review 3.  Coevolution between transposable elements and recombination.

Authors:  Tyler V Kent; Jasmina Uzunović; Stephen I Wright
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2017-12-19       Impact factor: 6.237

4.  A genetic linkage map for the salmon louse (Lepeophtheirus salmonis): evidence for high male:female and inter-familial recombination rate differences.

Authors:  Roy G Danzmann; Joseph D Norman; Eric B Rondeau; Amber M Messmer; Matthew P Kent; Sigbjørn Lien; Okechukwu Igboeli; Mark D Fast; Ben F Koop
Journal:  Mol Genet Genomics       Date:  2018-11-20       Impact factor: 3.291

5.  Synergistic epistasis of the deleterious effects of transposable elements.

Authors:  Yuh Chwen G Lee
Journal:  Genetics       Date:  2022-02-04       Impact factor: 4.402

6.  Asexual Experimental Evolution of Yeast Does Not Curtail Transposable Elements.

Authors:  Piaopiao Chen; Jianzhi Zhang
Journal:  Mol Biol Evol       Date:  2021-06-25       Impact factor: 16.240

7.  De Novo characterization of transcriptomes from two North American Papaipema stem-borers (Lepidoptera: Noctuidae).

Authors:  Sara J Oppenheim; Wiebke Feindt; Rob DeSalle; Paul Z Goldstein
Journal:  PLoS One       Date:  2018-01-24       Impact factor: 3.240

8.  Conserved Noncoding Elements Influence the Transposable Element Landscape in Drosophila.

Authors:  Manee M Manee; John Jackson; Casey M Bergman
Journal:  Genome Biol Evol       Date:  2018-06-01       Impact factor: 3.416

Review 9.  On the Importance to Acknowledge Transposable Elements in Epigenomic Analyses.

Authors:  Emmanuelle Lerat; Josep Casacuberta; Cristian Chaparro; Cristina Vieira
Journal:  Genes (Basel)       Date:  2019-03-31       Impact factor: 4.096

10.  Population-specific dynamics and selection patterns of transposable element insertions in European natural populations.

Authors:  Emmanuelle Lerat; Clément Goubert; Sara Guirao-Rico; Miriam Merenciano; Anne-Béatrice Dufour; Cristina Vieira; Josefa González
Journal:  Mol Ecol       Date:  2019-01-17       Impact factor: 6.185

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.