| Literature DB >> 22291611 |
Robert Kofler1, Andrea J Betancourt, Christian Schlötterer.
Abstract
Transposable elements (TEs) are mobile genetic elements that parasitize genomes by semi-autonomously increasing their own copy number within the host genome. While TEs are important for genome evolution, appropriate methods for performing unbiased genome-wide surveys of TE variation in natural populations have been lacking. Here, we describe a novel and cost-effective approach for estimating population frequencies of TE insertions using paired-end Illumina reads from a pooled population sample. Importantly, the method treats insertions present in and absent from the reference genome identically, allowing unbiased TE population frequency estimates. We apply this method to data from a natural Drosophila melanogaster population from Portugal. Consistent with previous reports, we show that low recombining genomic regions harbor more TE insertions and maintain insertions at higher frequencies than do high recombining regions. We conservatively estimate that there are almost twice as many "novel" TE insertion sites as sites known from the reference sequence in our population sample (6,824 novel versus 3,639 reference sites, with on average a 31-fold coverage per insertion site). Different families of transposable elements show large differences in their insertion densities and population frequencies. Our analyses suggest that the history of TE activity significantly contributes to this pattern, with recently active families segregating at lower frequencies than those active in the more distant past. Finally, using our high-resolution TE abundance measurements, we identified 13 candidate positively selected TE insertions based on their high population frequencies and on low Tajima's D values in their neighborhoods.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22291611 PMCID: PMC3266889 DOI: 10.1371/journal.pgen.1002487
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Outline of the method used to identify TE insertion polymorphism.
(A) Top: Examples of a “known” insertion in the repeat masked reference genome, and of a “novel” insertion, not in the reference. Bottom: Paired ends mapped to known and novel insertions. (B) Three TE insertions identified by i) reads both 5′ and 3′ of the insertion site (forward and reverse insertions), ii) reads 3′ of the insertion site (reverse insertion), iii) reads 5′ of the insertion site (forward insertion). (C) Estimating the population frequency for a reverse insertion. First, the end positions of the reads confirming the presence of an insertion are recorded, and based on this information a range is defined. Subsequently, all PE fragments within this range that either confirm the absence or the presence of a TE insertion are tallied (see text). The reference genome and reads mapping to the reference genome are shown in blue. TE's and reads mapping to TE's are shown in red. Sequences not aligned by the Smith-Waterman algorithm are shown in gray.
Abundance of TE insertions in the chromosomes of D. melanogaster.
| chr. | length (Mb) | n | density (#/Mb) | nfe
| nfixed
| fixed | |
| genome | 120.4 | all | 10,208 | 84.8 | 7,843 | 2,702 | 34.5 |
| known | 3,384 (5,222) | 28.1 | 2,959 | 2,459 | 83.1 | ||
| X | 22.4 | all | 1638 | 73.1 | 1,315 | 402 | 30.6 |
| known | 532 (856) | 23.7 | 475 | 388 | 81.7 | ||
| 2L | 23.0 | all | 1942 | 84.4 | 1,443 | 498 | 34.5 |
| known | 587 (879) | 25.5 | 521 | 424 | 81.4 | ||
| 2R | 21.1 | all | 2099 | 99.3 | 1,596 | 693 | 43.4 |
| known | 852 (1,323) | 40.2 | 741 | 630 | 85.0 | ||
| 3L | 24.5 | all | 2105 | 85.8 | 1,496 | 517 | 34.5 |
| known | 628 (1,029) | 25.6 | 528 | 444 | 84.1 | ||
| 3R | 27.9 | all | 1938 | 69.4 | 1,604 | 255 | 15.9 |
| known | 369 (590) | 13.2 | 324 | 241 | 74.4 | ||
| 4 | 1.4 | all | 486 | 359.5 | 389 | 337 | 86.6 |
| known | 416 (545) | 307.7 | 370 | 332 | 89.7 |
Number of identified TE insertions; All known TE insertions are in parenthesis.
Frequency estimates where obtained for non overlapping insertions having more than ten absence or presence fragments.
Number of fixed TE insertions (>0.95 frequency).
Fraction of fixed TE insertions nfixed/nfe.
The total number of identified TE insertions (all) and the number of TE insertions present in the reference genome that have been identified by our approach (known) are shown. The numbers in brackets indicate the total number of TE insertions present in the reference genome. chr.: chromosome arm.
Abundance of TE insertions in different features of the D. melanogaster genome.
| feature | length (Mb) | n | density (#/Mb) | nfe
| nfixed
| fixed | median frequency | |
| TIR | genome | 120.4 | 3,479 | 28.9 | 2,765 | 1,893 | 68.5 | 1.00 |
| intergenic | 43.1 | 1,750 | 40.6 | 1,364 | 984 | 72.1 | 1.00 | |
| intron | 47.0 | 1,617 | 37.6 | 1,301 | 867 | 66.6 | 1.00 | |
| exon | 30.2 | 107 | 3.5 | 96 | 38 | 39.6 | 0.848 | |
| CDS | 22.5 | 25 | 1.1 | 22 | 4 | 18.2 | 0.207 | |
| 5′-UTR | 3.5 | 14 | 3.9 | 14 | 6 | 42.9 | 0.531 | |
| 3′-UTR | 4.8 | 66 | 13.7 | 58 | 28 | 48.3 | 0.948 | |
| LTR | genome | 120.4 | 3,487 | 29.0 | 2,569 | 388 | 15.1 | 0.125 |
| intergenic | 43.1 | 1,726 | 40.1 | 1,242 | 256 | 20.6 | 0.143 | |
| intron | 47.0 | 1,474 | 34.2 | 1,132 | 115 | 10.2 | 0.111 | |
| exon | 30.2 | 286 | 9.5 | 194 | 17 | 8.8 | 0.100 | |
| CDS | 22.5 | 169 | 7.5 | 104 | 10 | 9.6 | 0.088 | |
| 5′-UTR | 3.5 | 33 | 9.3 | 28 | 4 | 14.3 | 0.106 | |
| 3′-UTR | 4.8 | 89 | 18.5 | 64 | 3 | 4.7 | 0.120 | |
| non-LTR | genome | 120.4 | 2,975 | 24.7 | 2,293 | 373 | 16.3 | 0.122 |
| intergenic | 43.1 | 1,482 | 34.4 | 1,119 | 223 | 19.9 | 0.140 | |
| intron | 47.0 | 1,372 | 31.9 | 1,073 | 144 | 13.4 | 0.111 | |
| exon | 30.2 | 120 | 4.0 | 100 | 6 | 6.0 | 0.107 | |
| CDS | 22.5 | 55 | 2.4 | 42 | 2 | 4.8 | 0.094 | |
| 5′-UTR | 3.5 | 21 | 5.9 | 17 | 1 | 5.9 | 0.097 | |
| 3′-UTR | 4.8 | 43 | 9.0 | 40 | 3 | 7.5 | 0.150 |
Number of TE insertions (including overlapping ones).
Frequency estimates where obtained for non overlapping insertions having more than ten absence or presence fragments.
Number of fixed TE insertions (>0.95 frequency).
Fraction of fixed TE insertions nfixed/nfe.
not tested.
***p<0.001.
**p<0.01.
*p<0.05.
The associated p-values indicate whether there is a significant difference from the intergenic regions, assessed by chi-square (for density) Fisher's Exact (for number of fixed insertion) or Mann-Whitney U (for median frequency) tests.
Abundance of TE insertions in telomere proximal, centromere proximal, and normal recombining regions of D. melanogaster.
| n | density (#/Mb) | nfe
| nfixed
| fixed | ||
| normal recombination | all | 4,790 | 54.1 | 3,985 | 399 | 10.0 |
| TIR | 526 | 5.9 | 430 | 46 | 10.7 | |
| INE-1 | 430 | 4.9 | 366 | 267 | 73.0 | |
| LTR | 1,968 | 22.2 | 1,599 | 36 | 2.3 | |
| non-LTR | 1,709 | 19.3 | 1,465 | 36 | 2.5 | |
| centromere proximal | all | 4,547 | 178.8 | 3,145 | 1,847 | 58.7 |
| TIR | 596 | 23.4 | 342 | 206 | 60.2 | |
| INE-1 | 1,371 | 53.9 | 1,143 | 966 | 84.5 | |
| LTR | 1,374 | 54.0 | 867 | 341 | 39.3 | |
| non-LTR | 1,112 | 43.7 | 718 | 305 | 42.5 | |
| telomere proximal | all | 385 | 75.5 | 324 | 119 | 36.7 |
| TIR | 31 | 6.1 | 22 | 8 | 36.4 | |
| INE-1 | 141 | 27.6 | 123 | 101 | 82.1 | |
| LTR | 110 | 21.6 | 91 | 2 | 2.2 | |
| non-LTR | 91 | 17.8 | 76 | 7 | 9.2 |
Number of TE insertions (including overlapping ones).
Frequency estimates where obtained for non overlapping insertions having more than ten absence or presence fragments.
Number of fixed TE insertions (>0.95 frequency).
Fraction of fixed TE insertions nfixed/nfe.
not tested.
***p<0.001.
**p<0.01.
The recombination rate (<1 cm/Mbp) was used to delimit centromere proximal, normal recombining and telomere proximal regions. Note that order totals do not sum to overall totals since some TEs are not classified. The associated p-values indicate whether there is a significant difference to normal recombining regions.
Figure 2Distribution of all (black), fixed (red) and percentage of fixed (dashed blue) TE insertions in our natural population of D. melanogaster.
The total number of TE insertions in a sliding window of 500 kb is plotted against the position in the five major chromosome arms of D. melanogaster. Shaded grey areas represent regions with a low recombination rate (<1 cM/Mbp).
Figure 3Number of novel identified TE insertions compared to known TE insertions for every TE family.
Dashed lines mark the regions of five-fold difference between the number of novel and known TE insertions.
Figure 4Boxplots of population frequencies in our natural population of D. melanogaster for all major TE families found in the Portuguese D. melanogaster population.
The number of TE insertions whose frequencies are represented is indicated below the boxplot; only non-overlapping insertions were used to calculate population frequencies.
Models fit to TE polymorphism data.
| Polymorphic TE insertions | polymorphic insertions (large families only) | |||||
| Full model, using family | Full model, using order | Reduced model (using rec rate) | Reduced model (using TEC) | Reduced model (using rec rate) | Reduced model (using TEC) | |
| Canonical length | + | + | + | + | + | + |
| Distance to nearest gene | + | + | + | + | + | + |
| Global family density (polymorphic insertions in family) | + | + | + | + | + | + |
| Local family density (polymorphic insertions in family within 1 MB) | + | + | + | + | + | + |
| Recombination | + | + | + | + | + | + |
| Taxonomy (family or order) | + | + | + | + | ||
| Chromosome arm | + | + | + | + | ||
| Canonical length*Distance to nearest gene | + | + | + | + | ||
| Distance to nearest gene* global family density | + | + | + | + | ||
| Canonical length * local family density | + | + | ||||
| Canonical length * global family density | − | − | − | − | + | |
| Distance to nearest gene* local family density | + | + | ||||
| Global family density * local family density | + | + | + | + | + | |
| Canonical length * recombination | + | + | + | + | ||
| Distance to nearest gene * recombination | + | + | + | + | ||
| Global family density * recombination | + | + | + | + | + | |
| Local family density * recombination | + | + | ||||
| Rank (age) | − | − | − | − | + | + |
| Rank (age) * local family density | − | − | − | − | + | |
| Rank (age) * canonical length | − | − | − | − | + | + |
| n | 2110 | 2110 | 2110 | 2110 | 671 | 671 |
| Model d.f. | 104 | 22 | 99 | 102 | 13 | 13 |
| Model R-squared | 0.209 | 0.108 | 0.208 | 0.215 | 0.136 | 0.1311 |
| AIC | −1102.49 | −1023.5 | −1109.23 | −1123.68 | −389.8 | −385.7 |
Models containing the full set of independent variables and their second order interactions were fit to log-transformed population frequencies of polymorphic TE insertions (full models). For the reduced models, we started with the full model containing family (rather than order) as a factor, and dropped or retained independent variables using AIC as the criteria (reduced models), with either the recombination rate or the centromere proximal, normal recombining or telomere proximal regions (TEC) used to indicate recombination environment. In a separate analysis, we fit models to the subset of data from the 11 families with age estimates and more than 30 insertions. We started with all the terms in the full model (with order as the taxonomic level), as well as the rank age estimates and second order interactions with age. The terms in the models are indicated by ‘+’ in the table; terms not tested are indicated by ‘−’.
Candidate positively selected TE insertions.
| nr. | chr. | pos. | Family | order | sup. | freq | TE ID | −2 | −1 | 0 | +1 | +2 | closest gene | location | putative function |
| 1 | X | 3,680,043 | mdg1 | LTR | F | 1.00 | FBti0019564 | −1.5282 |
| na | −2.2869 | −2.2165 | FBgn0086899 | intron | regulation of cell shape |
| 2 | X | 4,582,532 | HMS-Beagle | LTR | FR | 1.00 | FBti0060479 | −2.4301 | −1.9550 | −2.2732 |
| −2.0763 | FBgn0011760 | intron | actin filament bundle assembly |
| 3 | X | 17,000,405 | Ninja-Dsim. | ninja | FR | 1.00 | FBti0062283 | −1.1942 |
| na | −1.5800 | −2.3464 | FBgn0065032 | 520 bp us | actin filament organization |
| 4 | X | 18,678,871 | Rt1b | non-LTR | FR | 0.98 | FBti0019082 | −2.0867 |
| na | −1.7790 | −1.9961 | FBgn0030958 | 987 bp us | actin binding |
| 5 | X | 20,254,231 | 3S18 | LTR | R | 1.00 | FBti0019655 | −2.2056 |
| −2.2527 |
| −2.0739 | FBgn0085340 | 380 bp ds | unknown |
| 6 | 2L | 13,783,837 | S-element | TIR | R | 1.00 | FBti0060388 | −2.2020 |
| na | −2.0966 | −2.3344 | FBgn0028539 | 252 bp us | transporter activity |
| 7 | 2R | 5,758,108 | rooA | LTR | R | 1.00 | FBti0061742 | −0.4752 |
| −0.8749 |
| −1.771 | FBgn0011241 | intron | spermatoid development |
| 8 | 2R | 8,072,887 | Accord | LTR | F | 1.00 | - | −2.3306 |
| −1.7256 | −2.2527 | −2.0249 | FBgn0033693 | 3′-UTR | unknown |
| 9 | 2R | 11,540,143 | Roo | LTR | R | 1.00 | - | −1.8110 | −2.0345 |
| −1.6543 | −0.6354 | FBgn0260429 | 2328 bp ds | unknown |
| 10 | 2R | 13,919,899 | Hobo | TIR | F | 1.00 | FBti0059793 | −1.7469 |
| −0.4077 | −1.6845 | −1.9715 | FBgn0034289 | 9988 bp ds | unknown |
| 11 | 3L | 12,181,292 | gypsy12 | LTR | R | 1.00 | FBti0063191 | −0.8079 |
| −0.5754 | −1.8479 | −1.9908 | FBgn0036262 | 332 bp ds | oxidation-reduction process |
| 12 | 3R | 7,394,212 | G5 | non-LTR | F | 1.00 | FBti0020329 | −1.5106 | −1.7627 | −1.9876 |
| −1.7754 | FBgn0025701 | 760 bp us | wing disc dorsal/ventral pattern formation |
| 13 | 3R | 21,152,377 | Doc | non-LTR | F | 1.00 | FBti0019430 | −2.0306 | −1.6583 | na |
| −1.9980 | FBgn0045761 | CDS | RNA-dependent DNA replication |
Tajima's D values below a threshold value (see Material and Methods) are indicated in bold. Sup. indicates whether support for the TE insertion comes from forward reads (F), from reverse reads (R) or from forward and reverse reads (FR); freq.: frequency of the TE at the insertion site; −2, −1, 0, +1, +2: Tajima's D values for nonoverlapping windows of 500 bp surrounding the TE insertion. The window containing the TE insertion has a offset of 0, windows 5′ of the TE insertion have a negative offset (−1, −2) and windows 3′ of the TE insertion have a positive offset (+1, +2). us: upstream; ds: downstream.