| Literature DB >> 23883524 |
Julie M Cridland1, Stuart J Macdonald, Anthony D Long, Kevin R Thornton.
Abstract
Here we present computational machinery to efficiently and accurately identify transposable element (TE) insertions in 146 next-generation sequenced inbred strains of Drosophila melanogaster. The panel of lines we use in our study is composed of strains from a pair of genetic mapping resources: the Drosophila Genetic Reference Panel (DGRP) and the Drosophila Synthetic Population Resource (DSPR). We identified 23,087 TE insertions in these lines, of which 83.3% are found in only one line. There are marked differences in the distribution of elements over the genome, with TEs found at higher densities on the X chromosome, and in regions of low recombination. We also identified many more TEs per base pair of intronic sequence and fewer TEs per base pair of exonic sequence than expected if TEs are located at random locations in the euchromatic genome. There was substantial variation in TE load across genes. For example, the paralogs derailed and derailed-2 show a significant difference in the number of TE insertions, potentially reflecting differences in the selection acting on these loci. When considering TE families, we find a very weak effect of gene family size on TE insertions per gene, indicating that as gene family size increases the number of TE insertions in a given gene within that family also increases. TEs are known to be associated with certain phenotypes, and our data will allow investigators using the DGRP and DSPR to assess the functional role of TE insertions in complex trait variation more generally. Notably, because most TEs are very rare and often private to a single line, causative TEs resulting in phenotypic differences among individuals may typically fail to replicate across mapping panels since individual elements are unlikely to segregate in both panels. Our data suggest that "burden tests" that test for the effect of TEs as a class may be more fruitful.Entities:
Keywords: DGRP; DSPR; genomics; population genetics; transposable element
Mesh:
Substances:
Year: 2013 PMID: 23883524 PMCID: PMC3773372 DOI: 10.1093/molbev/mst129
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
F(A) Identity by descent in 148 DGRP lines. (B) IBD in the top 15 DGRP lines by average sequence coverage. (C) IBD in the 15 DSPR lines. Masked regions indicate regions of IDB ≥95%. When two lines were considered IBD in a region, the line with lower mean coverage was masked.
Summary of TE Insertions.
| DGRP | DGRP25 | DSPR | |
|---|---|---|---|
| Total | 17,639 | 5,855 | 7,104 |
| Not present in reference | 17,346 | 5,615 | 6,812 |
| Present in reference | 293 | 240 | 292 |
| X, recombination ≤ 2 cM/Mb | 418 | 149 | 203 |
| X, recombination > 2 cM/Mb | 2,741 | 948 | 1,107 |
| Autosomes, recombination ≤ 2 cM/Mb | 6,205 | 2,131 | 2,694 |
| Autosomes, recombination > 2 cM/Mb | 8,177 | 2,544 | 3,019 |
| 4, all | 98 | 83 | 81 |
| Exon | 1,158 | 378 | 633 |
| Intron | 8,595 | 2,870 | 3,310 |
| 3′ UTR | 477 | 159 | 234 |
| 5′ UTR | 190 | 63 | 81 |
| Intergenic | 7,219 | 2,680 | 3,269 |
| DNA elements | 1,388 | 514 | 748 |
| RNA elements | 10,133 | 3,004 | 4,085 |
| Indeterminate | 6,118 | 2,336 | 2,271 |
FTotal number of TEs identified versus coverage for the DGRP, DGRP25, and DSPR lines.
Differences between Observed and Expected TE Counts.
| DGRP | DGRP25 | DSPR | ||||
|---|---|---|---|---|---|---|
| Observed vs. Expected | Observed vs. Expected | Observed vs. Expected | ||||
| X, recombination ≤ 2 cM/Mb | Decrease | 3.29E−11 | Decrease | 1.87E−03 | Decrease | 5.22E−02 |
| X, recombination > 2 cM/Mb | Decrease | 1.74E−01 | Increase | 7.01E−02 | Increase | 4.16E−01 |
| Autosomes, recombination ≤ 2 cM/Mb | Decrease | 3.70E−01 | Increase | 1.00E+00 | Increase | 6.87E−04 |
| Autosomes, recombination > 2 cM/Mb | Increase | 3.92E−02 | Decrease | 1.00E+00 | Decrease | 1.00E+00 |
| Exon | Decrease | 0.00E+00 | Decrease | 2.47E−323 | Decrease | 1.03E−283 |
| Intron | Increase | 3.54E−103 | Increase | 0.00E+00 | Increase | 0.00E+00 |
| 3′ UTR | Decrease | 6.99E−16 | Decrease | 3.13E−06 | Decrease | 1.26E−02 |
| 5′ UTR | Decrease | 1.43E−51 | Decrease | 2.82E−18 | Decrease | 4.98E−20 |
| Intergenic | Increase | 1.12E−32 | Increase | 0 | Increase | 0.00E+00 |
ANOVAs for DSRP and DGRP25 Coverage and Comparison between the Two Data Sets.
| df | Sum Sq. | Mean Sq. | Pr(> | ||
|---|---|---|---|---|---|
| DSPR vs. DGRP25 | |||||
| Set | 1 | 541.25 | 541.25 | 648.6975 | <2e−16 |
| Line | 36 | 284.8 | 7.91 | 9.4817 | <2e−16 |
| Chromosome | 4 | 118.58 | 29.65 | 35.5313 | <2e−16 |
| Recombination rate (high vs. low) | 1 | 255.75 | 255.75 | 306.5207 | <2e−16 |
| Chromosome*recombination rate | 4 | 103.35 | 25.84 | 30.9663 | <2e−16 |
| Residuals | 333 | 277.84 | 0.83 | ||
| DSPR | |||||
| Line | 14 | 26.599 | 1.9 | 1.7894 | 0.04699 |
| Chromosome | 4 | 39.632 | 9.908 | 9.3316 | 1.23e−06 |
| Recombination rate (high vs. low) | 1 | 141.337 | 141.337 | 133.1143 | <2e−16 |
| Chromosome*recombination rate | 4 | 39.712 | 9.928 | 9.3504 | 1.19e−06 |
| Residuals | 126 | 133.783 | 1.062 | ||
| DGRP25 | |||||
| Line | 22 | 258.202 | 11.736 | 17.23 | <2e−16 |
| Chromosome | 4 | 80.126 | 20.031 | 29.409 | <2e−16 |
| Recombination rate (high vs. low) | 1 | 120.011 | 120.011 | 176.191 | <2e−16 |
| Chromosome*recombination rate | 4 | 66.056 | 16.514 | 24.244 | 2.43e−16 |
| Residuals | 198 | 134.866 | 0.681 | ||
TE Density in the X and Autosomes.
| TE/Mb | |||
|---|---|---|---|
| DGRP25 | DSPR | Both | |
| X, all | 3.82 | 6.38 | 4.83 |
| Autosomes, all | 3.51 | 5.93 | 4.47 |
| X, high recombination | 3.71 | 6.00 | 4.61 |
| X, low recombination | 3.93 | 6.76 | 5.05 |
| 2L, high recombination | 2.27 | 4.73 | 3.24 |
| 2L, low recombination | 3.23 | 5.89 | 4.28 |
| 2R, high recombination | 2.82 | 4.89 | 3.64 |
| 2R, low recombination | 6.17 | 8.63 | 7.14 |
| 3L, high recombination | 2.96 | 5.23 | 3.86 |
| 3L, low recombination | 4.74 | 7.26 | 5.73 |
| 3R, high recombination | 2.81 | 4.68 | 3.55 |
| 3R, low recombination | 3.72 | 6.70 | 4.90 |
TE Density for 15 Individual Families of Elements.
| Mean Density (TE/Mb) | |||||
|---|---|---|---|---|---|
| Element | Resource | X High | X Low | Auto High | Auto Low |
| DGRP25 | 0.39 | 0.37 | 0.32 | 0.33 | |
| DSPR | 0.80 | 0.88 | 0.65 | 0.68 | |
| DGRP25 | 0.05 | 0.00 | 0.02 | 0.01 | |
| DSPR | 0.05 | 0.00 | 0.04 | 0.00 | |
| DGRP25 | 0.03 | 0.01 | 0.03 | 0.07 | |
| DSPR | 0.06 | 0.08 | 0.09 | 0.12 | |
| DGRP25 | 0.02 | 0.06 | 0.04 | 0.06 | |
| DSPR | 0.07 | 0.08 | 0.12 | 0.13 | |
| DGRP25 | 0.01 | 0.00 | 0.01 | 0.02 | |
| DSPR | 0.02 | 0.00 | 0.02 | 0.06 | |
| DGRP25 | 0.01 | 0.00 | 0.04 | 0.01 | |
| DSPR | 0.02 | 0.02 | 0.06 | 0.02 | |
| DGRP25 | 0.01 | 0.00 | 0.02 | 0.03 | |
| DSPR | 0.09 | 0.02 | 0.14 | 0.19 | |
| DGRP25 | 0.01 | 0.00 | 0.01 | 0.00 | |
| DSPR | 0.00 | 0.00 | 0.01 | 0.00 | |
| DGRP25 | 0.10 | 0.01 | 0.00 | 0.00 | |
| DSPR | 0.10 | 0.04 | 0.02 | 0.02 | |
| DGRP25 | 0.42 | 1.88 | 0.03 | 0.48 | |
| DSPR | 0.45 | 2.03 | 0.04 | 0.55 | |
| DGRP25 | 0.15 | 0.12 | 0.19 | 0.13 | |
| DSPR | 0.35 | 0.34 | 0.40 | 0.33 | |
| DGRP25 | 0.03 | 0.02 | 0.03 | 0.10 | |
| DSPR | 0.08 | 0.04 | 0.12 | 0.18 | |
| DGRP25 | 0.06 | 0.04 | 0.07 | 0.05 | |
| DSPR | 0.13 | 0.21 | 0.15 | 0.13 | |
| DGRP25 | 0.00 | 0.01 | 0.01 | 0.03 | |
| DSPR | 0.02 | 0.00 | 0.01 | 0.05 | |
FDerived allele count spectra for the DSRP lines and the DGRP25 lines where a positive presence or absence call was made for each insertion in each line, 6,613 insertions in the DSPR and 3,274 in the DGRP25. Count spectra for SNPs is from SNPs in introns ≤86 bp. χ2 tests between observed and expected distributions result in P ≈ 0 for comparisons between TEs and the neutral model as well as between TEs and SNPs for both data sets.
FTransposable element insertions in genes. The frequency above each insertion is the number of lines in which the element is present over the number of lines in which the element is validated as either present or absent. Gene images are from the UCSC genome browser (http://genome.ucsc.edu/, last accessed June 31, 2012).
Flog10(TE density) versus log10(Gene family size).