| Literature DB >> 21177644 |
Anna-Sophie Fiston-Lavier1, Matthew Carrigan, Dmitri A Petrov, Josefa González.
Abstract
Transposable elements (TEs) are repetitive DNA sequences that are ubiquitous, extremely abundant and dynamic components of practically all genomes. Much effort has gone into annotation of TE copies in reference genomes. The sequencing cost reduction and the newly available next-generation sequencing (NGS) data from multiple strains within a species offer an unprecedented opportunity to study population genomics of TEs in a range of organisms. Here, we present a computational pipeline (T-lex) that uses NGS data to detect the presence/absence of annotated TE copies. T-lex can use data from a large number of strains and returns estimates of population frequencies of individual TE insertions in a reasonable time. We experimentally validated the accuracy of T-lex detecting presence or absence of 768 previously identified TE copies in two resequenced Drosophila melanogaster strains. Approximately 95% of the TE insertions were detected with 100% sensitivity and 97% specificity. We show that even at low levels of coverage T-lex produces accurate results for TE copies that it can identify reliably but that the rate of 'no data' calls increases as the coverage falls below 15×. T-lex is a broadly applicable and flexible tool that can be used in any genome provided the availability of the reference genome, individual TE copy annotation and NGS data.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21177644 PMCID: PMC3064797 DOI: 10.1093/nar/gkq1291
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.T-lex presence and absence detection modules. (a) The ‘presence’ detection module is based on the mapping of the NGS reads on the TE insertion junctions. The TE insertion junctions encompass the flanking region of the TE insertion and the terminal TE insertion sequence. After extracting the two TE insertion junctions, the input data are reformatted. T-lex launches Maq to map the reads on these sequences. The reads are then assembled to obtain two contigs (one for each side). Gaps are represented by ‘Ns’. The alignments of the contigs are used to define the presence and/or absence of the TE insertion (‘Materials and Methods’ section). (b) The ‘absence’ detection module is based on the mapping of the reads on the putative ancestral genomic sequence before the TE insertion. The flanking sequences of the TE insertion are extracted and concatenated. NGS data is reformatted. T-lex maps the NGS reads on the putative ancestral sequence using SHRiMP. T-lex masks the simple sequence repeats and low-complexity sequences of the selected NGS reads. Only the non-fully repetitive reads are used to define the absence of the TE insertion (see `Materials and Methods' section). The repetitive regions are represented here by ‘Ns’.
Combination of the detection results from the two T-lex detection modules
| Absence detection result | Presence detection result | Combination |
|---|---|---|
| Absent | Absent | Absent |
| Absent | No data | Absent/polymorphic |
| Absent | Present | Polymorphic |
| Present | Absent | No data |
| Present | No data | No data |
| Present | Present | Present |
| No data | Absent | No data |
| No data | No data | No data |
| No data | Present | Present/polymorphic |
Figure 2.Comparison of T-lex results with previous TE frequency estimates. The 661 TE insertions for which T-lex returns results in both strains are classified as ‘present’, ‘absent’ and ‘polymorphic’. These results are compared with previous TE insertion frequency estimates that classified the TE insertions as ‘fixed’, ‘common’, ‘rare’ and ‘very rare’ (Supplementary Table S1; 5,26,27).
Comparison of T-lex and PCR results for 56 instances for which T-lex and PCR approaches retuned a result in at least one strain
| TE identifier | Strain | T-lex result | PCR result | T-lex result after manual curation |
|---|---|---|---|---|
| FBti0018879 | Absent | Absent | Absent | |
| FBti0018879 | Present | Present | Present | |
| FBti0018880 | Present | Present | Present | |
| FBti0018884 | Polymorphic | Polymorphic | Polymorphic | |
| FBti0018884 | Absent/polymorphic | Absent | Absent/polymorphic | |
| FBti0018889 | Present | Present | Present | |
| FBti0018889 | Present | Present | Present | |
| FBti0018892 | Polymorphic | Polymorphic | Polymorphic | |
| FBti0018955 | Present | Present | Present | |
| FBti0018955 | Present | Present | Present | |
| FBti0018978 | Absent/polymorphic | Absent | Absent/polymorphic | |
| FBti0018980 | Polymorphic | Polymorphic | Polymorphic | |
| FBti0018980 | Present | Present | Present | |
| FBti0018999 | Present | Present | Present | |
| FBti0018999 | Present | Present | Present | |
| FBti0019056 | Present | Present | Present | |
| FBti0019056 | Present | Present | Present | |
| FBti0019065 | Present | Present | Present | |
| FBti0019081 | Present | Present | Present | |
| FBti0019081 | Present | Present | Present | |
| FBti0019164 | Present | Present | Present | |
| FBti0019164 | Absent | No data | Absent | |
| FBti0019223 | Absent | Absent | Absent | |
| FBti0019294 | Polymorphic | Polymorphic | Polymorphic | |
| FBti0019294 | Present | No data | Present | |
| FBti0019296 | Present | Present | Present | |
| FBti0019344 | Absent/polymorphic | Absent | Absent/polymorphic | |
| FBti0019344 | Present | Present | Present | |
| FBti0019360 | Absent | Absent | Absent | |
| FBti0019372 | Polymorphic | Polymorphic | Polymorphic | |
| FBti0019372 | Absent | Absent | Absent | |
| FBti0019386 | Polymorphic | Polymorphic | Polymorphic | |
| FBti0019386 | Present | Present | Present | |
| FBti0019415 | Absent | Absent | Absent | |
| FBti0019613 | Polymorphic | Polymorphic | Polymorphic | |
| FBti0019624 | Present | Present | Present | |
| FBti0019624 | Absent | Absent | Absent | |
| FBti0019985 | Absent | Absent | Absent | |
| FBti0020042 | Absent | Absent | Absent | |
| FBti0020042 | Absent | Absent | Absent | |
| FBti0020089 | Present | Present | Present | |
| FBti0020091 | Present | Present | Present | |
| FBti0020125 | Absent/polymorphic | Polymorphic | Absent/polymorphic | |
| FBti0020125 | Absent/polymorphic | Absent | Absent/polymorphic | |
| FBti0020190 | Absent | Absent | Absent | |
| FBti0020190 | Absent | Absent | Absent |
Cases for which T-lex nd PCR results differed are highlighted in bold.
aResults do not match after manual curation.
bResults match after manual curation.
cCases of misinference by T-lex (FBti0019223, Fbti0019360).
Figure 3.Impact of the read coverage on T-lex results. T-lex results for 661 TE insertions were obtained using NGS data subsampled to different coverages for (a) Canton-S and (b) W1 strains. The number of ‘no data’, ‘wrong calls’ (i.e. non-compatible results compared to the 50× coverage NGS data), ‘non-definitive calls’ (i.e. ‘absent/polymorphic’ and ‘present/polymorphic’) and ‘definitive calls’ (i.e. ‘absent’, ‘present’ and ‘polymorphic’) results for each coverage are plotted.