| Literature DB >> 32722451 |
Mourdas Mohamed1, Nguyet Thi-Minh Dang2, Yuki Ogyama1, Nelly Burlet3, Bruno Mugat1, Matthieu Boulesteix3, Vincent Mérel3, Philippe Veber3, Judit Salces-Ortiz3,4, Dany Severac5, Alain Pélisson1, Cristina Vieira3, François Sabot2, Marie Fablet3, Séverine Chambeyron1.
Abstract
Transposable elements (TEs) are the main components of genomes. However, due to their repetitive nature, they are very difficult to study using data obtained with short-read sequencing technologies. Here, we describe an efficient pipeline to accurately recover TE insertion (TEI) sites and sequences from long reads obtained by Oxford Nanopore Technology (ONT) sequencing. With this pipeline, we could precisely describe the landscapes of the most recent TEIs in wild-type strains of Drosophila melanogaster and Drosophila simulans. Their comparison suggests that this subset of TE sequences is more similar than previously thought in these two species. The chromosome assemblies obtained using this pipeline also allowed recovering piRNA cluster sequences, which was impossible using short-read sequencing. Finally, we used our pipeline to analyze ONT sequencing data from a D. melanogaster unstable line in which LTR transposition was derepressed for 73 successive generations. We could rely on single reads to identify new insertions with intact target site duplications. Moreover, the detailed analysis of TEIs in the wild-type strains and the unstable line did not support the trap model claiming that piRNA clusters are hotspots of TE insertions.Entities:
Keywords: Drosophila melanogaster; Drosophila simulans; ONT; piRNA; transposable elements
Mesh:
Substances:
Year: 2020 PMID: 32722451 PMCID: PMC7465170 DOI: 10.3390/cells9081776
Source DB: PubMed Journal: Cells ISSN: 2073-4409 Impact factor: 6.600
Statistics for the de novo assemblies before scaffolding. All lengths are expressed in bases. The Benchmarking Universal Single-Copy Orthologs (BUSCO) score indicates the “complete hit” level.
| Name | Size | Nb contig | Mean Length | Longest | N50 | L50 | BUSCO Score, % |
|---|---|---|---|---|---|---|---|
| dmgoth101 | 130,483,042 | 1213 | 107,571 | 20,963,225 | 14,899,963 | 4 | c: 98.6 |
| dmgoth63 | 134,481,426 | 1005 | 133,812 | 22,615,553 | 16,996,519 | 4 | c:98.03 |
| dmsj23 | 131,331,777 | 1094 | 120,047 | 22,945,221 | 10,553,205 | 5 | c:98.5 |
| dmsj7 | 131,360,683 | 1197 | 109,742 | 18,094,419 | 6,212,683 | 7 | c:98.7 |
| dsgoth31 | 135,039,133 | 822 | 164,281 | 27,577,085 | 17,530,992 | 4 | c: 98.3 |
| dsgoth613 | 132,908,190 | 918 | 144,78 | 22,559,698 | 16,120,890 | 4 | c:98.6 |
| dssj27 | 134,309,820 | 866 | 155,092 | 27,370,717 | 20,976,825 | 3 | c:98.6 |
| dssj9 | 142,009,588 | 508 | 279,546 | 27,589,620 | 19,611,840 | 4 | c:99 |
| G0 | 127,415,251 | 642 | 198,466 | 5,037,957 | 1,208,862 | 33 | c:93.7 |
| G0-F100 | 139,374,117 | 836 | 166,715 | 17,781,420 | 9,085,947 | 6 | c:98.97 |
| G73 | 144,335,962 | 584 | 247,15 | 24,539,270 | 12,530,957 | 4 | c:98.7 |
Figure 1Schematic of the method used for genome assembly and for transposable element insertion (TEI) detection. Global variants (black) were detected from genome assemblies, and minor variants (gray) by remapping reads in these assemblies. The reference genomes used for RaGOO scaffolding were Dmel_R6.23 for G0 and for wild-type D. melanogaster strains, Dsim_R2.02 for wild-type D. simulans strains, and the G0 assembly for G73 and G0-F100.
Figure 2Estimation of the TE percentage in the D. melanogaster and D. simulans genomes (isogenic wild-type strains). (a) Estimation of the TE percentage using RepeatMasker (ONT chromosome assemblies), and dnaPipeTE or TEcount (Illumina reads). (b) Correlations between the estimates obtained with the indicated methods.
Figure 3Insertion site numbers for each TE group and per chromosome, determined using Illumina data (upper panels) or Oxford Nanopore Technology (ONT) chromosome assemblies (lower panels).
Number of transposable elements insertions (TEIs) identified as global variants in the Oxford Nanopore Technology (ONT) chromosome assemblies.
| dmgoth63 | dmgoth101 | dmsj23 | dmsj7 | dsgoth613 | dsgoth31 | dssj27 | dssj9 | |
|---|---|---|---|---|---|---|---|---|
| Total Insertion Number | 515 | 448 | 550 | 456 | 434 | 496 | 420 | 474 |
Figure 4Global variant copy numbers in wild-type D melanogaster and D. simulans strains. (a) Number of shared global variants among strains. The color scale (on the right of each panel) shows the distance based on the number of pairwise shared insertions (indicated in black in the figure). Values in white correspond to the total numbers of the identified insertions for the considered strains. (b) Mean TEI numbers for the indicated TE groups computed in the wild-type D melanogaster and D. simulans strains based on the ONT chromosome assemblies.
Figure 5Global variant sequence analysis in wild-type D. melanogaster and D. simulans strains. (a) Distributions of TE copy lengths (i.e., fragment size) in bp for all global variants across strains and TE groups. (b) Intra-family sequence divergence (average Kimura distance) computed per strain and per TE family.
Figure 6piRNA analyses in wild-type D. melanogaster and D. simulans strains. (a) Normalized piRNA counts (log10) relative to genome occupancy for all strains and the two species and linear regression curve. Each dot is a TE family. (b) Results for the dmgoth63 and dsgoth31 strains are shown as examples. Uniquely mapping piRNAs along ONT chromosome assemblies (black, normalized piRNA counts). Global variants identified along ONT chromosome assemblies (gray). Red arrows indicate flamenco (X chromosome) and 42AB (2R chromosome). Data for the other strains are provided in Figure S2. The off-scale peaks might correspond to microRNAs that are absent from miRBase.
Figure 7Characterization of the Long-Terminal Repeat minor insertion variant (LTR MIV) in the stable (G0) and unstable (G73) lines. (a) ZAM copies visualized by fluorescent in situ hybridization in G0 (left) and G73 (right) polytene chromosomes. The two global variants correspond to non-reference ZAM copies present in G0 and G73 (asterisks in the zoomed images). Arrowheads show the new ZAM insertions in G73. More examples are presented in Figure S3. (b) Dot plot of the sequence comparison between the ZAM sequences accessed from the de novo assembled G0 genome and the ZAM consensus sequence. (c) Heat map of the LTR MIV detected in the G0-F100 (stable) and G73 (unstable) libraries. (d) Histograms showing the number of reads supporting each LTR MIV. (e) Sequence logo of TSD defined using the LTR MIV automatic detection procedure. (f) the ZAM TSM motif defined using the automatic and manual LTR MIV detection procedures.
Target site duplication (TSD) flanking Long-Terminal Repeat minor insertion variants (LTR MIVs) in the G73 line.
| LTR Family | ||||||
|---|---|---|---|---|---|---|
| gtwin | roo | ZAM | copia | Blood | mdg3 | |
| Total LTR MIV detected ( | 93 | 48 | 51 | 35 | 10 | 10 |
| TSD automatic detection ( | 66 | 15 | 25 | 11 | 8 | 5 |
| TSD automatic detection (%) | 71 | 31 | 49 | 31 | 80 | 50 |
| Additional TSD manually detected ( | NA | NA | 23 | NA | NA | NA |
Figure 8Heat map of the LTR MIVs inserted in piRNA clusters and detected in the G0-F100 and G73 lines.