| Literature DB >> 31890048 |
Aurélie Teissandier1,2,3,4, Nicolas Servant1,2,3, Emmanuel Barillot1,2,3, Deborah Bourc'his1,4.
Abstract
BACKGROUND: Sequencing technologies give access to a precise picture of the molecular mechanisms acting upon genome regulation. One of the biggest technical challenges with sequencing data is to map millions of reads to a reference genome. This problem is exacerbated when dealing with repetitive sequences such as transposable elements that occupy half of the mammalian genome mass. Sequenced reads coming from these regions introduce ambiguities in the mapping step. Therefore, applying dedicated parameters and algorithms has to be taken into consideration when transposable elements regulation is investigated with sequencing datasets.Entities:
Keywords: Data analysis; High-throughput sequencing; Mapping; Quantification; Retrotransposon
Year: 2019 PMID: 31890048 PMCID: PMC6935493 DOI: 10.1186/s13100-019-0192-1
Source DB: PubMed Journal: Mob DNA
Fig. 1Comparison of mapper efficiency with mouse simulated data. a A diagram showing the method for the data simulation. The circles represent used tools and the rectangles correspond to files. b True Positive (TP) rate versus mapping percentage with chromosome 1 of the mouse genome. The dots are the average values of three independent simulated libraries. SE and PE refer to single end and paired end, respectively. c Use memory, run time and size of the BAM file with chromosome 1 of the mouse genome. The error bars correspond to standard deviation from three independent simulated libraries
Statistics for the different mappers with mouse chromosome 1 simulation data
| Algorithm | Library | Mode | Mapping percentage | True Positive rate | Memory in gbytes | Running Time in minutes | Output size in Mbytes |
|---|---|---|---|---|---|---|---|
| bowtie | PE | unique | 91.87823 | 99.97913 | 0.92 | 3.00 | 583.36 |
| bowtie | SE | unique | 92.05224 | 99.92287 | 0.69 | 1.33 | 311.38 |
| bowtie2 | PE | unique | 94.57886 | 99.93802 | 1.28 | 38.00 | 572.58 |
| bowtie2 | SE | unique | 92.08282 | 99.84845 | 1.18 | 32.67 | 294.64 |
| Bwa aln | PE | unique | 94.62602 | 99.88782 | 2.66 | 15.67 | 553.86 |
| Bwa aln | SE | unique | 96.60879 | 95.82612 | 1.85 | 3.00 | 310.30 |
| Bwa mem | PE | unique | 94.54763 | 99.95728 | 8.77 | 19.33 | 563.50 |
| Bwa mem | SE | unique | 92.08548 | 99.89624 | 8.40 | 4.67 | 299.76 |
| novoalign | PE | unique | 95.55760 | 99.61473 | 7.62 | 226.33 | 609.08 |
| novoalign | SE | unique | 92.08982 | 99.92307 | 7.61 | 31.67 | 315.96 |
| STAR | PE | unique | 95.37882 | 99.80753 | 16.67 | 2.00 | 553.24 |
| STAR | SE | unique | 92.23340 | 99.73004 | 16.18 | 2.33 | 285.06 |
| bowtie | PE | random | 99.95300 | 93.67212 | 0.93 | 3.00 | 596.75 |
| bowtie | SE | random | 99.99001 | 93.04126 | 0.69 | 2.33 | 317.67 |
| bowtie2 | PE | random | 99.99991 | 95.89737 | 1.28 | 35.67 | 607.86 |
| bowtie2 | SE | random | 99.98093 | 92.97406 | 1.18 | 25.67 | 324.26 |
| Bwa aln | PE | random | 99.99998 | 95.94218 | 2.66 | 17.67 | 604.39 |
| Bwa aln | SE | random | 99.99801 | 93.01531 | 1.85 | 4.00 | 322.33 |
| Bwa mem | PE | random | 99.99998 | 95.94068 | 9.42 | 18.33 | 612.39 |
| Bwa mem | SE | random | 99.99998 | 93.01096 | 7.96 | 6.33 | 329.82 |
| novoalign | PE | random | 99.99998 | 95.84899 | 7.62 | 272.00 | 616.78 |
| novoalign | SE | random | 99.99989 | 93.03697 | 7.61 | 30.67 | 322.72 |
| STAR | PE | random | 99.94380 | 95.93094 | 16.67 | 5.00 | 583.02 |
| STAR | SE | random | 99.99024 | 93.01921 | 16.26 | 2.00 | 314.19 |
| bowtie | PE | multi | 99.95300 | 92.89719 | 0.98 | 18.33 | 7289.52 |
| bowtie | SE | multi | 99.99001 | 93.01711 | 0.71 | 9.67 | 2747.64 |
| bowtie2 | PE | multi | 99.99998 | 76.80653 | 11.53 | 28658.67 | 228148.51 |
| bowtie2 | SE | multi | 99.99998 | 70.81391 | 8.74 | 8205.33 | 161697.48 |
| novoalign | PE | multi | 99.99998 | 95.85903 | 7.62 | 307.67 | 2627.41 |
| novoalign | SE | multi | 99.99989 | 93.03718 | 7.61 | 99.00 | 3176.37 |
| STAR | PE | multi | 99.94380 | 95.93265 | 23.95 | 7.00 | 2575.59 |
| STAR | SE | multi | 99.99024 | 93.02143 | 26.64 | 4.00 | 2831.57 |
Values correspond to the average values of three independent simulated libraries with a 10X coverage. SE and PE refer to single end and paired end, respectively. Post-mapping filtering were applied for Bowtie2, Bwa mem and aln algorithms in order to extract uniquely-mapped reads
Statistics for the different mappers with human chromosome 1 simulation data
| Algorithm | Library | Mode | Mapping percentage | True Positive rate | Memory in gbytes | Running Time in minutes | Output size in Mbytes |
|---|---|---|---|---|---|---|---|
| bowtie | PE | unique | 96.12725 | 99.99703 | 1.07 | 4.00 | 717.33 |
| bowtie | SE | unique | 96.26772 | 99.98760 | 0.80 | 1.67 | 381.52 |
| bowtie2 | PE | unique | 97.58530 | 99.99163 | 1.42 | 36.00 | 720.57 |
| bowtie2 | SE | unique | 96.25897 | 99.93671 | 1.33 | 25.33 | 375.46 |
| Bwa aln | PE | unique | 97.58600 | 99.99135 | 3.01 | 13.67 | 703.84 |
| Bwa aln | SE | unique | 98.40958 | 98.52603 | 2.18 | 6.33 | 381.22 |
| Bwa mem | PE | unique | 97.57669 | 99.99745 | 5.65 | 8.33 | 715.38 |
| Bwa mem | SE | unique | 96.28285 | 99.98096 | 5.45 | 4.67 | 379.88 |
| novoalign | PE | unique | 97.83211 | 99.99187 | 8.31 | 99.67 | 745.17 |
| novoalign | SE | unique | 96.28793 | 99.98755 | 8.31 | 21.00 | 385.94 |
| STAR | PE | unique | 97.79129 | 99.99166 | 18.12 | 2.33 | 693.70 |
| STAR | SE | unique | 96.29801 | 99.96226 | 17.71 | 1.00 | 363.12 |
| bowtie | PE | random | 99.95306 | 97.78786 | 1.07 | 4.00 | 722.46 |
| bowtie | SE | random | 99.98993 | 97.48616 | 0.80 | 2.33 | 383.45 |
| bowtie2 | PE | random | 99.99967 | 98.68378 | 1.42 | 47.00 | 738.73 |
| bowtie2 | SE | random | 99.97064 | 97.42861 | 1.33 | 35.67 | 391.06 |
| Bwa aln | PE | random | 99.99998 | 98.68727 | 3.01 | 13.67 | 733.20 |
| Bwa aln | SE | random | 99.99814 | 97.47704 | 2.18 | 7.33 | 387.77 |
| Bwa mem | PE | random | 99.99998 | 98.69222 | 6.05 | 9.33 | 744.88 |
| Bwa mem | SE | random | 99.99998 | 97.47710 | 5.26 | 3.00 | 397.18 |
| novoalign | PE | random | 99.99998 | 98.68797 | 8.31 | 100.67 | 748.47 |
| novoalign | SE | random | 99.99998 | 97.48725 | 8.31 | 27.67 | 388.19 |
| STAR | PE | random | 99.94355 | 98.68767 | 18.12 | 3.33 | 709.61 |
| STAR | SE | random | 99.99103 | 97.47578 | 17.70 | 2.00 | 378.46 |
| bowtie | PE | multi | 99.95306 | 97.41469 | 1.09 | 4.33 | 1032.87 |
| bowtie | SE | multi | 99.98993 | 97.47888 | 0.82 | 2.00 | 540.64 |
| bowtie2 | PE | multi | 99.99998 | 85.55682 | 11.92 | 71150.67 | 81772.06 |
| bowtie2 | SE | multi | 99.99998 | 77.59895 | 6.34 | 62006.33 | 123387.84 |
| novoalign | PE | multi | 99.99998 | 98.68698 | 8.31 | 83.67 | 800.39 |
| novoalign | SE | multi | 99.99998 | 97.48601 | 8.31 | 24.00 | 572.07 |
| STAR | PE | multi | 99.94355 | 98.69066 | 18.12 | 4.00 | 754.66 |
| STAR | SE | multi | 99.99103 | 97.47921 | 17.64 | 2.00 | 541.40 |
Values correspond to the average values of three independent simulated libraries with a 10X coverage. SE and PE refer to single end and paired end, respectively. Post-mapping filtering were applied for Bowtie2, Bwa mem and aln algorithms in order to extract uniquely-mapped reads
Fig. 2Comparison of the methods for the quantification of mouse retrotransposon families. a Comparison of the estimated abundance versus the true abundance for different quantification methods using mouse simulated TE-derived library. An R-squared value (R2) was calculated to evaluate the correlation of estimated values between simulated values b Comparison of the estimated abundance versus the true abundance for TEtools and when randomly reported reads are used for the TE quantification with FeatureCounts (FeatureCounts Random alignments). A PE genome-wide library (10X coverage) was simulated using the mouse genome with STAR for the mapping
Fig. 3Mappability of the different mouse retrotransposon families. a True Positive (TP) rate versus mapping percentage per TE family using STAR and paired-end library with mouse simulated TE-derived reads. Black triangle represents the True Positive rate and percentage of mapping for the entire simulated library. b Mapping percentage versus age of L1Md families. Dot colors represent the True Positive (TP) rate. Ages are obtained from previously published divergence analysis study [24] c Gain of True Positive in percentage versus gain of mapping in percentage when PE library are used in comparison to SE library