| Literature DB >> 24142950 |
Jiajie Zhang1, Kassian Kobert, Tomáš Flouri, Alexandros Stamatakis.
Abstract
MOTIVATION: The Illumina paired-end sequencing technology can generate reads from both ends of target DNA fragments, which can subsequently be merged to increase the overall read length. There already exist tools for merging these paired-end reads when the target fragments are equally long. However, when fragment lengths vary and, in particular, when either the fragment size is shorter than a single-end read, or longer than twice the size of a single-end read, most state-of-the-art mergers fail to generate reliable results. Therefore, a robust tool is needed to merge paired-end reads that exhibit varying overlap lengths because of varying target fragment lengths.Entities:
Mesh:
Year: 2013 PMID: 24142950 PMCID: PMC3933873 DOI: 10.1093/bioinformatics/btt593
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Three possible scenarios for paired-end read lengths and target DNA fragment lengths. (A) Short overlap between the paired-end reads; (B) no overlap between the paired-end reads; (C) single-end read length is larger than the target DNA fragment length
Simulated dataset of paired-end reads with different overlap sizes
| Software | Merged | Correct (−) | Correct (%) | FPR (%) |
|---|---|---|---|---|
| 100-bp paired-end reads with no overlaps (23 096 pairs) | ||||
| COPE (mode0) | 31 | 23 065 | 99.87 | 0.13 |
| FLASH | 0 | 23 096 | 100 | 0 |
| PANDAseq (default) | 12 796 | 10 300 | 44.59 | 55.4 |
| PANDAseq (−o = 10) | 10 562 | 12 534 | 54.27 | 45.7 |
| PEAR (test disabled) | 8 184 | 14 912 | 64.57 | 35.4 |
| PEAR ( | 8 | 23 088 | 99.96 | 0.03 |
| PEAR (MAP = 0.01) | 33 | 23 063 | 99.86 | 0.14 |
| 100-bp paired-end reads with 10-bp mean overlaps (24 969 pairs) | ||||
| COPE (mode0) | 5 755 | 5 709 | 22.86 | 0.80 |
| FLASH | 8 968 | 8 309 | 33.27 | 7.34 |
| PANDAseq (default) | 19 616 | 14 690 | 58.83 | 25.11 |
| PANDAseq (−o = 10) | 17 783 | 12 053 | 48.27 | 32.22 |
| PEAR (test disabled) | 19 691 | 17 112 | 68.53 | 13.10 |
| PEAR ( | 9 365 | 9 315 | 37.31 | 0.53 |
| PEAR (MAP = 0.01) | 10 080 | 10 015 | 40.11 | 0.64 |
| 100-bp paired-end reads with 20-bp mean overlaps (25 858 pairs) | ||||
| COPE(mode0) | 9 819 | 9 750 | 37.71 | 0.70 |
| FLASH | 10 917 | 10 843 | 41.93 | 0.67 |
| PANDAseq (default) | 23 136 | 21 596 | 83.51 | 6.65 |
| PANDAseq (−o = 10) | 22 736 | 20 722 | 80.14 | 8.85 |
| PEAR (test disabled) | 24 153 | 23 386 | 90.44 | 3.16 |
| PEAR ( | 18 202 | 18 115 | 70.06 | 0.48 |
| PEAR (MAP = 0.01) | 19 265 | 19 165 | 74.12 | 0.52 |
| 100-bp paired-end reads with 35-bp mean overlaps (27 026 pairs) | ||||
| COPE (mode0) | 11 771 | 11 693 | 43.27 | 0.66 |
| FLASH | 15 603 | 15 507 | 57.37 | 0.61 |
| PANDAseq (default) | 26 068 | 25 849 | 95.64 | 0.84 |
| PANDAseq (−o = 10) | 26 267 | 26 026 | 96.29 | 0.92 |
| PEAR(test disabled) | 26 866 | 26 712 | 98.84 | 0.57 |
| PEAR ( | 25 939 | 25 833 | 95.59 | 0.41 |
| PEAR (MAP = 0.01) | 26 380 | 26 273 | 97.21 | 0.41 |
| 100-bp paired-end reads with 50-bp mean overlaps (28 339 pairs) | ||||
| COPE (mode0) | 7 915 | 7 858 | 27.73 | 0.72 |
| FLASH | 20 025 | 19 940 | 70.36 | 0.42 |
| PANDAseq (default) | 27 939 | 27 834 | 98.21 | 0.37 |
| PANDAseq (−o = 10) | 28 049 | 27 944 | 98.61 | 0.37 |
| PEAR(test disabled) | 28 335 | 28 234 | 99.63 | 0.36 |
| PEAR ( | 28 288 | 28 190 | 99.47 | 0.35 |
| PEAR (MAP = 0.01) | 28 329 | 28 229 | 99.61 | 0.35 |
| 150-bp paired-end reads with 100-bp mean overlaps (33 217 pairs) | ||||
| COPE (mode0) | 43 | 0 | 0 | 100 |
| FLASH | 44 | 0 | 0 | 100 |
| PANDAseq (default) | 11 417 | 0 | 0 | 100 |
| PANDAseq (−o = 10) | 14 146 | 0 | 0 | 100 |
| PEAR (test disabled) | 33 187 | 33 071 | 99.56 | 0.35 |
| PEAR ( | 33 136 | 33 022 | 99.41 | 0.34 |
| PEAR (MAP = 0.01) | 33 185 | 33 071 | 99.56 | 0.34 |
A total of 647 052 paired-end reads with mean fragment size 180-bp and read length 101-bp (S.aureus genome)
| Software | Merged | Correct (−) | Correct (%) | FPR (%) |
|---|---|---|---|---|
| COPE (full mode) | 373 543 | 369 683 | 57.13 | 1.03 |
| FLASH | 369 276 | 361 663 | 55.89 | 2.06 |
| PANDAseq (default) | 534 839 | 418 747 | 64.72 | 21.71 |
| PANDAseq (−o = 10) | 533 618 | 407 477 | 62.97 | 23.64 |
| PEAR (test disabled) | 411 321 | 391 157 | 60.45 | 4.90 |
| PEAR ( | 202 221 | 199 764 | 30.87 | 1.22 |
| PEAR (MAP = 0.01) | 257 409 | 251 714 | 38.90 | 2.21 |
Single template 198-bp sequence dataset of 673 845 108-bp paired-end reads
| Software | Merged | Correct (−) | FPR (%) | ER |
|---|---|---|---|---|
| COPE (full mode) | 0 | 0 | ||
| FLASH | 660 984 | 660 030 | 0.14 | 0.4594 |
| PANDAseq (default) | 660 593 | 657 602 | 0.45 | 0.4333 |
| PANDAseq (−o = 10) | 660 522 | 657 609 | 0.44 | 0.4304 |
| PEAR (test disabled) | 663 025 | 661 717 | 0.20 | 0.4753 |
| PEAR ( | 576 225 | 576 035 | 0.03 | 0.1470 |
| PEAR (MAP = 0.01) | 578 887 | 578 679 | 0.04 | 0.1486 |
Fig. 2.Parallel speedups of PEAR, FLASH and PANDAseq on the single template sequences dataset. The sequential runtimes for the three mergers are 98, 58 and 39 s, respectively