| Literature DB >> 24695404 |
Anthony M Bolger1, Marc Lohse2, Bjoern Usadel1.
Abstract
MOTIVATION: Although many next-generation sequencing (NGS) read preprocessing tools already existed, we could not find any tool or combination of tools that met our requirements in terms of flexibility, correct handling of paired-end data and high performance. We have developed Trimmomatic as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data.Entities:
Mesh:
Year: 2014 PMID: 24695404 PMCID: PMC4103590 DOI: 10.1093/bioinformatics/btu170
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Putative sequence alignments as tested in simple mode. The alignment process begins with a partial overlap at the 5′ end of the read (A), increasing to a full-length 5′ overlap (B), followed by full overlaps at all positions (C) and finishes with a partial overlap at the 3′ end of the read (D). Note that the upstream ‘adapter’ sequence is for illustration only and is not part of the read or the aligned region
Fig. 2.Putative sequence alignments as tested in palindrome mode. The alignment process begins with the adapters completely overlapping the reads (A) testing for immediate ‘read-through’, then proceeds by checking for later overlap (B), including partial adapter read-through (C), finishing when the overlap indicates no read-through into the adapters (D)
Fig. 3.How Maximum Information mode combines uniqueness, coverage and error rate to determine the optimal trimming point
Results of alignment of raw data and data trimmed by Trimmomatic from both datasets
| Dataset/aligner | Reads | Tolerant | Strict |
|---|---|---|---|
| Dataset 1 with Bowtie2 aligner | |||
| Unfiltered | 11 008 190 | 9 018 810 | 6 401 927 |
| Trimmomatic—adapters only | 11 008 150 | 9 117 952 | 6 510 253 |
| Trimmomatic—SW | 9 456 826 | 9 079 434 | 8 086 905 |
| Trimmomatic—MI | 9 456 826 | 9 116 627 | 8 748 376 |
| Trimmomatic—adapters and SW | 9 456 819 | 9 150 361 | 8 111 470 |
| Trimmomatic—adapters and MI | 9 456 126 | ||
| Dataset 1 with BWA aligner | |||
| Unfiltered | 11 008 190 | 8 750 851 | 7 834 544 |
| Trimmomatic—adapters only | 11 008 150 | 8 864 884 | 7 942 198 |
| Trimmomatic—adapters and SW | 9 456 819 | 9 110 831 | 8 810 063 |
| Trimmomatic—adapters and MI | 9 456 126 | ||
| Dataset 2 with BWA aligner | |||
| Unfiltered | 801 192 | 60 010 | 11 592 |
| Trimmomatic—adapters only | 801 164 | 121 926 | 68 177 |
| Trimmomatic—adapters and SW | 655 075 | 628 867 | 590 729 |
| Trimmomatic—adapters and MI | 658 796 | ||
Note: Adapter trimming, where done, used palindrome mode. Best values per dataset and aligner are indicated in bold. MI indicates Maximum Information mode, and SW indicates Sliding Window mode.
aAlignment allowing some mismatches and/or INDELs. See Supplementary Methods for more details.
bAligned when no mismatches or INDELs were allowed.
Results of Bowtie2 alignment of dataset 1 showing raw data and the trimmed data by each tool
| Dataset/alignment | Reads | Alignment (paired) | Run time |
|---|---|---|---|
| Tolerant alignment | |||
| Unfiltered | 11 008 190 | 9 018 810 (8,323,786) | N/A |
| Fastx-Toolkit | 9 631 977 | 8 073 757 (N/A) | 670.1/356.3 |
| Reaper | 9 428 331 | 9 057 448 (N/A) | 324.8/166.8 |
| Cutadapt | 9 456 172 | 9 127 667 (N/A) | 342.5/176.7 |
| EA-Utils | 8 995 134 | 8 662 596 (8 578 790) | 9.3/ |
| Scythe/Sickle | 9 453 459 | 9 133 464 (8 636 984) | 529.3/279.7 |
| AdapterRemoval | 9 456 350 | 9 147 915 (8 689 668) | 960.2 |
| Trimmomatic SW | 9 456 819 | 9 150 361 (8 693 000) | 33.7/9.6 |
| Trimmomatic MI | 9 456 819 | 34.3/9.7 | |
| Strict alignment | |||
| Unfiltered | 11 008 190 | 6 401 927 (4 857 606) | N/A |
| Fastx-Toolkit | 8 263 345 | 7 187 257 (N/A) | — |
| Reaper | 9 355 765 | 8 010 326 (N/A) | — |
| Cutadapt | 9 390 371 | 8 086 428 (N/A) | — |
| EA-Utils | 8 910 356 | 7 757 108 (7 056 242) | — |
| Scythe/Sickle | 9 339 668 | 8 060 612 (6 993 076) | — |
| AdapterRemoval | 9 454 189 | 8 103 596 (7 050 788) | — |
| Trimmomatic SW | 9 355 985 | 8 111 470 (7 068 406) | — |
| Trimmomatic MI | 9 456 124 | — | |
Note: Both quality modes are shown for Trimmomatic. Best values are indicated in bold. MI indicates Maximum Information mode, and SW indicates Sliding Window mode.
aTotal reads aligned, and the subset that are aligned as pairs.
bShows wall time, for both serial and parallel execution. See Supplementary Methods for more details.
Results of strict and tolerant BWA alignments of the raw data and trimmed data from each tool (using both quality modes for Trimmomatic) from both datasets
| Dataset | Strict alignments | Tolerant alignments |
|---|---|---|
| Dataset 1 | ||
| Unfiltered | 7 834 544 | 8 750 851 |
| Fastx-Toolkit | 7 187 257 | 7 894 580 |
| Reaper | 8 010 326 | 8 894 757 |
| Cutadapt | 8 086 428 | 8 968 519 |
| EA-Utils | 8 059 850 | 8 896 724 |
| Scythe/Sickle | 8 755 676 | 9 076 936 |
| AdapterRemoval | 8 810 051 | 9 108 691 |
| Trimmomatic SW | 8 810 063 | 9 110 831 |
| Trimmomatic MI | ||
| Dataset 2 | ||
| Unfiltered | 11 592 | 60 010 |
| AdapterRemoval | 513 133 | 574 973 |
| Fastx-Toolkit | 525 519 | 550 695 |
| EA-Utils | 538 472 | 588 046 |
| Scythe/Sickle | 567 976 | 588 135 |
| Cutadapt | 568 044 | 613 089 |
| Trimmomatic SW | 590 729 | 628 867 |
| Trimmomatic MI | ||
Note: Best values are indicated in bold. MI indicates Maximum Information mode, and SW indicates Sliding Window mode.
aReads aligned, zero mismatches permitted.
bReads aligned, one mismatch allowed.