| Literature DB >> 24925680 |
Hongshan Jiang1, Rong Lei, Shou-Wei Ding, Shuifang Zhu.
Abstract
BACKGROUND: Adapter trimming is a prerequisite step for analyzing next-generation sequencing (NGS) data when the reads are longer than the target DNA/RNA fragments. Although typically used in small RNA sequencing, adapter trimming is also used widely in other applications, such as genome DNA sequencing and transcriptome RNA/cDNA sequencing, where fragments shorter than a read are sometimes obtained because of the limitations of NGS protocols. For the newly emerged Nextera long mate-pair (LMP) protocol, junction adapters are located in the middle of all properly constructed fragments; hence, adapter trimming is essential to gain the correct paired reads. However, our investigations have shown that few adapter trimming tools meet both efficiency and accuracy requirements simultaneously. The performances of these tools can be even worse for paired-end and/or mate-pair sequencing.Entities:
Mesh:
Year: 2014 PMID: 24925680 PMCID: PMC4074385 DOI: 10.1186/1471-2105-15-182
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Main features of various adapter trimmers
| FastX | × | ○ | ○ | × | × | × | ○ | × | ○ | × | × | × |
| SeqTrim | × | ○ | ○ | × | × | ○ | ○ | ○ | × | × | ○ | ○ |
| TagCleaner | ○ | ○ | ○ | × | × | × | × | × | × | × | × | × |
| EA-Tools | × | ○ | ○ | ○ | × | × | ○ | ○ | ○ | × | ○ | × |
| Cutadapt | ○ | ○ | ○ | ○ | × | ○ | × | ○ | × | × | ○ | × |
| TrimGalore | × | ○ | ○ | ○ | × | × | × | ○ | × | × | ○ | × |
| SeqPrep | × | ○ | × | ○ | × | × | × | × | × | ○ | × | × |
| Btrim | ○ | ○ | ○ | ○ | × | × | × | ○ | ○ | × | × | × |
| Scythe | × | ○ | ○ | × | × | × | × | × | × | × | ○ | × |
| Flexbar | ○ | ○ | ○ | ○ | × | ○ | ○ | ○ | ○ | × | ○ | ○ |
| Trimmomatic | × | ○ | ○ | ○ | × | ○ | × | ○ | × | × | ○ | ○ |
| AdapterRemoval | ○ | ○ | ○ | ○ | × | × | ○ | ○ | × | ○ | × | × |
| AlienTrimmer | ○ | ○ | ○ | ○ | × | ○ | × | ○ | × | × | × | × |
| NextClip | × | × | × | × | ○ | × | × | × | × | × | × | × |
| Skewer | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | ○ | × | ○ | ○ |
For each method, the table shows if it is able to: i) identify adapters in the 5’ end of reads, ii) identify adapters in the 3’ end of reads, iii) process single-end (SE) reads, iv) process paired-end (PE) reads, v) process Nextera long mate-pair (LMP) reads, vi) search for multiple different adapters (Multi), vii) trim subsequences of multiple degenerative characters (Ns), viii) trim low-quality nucleotides (Q), ix) separate multiplexed reads based on barcodes, x) merge overlapped pairs into longer single-end reads, xi) process gzip files directly, and xii) run with multiple threads simultaneously (MT). (○: Yes; ×: No).
Performance of adapter trimmers on 2Gbp simulated data
| FastX | SE | 0.92 | 13.8 | 68.90 | 90.84 | 77.97 | 0.6683 |
| SeqTrim | SE | 0.03 | 115.7 | 67.07 | 85.27 | 81.24 | 0.6618 |
| TagCleaner | SE | 0.54 | 37.6 | 100.0 | 45.50 | 100.0 | 0.5898 |
| EA-Tools | SE | 12.04 | 17.7 | 59.24 | 99.72 | 61.32 | 0.6010 |
| | PE | 11.54 | 30.0 | 59.16 | 99.43 | 61.36 | 0.5983 |
| Cutadapt | SE | 4.36 | 34.5 | 94.55 | 96.27 | 96.93 | 0.9286 |
| | PE | 3.44 | 42.8 | 94.55 | 96.00 | 96.93 | 0.9266 |
| TrimGalore | SE | 3.81 | 19.4 | 59.24 | 99.72 | 61.32 | 0.6010 |
| | PE | 3.26 | 19.6 | 59.16 | 99.44 | 61.36 | 0.5984 |
| SeqPrep | PE | 0.64 | 22.0 | 99.84 | 99.82 | 99.92 | 0.9975 |
| Btrim | SE | 23.63 | 11.2 | 99.96 | 53.44 | 100.0 | 0.6503 |
| | PE | 5.79 | 15.3 | 99.89 | 53.30 | 100.0 | 0.6490 |
| Scythe | SE | 3.15 | 11.2 | 99.56 | 90.86 | 99.92 | 0.9283 |
| Flexbar | SE | 2.82 | 9.5 | 57.90 | 99.12 | 59.48 | 0.5814 |
| | PE | 2.70 | 9.7 | 57.77 | 99.09 | 59.29 | 0.5795 |
| Trimmomatic | SE | 16.73 | 2593.0 | 99.99 | 72.31 | 100.0 | 0.7907 |
| | PE | 16.40 | 2292.0 | 100.0 | 71.54 | 100.0 | 0.7850 |
| AdapterRemoval | SE | 1.67 | 6.3 | 75.09 | 97.74 | 81.89 | 0.7675 |
| | PE | 0.73 | 8.3 | 99.93 | 94.47 | 99.97 | 0.9566 |
| AlienTrimmer | SE | 1.64 | 2319.9 | 85.62 | 57.11 | 99.96 | 0.6769 |
| | PE | 1.61 | 2248.9 | 83.71 | 55.67 | 99.95 | 0.6659 |
| Skewer | SE | 8.79 | 13.6 | 94.56 | 96.32 | 96.93 | 0.9291 |
| PE | 8.88 | 22.2 | 100.0 | 99.86 | 100.0 | 0.9989 | |
Methods that process only single-end (SE) or paired-end (PE) reads are indicated.
Figure 1ROC curves of various adapter trimmers for processing single-end reads of simulated data. ROC: receiver operating characteristic.
Figure 2ROC curves of various adapter trimmers for processing paired-end reads of simulated data. ROC: receiver operating characteristic.
Figure 3Performance of various adapter trimmers on real small RNA data [SRA:SRR014966].
Figure 4Performance of various adapter trimmers on real paired-end data [SRA:SRR330569].
Comparison of NextClip and Skewer in processing Nextera long mate-pair (LMP) reads (ERA264981)
| None | N/A | 14097 | 19150 | 105.80 |
| Paired end (PE) only | N/A | 11781 | 23496 | 104.84 |
| PE and NextClip processed | 1480.39 | 6080 | 309342 | 111.55 |
| PE and Skewer processed | 993.49 (single thread) | 5806 | 312317 | 112.52 |
| 155.96 (8 threads) | ||||
Figure 5Layout of paired-end reads that have adapter contaminants.