| Literature DB >> 22709551 |
Xiaoqing Yu1, Kishore Guda, Joseph Willis, Martina Veigl, Zhenghe Wang, Sanford Markowitz, Mark D Adams, Shuying Sun.
Abstract
BACKGROUND: Next-generation sequencing technologies generate a significant number of short reads that are utilized to address a variety of biological questions. However, quite often, sequencing reads tend to have low quality at the 3' end and are generated from the repetitive regions of a genome. It is unclear how different alignment programs perform under these different cases. In order to investigate this question, we use both real data and simulated data with the above issues to evaluate the performance of four commonly used algorithms: SOAP2, Bowtie, BWA, and Novoalign.Entities:
Year: 2012 PMID: 22709551 PMCID: PMC3414812 DOI: 10.1186/1756-0381-5-6
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Algorithm of four aligners: SOAP2, Bowtie, BWA, and Novoalign
| Indexing | FM-index | FM-index | FM-index | Hash table |
| Inexact match | Split read | Quality-aware backtracking | Backtracking | Alignment scoring |
*version of the program.
Available options in SOAP2, Bowtie, BWA, and Novoalign
| Mismatch allowed | exactly 0,1,2 | max in seed, 0-3 max in read | up to | up to 8 or more in single end; |
| Alignments reported per read | random/all/none | up to any | up to any | random/all/none/ |
| Gap alignment | 1-3 bp gap | unavailable | available | up to 7 bp |
| Pair-end reads | available | available | available | available |
| Best alignment | minimal number of mismatch | minimal number of mismatch | minimal number of mismatch | highest alignment score |
| Trim bases | 3’ end | 3’ and 5’ end | available | 3’ end** |
*Given a read of length m, less than 4% of m-long reads with 2% uniform base error rate may have more than k mismatches. For m = 15-37 bp, k = 2; for m = 38-64 bp, k = 3; for m = 64-92 bp, k = 4; for m = 92-123 bp, k = 5; for m = 124-156 bp, k = 6.
**only available for single-end reads.
Figure 1Mean quality score and standard deviation for each base in S1 and S2 data sets. Quality score is assessed in Illumina FASTQ format.
Figure 2The four classes to which all reads are assigned during a pair-wise comparison. Class1 is a group of reads each of which is assigned to the same location by aligners 1 and 2; Class 2 is a group of reads each of which is assigned to a different location by aligner 1 and 2; Class 3 is a group of reads each of which is only aligned by aligner 1; Class 4 is a group of reads each of which is aligned only by aligner 2.
Indexing and alignment time of four alignment programs
| SOAP2 (2.20) | 89.50 | 15.4 | 75.96 |
| Bowtie (0.12.3) | 192.00 | 21.2 | 75.71 |
| BWA (0.5.8 C) | 101.50 | 26.4 | 76.12 |
| Novoalign (2.07.00) | 4.02 | 62.9 | 74.61 |
Index is built on the human genome 18 for each aligner. 7.4 million single-end reads are then mapped onto the human genome 18. The read length is 68 bp. At most two mismatches are allowed in all programs, and one alignment is randomly reported for each read. The CPU time in minutes on dual quad-core 2.66Ghz Xeon E5430 processor for index building and alignment processing, as well as percent of mapped reads, are shown in this table.
Percentage of reads aligned in S1 and S2 data sets by four aligners under different settings
| Randomly report one alignment per read | SOAP2 | 75.96% | 91.45% | 42.12% | 76.81% |
| Bowtie | 75.71% | 91.36% | 41.83% | 76.67% | |
| BWA | 76.12% | 91.80% | 41.94% | 76.88% | |
| Novoalign | 73.64% | 91.60% | 34.50% | 76.94% | |
| Suppress reads w/ multiple alignments | SOAP2 | 71.85% | 85.90% | 39.75% | 71.31% |
| | Bowtie | 68.82% | 81.90% | 38.89% | 68.63% |
| | BWA | 74.40% | 84.07% | 39.12% | 69.75% |
| Novoalign | 69.67% | 86.09% | 32.63% | 71.63% | |
Agreement among aligners in S1 non-trimmed data
| Class 4 | |||||
|---|---|---|---|---|---|
| Randomly report one alignment per read | SOAP2 vs. Bowtie1 (5,626,038)2 | 96.25% | 3.41% | 0.34% | 0.002% |
| SOAP2 vs.BWA (5,656,559) | 95.72% | 3.40% | 0.34% | 0.54% | |
| Bowtie vs. BWA (5,637,504) | 95.80% | 3.66% | 0.00002% | 0.54% | |
| SOAP2 vs. Novoalign (5,757,260) | 85.13% | 7.32% | 5.27% | 2.28% | |
| Bowtie vs. Novoalign (5,748,724) | 85.18% | 7.26% | 5.13% | 2.47% | |
| BWA vs. Novoalign (5,835,451) | 85.20% | 7.24% | 5.37% | 2.19% | |
| Suppress reads with multiple alignments | SOAP2 vs. Bowtie (5,321,512) | 95.78% | 0.00002% | 4.22% | 0.003% |
| | SOAP2 vs.BWA (5,361,466) | 96.50% | 0.0005% | 2.75% | 0.75% |
| | Bowtie vs. BWA (5,213,871) | 97.76% | 0.00% | 0.0004% | 2.24% |
| | SOAP2 vs. Novoalign (5,447,206) | 88.14% | 4.27% | 5.28% | 2.31% |
| | Bowtie vs. Novoalign (5,432,410) | 84.72% | 4.08% | 5.02% | 6.18% |
| BWA vs. Novoalign (5,458,788) | 85.92% | 4.11% | 5.48% | 4.49% |
1. Comparison pair in the format of aligner 1 vs. aligner 2.
2. Total number of reads aligned by either of these two aligners in a comparison pair.
Agreement among aligners in S2 non-trimmed data
| Randomly report one alignment per read | SOAP2 vs. Bowtie1 (2,209,957)2 | 95.69% | 3.62% | 0.69% | 0.003% |
| SOAP2 vs.BWA (2,215,397) | 95.45% | 3.61% | 0.69% | 0.25% | |
| Bowtie vs. BWA (2,200,129) | 95.37% | 3.70% | 0.00% | 0.25% | |
| SOAP2 vs. Novoalign (2,436,379) | 49.58% | 15.40% | 25.72% | 9.26% | |
| Bowtie vs. Novoalign (2,424,001) | 49.81% | 15.38% | 25.53% | 9.46% | |
| BWA vs. Novoalign (2,428,458) | 49.68% | 15.44% | 25.48% | 9.40% | |
| Suppress reads with multiple alignments | SOAP2 vs. Bowtie (2,085,316) | 97.84% | 0.00% | 2.15% | 0.007% |
| | SOAP2 vs.BWA (2,094,218) | 97.57% | 0.0008% | 1.99% | 0.43% |
| | Bowtie vs. BWA (2,052,464) | 99.94% | 0.00% | 0.0003% | 0.59% |
| | SOAP2 vs. Novoalign (2,303,060) | 51.37% | 13.36% | 25.71% | 9.46% |
| | Bowtie vs. Novoalign (2,283,644) | 50.93% | 13.35% | 25.07% | 10.65% |
| BWA vs. Novoalign (2,292,171) | 50.86% | 13.33% | 25.35% | 10.46% |
1. Comparison pair in the format of aligner 1 vs. aligner 2.
2. Total number of reads aligned by either of these two aligners in a comparison pair.
Agreement among aligners in S1 trimmed data
| Randomly report one alignment per read | SOAP2 vs. Bowtie1 (6,409,534)2 | 95.89% | 3.95% | 0.13% | 0.03% |
| SOAP2 vs.BWA (6,440,873) | 95.42% | 3.92% | 0.13% | 0.52% | |
| Bowtie vs. BWA (6,432,433) | 95.30% | 4.21% | 0.00002% | 0.49% | |
| SOAP2 vs. Novoalign (6,430,033) | 94.62% | 4.84% | 0.13% | 0.35% | |
| Bowtie vs. Novoalign (6,422,084) | 94.77% | 4.84% | 0.07% | 0.33% | |
| BWA vs. Novoalign (6,435,917) | 94.83% | 4.84% | 0.30% | 0.05% | |
| Suppress reads with multiple alignments | SOAP2 vs. Bowtie (6,020,802) | 95.29% | 0.0002% | 4.68% | 0.003% |
| | SOAP2 vs.BWA (6,068,512) | 96.26% | 0.0005% | 2.93% | 0.81% |
| | Bowtie vs. BWA (5,890,868) | 97.42% | 0.00% | 0.0004% | 2.58% |
| | SOAP2 vs. Novoalign (6,043,150) | 98.47% | 0.95% | 0.18% | 0.40% |
| | Bowtie vs. Novoalign (6,035,510) | 94.11% | 0.92% | 0.06% | 4.92% |
| BWA vs. Novoalign (6,066,586) | 95.62% | 0.92% | 0.57% | 2.90% |
1. Comparison pair in the format of aligner 1 vs. aligner 2.
2. Total number of reads aligned by either of these two aligners in a comparison pair.
Agreement among aligners in S2 trimmed data
| Randomly report one alignment per read | SOAP2 vs. Bowtie1 (3,890,070)2 | 94.94% | 4.84% | 0.20% | 0.02% |
| SOAP2 vs.BWA (3,900,529) | 94.69% | 4.82% | 0.20% | 0.29% | |
| Bowtie vs. BWA (3,892,602) | 94.77% | 4.96% | 0.0002% | 0.27% | |
| SOAP2 vs. Novoalign (3,909,055) | 93.84% | 5.32% | 0.34% | 0.50% | |
| Bowtie vs. Novoalign (3,901,709) | 94.50% | 5.30% | 0.15% | 0.50% | |
| BWA vs. Novoalign (3,908,656) | 93.96% | 5.30% | 0.33% | 0.41% | |
| Suppress reads with multiple alignments | SOAP2 vs. Bowtie (3,611,489) | 96.20% | 0.0002% | 3.79% | 0.02% |
| | SOAP2 vs.BWA (3,636,423) | 96.42% | 0.0007% | 2.87% | 0.70% |
| | Bowtie vs. BWA (3,531,986) | 98.38% | 0.00% | 0.0007% | 1.62% |
| | SOAP2 vs. Novoalign (3,638,616) | 98.07% | 0.54% | 0.32% | 0.76% |
| | Bowtie vs. Novoalign(3,631,179) | 95.06% | 0.52% | 0.11% | 4.31% |
| BWA vs. Novoalign (3,652,782) | 97.99% | 0.54% | 0.70% | 3.31% |
1. Comparison pair in the format of aligner 1 vs. aligner 2.
2. Total number of reads aligned by either of these two aligners in a comparison pair.
Percentage of aligned reads and the false alignment rate for 3000 exon simulation data
| Mismatch | Settings | SOAP2 | Bowtie | BWA | Novoalign | |
| 1 | Randomly report one alignment | aligned (%) | 100 | 100 | 100 | 100 |
| False alignments (%) | 0.76 | 0.77 | 0.76 | 4.83 | ||
| Suppress reads w/multiple alignments | aligned (%) | 98.69 | 98.65 | 98.68 | 98.69 | |
| False alignments (%) | 0 | 0 | 0 | 4.13 | ||
| 2 | Randomly report one alignment | aligned (%) | 100 | 100 | 100 | 100 |
| False alignments (%) | 0.78 | 0.78 | 0.76 | 8.95 | ||
| Suppress reads w/multiple alignments | aligned (%) | 98.69 | 98.68 | 98.68 | 98.67 | |
| | | False alignments (%) | 0 | 0 | 0 | 8.26 |
| Mismatch | Settings | | SOAP2 | Bowtie | BWA | Novoalign |
| 1 | Randomly report one alignment | aligned (%) | 100 | 100 | 100 | 100 |
| False alignments (%) | 0.77 | 0.75 | 0.76 | 3.10 | ||
| Suppress reads w/multiple alignments | aligned (%) | 98.69 | 98.65 | 98.68 | 98.69 | |
| False alignments (%) | 0 | 0 | 0 | 4.13 | ||
| 2 | Randomly report one alignment | aligned (%) | 100 | 100 | 100 | 100 |
| False alignments (%) | 0.77 | 0.81 | 0.76 | 5.49 | ||
| Suppress reads w/multiple alignments | aligned (%) | 98.69 | 98.68 | 98.68 | 98.67 | |
| False alignments (%) | 0.02 | 0 | 0 | 4.78 | ||
Percentage of aligned reads and the false alignment rate for 218 CpG island simulation data
| Mismatch | Settings | SOAP2 | Bowtie | BWA | Novoalign | |
| 1 | Randomly report one alignment | aligned (%) | 100 | 100 | 100 | 100 |
| False alignments (%) | 13.80 | 13.84 | 13.80 | 17.25 | ||
| Suppress reads w/multiple alignments | aligned (%) | 84.26 | 84.26 | 84.26 | 84.34 | |
| False alignments (%) | 0 | 0 | 0.01 | 4.09 | ||
| 2 | Randomly report one alignment | aligned (%) | 100 | 100 | 100 | 100 |
| False alignments (%) | 13.90 | 13.98 | 13.91 | 20.77 | ||
| Suppress reads w/multiple alignments | aligned (%) | 84.39 | 84.22 | 84.39 | 84.23 | |
| | | False alignments (%) | 0.21 | 0 | 0.02 | 8.20 |
| Mismatch | Settings | SOAP2 | Bowtie | BWA | Novoalign | |
| 1 | Randomly report one alignment | aligned (%) | 100 | 100 | 100 | 100 |
| False alignments (%) | 13.79 | 13.83 | 13.80 | 15.93 | ||
| Suppress reads w/multiple alignments | aligned (%) | 84.26 | 84.26 | 84.26 | 84.34 | |
| False alignments (%) | 0 | 0 | 0.001 | 2.42 | ||
| 2 | Randomly report one alignment | aligned (%) | 100 | 100 | 100 | 100 |
| False alignments (%) | 13.82 | 13.86 | 13.91 | 17.79 | ||
| Suppress reads w/multiple alignments | aligned (%) | 84.39 | 84.22 | 84.39 | 84.23 | |
| False alignments (%) | 0.21 | 0 | 0.02 | 4.86 | ||
Figure 3Mapping quality scores reported in Novoalign and BWA. Alignment is performed on both the untrimmed S1 and S2 data sets, with one alignment randomly reported for each read.