| Literature DB >> 25952019 |
Ruibang Luo, Jeanno Cheung, Edward Wu, Heng Wang, Sze-Hang Chan, Wai-Chun Law, Guangzhu He, Chang Yu, Chi-Man Liu, Dazong Zhou, Yingrui Li, Ruiqiang Li, Jun Wang, Xiaoqian Zhu, Shaoliang Peng, Tak-Wah Lam.
Abstract
BACKGROUND: Short-read aligners have recently gained a lot of speed by exploiting the massive parallelism of GPU. An uprising alterative to GPU is Intel MIC; supercomputers like Tianhe-2, currently top of TOP500, is built with 48,000 MIC boards to offer ~55 PFLOPS. The CPU-like architecture of MIC allows CPU-based software to be parallelized easily; however, the performance is often inferior to GPU counterparts as an MIC card contains only ~60 cores (while a GPU card typically has over a thousand cores).Entities:
Mesh:
Year: 2015 PMID: 25952019 PMCID: PMC4423751 DOI: 10.1186/1471-2105-16-S7-S10
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1MICA's architecture: interaction between the host and MIC cards.
Figure 2Traditional approach to Dynamic Programming.
Figure 3MICA's approach to Dynamic Programming.
Figure 4Illustration of branching mismatches (BM) and non-branching mismatches (NBM).
MICA and SOAP3-dp seeding details.
| MICA: | |||
|---|---|---|---|
| Round 0 | 140 | 0 | 1,000 |
| Round 1 | 80 | 0 | 1,000 |
| Round 2 | 46 | 0 | 100 |
| SOAP3-dp DP module: | |||
| Round 1 | 28 | 20 | 100 |
| Round 2 | 32 | 28 | 1,000 |
"Seed Overlap" means the maximum overlap between two seeds when extracting seeds from a read.
Performance of MICA, SOAP3-dp and BWA-MEM on experimental data.
| Dataset | Volume (Gbp) | # of Read Pairs (M) | Fold | MICA | SOAP3-dp | BWA-MEM | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 card | 2 cards | 2 cards | 3 cards | 3 cards scale-up | Properly paired | 1 card | Properly paired | 6 cores | Properly paired | ||||
| PE150 | 232.15 | 773.83 | 77.38 | 20,919 s | 10,618 s | 1.97x | 7,183 s | 2.91x | 95.48% | 25,878 s | 95.01% | 101,466 s | 92.32% |
| PE100 | 148.43 | 742.16 | 49.48 | 15,879 s | 8,093 s | 1.96x | 5,453 s | 2.91x | 97.23% | 17,982 s | 97.08% | 53,832 s | 95.74% |
Comparison on 13 sets of programs and parameters using 100 bp paired-end simulated reads.
| 6M 100 bp Paired-end reads, 1.2 Gbp bases. 500 bp insert size, 25 bp standard deviation. | MIC | GPU | CPU | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MICA(1 MIC Card, 240 threads) | SOAP3-dp | SOAP3 | Bowtie2 | Bowtie2 | Bowtie2 | BWA1 | SeqAlto | SeqAlto | CUSHAW2 | GEM2 | GEM2 | GEM2 | |||
| Configuration | CPU (thread: core i7-3930k) | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | ||
| GPU (device: GTX680) | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||
| Computational Resources | Total Elapsed | sec. | 162 | 132 | 966 | 1974 | 672 | 1154 | 495 | 379 | 1303 | 416 | 446 | 298 | |
| Fold | 1.23 | 1.00 | 7.32 | 14.95 | 5.09 | 8.74 | 3.75 | 2.87 | 9.87 | 3.15 | 3.38 | 2.26 | |||
| Loading Index3 | sec. | 74 | 74 | 38 | 38 | 38 | 53+1+1 | 96 | 96 | 40 | 40+1 | 40+1 | 40+1 | ||
| Alignment4 | sec. | 88 | 58 | 928 | 1936 | 634 | 370+369+360 | 399 | 283 | 1263 | 199+176 | 238+167 | 90+167 | ||
| Fold | 0.88 | 0.58 | 9.28 | 19.36 | 6.34 | 10.99 | 3.99 | 2.83 | 12.63 | 3.75 | 4.05 | 2.57 | |||
| Avg. Memory | GB | 17.2 | 17.3 | 3.3 | 3.3 | 3.3 | 3.5 | 7 | 6.9 | 3.6 | 4.3 | 4.3 | 4.3 | ||
| Peak Memory | GB | 18.1 | 19.2 | 3.5 | 3.5 | 3.5 | 4.8 | 7.2 | 7.2 | 3.6 | 4.3 | 4.3 | 4.3 | ||
| Alignment Metrics | Aligned | # | 11,999,827 | 11,870,740 | 11,999,763 | 11,999,936 | 11,998,226 | 11,998,804 | 12,000,000 | 11,995,872 | 11,999,975 | 11,999,763 | 11,999,484 | 11,995,422 | |
| Diff. | 17 | -129,070 | -47 | 126 | -1,584 | -1,006 | 190 | -3,938 | 165 | -47 | -326 | -4,388 | |||
| Properly Paired | # | 11,999,460 | 11,742,902 | 11,998,912 | 11,999,344 | 11,996,528 | 11,997,254 | 11,999,976 | 11,995,410 | 11,977,218 | 11,998,994 | 11,997,702 | 11,991,992 | ||
| Diff. | -208 | -256766 | -756 | -324 | -3140 | -2414 | 308 | -4258 | -22450 | -674 | -1966 | -7676 | |||
| Incorrectly Aligned | # | 40,561 | 138,655 | 143,012 | 141,373 | 147,764 | 85,297 | 95,672 | 99,243 | 99,243 | 56,514 | 61,642 | 61,887 | ||
| Diff. | -7,623 | 90,471 | 94,828 | 93,189 | 99,580 | 37,113 | 47,488 | 51,059 | 51,059 | 8,330 | 13,458 | 13,703 | |||
| Sensitivity5 | % | 99.66% | 97.77% | 98.81% | 98.82% | 98.75% | 99.28% | 99.20% | 99.14% | 99.17% | 99.53% | 99.48% | 99.45% | ||
| Diff. | 0.06% | -1.83% | -0.79% | -0.78% | -0.85% | -0.32% | -0.40% | -0.46% | -0.43% | -0.07% | -0.12% | -0.15% | |||
| FDR6 | % | 0.34% | 1.17% | 1.19% | 1.18% | 1.23% | 0.71% | 0.80% | 0.83% | 0.83% | 0.47% | 0.51% | 0.52% | ||
| Diff. | -0.06% | 0.77% | 0.79% | 0.78% | 0.83% | 0.31% | 0.40% | 0.43% | 0.43% | 0.07% | 0.11% | 0.12% | |||
1 The time consumption of BWA is calculated as "align left reads"+"align right reads"+"sampe". The index loading times of "align right reads" and "sampe" modules are 1 second due to the reason that, index files were cached during "align left reads". However, datasets larger than the host memory will flush the cache during alignment.
2 The alignment time consumption of GEM is calculated as "alignment"+"convert to SAM format". The conversion module was run with 4 threads in consistent with the alignment module.
3 SOAP3-dp, SOAP3, SeqAlto and GEM aligners explicitly provide index loading time consumption. The index loading time for Bowtie2, CUSHAW2 and BWA are calculated by the total size of index, divided by 100 MB/s, which is the average network file system speed of the testing environment. The index loading time maybe underestimated while the time processing the index was not calculated.
4 The alignment times were explicitly provided by the aligners (include results processing and input/output time) or calculated by total elapsed time minus estimated index loading time.
5 Sensitivity is calculated as "Correctly aligned reads"/"All simulated reads". The higher the better.
6 FDR is calculated as "Incorrectly aligned reads"/"All aligned reads". The lower the better
Results of experiments on Tianhe-2 using 5 different settings.
| Setting | MIC card used | Output Format | Finished lane | Longest (sec) | Mean (sec) | Median (sec) |
|---|---|---|---|---|---|---|
| 1 | 3 | SAM | 923 | 3,425 | 1,303 | 1,220 |
| 2 | 3 | BAM | 929 | 6,012 | 2,777 | 2,666 |
| 3 | 3 | 6-thread BAM | 928 | 4,484 | 1,371 | 1,321 |
| 4 | 2 | 6-thread BAM | 931 | 4,269 | 1,508 | 1,475 |
| 5 | 1 | 6-thread BAM | 930 | 6,915 | 2,943 | 2,879 |