| Literature DB >> 34487327 |
Zeyu Xia1, Yingbo Cui2, Ang Zhang1, Tao Tang1, Lin Peng1, Chun Huang1, Canqun Yang1, Xiangke Liao1.
Abstract
The rapid advances in sequencing technology have led to an explosion of sequence data. Sequence alignment is the central and fundamental problem in many sequence analysis procedure, while local alignment is often the kernel of these algorithms. Usually, Smith-Waterman algorithm is used to find the best subsequence match between given sequences. However, the high time complexity makes the algorithm time-consuming. A lot of approaches have been developed to accelerate and parallelize it, such as vector-level parallelization, thread-level parallelization, process-level parallelization, and heterogeneous acceleration, but the current researches seem unsystematic, which hinders the further research of parallelizing the algorithm. In this paper, we summarize the current research status of parallel local alignments and describe the data layout in these work. Based on the research status, we emphasize large-scale genomic comparisons. By surveying some typical alignment tools' performance, we discuss some possible directions in the future. We hope our work will provide the developers of the alignment tool with technical principle support, and help researchers choose proper alignment tools.Entities:
Keywords: Inter-sequence alignment; Intra-sequence alignment; Smith–Waterman algorithm; Vector-level parallelization
Mesh:
Year: 2021 PMID: 34487327 PMCID: PMC8419822 DOI: 10.1007/s12539-021-00473-0
Source DB: PubMed Journal: Interdiscip Sci ISSN: 1867-1462 Impact factor: 3.492
Fig. 1Comparison between scalar operation and vector operation
Fig. 2Distributed and shared memory system model
Fig. 3Data dependencies in the alignment matrix
Fig. 4Three intra-sequence alignment approaches
Fig. 5Sequential layout and striped layout
Fig. 6Data dependencies of matrix H and E in striped layout
Fig. 7Data dependencies of the first and last H vectors between the adjacent columns
Fig. 8Data dependencies of the F vectors on each column
Fig. 9Inter-sequence alignment
Fig. 10Blocks of target sequence computed simultaneously
Some typical alignment tools
| Tool name | Time | Architecture | Methods | Hardware | Speed (GCUPS) |
|---|---|---|---|---|---|
| STRIPED | 2006 | CPU | Striped | Dual Intel Xeon X5650 CPU @ 2.67 GHz | 14.7 |
| SWIPE | 2011 | CPU | Many-to-one | Dual Intel Xeon X5650 CPU @ 2.67 GHz | 106.2 |
| Intel Xeon E5-2695 v3 @2.3GHz | 220.0 | ||||
| SeqAn | 2018 | CPU | Many-to-many | Dual Intel Xeon Gold 6148 CPU @2.4GHz | 194.1 |
| SWAPHI-LS | 2014 | Xeon Phi | Anti-diagonal | Xeon Phi 5110P @1.05GHz | 29.2 |
| XSW | 2014 | Xeon Phi | Many-to-one | Xeon Phi 3120P @1.1GHz | 50.0 |
| CUDASW++ 3.0 | 2013 | CPU + GPU | Many-to-one | Xeon E5-2670 @2.6GHz + Tesla K20c | 298.8 |
| Xeon E5-2695 v3 @2.3GHz + Tesla K20c | 206.2 | ||||
| OSWALD | 2015 | CPU + FPGA | Many-to-one | Xeon E5-2670 @2.6GHz + Altera Stratix V | 178.9 |
| Xeon E5-2695 v3 @2.3GHz + Altera Stratix V | 401.1 | ||||
| SWIMM | 2015 | CPU + Xeon Phi | Many-to-one | Xeon E5-2670 @2.6GHz | 127.5 |
| Xeon E5-2695 v3 @2.3GHz | 354.8 | ||||
| Xeon E5-2670 @2.6GHz + Xeon Phi 3120P @1.1GHz | 165.5 | ||||
| Xeon E5-2695 v3 @2.3GHz + Xeon Phi 3120P @1.1GHz | 450.5 |