| Literature DB >> 20370891 |
Yongchao Liu1, Bertil Schmidt, Douglas L Maskell.
Abstract
BACKGROUND: Due to its high sensitivity, the Smith-Waterman algorithm is widely used for biological database searches. Unfortunately, the quadratic time complexity of this algorithm makes it highly time-consuming. The exponential growth of biological databases further deteriorates the situation. To accelerate this algorithm, many efforts have been made to develop techniques in high performance architectures, especially the recently emerging many-core architectures and their associated programming models.Entities:
Year: 2010 PMID: 20370891 PMCID: PMC2907862 DOI: 10.1186/1756-0500-3-93
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Pseudocode of the CUDA kernel for the optimized SIMT algorithm.
Figure 2Alignment matrix of the partitioned vectorized algorithm and data dependencies for H and F vectors.
Figure 3Pseudocode of CUDA kernel of the partitioned vectorized Smith-Waterman algorithm.
Performance evaluation of the optimized SIMT and partitioned vectorized algorithms on GTX 280
| Query Sequences | Partitioned | SIMT | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 10-2 k | 20-2 k | 40-3 k | 10-2 k | ||||||
| P02232 | 144 | 1.58 | 13.3 | 1.41 | 14.9 | 1.40 | 15.0 | 1.38 | 15.2 |
| P05013 | 189 | 1.80 | 15.4 | 1.66 | 16.7 | 1.65 | 16.8 | 1.75 | 15.8 |
| P14942 | 222 | 2.01 | 16.1 | 1.84 | 17.6 | 1.82 | 17.8 | 2.00 | 16.2 |
| P07327 | 375 | 3.97 | 13.8 | 3.64 | 15.1 | 3.51 | 15.6 | 3.35 | 16.4 |
| P01008 | 464 | 4.57 | 14.8 | 4.20 | 16.1 | 4.03 | 16.8 | 4.05 | 16.7 |
| P03435 | 567 | 5.87 | 14.1 | 5.38 | 15.4 | 5.28 | 15.7 | 4.94 | 16.4 |
| P42357 | 657 | 6.64 | 14.5 | 6.16 | 15.6 | 5.97 | 16.1 | 5.00 | 16.6 |
| P21177 | 729 | 6.92 | 15.4 | 6.40 | 16.6 | 6.24 | 17.1 | 5.77 | 16.6 |
| Q38941 | 850 | 7.98 | 15.6 | 7.37 | 16.9 | 7.35 | 16.9 | 6.35 | 16.8 |
| P27895 | 1000 | 10.27 | 14.2 | 9.29 | 15.7 | 8.74 | 16.7 | 7.44 | 16.7 |
| P07756 | 1500 | 15.07 | 14.5 | 14.08 | 15.6 | 13.43 | 16.3 | 8.64 | 16.9 |
| P04775 | 2005 | 19.30 | 15.2 | 18.05 | 16.2 | 17.36 | 16.9 | 13.04 | 16.8 |
| P19096 | 2504 | 22.89 | 16.0 | 21.49 | 17.0 | 21.19 | 17.3 | 17.50 | 16.7 |
| P28167 | 3005 | 28.54 | 15.4 | 26.08 | 16.8 | 25.53 | 17.2 | 21.89 | 16.7 |
| P0C6B8 | 3564 | 32.44 | 16.1 | 30.56 | 17.0 | 29.60 | 17.6 | 26.41 | 16.6 |
| P20930 | 4061 | 40.47 | 14.7 | 36.07 | 16.5 | 34.31 | 17.3 | 31.35 | 16.6 |
| P08519 | 4548 | 42.41 | 15.7 | 39.89 | 16.7 | 38.86 | 17.1 | 35.84 | 16.6 |
| Q7TMA5 | 4743 | 42.44 | 16.3 | 39.36 | 17.6 | 39.30 | 17.6 | 40.18 | 16.5 |
| P33450 | 5147 | 50.91 | 14.8 | 47.74 | 15.8 | 44.20 | 17.0 | 41.92 | 16.5 |
| Q9UKN1 | 5478 | 55.46 | 14.4 | 49.49 | 16.2 | 46.66 | 17.2 | 45.62 | 16.5 |
Performance evaluation of the optimized SIMT and partitioned vectorized algorithms on GTX 295
| Query Sequences | Partitioned | SIMT | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 10-2 k | 20-2 k | 40-3 k | 10-2 k | ||||||
| P02232 | 144 | 1.19 | 17.7 | 1.13 | 18.7 | 1.09 | 19.4 | 1.02 | 20.7 |
| P05013 | 189 | 1.34 | 20.7 | 1.30 | 21.4 | 1.26 | 22.1 | 1.25 | 22.3 |
| P14942 | 222 | 1.49 | 22.0 | 1.41 | 23.1 | 1.38 | 23.7 | 1.37 | 23.8 |
| P07327 | 375 | 2.77 | 19.9 | 2.58 | 21.4 | 2.42 | 22.8 | 2.15 | 25.7 |
| P01008 | 464 | 3.04 | 22.4 | 2.82 | 24.2 | 2.66 | 25.6 | 2.54 | 26.8 |
| P03435 | 567 | 3.93 | 21.2 | 3.61 | 23.1 | 3.49 | 23.9 | 3.11 | 26.8 |
| P42357 | 657 | 4.29 | 22.5 | 4.02 | 24.0 | 3.87 | 25.0 | 3.56 | 27.1 |
| P21177 | 729 | 4.53 | 23.7 | 4.22 | 25.4 | 4.04 | 26.5 | 3.90 | 27.5 |
| Q38941 | 850 | 5.03 | 24.9 | 4.66 | 26.8 | 4.63 | 27.0 | 4.53 | 27.6 |
| P27895 | 1000 | 6.58 | 22.3 | 5.87 | 25.1 | 5.38 | 27.3 | 5.21 | 28.2 |
| P07756 | 1500 | 9.86 | 22.4 | 9.19 | 24.0 | 8.58 | 25.7 | 7.72 | 28.6 |
| P04775 | 2005 | 12.26 | 24.1 | 11.32 | 26.0 | 10.79 | 27.3 | 10.26 | 28.7 |
| P19096 | 2504 | 14.32 | 25.7 | 13.34 | 27.6 | 12.99 | 28.4 | 12.79 | 28.8 |
| P28167 | 3005 | 18.31 | 24.1 | 16.46 | 26.9 | 15.56 | 28.4 | 15.33 | 28.8 |
| P0C6B8 | 3564 | 21.09 | 24.9 | 19.34 | 27.1 | 17.99 | 29.1 | 18.20 | 28.8 |
| P20930 | 4061 | 26.75 | 22.3 | 23.35 | 25.6 | 20.76 | 28.8 | 20.77 | 28.8 |
| P08519 | 4548 | 27.36 | 24.4 | 25.11 | 26.6 | 23.92 | 28.0 | 23.24 | 28.8 |
| Q7TMA5 | 4743 | 25.86 | 27.0 | 23.57 | 29.6 | 23.51 | 29.7 | 24.24 | 28.8 |
| P33450 | 5147 | 32.69 | 23.2 | 30.57 | 24.8 | 27.37 | 27.7 | 26.33 | 28.7 |
| Q9UKN1 | 5478 | 36.61 | 22.0 | 32.40 | 24.9 | 28.88 | 27.9 | 28.05 | 28.7 |
Figure 4Performance comparison between CUDASW++ 1.0 and CUDASW++ 2.0 on GTX 280.
Figure 5Performance comparison between CUDASW++ 1.0 and CUDASW++ 2.0 on GTX 295.
Performance comparison between CUDASW++ 1.0, CUDASW++ 2.0 and NCBI-BLAST
| Software | Performance | |
|---|---|---|
| Time(h) | GCUPS | |
| Optimized SIMT (BL62, 10-2 k) | 8.00 | 28.8 |
| Partitioned (BL62, 10-2 k) | 11.15 | 20.7 |
| Partitioned (BL50, 10-3 k) | 11.71 | 19.7 |
| NCBI-BLAST(BL62, 10-2 k) | 9.56 | 24.1 |
| NCBI-BLAST(BL50, 10-3 k) | 51.45 | 4.5 |
| CUDASW++ 1.0 (BL62, 10-2 k) | 14.12 | 16.3 |