| Literature DB >> 23653898 |
Sheng-Ta Lee1, Chun-Yuan Lin, Che Lun Hung.
Abstract
As the conventional means of analyzing the similarity between a query sequence and database sequences, the Smith-Waterman algorithm is feasible for a database search owing to its high sensitivity. However, this algorithm is still quite time consuming. CUDA programming can improve computations efficiently by using the computational power of massive computing hardware as graphics processing units (GPUs). This work presents a novel Smith-Waterman algorithm with a frequency-based filtration method on GPUs rather than merely accelerating the comparisons yet expending computational resources to handle such unnecessary comparisons. A user friendly interface is also designed for potential cloud server applications with GPUs. Additionally, two data sets, H1N1 protein sequences (query sequence set) and human protein database (database set), are selected, followed by a comparison of CUDA-SW and CUDA-SW with the filtration method, referred to herein as CUDA-SWf. Experimental results indicate that reducing unnecessary sequence alignments can improve the computational time by up to 41%. Importantly, by using CUDA-SWf as a cloud service, this application can be accessed from any computing environment of a device with an Internet connection without time constraints.Entities:
Mesh:
Year: 2013 PMID: 23653898 PMCID: PMC3638642 DOI: 10.1155/2013/721738
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Smith-Waterman method.
Figure 2Sorting of the selected sequence to assemble the sequences of a similar length for an improved load balance.
Figure 3Memory patterns of sequences in the global memory.
Figure 4Flowchart of CUDA-SWf.
Overall computation time of CPU version of SW algorithm, CUDA-SW, and CUDA-SWf with MP (10%).
| H1N1 virus query sequence length (bp) | CPU version of SW | CUDA-SW | CUDA-SWf | Improved ratio |
|---|---|---|---|---|
| 100 | 49.91 | 6.79 | 6.68 | 1.62% |
| 200 | 97.84 | 7.04 | 6.25 | 11.22% |
| 300 | 145.6 | 7.27 | 5.71 | 21.46% |
| 400 | 193.62 | 7.52 | 5.02 | 33.24% |
| 500 | 243.56 | 7.77 | 4.71 | 39.38% |
| 600 | 293.43 | 8.02 | 4.52 | 43.64% |
| 700 | 343.31 | 8.29 | 4.48 | 45.96% |
Overall computation time of CUDA-SWf with query sequence length (700).
| MP | Number of selected database sequences | Differences | Differences | CUDA-SWf |
|---|---|---|---|---|
| 100% | 32,133 | 3,542 | 1,169 | 8.27 |
| 50% | 17,913 | 3,542 | 1,169 | 5.77 |
| 30% | 8,578 | 3,536 | 1,169 | 4.63 |
| 10% | 7,007 | 3,525 | 1,169 | 4.4 |
Figure 5Speedup ratio of CUDA-SW, and CUDA-SWf with MP (10%).
Figure 6Workbench of CUDA-SWf.
Figure 7Result window of CUDA-SWf.