| Literature DB >> 28008220 |
Germán Retamosa1, Luis de Pedro1, Ivan González1, Javier Tamames2.
Abstract
Homology detection has evolved over the time from heavy algorithms based on dynamic programming approaches to lightweight alternatives based on different heuristic models. However, the main problem with these algorithms is that they use complex statistical models, which makes it difficult to achieve a relevant speedup and find exact matches with the original results. Thus, their acceleration is essential. The aim of this article was to prefilter a sequence database. To make this work, we have implemented a groundbreaking heuristic model based on NVIDIA's graphics processing units (GPUs) and multicore processors. Depending on the sensitivity settings, this makes it possible to quickly reduce the sequence database by factors between 50% and 95%, while rejecting no significant sequences. Furthermore, this prefiltering application can be used together with multiple homology detection algorithms as a part of a next-generation sequencing system. Extensive performance and accuracy tests have been carried out in the Spanish National Centre for Biotechnology (NCB). The results show that GPU hardware can accelerate the execution times of former homology detection applications, such as National Centre for Biotechnology Information (NCBI), Basic Local Alignment Search Tool for Proteins (BLASTP), up to a factor of 4.Entities:
Keywords: NCBI BLAST; NVIDIA CUDA; computational biology; next-generation sequencing; parallel programming; performance analysis
Year: 2016 PMID: 28008220 PMCID: PMC5170890 DOI: 10.4137/EBO.S40877
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Figure 1NVIDIA GPU general architecture.
Figure 2CUDA programming model.
Figure 3Hashing database sequence architecture.
Figure 4GPU filtering model.
Figure 5CUDA reduction model.
Figure 6Processing policy comparison.
Performance comparison.
| SUBJECT DATABASE | QUERY SEQUENCE | QUERY SEQUENCE LENGTH | QUERY FAMILY | LIKELIHOOD FILTER THRESHOLD | BLAST TIME | BLAST TIME + PREFILTER TIME | SPEEDUP |
|---|---|---|---|---|---|---|---|
| nr | BAK61626.1 | 3161 | PolyProtein Virus | 95% | 779 seconds | 243 seconds | 3.21 |
| nr | BAK61626.1 | 3161 | PolyProtein Virus | 60% | 779 seconds | 325 seconds | 2.40 |
| nr | ABD34305.1 | 743 | PolyProtein Virus | 95% | 548 seconds | 156 seconds | 3.51 |
| nr | ABD34305.1 | 743 | PolyProtein Virus | 60% | 548 seconds | 256 seconds | 2.14 |
| nr | AAA45466.1 | 2225 | PolyProtein Virus | 95% | 700 seconds | 199 seconds | 3.52 |
| nr | AAA45466.1 | 2225 | PolyProtein Virus | 60% | 700 seconds | 311 seconds | 2.25 |
| nr | AHW02111.1 | 2435 | Proteobacteria | 95% | 718 seconds | 175 seconds | 4.10 |
| nr | AHW02111.1 | 2435 | Proteobacteria | 60% | 718 seconds | 311 seconds | 2.31 |
| nr | AAD11553.1 | 542 | Proteobacteria | 95% | 522 seconds | 150 seconds | 3.48 |
| nr | AAD11553.1 | 542 | Proteobacteria | 60% | 522 seconds | 188 seconds | 2.78 |
| nr | AAO08121.1 | 1976 | Proteobacteria | 95% | 696 seconds | 173 seconds | 4.02 |
| nr | AAO08121.1 | 1976 | Proteobacteria | 60% | 696 seconds | 307 seconds | 2.27 |
Filtering comparison.
| SUBJECT DATABASE | QUERY SEQUENCE | QUERY FAMILY | LIKELIHOOD FILTER THRESHOLD | FILTERING PERCENTAGE | SPEEDUP |
|---|---|---|---|---|---|
| nr | BAK61626.1 | PolyProtein Virus | 95% | 91.30% | 3.21 |
| nr | BAK61626.1 | PolyProtein Virus | 60% | 83.79% | 2.40 |
| nr | ABD34305.1 | PolyProtein Virus | 95% | 98.20% | 3.51 |
| nr | ABD34305.1 | PolyProtein Virus | 60% | 87.17% | 2.14 |
| nr | AAA45466.1 | PolyProtein Virus | 95% | 95.10% | 3.52 |
| nr | AAA45466.1 | PolyProtein Virus | 60% | 83.83% | 2.25 |
| nr | AHW02111.1 | Proteobacteria | 95% | 97.20% | 4.10 |
| nr | AHW02111.1 | Proteobacteria | 60% | 83.98% | 2.31 |
| nr | AAD11553.1 | Proteobacteria | 95% | 98.64% | 3.48 |
| nr | AAD11553.1 | Proteobacteria | 60% | 51.39% | 2.78 |
| nr | AAO08121.1 | Proteobacteria | 95% | 97.35% | 4.02 |
| nr | AAO08121.1 | Proteobacteria | 60% | 84.04% | 2.27 |