| Literature DB >> 22662128 |
Pooya Zandevakili1, Ming Hu, Zhaohui Qin.
Abstract
Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/Entities:
Mesh:
Substances:
Year: 2012 PMID: 22662128 PMCID: PMC3360745 DOI: 10.1371/journal.pone.0036865
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1High-level architecture of the NVIDIA GTX 480 graphics card (simplified and excluding the graphics-specific details).
The GPU contains several multiprocessors (MPs) for executing groups of threads, called “thread blocks", assigned to MPs by the global control logic (scheduler). Each MP contains several (2×16 for GTX 480) CUDA cores capable of performing floating-point multiply-and-add and integer, logical and bitwise operations. MPs also contain a fast programmable shared memory, hardware-managed data caches and “special function" units that perform double-precision or more complex floating-point operation (such as reciprocal, square root, sine and cosign). A local scheduler assigns resources (including CUDA cores) to the thread blocks assigned to the MP.
Figure 2Illustration of the “fragmentation" technique we proposed for improving the performance of GPU.
Figure 3The speedup of the GPU-accelerated motif search over the original version.
“w" stands for motif width.
De novo search speedup across various datasets.
| Dataset | Dataset Size (MB) | Original Time (sec.) | GPU-Aided Time (sec.) | Overall Speedup | % time in Motif Scan Core | Theoretical Speedup Limit |
| CTCF | 7.36 | 527.5 | 55.42 | 9.52 | 90.62 | 10.66 |
| ER | 2.48 | 179.36 | 17.05 | 10.52 | 91.80 | 12.20 |
| NRSF | 1.21 | 84.54 | 6.82 | 12.40 | 93.46 | 15.29 |
| STAT1 | 6.77 | 519.26 | 72.82 | 7.13 | 87.13 | 7.77 |