| Literature DB >> 28646874 |
Xi Chen1, Chen Wang1, Shanjiang Tang1, Ce Yu2, Quan Zou1.
Abstract
BACKGROUND: The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users' sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously.Entities:
Keywords: Center star alignment; GPU; Heterogeneous; Multiple sequence alignment (MSA)
Mesh:
Substances:
Year: 2017 PMID: 28646874 PMCID: PMC5483318 DOI: 10.1186/s12859-017-1725-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The time complexity of the center star strategy
| Step | Naive center star | HAlign | CMSA |
|---|---|---|---|
| 1 |
|
|
|
| 2 |
|
|
|
| 3 |
|
|
|
Fig. 1The heterogeneous CPU/GPU architecture. To achieving the best performance, the co-run model of CPU and GPU is adopted
Fig. 2The overall flow of CMSA. Multiple sequence alignment is handled on the heterogeneous CPU/GPU platform
Example of a segment. Convert the RNA/DNA segments to the decimal numbers
| Segment | ATCGCGAT |
|---|---|
| Binary number | 0001111011100001 |
| Decimal number | 7905 |
GPU hardware specifications
| Tesla K40 | |
|---|---|
| CUDA Driver Version / Runtime Version | 8.0 / 8.0 |
| CUDA compute capability | 3.5 |
| CUDA cores | 2880 |
| GPU clock rate (MHz) | 745 |
| Total amount of global memory (GB) | 12 |
| Memory bandwidth (GB/s) | 288 |
| Shared memory size per block (bytes) | 49152 |
| Registers available per block | 65536 |
Experimental datasets
| Dataset | Average length | Num | File size |
|---|---|---|---|
| MT | 16569 | 672 | 11 MB |
| D1 | 252 | 500000 | 183.8 MB |
| D2 | 490 | 500000 | 290.6 MB |
| D3 | 748 | 500000 | 414.3 MB |
| 16s rRNA | 1388 | 1011621 | 1.4 GB |
The running time and SP score of single core HAlign and CMSA(CPU) based on different center sequence selection algorithms
| Dataset | Num | Center sequence | Running time(s) | Average SP score | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HAlign | CMSA | HAlign | CMSA | HAlign | CMSA | ||||||||
| Step1 | Step2 | Step3 | Overall | Step1 | Step2 | Step3 | Overall | ||||||
| MT | 672 | 16 | 479 | 88.19 | 33.40 | 21.11 | 142.70 | 0.80 | 43.40 | 0.50 | 44.70 | 0.977 | 0.987 |
| 1000 | 912 | 575 | 2.22 | 0.42 | 0.24 | 2.88 | 0.05 | 0.40 | 0.03 | 0.48 | 0.549 | 0.570 | |
| D1 | 3000 | 912 | 575 | 23.17 | 2.75 | 0.77 | 26.69 | 0.13 | 1.20 | 0.10 | 1.43 | 0.550 | 0.588 |
| 5000 | 3477 | 2266 | 67.95 | 2.10 | 1.28 | 71.33 | 0.16 | 2.15 | 0.23 | 2.54 | 0.492 | 0.523 | |
| 1000 | 158 | 158 | 6.93 | 0.52 | 0.99 | 8.44 | 0.07 | 1.40 | 0.10 | 1.57 | 0.508 | 0.548 | |
| D2 | 3000 | 181 | 1447 | 70.64 | 5.07 | 1.20 | 76.91 | 0.18 | 4.90 | 0.25 | 5.33 | 0.484 | 0.500 |
| 5000 | 3533 | 4677 | 200.38 | 10.31 | 7.60 | 218.29 | 0.25 | 0.96 | 0.42 | 1.63 | 0.455 | 0.510 | |
| 1000 | 697 | 697 | 13.50 | 2.19 | 1.85 | 17.54s | 0.12 | 4.22 | 0.17 | 4.15 | 0.513 | 0.540 | |
| D3 | 3000 | 2170 | 3217 | 125.06 | 2.19 | 1.85 | 129.10 | 0.24 | 13.44 | 0.34 | 14.02 | 0.527 | 0.528 |
| 5000 | 2420 | 2992 | 351.49 | 8.40 | 9.75 | 369.64 | 0.37 | 22.13 | 0.60 | 23.10 | 0.518 | 0.523 | |
Fig. 3Experiments on datasets with different number of sequences. D1, D2, D3 represent three kinds of datasets described in Table 4. a Running time and b Speedup
Workload radio for GPU and CPU
| Dataset | Number | Workload radio |
|---|---|---|
| D1 | 100000 | 1.382 |
| 200000 | 1.432 | |
| 300000 | 1.435 | |
| 400000 | 1.426 | |
| 500000 | 1.423 |
Running time of different MSA tools with different number of sequences and average length
| Dataset | Number | Kalign | MAFFT | HAlign(one node) | CMSA(CPU) | CMSA(CPU/GPU) |
|---|---|---|---|---|---|---|
| D1 | 10000 | 20m39s | 2m26s | 39.99s | 6.66s | 7.71s |
| 100000 | - | 18h30m | 5m59s | 41.61s | 44.32s | |
| 500000 | - | - | 33m17s | 3m15s | 3m20s | |
| D2 | 10000 | 52m28s | 4m40s | 1m36s | 16.21s | 17.19s |
| 100000 | - | - | 22m11s | 2m14s | 2m36s | |
| 500000 | - | - | 2h15m | 11m6s | 12m1s | |
| D3 | 10000 | 79m23s | 8m59s | 10m9s | 45.01s | 44.11s |
| 100000 | - | - | 15m27s | 6m16s | 6m21s | |
| 500000 | - | - | 11h2m | 30m28s | 27m58s |
Average SP scores of different MSA tools with different number of sequences and average length
| Dataset | Number | Kalign | MAFFT | HAlign(one node) | CMSA(CPU) | CMSA(CPU/GPU) |
|---|---|---|---|---|---|---|
| D1 | 10000 | 0.570 | 0.560 | 0.340 | 0.467 | 0.428 |
| 100000 | - | 0.561 | 0.340 | 0.478 | 0.431 | |
| 500000 | - | - | 0.372 | 0.473 | 0.423 | |
| D2 | 10000 | 0.458 | 0.472 | 0.329 | 0.467 | 0.447 |
| 100000 | - | - | 0.380 | 0.474 | 0.454 | |
| 500000 | - | - | 0.327 | 0.480 | 0.449 | |
| D3 | 10000 | 0.480 | 0.479 | 0.401 | 0.474 | 0.414 |
| 100000 | - | - | 0.376 | 0.477 | 0.437 | |
| 500000 | - | - | - | 0.472 | 0.441 |