| Literature DB >> 29334891 |
Marek Palkowski1, Wlodzimierz Bielecki2.
Abstract
BACKGROUND: RNA folding is an ongoing compute-intensive task of bioinformatics. Parallelization and improving code locality for this kind of algorithms is one of the most relevant areas in computational biology. Fortunately, RNA secondary structure approaches, such as Nussinov's recurrence, involve mathematical operations over affine control loops whose iteration space can be represented by the polyhedral model. This allows us to apply powerful polyhedral compilation techniques based on the transitive closure of dependence graphs to generate parallel tiled code implementing Nussinov's RNA folding. Such techniques are within the iteration space slicing framework - the transitive dependences are applied to the statement instances of interest to produce valid tiles. The main problem at generating parallel tiled code is defining a proper tile size and tile dimension which impact parallelism degree and code locality.Entities:
Keywords: Computational biology; Nussinov’s algorithm; Parallel computing; Parametric loop tiling; RNA folding; Tile size selection
Mesh:
Substances:
Year: 2018 PMID: 29334891 PMCID: PMC5769393 DOI: 10.1186/s12859-018-2008-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Finding integer coefficients a, i=0…4, of the model y=a0∗b0 + a1∗b2 + a1∗b1 + a2∗b2 + a3∗b3 + a4, where b0= b1* b2
| y1 | y2 |
|
|
|
|
| Formula | Lines, |
|---|---|---|---|---|---|---|---|---|
| [see Additional file | ||||||||
| 70 | 116 | 0 | 1 | 1 | 0 | 0 | b1+b2 | 6, 10, 56 |
| 93 | 153 | 0 | 2 | 1 | 0 | 0 | 2*b1+b2 | 6 |
| 1081 | 2923 | 1 | 0 | 0 | 0 | 0 | b1*b2 | 6 |
| 22 | 36 | 0 | 1 | 0 | 0 | -1 | b1-1 | 10 |
| 23 | 37 | 0 | 1 | 0 | 0 | 0 | b1 | 10, 15, 43, 46, 56, 62 |
| 21 | 35 | 0 | 0 | 1 | 0 | -2 | b1-2 | 10 |
| 113 | 167 | 0 | 0 | 0 | 1 | 0 | b3 | 40, 49, 59 |
| 112 | 166 | 0 | 0 | 0 | 1 | -1 | b3-1 | 49, 59 |
| 46 | 78 | 0 | 0 | 1 | 0 | -1 | b2-1 | 56 |
| 47 | 79 | 0 | 0 | 1 | 0 | 0 | b2 | 10, 15, 21, 30, 34, |
| 40, 43, 49, 52, 59 |
Execution time (in seconds) of serial ISS based tiled code for some tile sizes, N =2200
| No. | b1 | b2 | b3 | Time |
|---|---|---|---|---|
| 1 | 1 | 128 | 16 | 3.2760 |
| 2 | 1 | 128 | 8 | 3.2910 |
| 3 | 1 | 150 | 8 | 3.3264 |
| 4 | 1 | 128 | 12 | 3.3506 |
| 5 | 1 | 96 | 16 | 3.3602 |
| 6 | 1 | 128 | 24 | 3.3913 |
| 7 | 1 | 150 | 12 | 3.4042 |
| 8 | 1 | 128 | 6 | 3.4247 |
| 9 | 1 | 96 | 8 | 3.4357 |
| 10 | 1 | 200 | 16 | 3.4645 |
| ... | ... | ... | ... | ... |
| 467 | 2 | 150 | 16 | 6.6576 |
| ... | ... | ... | ... | ... |
|
| 12.2802 | |||
| 7985 | 400 | 1 | 1 | 12.2872 |
| 7986 | 400 | 48 | 1 | 12.3162 |
| ... | ... | ... | ... | ... |
| 8000 | 1 | 1 | 1 | 24.8607 |
Fig. 1Execution time (in seconds) of serial ISS based tiled code, N = 2200, run on Intel Xeon E5-2699 v3. Results show that the maximal performance of serial ISS based tiled code is achieved when the outermost loop remains untiled (b1=1)
Execution time (in seconds) of parallel ISS based tiled code for some tile sizes, N = 5000, 32 threads used
| No. | b1 | b2 | b3 | Time |
|---|---|---|---|---|
| 1 | 1 | 96 | 8 | 7.8751 |
| 2 | 1 | 150 | 12 | 8.0246 |
| 3 | 1 | 96 | 12 | 8.1903 |
| 4 | 1 | 128 | 12 | 8.1952 |
| 5 | 1 | 128 | 16 | 8.2199 |
| 6 | 1 | 128 | 6 | 8.2816 |
| 7 | 1 | 150 | 16 | 8.2831 |
| 8 | 1 | 50 | 8 | 8.3449 |
| 9 | 1 | 128 | 8 | 8.3841 |
| 10 | 1 | 96 | 6 | 8.4597 |
| ... | ... | ... | ... | ... |
| 143 | 2 | 6 | 300 | 10.4351 |
| ... | ... | ... | ... | ... |
|
| 334.3200 | |||
| 7993 | 2 | 1 | 2 | 335.3765 |
| 7994 | 1 | 2 | 2 | 411.5945 |
| ... | ... | ... | ... | ... |
| 8000 | 1 | 2 | 1 | 889.6510 |
Execution time (in seconds) of the Nussinov RNA folding codes for N = 5000 and different numbers of threads used
| Threads | Original | Chang | Li | PluTo [ 8×8×1] | ISS [ 2×6×300] | ISS [ 1×96×8] |
|---|---|---|---|---|---|---|
| 1 | 334.32 | 382.46 | 81.54 | 238.66 | 221.11 | 54.66 |
| 2 | 198.12 | 37.96 | 164.66 | 120.90 | 30.83 | |
| 4 | 100.17 | 20.19 | 90.16 | 67.74 | 18.29 | |
| 8 | 53.07 | 13.62 | 46.01 | 35.29 | 10.67 | |
| 16 | 28.74 | 10.50 | 25.84 | 19.06 | 8.22 | |
| 32 | 16.55 | 9.75 | 13.94 | 10.65 | 7.82 |
Fig. 2Speed-up of parallel codes for Nussinov’s matrix size of 5000 run on Intel Xeon E5-2699 v3. The horizontal coordinate represents the number of threads, the vertical one shows the speed-up of the examined codes
Execution time (in seconds) of the Nussinov RNA folding codes for 32 threads and different lengths of RNA strands. mRNAs acquired from the NCBI database
| mRNA definition | Lenght | Serial time | Chang | Li | PluTo [ 8×8×1] | ISS [ 2×6×300] | ISS [ 1×96×8] |
|---|---|---|---|---|---|---|---|
| MAPK1, trans. var. 1 | 5916 | 606,32 | 31,05 | 14,95 | 25,94 | 19,52 | 12,92 |
| MAPK1, trans. var 2 | 1514 | 2,30 | 0,25 | 0,21 | 0,16 | 0,14 | 0,32 |
| MAP2K5, trans. var 2 | 2355 | 15,57 | 0,92 | 0,48 | 0,82 | 0,51 | 0,72 |
| MAP2K7, trans. var. 1 | 3525 | 102,32 | 4,46 | 2,74 | 3,51 | 2,63 | 2,53 |
| MAP2K6, trans. var. 2 | 13577 | 6127,09 | 437,60 | 168,25 | 347,78 | 257,00 | 152,54 |
| MAP3K2 | 10870 | 3033,73 | 229,53 | 86,30 | 171,54 | 121,44 | 70,68 |
| MAP4K3, trans. var 1 | 4362 | 221,82 | 10,07 | 6,06 | 8,19 | 6,46 | 5,21 |
| MAP4K4, trans. var. 4 | 7183 | 926,43 | 54,43 | 25,65 | 47,40 | 32,67 | 21,45 |
Fig. 3Speed-up of parallel codes run on Intel Xeon E5-2699 v3, 32 threads used. The horizontal coordinate represents Nussinov’s matrix size, the vertical one shows the speed-up of the studied codes. mRNAs acquired from the NCBI database