| Literature DB >> 24744779 |
Mahmood A Rashid1, Swakkhar Shatabda1, M A Hakim Newton2, Md Tamjidul Hoque3, Abdul Sattar1.
Abstract
Protein structure prediction is computationally a very challenging problem. A large number of existing search algorithms attempt to solve the problem by exploring possible structures and finding the one with the minimum free energy. However, these algorithms perform poorly on large sized proteins due to an astronomically wide search space. In this paper, we present a multipoint spiral search framework that uses parallel processing techniques to expedite exploration by starting from different points. In our approach, a set of random initial solutions are generated and distributed to different threads. We allow each thread to run for a predefined period of time. The improved solutions are stored threadwise. When the threads finish, the solutions are merged together and the duplicates are removed. A selected distinct set of solutions are then split to different threads again. In our ab initio protein structure prediction method, we use the three-dimensional face-centred-cubic lattice for structure-backbone mapping. We use both the low resolution hydrophobic-polar energy model and the high-resolution 20 × 20 energy model for search guiding. The experimental results show that our new parallel framework significantly improves the results obtained by the state-of-the-art single-point search approaches for both energy models on three-dimensional face-centred-cubic lattice. We also experimentally show the effectiveness of mixing energy models within parallel threads.Entities:
Year: 2014 PMID: 24744779 PMCID: PMC3976798 DOI: 10.1155/2014/985968
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Figure 1A unit 3D FCC lattice with 12 basis vectors on the Cartesian coordinates.
HP energy model [23].
| H | P | |
|---|---|---|
| H | −1 | 0 |
| P | 0 | 0 |
The 20 × 20 BM energy model by Berrera et al. [25].
|
| −3.477 | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| −2.24 | −1.901 | ||||||||||||||||||
|
| −2.424 | −2.304 | −2.467 | |||||||||||||||||
|
| −2.41 | −2.286 | −2.53 | −2.691 | ||||||||||||||||
|
| −2.343 | −2.208 | −2.491 | −2.647 | −2.501 | |||||||||||||||
|
| −2.258 | −2.079 | −2.391 | −2.568 | −2.447 | −2.385 | ||||||||||||||
|
| −2.08 | −2.09 | −2.286 | −2.303 | −2.222 | −2.097 | −1.867 | |||||||||||||
|
| −1.892 | −1.834 | −1.963 | −1.998 | −1.919 | −1.79 | −1.834 | −1.335 | ||||||||||||
|
| −1.7 | −1.517 | −1.75 | −1.872 | −1.728 | −1.731 | −1.565 | −1.318 | −1.119 | |||||||||||
|
| −1.101 | −0.897 | −1.034 | −0.885 | −0.767 | −0.756 | −1.142 | −0.818 | −0.29 | 0.219 | ||||||||||
|
| ||||||||||||||||||||
|
| −1.243 | −0.999 | −1.237 | −1.36 | −1.202 | −1.24 | −1.077 | −0.892 | −0.717 | −0.311 | −0.617 | |||||||||
|
| −1.306 | −0.893 | −1.178 | −1.037 | −0.959 | −0.933 | −1.145 | −0.859 | −0.607 | −0.261 | −0.548 | −0.519 | ||||||||
|
| −0.835 | −0.72 | −0.807 | −0.778 | −0.729 | −0.642 | −0.997 | −0.687 | −0.323 | 0.033 | −0.342 | −0.26 | 0.054 | |||||||
|
| −0.788 | −0.658 | −0.79 | −0.669 | −0.524 | −0.673 | −0.884 | −0.67 | −0.371 | −0.23 | −0.463 | −0.423 | −0.253 | −0.367 | ||||||
|
| −0.179 | −0.209 | −0.419 | −0.439 | −0.366 | −0.335 | −0.624 | −0.453 | −0.039 | 0.443 | −0.192 | −0.161 | 0.179 | 0.16 | 0.933 | |||||
|
| −0.616 | −0.409 | −0.482 | −0.402 | −0.291 | −0.298 | −0.613 | −0.631 | −0.235 | −0.097 | −0.382 | −0.521 | 0.022 | −0.344 | 0.634 | 0.179 | ||||
|
| −1.499 | −1.252 | −1.33 | −1.234 | −1.176 | −1.118 | −1.383 | −1.222 | −0.646 | −0.325 | −0.72 | −0.639 | −0.29 | −0.455 | −0.324 | −0.664 | −1.078 | |||
|
| −0.771 | −0.611 | −0.805 | −0.854 | −0.758 | −0.664 | −0.912 | −0.745 | −0.327 | −0.05 | −0.247 | −0.264 | −0.042 | −0.114 | −0.374 | −0.584 | −0.307 | 0.2 | ||
|
| −0.112 | −0.146 | −0.27 | −0.253 | −0.222 | −0.2 | −0.391 | −0.349 | 0.196 | 0.589 | 0.155 | 0.223 | 0.334 | 0.271 | −0.057 | −0.176 | 0.388 | 0.815 | 1.339 | |
|
| ||||||||||||||||||||
|
| −1.196 | −0.788 | −1.076 | −0.991 | −0.771 | −0.886 | −1.278 | −1.067 | −0.374 | −0.042 | −0.222 | −0.199 | −0.035 | −0.018 | 0.257 | 0.189 | −0.346 | −0.023 | 0.661 | 0.129 |
|
| ||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Algorithm 1SpiralSearchHP(C).
Algorithm 5SpiralSearchBM(C).
Figure 2Diagonal move operator. For easy understanding, the figures are presented in 2D space.
Figure 3Spiral search comprising a series of diagonal moves with tabu metaheuristics. For simplification and easy understanding, the figures are presented in 2D space.
Algorithm 2The pseudocode of H-move selection: selectMoveForH().
Algorithm 3evaluate(AA).
Algorithm 4initialise().
Combination of SS-Tabu variations amongst different threads.
| Combinations | HP guide SS-Tabu | BM guide SS-Tabu |
|---|---|---|
| 1 (PSSB4H0) | 0 thread | 4 threads |
| 2 (PSSB3H1) | 1 thread | 3 threads |
| 3 (PSSB2H2) | 2 threads | 2 threads |
| 4 (PSSB1H3) | 3 threads | 1 thread |
| 5 (PSSB0H4) | 4 threads | 0 thread |
Figure 4Parallel spiral search framework.
Algorithm 6SSParallel(time, repeat).
For 9 medium sized proteins, the three different sets of excremental data—(i) our parallel local search framework (PSS), (ii) the tabu guided spiral search ( SS-Tabu ), and (iii) the genetic algorithms (GA+). The RI Columns present the relative improvements of parallel local search over the single-thread local search and the genetic algorithm. The RI is calculated on the average energy values.
| Our approach | The current state-of-the-art approaches | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| (Four threads) | (Single thread) | |||||||||
| Protein Info. | 0.5 hrs × 4 = 2 hrs | 2 hrs × 1 = 2 hrs | ||||||||
| PSS | SS-Tabu [ | GA+ [ | ||||||||
| Seq | Size | LBFE | Best | Avg ( | Best | Avg ( | RI | Best | Avg ( | RI |
| F90_1 | 90 | −168 |
| −166 | −168 |
|
| −168 | −166 |
|
| F90_2 | 90 | −168 |
|
| −167 | −164 |
| −168 | −165 |
|
| F90_3 | 90 | −167 |
|
| −167 | −165 | 0% | −167 | −164 |
|
| F90_4 | 90 | −168 |
|
| −168 | −165 | 33% | −168 | −165 |
|
| F90_5 | 90 | −167 |
| −165 | −167 | −165 | 0% | −167 |
| 0% |
|
| ||||||||||
| S1 | 135 | −357 |
|
| −355 | −347 | 30% | −355 | −348 | 22% |
| S2 | 151 | −360 |
|
| −354 | −347 | 31% | −356 | −349 | 18% |
| S3 | 162 | −367 |
|
| −359 | −350 | 26% | −361 | −349 | 28% |
| S4 | 164 | −370 |
|
| −358 | −350 | 40% | −364 | −352 |
|
For 12 large sized proteins, the three different sets of excremental data—(i) our parallel local search framework (PSS), (ii) the tabu guided spiral search (SS-Tabu), and (iii) the genetic algorithms (GA+). The RI Columns present the relative improvements of parallel local search over the single-thread local search and the genetic algorithm. The RI is calculated on the average energy values.
| Our approach | The current state-of-the-art approaches | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| (Four threads) | (Single thread) | |||||||||
| Protein Info. | 1.25 hrs × 4 = 5 hrs | 5 hrs × 1 = 5 hrs | ||||||||
| PSS | SS-Tabu [ | GA+ [ | ||||||||
| Seq | Size | LBFE | Best | Avg ( | Best | Avg ( | RI | Best | Avg ( | RI |
| F180_1 | 180 | −378 |
|
| −357 | −340 | 11% | −351 | −341 | 8% |
| F180_2 | 180 | −381 |
|
| −359 | −345 | 19% | −362 | −346 | 17% |
| F180_3 | 180 | −378 |
|
| −362 | −353 | 12% | −361 | −350 | 21% |
|
| ||||||||||
| R1 | 200 | −384 |
|
| −359 | −345 | 21% | −355 | −346 | 18% |
| R2 | 200 | −383 |
|
| −358 | −346 | 24% | −360 | −346 |
|
| R3 | 200 | −385 |
|
| −365 | −345 | 20% | −363 | −344 | 22% |
|
| ||||||||||
| 3mse | 179 | −323 |
|
| −289 | −280 | 12% | −290 | −279 | 14% |
| 3mr7 | 189 | −355 |
|
| −328 | −313 | 14% | −328 | −316 | 8% |
| 3mqz | 215 | −474 |
|
| −420 | −402 | 17% | −427 | −410 | 6% |
| 3no6 | 229 | −455 |
|
| −411 | −391 |
| −420 | −400 | 13% |
| 3no3 | 258 | −494 |
|
| −412 | −393 |
| −421 | −402 |
|
| 3on7 | 279 | n/a |
|
| −512 | −485 | n/a | −515 | −485 | n/a |
Figure 5Search progress for protein R1 with (a) real time and (b) CPU time of 4 threads (4x real time). SST, GA+, and SSP represent tabu-based spiral search [9], genetic algorithms [8], and multipoint parallel spiral search, respectively.
The benchmark proteins used in our experiments.
| ID | Length | Sequence |
|---|---|---|
|
| 54 |
|
|
| 54 |
|
|
| 58 |
|
|
| 61 |
|
|
| 64 |
|
|
| 69 |
|
|
| 74 |
|
|
| ||
|
| 90 |
|
|
| ||
|
| 108 |
|
|
| ||
|
| 120 |
|
|
| ||
|
| 142 |
|
|
| ||
|
| 160 |
|
|
| ||
|
| ||
The best and average contact energies obtained from 8 different approaches using Berrera et al. [25] 20 × 20 energy matrix. Rowwise bold-faced values are the winners for the corresponding proteins amongst the variants of spiral search (both single and parallel frameworks) and bold-italic-faced values are the winners for the corresponding proteins amongst all 8 approaches. For both energy and RMSD values, the lower the better.
| Comparing all-atomic interaction energy and RMSD values | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| State-of-the-art | Spiral search | Parallel spiral search (PSS) variants on energy model mixing | State-of-the-art | |||||||||||||||
| CPU time 1 hr | CPU time 1 hr | CPU time 1 hr (4-threads × 15 minutes)) | CPU time 1 hr | |||||||||||||||
| Protein details | 1 × br-thread | 1 × br-thread | 4 × br-threads | 3 × br-threads | 2 × br-threads | 1 × br-thread | 0 × br-thread | br and hp based | ||||||||||
| 0 × hp-thread | 0 × hp-thread | 0 × hp-thread | 1 × hp-thread | 2 × hp-threads | 3 × hp-threads | 4 × hp-threads | single threaded | |||||||||||
| LS-Tabu [ | SS-Tabu | PSSB4H0 | PSSB3H1 | PSSB2H2 | PSSB1H3 | PSSB0H4 | GA+ [ | |||||||||||
| Seq. | Size | H | Energy | RMSD | Energy | RMSD | Energy | RMSD | Energy | RMSD | Energy | RMSD | Energy | RMSD | Energy | RMSD | Energy | RMSD |
|
| 54 | 27 |
|
| −150.11 | 6.00 | −142.22 | 5.23 | −154.47 | 5.21 | −156.94 |
| −157 | 5.19 |
| 5.17 |
|
|
|
| 54 | 19 |
|
| −143.01 | 5.88 | −129.23 | 5.11 | −146.88 | 5.09 | −147.76 | 5.02 |
| 4.93 | −148.39 |
|
|
|
|
| 58 | 32 |
|
| −190.77 | 6.99 | −175.52 | 6.41 | −196.05 | 6.27 | −197.33 | 6.38 |
| 6.38 | −198.3 |
|
|
|
|
| 61 | 25 |
|
| −163.87 | 8.50 | −151.09 |
| −171.74 |
| −172.83 | 7.28 | −173.89 | 7.33 |
| 7.43 |
|
|
|
| 64 | 38 |
|
| −236.10 | 6.86 | −214.60 | 6.00 | −245.19 | 6.05 | −248.43 | 5.93 | −247.35 |
|
| 5.86 |
|
|
|
| 69 | 30 |
|
| −191.14 | 5.65 | −175.61 | 5.14 | −203.22 | 4.92 | −204.81 | 4.85 | −205.88 | 4.88 |
|
|
|
|
|
| 74 | 42 |
|
| −197.85 | 5.63 | −179.18 | 5.23 | −218.38 | 5.21 | −220.1 | 5.06 |
|
| −221.67 | 5.06 |
|
|
|
| ||||||||||||||||||
|
| 90 | 44 |
|
| −300.89 | 8.62 | −257.49 | 7.87 | −321.94 | 8.00 | −324.09 | 7.8 |
| 7.70 | −325.55 |
|
|
|
|
| 108 | 56 |
|
| −380.12 | 6.95 | −329.7 | 6.88 | −409.5 | 6.12 | −406.74 | 6.06 |
| 6.06 | −411.18 |
|
|
|
|
| 120 | 68 |
|
| −422.4 | 7.52 | −336.74 | 7.39 | −461.38 | 6.98 | −465.02 | 6.83 |
| 6.77 | −467.38 |
|
|
|
|
| 142 | 63 |
|
| −397.14 | 9.61 | −313.85 | 8.74 | −445.23 | 8.11 | −450.68 | 7.93 | −448.59 | 7.85 |
|
|
|
|
|
| 160 | 84 |
|
| −502.29 | 9.55 | −383.49 | 9.05 | −586.68 | 8.73 | −593.85 |
| −595.99 | 8.45 | − | 8.39 |
|
|