| Literature DB >> 24876837 |
Swakkhar Shatabda1, M A Hakim Newton2, Mahmood A Rashid1, Duc Nghia Pham2, Abdul Sattar1.
Abstract
Protein structure prediction (PSP) has been one of the most challenging problems in computational biology for several decades. The challenge is largely due to the complexity of the all-atomic details and the unknown nature of the energy function. Researchers have therefore used simplified energy models that consider interaction potentials only between the amino acid monomers in contact on discrete lattices. The restricted nature of the lattices and the energy models poses a twofold concern regarding the assessment of the models. Can a native or a very close structure be obtained when structures are mapped to lattices? Can the contact based energy models on discrete lattices guide the search towards the native structures? In this paper, we use the protein chain lattice fitting (PCLF) problem to address the first concern; we developed a constraint-based local search algorithm for the PCLF problem for cubic and face-centered cubic lattices and found very close lattice fits for the native structures. For the second concern, we use a number of techniques to sample the conformation space and find correlations between energy functions and root mean square deviation (RMSD) distance of the lattice-based structures with the native structures. Our analysis reveals weakness of several contact based energy models used that are popular in PSP.Entities:
Year: 2014 PMID: 24876837 PMCID: PMC4022063 DOI: 10.1155/2014/867179
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Figure 1Different 3D lattices: (a) cubic and (b) FCC lattice.
Algorithm 2selectDirection(position i).
Algorithm 1chainGrowthInitialize().
Average distance root mean square deviation (dRSMD) values achieved for the benchmark protein sequences in five runs for different algorithms in the literature and percentage improvements produced by our approach over other approaches.
| Lattice type |
Park and Levitt [ |
Mann et al. [ | Our approach | |||
|---|---|---|---|---|---|---|
| Avg. | Imp. | Avg. | Imp. | Initial | Final | |
| Cubic | 2.34 | 20.08% | 2.08 | 10.09% | 2.86 |
|
| FCC | 1.46 | 15.75% | 1.34 | 8.21% | 2.03 |
|
Algorithm 3PCLFSearch().
Proportion of the number of protein sequences that fall in the different ranges of correlation coefficient between energy function value and distant root mean square deviation (dRSMD), produced by different sampling techniques: sampled by PCLF search, sampled by energy function guidance, and sampled by random walk.
| Energy function | Correlation coefficient for cubic lattice (%) | Correlation coefficient for FCC lattice (%) | ||||||
|---|---|---|---|---|---|---|---|---|
| >0 | >0.5 | ≤0 | <−0.5 | >0 | >0.5 | ≤0 | <−0.5 | |
| Sampling by using PCLF search | ||||||||
|
| 11.56 | 2.02 | 88.43 | 58.67 | 5.73 | 0.28 | 94.26 | 74.21 |
|
| 70.20 | 24.35 | 29.79 | 6.59 | 73.72 | 28.85 | 26.28 | 6 |
|
| 9.16 | 0.57 | 90.83 | 66.47 | 3.71 | 0 | 96.28 | 80 |
|
| ||||||||
| Sampling by using guided search | ||||||||
|
| 9.88 | 5.14 |
|
| 3.69 | 0.86 |
|
|
|
| 31.85 | 16.38 |
|
| 19.31 | 5.42 |
|
|
|
| 2.09 | 1.43 |
|
| 0.86 | 0.22 |
|
|
|
| ||||||||
| Sampling by using random walk | ||||||||
|
| 86.22 | 1.04 | 13.78 | 0.13 | 87.54 | 1.23 | 2.36 | 0.11 |
|
| 31.76 | 0.13 | 68.24 | 0 | 54.25 | 1.18 | 46.75 | 0.24 |
|
| 85.30 | 2.62 | 14.70 | 0.26 | 81.28 | 1.28 | 18.72 | 1.54 |
Figure 2Plot of values of three different energy function values of the structures generated for 1A6M against their distant root mean square deviation (dRMSD) values found by the guided sampling algorithms.
Figure 3Scatter plot of minimum dRSMD values found by each of the sampling algorithms against the minimum dRSMD value found by random walk.