| Literature DB >> 24565217 |
Jyh-Jong Tsay1, Shih-Chieh Su1.
Abstract
BACKGROUND: Proteins are essential biological molecules which play vital roles in nearly all biological processes. It is the tertiary structure of a protein that determines its functions. Therefore the prediction of a protein's tertiary structure based on its primary amino acid sequence has long been the most important and challenging subject in biochemistry, molecular biology and biophysics. In the past, the HP lattice model was one of the ab initio methods that many researchers used to forecast the protein structure. Although these kinds of simplified methods could not achieve high resolution, they provided a macrocosm-optimized protein structure. The model has been employed to investigate general principles of protein folding, and plays an important role in the prediction of protein structures.Entities:
Year: 2013 PMID: 24565217 PMCID: PMC3908773 DOI: 10.1186/1477-5956-11-S1-S19
Source DB: PubMed Journal: Proteome Sci ISSN: 1477-5956 Impact factor: 2.480
Figure 1A ground-state conformation in the 3D FCC HP model. An example on HP lattice model: (a) The native state of the protein with PDB id 1CNL, (b) an optimal HP conformation for 1CNL on 3D FCC lattice with 7 HH contacts denoted by dashed blue lines.
Figure 2The FCC lattice model: each lattice point has 12 neighbours.
Figure 3Main steps of the proposed EA-based approach.
Figure 4Square-Based Rotation: 3 ways to partition the 12 neighbours of the central point into 3 squares.
Figure 5Triangle-Hexagon-Based Rotations. Triangle-Hexagon-Based Rotations: 4 ways to partition the 12 neighbours of the central point into two triangles and one hexagon.
Figure 6Rotation-based crossover operate. Illustration of rotation-based crossover: (a) and (b) are the parent conformations; (c) is the offspring without rotation; (d) shows the 3 squares in one partitioning of neighbours; (e) shows the corresponding label permutation for rotation angle 90°, 180° and 270°; (f) shows the 4 offspring with the part in red rotated 0°, 90°, 180° and 270°; (g), (i) and (j) illustrate triangle-hexagon-based rotation.
Figure 7Generalized Pull Move. Generalized Pull Move on 3D FCC lattice: (a) shows the result obtained by the traditional Pull Move; (b) to (e) shows the 4 possible results obtained by Generalized Pull Move.
Data Set I: a group of eight HP sequences with 20-64 amino acids.
| Protein Sequence | ||
|---|---|---|
| S1 | 20 | HPHPPHHPHPPHPHHPPHPH |
| S2 | 24 | HHPPHPPHPPHPPHPPHPPHPPHH |
| S3 | 25 | PPHPPHHPPPPHHPPPPHHPPPPHH |
| S4 | 36 | PPPHHPPHHPPPPPHHHHHHHPPHHPPPPHHPPHPP |
| S5 | 48 | PPHPHHHPHHHPPPPPHHHHHHHHHHPPPPPPHHPPHHPPHPPHHHHH |
| S6 | 50 | HHPHPHPHPHHHHPHPPPHPPPHPPPPHPPPHPPPHPHHHHPHPHPHPHH |
| S7 | 60 | PPHHHPHHHHHHHHPPPHHHHHHHHHHPHPPPHHHHHHHHHHHHPPPPHHHHHHPHHPHH |
| S8 | 64 | HHHHHHHHHHHHPHPHPPHHPPHHPPHPPHHPPHHPPHPPHHPPHHPPHPHPHHHHHHHHHHHH |
Data Set II
| Protein Sequence | ||
|---|---|---|
| H1 | HPHHPPHHHHPHHHPPHHPPHPHHHPHPHHPPHHPPPHPPPPPPPPHH | |
| H2 | HHHHPHHPHHHHHPPHPPHHPPHPPPPPPHPPHPPPHPPHHPPHHHPH | |
| H3 | PHPHHPHHHHHHPPHPHPPHPHHPHPHPPPHPPHHPPHHPPHPHPPHP | |
| H4 | PHPHHPPHPHHHPPHHPHHPPPHHHHHPPHPHHPHPHPPPPHPPHPHP | |
| H5 | 48 | PPHPPPHPHHHHPPHHHHPHHPHHHPPHPHPHPPHPPPPPPHHPHHPH |
| H6 | HHHPPPHHPHPHHPHHPHHPHPPPPPPPHPHPPHPPPHPPHHHHHHPH | |
| H7 | PHPPPPHPHHHPHPHHHHPHHPHHPPPHPHPPPHHHPPHHPPHHPPPH | |
| H8 | PHHPHHHPHHHHPPHHHPPPPPPHPHHPPHHPHPPPHHPHPHPHHPPP | |
| H9 | PHPHPPPPHPHPHPPHPHHHHHHPPHHHPHPPHPHHPPHPHHHPPPPH | |
| H10 | PHHPPPPPPHHPPPHHHPHPPHPHHPPHPPHPPHHPPHHHHHHHPPHH |
Data Set III
| Sequences | ||
|---|---|---|
| F90_1 | 90 | PPHHHPPPHHPPPPHHPHHHHHHPHPHPHHPHHHHHPHHHPHPHHHHP |
| F90_2 | 90 | PHHPPHPHHPHHHPHHHPPHHHHHHPPHPHPPPPHHHPHPPHHHHPHH |
| F90_3 | 90 | HPHPHHHPHHHHPHHHPPPHPPPHPPPPHHHPPHPPPPHHHPPPPPPPPHP |
| F90_4 | 90 | PHHHPPHPPHPHPPPPHPPPHPHPPHPHHPHPPPHHHPHHHPPHHHPPHPP |
| F90_5 | 90 | PPPHPHHHHHHHPPPHPPHHHHHPHHPPHHPPHHHHPHPHPHHPPHHPPP |
| S1 | 135 | HHHHPHHHHHHPPHHPHHHHHHHHPHHPHHHHHHHHHHPPHHPPPPPH |
| S2 | 151 | HHPPHPHHHHHHHHHHPHPPPPHHHPPPHHHHHPPHHHHHPPHHHHPPH |
| S3 | 162 | HHHPPPHHPHHPPPPPHHHHHHHHPHPPHHPHHPHHHHHPPPHHHHHHH |
| S4 | 164 | HHPPHPHHHHHHHPPHPHPPHPHPPPPHHHPPPHHPHPHHPPHHHHHPPH |
| R1 | 200 | PPPHPHHPHHPPPHPHPPPPHPHHPPHPHHHHHPPHHPPHHHHHHPPHPPH |
| R2 | 200 | HPHHPPHPPPPPHHPHPHPHHPPHPPPPHHHHHHPPPHPPHHHPPHPPPPHH |
| R3 | 200 | HPHHHPHHPHPHPPPHHHHHPHPHPHHHHPPPHHPPPPPPHHPPPPHPHHH |
| F180_1 | 180 | HHPPHHHHHPHHHPPPHHHPPHHHPHPPHHHHHPPPHHHPPPHPHHPPPP |
| F180_2 | 180 | PHHPHPPPHPPHHPHHHPHPHHPHHHPHHHPPPHHPPHPHPHHPHHHHP |
| F180_3 | 180 | HHHPHPPHHPPPHPPPHPHPHPPHHHHPPHHHHHHPHPHHPPPPPHPPHH |
Data Set IV: the amino acid sequences and the corresponding HP sequences.
| Sequences (original and HP transform) | ||
|---|---|---|
| 4BP2 | 123 | ALWQFNGMIKCKIPSSEPLLDFNNYGCYCGLGGSGTPVDDLDRCCQTH |
| PHHPHPPHHPHPHPPPPPHHPHPPHPHHHPHPPPPPPHPPHPPHHPPPPPH | ||
| 2AAS | 124 | KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVH |
| PPPPPPPHPPPPHPPPPPPPPPPPHHPPHHPPPPHPPPPHPPHPPHHPPPHPP | ||
| 5LYZ | 129 | KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNT |
| PHHPPHPHPPPHPPPPHPPHPPHPHPPHHHPPPHPPPHPPPPPPPPPPPPPPH | ||
| 9WGA | 170 | RCGEQGSNMECPNNLCCSQYGYCGMGGDYCGKGCQNGACWTSKRCGS |
| PHPPPPPPHPHPPPHHHPPHPHHPHPPPHHPPPHPPPPHHPPPPHPPPPPPPP | ||
| 1RBP | 174 | ERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDE |
| PPPHPHPPHPHPPPHPPPPHPPPHHPHPPPPPPPHHHPPPHHPPHPHPPPPPHP | ||
Result for Data Set I and Comparison with ETS, HGA, and MA.
| Native E (HPstruct) | Len | Our Method | ||||
|---|---|---|---|---|---|---|
| S1 | 20 | 29 | ||||
| S2 | 24 | 28 | ||||
| S3 | 25 | 25 | ||||
| S4 | 36 | 50 | ||||
| S5 | 48 | 65 | 72 (68.50) | |||
| S6 | 50 | - | 59 | 69 (62.73) | ||
| S7 | 60 | 114 | 122 (115.87) | |||
| S8 | 64 | 98 | 115 (107.00) |
Result for Data Set II and Comparison with LNS-based approaches [28].
| Native E (HPstruct) | Len | LS | LS-G | LS-2N | LS-2N-G | LNS-MULT | LNS-3D | Our Method | |
|---|---|---|---|---|---|---|---|---|---|
| H1 | 48 | 65 (57.50) | 51 (47.17) | 68 (64.70) | 68 (64.61) | ||||
| H2 | 48 | 64 (56.59) | 55 (46.79) | 68 (62.51) | |||||
| H3 | 48 | 66 (56.69) | 58 (54.38) | 68 (62.08) | 67 (62.51) | 71 (68.06) | |||
| H4 | 48 | 65 (58.08) | 56 (49.26) | 67 (63.15) | 68 (63.10) | ||||
| H5 | 48 | 64 (57.01) | 57 (42.95) | 67 (63.38) | 68 (63.79) | ||||
| H6 | 48 | 63 (56.52) | 40 (34.35) | 69 (63.38) | 68 (64.91) | ||||
| H7 | 48 | 63 (58.15) | 49 (41.10) | 68 (63.36) | 67 (63.75) | 69 (66.68) | |||
| H8 | 48 | 63 (55.31) | 54 (50.27) | 67 (62.20) | 66 (62.56) | ||||
| H9 | 48 | 67 (58.91) | 54 (46.77) | 69 (64.90) | 69 (64.40) | ||||
| H10 | 48 | 64 (57.47) | 45 (30.03) | 67 (63.96) | 67 (63.61) | 68 (65.67) |
Result for Data Set III and Comparison with LNS-based approaches [28].
| Native E HPstruct | Len | LS | LS-G | LS-2N | LS-2N-G | LNS-MULT | LNS-3D | Our Method | |
|---|---|---|---|---|---|---|---|---|---|
| F90_1 | 90 | 143 (125.75) | 104 (102.97) | 154 (142.25) | 153 (142.77) | 164 (156.83) | 161 (151.77) | ||
| F90_2 | 90 | 142 (123.68) | 117 (112.05) | 156 (141.45) | 157 (141.89) | 163 (155.81) | 161 (153.77) | ||
| F90_3 | 90 | 138 (121.80) | 110 (101.70) | 157 (143.79) | 159 (145.24) | 163 (156.23) | 163 (157.20) | ||
| F90_4 | 90 | 144 (124.35) | 94 (92.74) | 162 (144.17) | 158 (139.26) | 163 (156.54) | 159 (152.67) | ||
| F90_5 | 90 | 138 (121.59) | 110 (107.65) | 157 (143.32) | 154 (145.00) | 163 (155.77) | 160 (152.60) | ||
| S1 | 135 | 296 (271.03) | 276 (270.99) | 343 (320.55) | 345 (323.81) | 349 (332.37) | 330 (311.53) | ||
| S2 | 151 | 304 (268.43) | 250 (244.23) | 339 (318.30) | 339 (316.60) | 349 (328.98) | 325 (303.80) | ||
| S3 | 162 | 293 (259.55) | 234 (228.71) | 332 (310.02) | 337 (306.03) | 351 (323.77) | 324 (299.33) | ||
| S4 | 164 | 294 (263.73) | 226 (222.99) | 337 (307.77) | 329 (300.92) | 346 (323.98) | 325 (300.50) | ||
| R1 | 200 | 287 (240.85) | 212 (205.58) | 292 (254.69) | 291 (264.53) | 313 (287.98) | 302 (283.90) | ||
| R2 | 200 | 290 (239.12) | 209 (205.60) | 294 (262.74) | 296 (267.75) | 331 (289.83) | 299 (284.30) | ||
| R3 | 200 | 260 (230.57) | 228 (212.12) | 305 (260.70) | 299 (267.05) | 325 (288.49) | 302 (284.60) | ||
| F180_1 | 180 | 244 (204.28) | 201 (188.06) | 261 (232.30) | 265 (240.88) | 289 (264.06) | 293 (269.07) | ||
| F180_2 | 180 | 240 (222.40) | 228 (211.07) | 279 (255.24) | 278 (254.11) | 302 (280.84) | 312 (287.21) | ||
| F180_3 | 180 | 256 (227.69) | 195 (191.91) | 292 (262.86) | 287 (261.55) | 306 (286.78) | 313 (295.31) | ||
Figure 8Configurations for F180_1, F180_2 and F180_3 obtained by our approach.
Figure 9Configurations for sequences in Data Set IV obtained by our approach.