| Literature DB >> 27597167 |
Marziyeh Movahedi1, Fatemeh Zare-Mirakabad2, Seyed Shahriar Arab3.
Abstract
BACKGROUND: According to structure-dependent function of proteins, two main challenging problems called Protein Structure Prediction (PSP) and Inverse Protein Folding (IPF) are investigated. In spite of IPF essential applications, it has not been investigated as much as PSP problem. In fact, the ultimate goal of IPF problem or protein design is to create proteins with enhanced properties or even novel functions. One of the major computational challenges in protein design is its large sequence space, namely searching through all plausible sequences is impossible. Inasmuch as, protein secondary structure represents an appropriate primary scaffold of the protein conformation, undoubtedly studying the Protein Secondary Structure Inverse Folding (PSSIF) problem is a quantum leap forward in protein design, as it can reduce the search space. In this paper, a novel genetic algorithm which uses native secondary sub-structures is proposed to solve PSSIF problem. In essence, evolutionary information can lead the algorithm to design appropriate amino acid sequences respective to the target secondary structures. Furthermore, they can be folded to tertiary structures almost similar to their reference 3D structures.Entities:
Keywords: Evolutionary information; Protein design; Protein structure prediction
Mesh:
Substances:
Year: 2016 PMID: 27597167 PMCID: PMC5011913 DOI: 10.1186/s12859-016-1199-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Secondary structure assessment of designed sequences. The predicted secondary structure accuracies of designed sequences by GAPSSIF, EvoDesign and Evolver on five proteins are estimated. PSS-Pred, PSI-Pred and Reprof are used as secondary structure prediction algorithms
| PDB ID_Chain | GAPSSIF | EvoDesign | Evolver | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Reprof% | PSS% | PSI% | Reprof% | PSS% | PSI% | Reprof% | PSS% | |||||||||
| B | Ave | B | Ave | B | Ave | B | Ave | B | Ave | B | Ave | B | Ave | B | Ave | |
| 1ZZK_A | 100 | 100 | 87.5 | 80.75 | 87.5 | 81.37 | 82 | 66.5 | 83 | 70.55 | 83 | 66.3 | 88.75 | 88.33 | 91.25 | 87.91 |
| 1XTE_A | 100 | 99.74 | 92.24 | 86.81 | 93.96 | 85.94 | 69 | 61.9 | 85 | 74.7 | 81 | 69.6 | 88.79 | 89.22 | 90.51 | 89.65 |
| 2VOU_A | 100 | 99.93 | 92.46 | 89.45 | 92.46 | 89.17 | 78 | 58.8 | 84 | 70.1 | 84 | 70.1 | 90.41 | 55.58 | 86.30 | 83.78 |
| 3I4O_A | 100 | 100 | 91.17 | 87.20 | 97.05 | 89.26 | 73 | 59.1 | 82 | 60.6 | 75 | 64 | 75 | 70.58 | 77.94 | 68.62 |
| 1R26_A | 100 | 100 | 96.15 | 91.24 | 92.30 | 87.49 | 84 | 68.2 | 95 | 77.6 | 95 | 76.5 | 91.34 | 91.02 | 93.26 | 92.30 |
Tertiary structure assessment of designed sequences. TM-Score, RMSD and Assigned SS measure the predicted tertiary structure accuracy of designed sequences by I-TASSER
| TM-Score | RMSD | Assigned SS Q3 (%) | ||||
|---|---|---|---|---|---|---|
| B | Ave | B | Ave | B | Ave | |
| 1ZZK_A | 0.81 | 0.58 | 1.71 | 2.89 | 86 | 78 |
| 1XTE_A | 0.79 | 0.54 | 2.26 | 3.29 | 88 | 71 |
| 2VOU_A | 0.78 | 0.40 | 2.98 | 3.90 | 71 | 66 |
| 3I4O_A | 0.54 | 0.43 | 2.95 | 3.63 | 73 | 66 |
| 1R26_A | 0.95 | 0.82 | 0.88 | 1.71 | 81 | 75 |
| Mean | 0.77 | 2.15 | 79 | |||
Statistical assessment of designed sequences.
| PDB ID_Chain |
|
| |||||
|---|---|---|---|---|---|---|---|
| Designed | Reference | Protein-like | Bunched | Designed | Reference | Uniprot-distributed | |
| 1ZZK_A | 21.90 | 33.03 | 24.24 | 295.21 | 11.06 | 37.00 | 27.08 |
| 1XTE_A | 41.57 | 42.78 | 32.98 | 429.21 | 23.28 | 26.14 | 15.31 |
| 1T3Y_A | 46.15 | 40.46 | 43.67 | 652.03 | 22.07 | 21.70 | 8.88 |
| 1VQS_A | 40.54 | 44.20 | 29.42 | 356.21 | 39.51 | 29.21 | 18.77 |
| 1OH0_A | 45.88 | 29.85 | 34.79 | 438.79 | 37.73 | 25.73 | 12.06 |
| 1A2P_A | 30.72 | 27.89 | 29.88 | 309.62 | 29.50 | 20.66 | 9.93 |
| 1EW4_A | 28.98 | 35.20 | 37.23 | 347.37 | 17.05 | 36.39 | 10.05 |
| 1HZT_A | 38.60 | 41.60 | 37.11 | 604.35 | 22.24 | 19.61 | 12.79 |
| 1IDP_A | 46.38 | 37.74 | 43.54 | 634.46 | 50.10 | 36.94 | 17.01 |
| 1IUJ_A | 38.96 | 31.30 | 34.49 | 402.85 | 32.30 | 30.23 | 21.15 |
| 1MG4_A | 28.03 | 38.06 | 36.36 | 300.93 | 38.60 | 13.89 | 9.59 |
| 1NZ0_A | 35.95 | 51.58 | 48.39 | 850.27 | 30.91 | 55.66 | 8.16 |
| 1URR_A | 30.18 | 28.02 | 23.55 | 192.76 | 21.58 | 18.81 | 18.08 |
| 1VH5_A | 43.58 | 33.43 | 37.03 | 595.33 | 27.32 | 15.73 | 30.65 |
| 1VKK_A | 40.07 | 42.42 | 38.21 | 612.99 | 24.51 | 22.78 | 13.62 |
| 1WLU_A | 38.57 | 40.23 | 36.69 | 613.45 | 40.15 | 25.25 | 13.47 |
| 1X6Z_A | 32.58 | 44.50 | 35.98 | 642.98 | 27.93 | 28.63 | 29.13 |
| 1ZHV_A | 38.18 | 67.38 | 47.61 | 722.31 | 22.84 | 23.84 | 17.15 |
| 2BWF_A | 23.42 | 16.12 | 21.48 | 155.17 | 27.59 | 18.33 | 11.07 |
| 2FTR_A | 49.27 | 21.01 | 36.12 | 290.87 | 77.06 | 34.49 | 15.43 |
| 2GPI_A | 21.97 | 26.50 | 30.76 | 327.01 | 17.54 | 39.97 | 16.05 |
| 2PV2_A | 30.48 | 32.50 | 32.56 | 318.45 | 24.16 | 13.83 | 13.01 |
| 3EBT_A | 63.24 | 41.44 | 52.13 | 643.93 | 31.17 | 29.77 | 22.43 |
| 3EF8_A | 56.73 | 46.87 | 43.06 | 704.12 | 36.44 | 23.44 | 19.55 |
| 3FEA_A | 18.96 | 26.99 | 25.63 | 221.59 | 21.73 | 28.18 | 15.01 |
| 1GBS_A | 72.44 | 56.16 | 46.31 | 1135.2 | 44.03 | 31.58 | 10.53 |
| 1R26_A | 34.94 | 23.04 | 29.40 | 248.86 | 17.64 | 13.40 | 16.07 |
| 1Y25_A | 48.29 | 53.31 | 52.55 | 1134.5 | 26.18 | 26.00 | 19.10 |
| 2PTH_A | 66.33 | 77.51 | 58.86 | 1671.6 | 33.55 | 23.49 | 20.73 |
| 1ABA_A | 22.67 | 18.05 | 19.76 | 166.69 | 15.93 | 16.75 | 11.94 |
| 1DBW_A | 35.58 | 41.21 | 36.24 | 526.48 | 16.34 | 22.71 | 16.45 |
| 1I2T_A | 34.03 | 34.00 | 23.55 | 182.00 | 20.22 | 20.07 | 10.84 |
| 1JF8_A | 50.42 | 41.48 | 31.59 | 538.69 | 28.52 | 22.40 | 19.64 |
| 1KNG_A | 47.69 | 47.33 | 39.62 | 777.74 | 20.34 | 24.70 | 13.16 |
| 2CAR_A | 78.57 | 71.70 | 70.30 | 1603.3 | 31.39 | 26.90 | 13.87 |
| 1MF7_A | 64.18 | 63.88 | 71.41 | 1550.0 | 29.86 | 20.98 | 16.92 |
| 1SHU_X | 66.39 | 52.18 | 55.33 | 1426.0 | 16.93 | 19.34 | 15.08 |
| 1BKR_A | 25.63 | 30.22 | 25.95 | 308.33 | 14.10 | 29.46 | 19.09 |
| 2GMY_A | 37.04 | 34.64 | 35.70 | 604.25 | 29.84 | 16.53 | 8.61 |
| 1OAI_A | 21.36 | 14.58 | 14.37 | 76.785 | 15.64 | 23.19 | 12.30 |
| 1UTG_A | 23.59 | 22.73 | 20.83 | 142.38 | 33.41 | 18.75 | 16.26 |
| 1TQG_A | 39.34 | 35.46 | 44.32 | 372.97 | 25.16 | 22.52 | 29.04 |
| 1TUK_A | 15.96 | 21.39 | 25.78 | 190.00 | 8.20 | 73.94 | 26.47 |
| 1ZKE_A | 52.57 | 26.50 | 39.81 | 282.41 | 56.29 | 24.02 | 16.89 |
| 2J5Y_A | 19.41 | 18.76 | 23.69 | 156.59 | 23.08 | 19.26 | 22.90 |
| 2P5K_A | 15.82 | 20.15 | 15.64 | 87.718 | 12.72 | 13.78 | 31.03 |
| 1GUT_A | 20.80 | 34.68 | 33.24 | 276.11 | 24.02 | 18.78 | 11.73 |
| 2O1Q_A | 42.20 | 35.61 | 42.15 | 634.86 | 33.39 | 35.21 | 14.52 |
| 3I4O_A | 16.56 | 17.85 | 21.53 | 151.22 | 20.91 | 18.08 | 22.34 |
| 1EAQ_A | 35.55 | 36.74 | 38.15 | 525.38 | 53.59 | 18.46 | 17.31 |
| 1JB3_A | 35.90 | 38.39 | 33.32 | 444.65 | 25.70 | 21.64 | 14.12 |
| 1KMT_A | 40.62 | 50.03 | 36.15 | 598.09 | 33.67 | 15.22 | 24.35 |
| 1KQ1_A | 14.84 | 13.58 | 18.51 | 71.494 | 21.90 | 15.49 | 22.93 |
| 1NXM_A | 61.59 | 60.02 | 60.89 | 1582.6 | 42.49 | 29.20 | 22.84 |
| 1O7I_A | 35.83 | 38.47 | 43.29 | 571.87 | 20.76 | 21.11 | 19.77 |
| 1OK0_A | 21.76 | 20.51 | 23.92 | 151.96 | 36.62 | 29.67 | 22.49 |
| 1QHQ_A | 41.56 | 66.12 | 48.61 | 1052.2 | 19.43 | 59.80 | 15.67 |
| 1R6J_A | 13.76 | 22.35 | 21.93 | 234.14 | 7.06 | 23.50 | 27.16 |
| 1UCS_A | 12.26 | 20.15 | 19.60 | 150.73 | 21.99 | 29.24 | 16.83 |
| 2C9Q_A | 34.82 | 31.49 | 33.78 | 416.12 | 22.62 | 33.69 | 16.02 |
| 2F01_A | 33.91 | 32.49 | 41.94 | 632.93 | 44.30 | 69.50 | 17.83 |
| 2J2J_A | 56.97 | 53.80 | 54.79 | 1232.2 | 29.27 | 44.44 | 14.14 |
| 2VMH_A | 52.37 | 48.25 | 65.53 | 1022.4 | 48.85 | 31.09 | 23.27 |
| 3VUB_A | 44.39 | 27.49 | 24.28 | 283.47 | 31.45 | 19.60 | 18.99 |
| 1M9Z_A | 24.13 | 32.38 | 39.90 | 431.24 | 34.22 | 100.74 | 19.58 |
| 2J8B_A | 22.71 | 21.00 | 24.56 | 264.71 | 28.96 | 105.67 | 14.13 |
| 2VOU_A | 36.64 | 45.43 | 54.36 | 937.67 | 35.35 | 28.20 | 22.98 |
| 1V5I_B | 28.93 | 20.44 | 28.45 | 145.74 | 31.51 | 11.70 | 15.60 |
| 2WLV_A | 40.35 | 36.52 | 41.68 | 586.89 | 52.72 | 31.57 | 10.40 |
| 1F46_A | 58.98 | 43.65 | 38.99 | 541.85 | 36.50 | 14.21 | 17.10 |
| 1VZI_A | 33.33 | 37.45 | 42.36 | 578.66 | 63.84 | 39.32 | 18.86 |
| 2ANX_A | 49.41 | 53.04 | 52.31 | 744.32 | 20.30 | 10.30 | 14.17 |
| 2CMP_A | 25.96 | 25.58 | 31.02 | 154.14 | 19.60 | 18.92 | 10.19 |
| 2CVI_A | 27.21 | 47.82 | 37.24 | 298.96 | 39.62 | 39.01 | 23.82 |
| 2D3D_A | 27.45 | 28.78 | 35.66 | 306.37 | 18.22 | 22.52 | 15.46 |
| 2ERB_A | 36.41 | 36.63 | 36.23 | 504.87 | 17.94 | 52.38 | 16.72 |
| 2O9S_A | 16.01 | 13.43 | 15.44 | 100.61 | 59.57 | 14.03 | 9.57 |
| 2PR7_A | 40.92 | 48.59 | 39.60 | 857.73 | 28.04 | 30.25 | 24.34 |
| 2QCP_X | 19.48 | 19.43 | 22.94 | 150.64 | 25.94 | 18.24 | 15.44 |
| 2V1Q_A | 14.77 | 19.23 | 19.72 | 98.525 | 20.10 | 18.11 | 16.45 |
| 2VPB_A | 23.28 | 17.56 | 22.49 | 121.38 | 16.62 | 65.12 | 9.10 |
| 2VZC_A | 40.47 | 40.82 | 51.30 | 660.83 | 16.20 | 33.51 | 20.36 |
| 2ZXY_A | 31.88 | 25.72 | 31.14 | 312.36 | 24.89 | 22.00 | 21.61 |
| 3CTG_A | 32.67 | 31.39 | 40.23 | 407.36 | 27.96 | 11.53 | 16.02 |
| 3E9T_A | 29.23 | 38.88 | 39.61 | 488.97 | 21.08 | 28.89 | 21.48 |
| 3FIL_A | 18.22 | 24.39 | 24.34 | 126.78 | 21.55 | 14.81 | 17.17 |
| 3G21_A | 25.86 | 19.68 | 18.65 | 156.69 | 30.02 | 17.41 | 17.60 |
| 3G36_A | 13.81 | 14.90 | 11.31 | 108.60 | 21.10 | 11.57 | 7.88 |
| 3IV4_A | 35.41 | 25.83 | 31.88 | 388.37 | 31.85 | 26.41 | 9.82 |
| Mean | 35.64 | 35.30 | 35.58 | 498.34 | 28.71 | 28.16 | 17.08 |
| Standard deviation | 14.56 | 14.00 | 12.58 | 374.65 | 12.36 | 16.88 | 5.43 |
| Quartile 1 | 23.59 | 24.39 | 24.56 | 221.59 | 20.76 | 18.75 | 13.47 |
| Median | 35.41 | 34.64 | 35.98 | 407.36 | 26.18 | 23.49 | 16.45 |
| Quartile 3 | 42.20 | 42.78 | 42.15 | 634.46 | 33.55 | 30.25 | 20.36 |
(a) Pot statistic test penalizes short-range bunching of amino acids. The E value of reference and protein-like sequences give the minimal bunching. On the other hand, the maximal bunching is obtained from bunched sequences. The E values of designed sequences confirm that their bunching is typical of the native sequences. (b) Chi-square test is applied to determine if there is any significant difference between two sets of categorical data. The χ 2 values indicate that the distribution of designed sequences versus Uniprot database is as significant as reference sequences