| Literature DB >> 26361554 |
Ahammed Ullah1, Nasif Ahmed2, Subrata Dey Pappu2, Swakkhar Shatabda3, A Z M Dayem Ullah4, M Sohel Rahman2.
Abstract
Ab initio protein folding simulation largely depends on knowledge-based energy functions that are derived from known protein structures using statistical methods. These knowledge-based energy functions provide us with a good approximation of real protein energetics. However, these energy functions are not very informative for search algorithms and fail to distinguish the types of amino acid interactions that contribute largely to the energy function from those that do not. As a result, search algorithms frequently get trapped into the local minima. On the other hand, the hydrophobic-polar (HP) model considers hydrophobic interactions only. The simplified nature of HP energy function makes it limited only to a low-resolution model. In this paper, we present a strategy to derive a non-uniform scaled version of the real 20×20 pairwise energy function. The non-uniform scaling helps tackle the difficulty faced by a real energy function, whereas the integration of 20×20 pairwise information overcomes the limitations faced by the HP energy function. Here, we have applied a derived energy function with a genetic algorithm on discrete lattices. On a standard set of benchmark protein sequences, our approach significantly outperforms the state-of-the-art methods for similar models. Our approach has been able to explore regions of the conformational space which all the previous methods have failed to explore. Effectiveness of the derived energy function is presented by showing qualitative differences and similarities of the sampled structures to the native structures. Number of objective function evaluation in a single run of the algorithm is used as a comparison metric to demonstrate efficiency.Entities:
Keywords: discrete lattices; energy function; genetic algorithms; optimization; protein folding simulation; protein structure prediction
Year: 2015 PMID: 26361554 PMCID: PMC4555859 DOI: 10.1098/rsos.150238
Source DB: PubMed Journal: R Soc Open Sci ISSN: 2054-5703 Impact factor: 2.963
Figure 1.Comparison of partial energy distribution assigned by HP, GW and BM energy functions for protein 1CTF. Both GW and BM energy distributions are normalized.
Real benchmark sequences selected for experiment from PDB and CASP9.
| PDB ID | length | sequence |
|---|---|---|
| 4RXN | 54 | MKKYTCTVCGYIYNPEDGDPDNGVNPGTDFKDIPDDWVCPLCGVGKDQFEEVEE |
| 1ENH | 54 | RPRTAFSSEQLARLKREFNENRYLTERRRQQLSSELGLNEAQIKIWFQNKRAKI |
| 4PTI | 58 | RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA |
| 2IGD | 61 | MTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFKQYANDNGVDGVWTYDDATKTFTVTE |
| 1YPA | 64 | MKTEWPELVGKAVAAAKKVILQDKPEAQIIVLPVGTIVTMEYRIDRVRLFVDKLDNIAQVPRVG |
| 1R69 | 69 | SISSRVKSKRIQLGLNQAELAQKVGTTQQSIEQLENGKTKRPRFLPELASALGVSVDWLLNGTSDSNVR |
| 1CTF | 74 | AAEEKTEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK |
| 3MX7 | 90 | MTDLVAVWDVALSDGVHKIEFEHGTTSGKRVVYVDGKEEIRKEWMFKLVGKETFYVGAAKTKATINIDAISGFAY |
| EYTLEINGKSLKKYM | ||
| 3NBM | 108 | SNASKELKVLVLCAGSGTSAQLANAINEGANLTEVRVIANSGAYGAHYDIMGVYDLIILAPQVRSYYREMKVDA |
| ERLGIQIVATRGMEYIHLTKSPSKALQFVLEHYQ | ||
| 3MQO | 120 | PAIDYKTAFHLAPIGLVLSRDRVIEDCNDELAAIFRCARADLIGRSFEVLYPSSDEFERIGERISPVMIAHGSYA |
| DDRIMKRAGGELFWCHVTGRALDRTAPLAAGVWTFEDLSATRRVA | ||
| 3MR0 | 142 | SNALSASEERFQLAVSGASAGLWDWNPKTGAMYLSPHFKKIMGYEDHELPDEITGHRESIHPDDRARVLAALKAHL |
| EHRDTYDVEYRVRTRSGDFRWIQSRGQALWNSAGEPYRMVGWIMDVTDRKRDEDALRVSREELRRL | ||
| 3PNX | 160 | GMENKKMNLLLFSGDYDKALASLIIANAAREMEIEVTIFCAFWGLLLLRDPEKASQEDKSLYEQAFSSLTPREAEEL |
| PLSKMNLGGIGKKMLLEMMKEEKAPKLSDLLSGARKKEVKFYACQLSVEIMGFKKEELFPEVQIMDVKEYL | ||
| KNALESDLQLFI | ||
| 3MSE | 179 | ISPNVLNNMKSYMKHSNIRNIIINIMAHELSVINNHIKYINELFYKLDTNHNGSLSHREIYTVLASVGIKKWDINR |
| ILQALDINDRGNITYTEFMAGCYRWKNIESTFLKAAFNKIDKDEDGYISKSDIVSLVHDKVLDNNDIDNFFLSVHS | ||
| IKKGIPREHIINKISFQEFKDYMLSTF | ||
| 3MR7 | 189 | SNAERRLCAILAADMAGYSRLMERNETDVLNRQKLYRRELIDPAIAQAGGQIVKTTGDGMLARFDTAQAALRCALE |
| IQQAMQQREEDTPRKERIQYRIGINIGDIVLEDGDIFGDAVNVAARLEAISEPGAICVSDIVHQITQDRVSEPFTD | ||
| LGLQKVKNITRPIRVWQWVPDADRDQSHDPQPSHVQH | ||
| 3NO6 | 229 | MTFSKELREASRPIIDDIYNDGFIQDLLAGKLSNQAVRQYLRADASYLKEFTNIYAMLIPKMSSMEDVKFLVEQIEFML |
| EGEVEAHEVLADFINEPYEEIVKEKVWPPSGDHYIKHMYFNAFARENAAFTIAAMAPCPYVYAVIGKRAMEDPKLN | ||
| KESVTSKWFQFYSTEMDELVDVFDQLMDRLTKHCSETEKKEIKENFLQSTIHERHFFNMAYINEKWEYGGNNNE | ||
| 3NO3 | 258 | MNLKSTLLLLLCLMMAGMVAAKDNTKVIAHRGYWKTEGSAQNSIRSLERASEIGAYGSEFDVHLTADNVLVVYHD |
| NDIQGKHIQSCTYDELKDLQLSNGEKLPTLEQYLKRAKKLKNIRLIFELKSHDTPERNRDAARLSVQMVKRMKLA | ||
| KRTDYISFNMDACKEFIRLCPKSEVSYLNGELSPMELKELGFTGLDYHYKVLQSHPDWVKDCKVLGMTSNVWTV | ||
| DDPKLMEEMIDMGVDFITTDLPEETQKILHSRAQ | ||
| 3ON7 | 279 | MKLETIDYRAADSAKRFVESLRETGFGVLSNHPIDKELVERIYTEWQAFFNSEAKNEFMFNRETHDGFFPASISE |
| TAKGHTVKDIKEYYHVYPWGRIPDSLRANILAYYEKANTLASELLEWIETYSPDEIKAKFSIPLPEMIANSHKT | ||
| LLRILHYPPMTGDEEMGAIRAAAHEDINLITVLPTANEPGLQVKAKDGSWLDVPSDFGNIIINIGDMLQEASD | ||
| GYFPSTSHRVINPEGTDKTKSRISLPLFLHPHPSVVLSERYTADSYLMERLRELGVL |
Threshold values for invoking the random walk in the global algorithm.
| PDB ID | length | threshold value |
|---|---|---|
| 4RXN | 54 | 50 |
| 1ENH | 54 | 50 |
| 4PTI | 58 | 50 |
| 2IGD | 61 | 50 |
| 1YPA | 64 | 45 |
| 1R69 | 69 | 45 |
| 1CTF | 74 | 40 |
| 3MX7 | 90 | 35 |
| 3NBM | 108 | 30 |
| 3MQO | 120 | 25 |
| 3MR0 | 142 | 20 |
| 3PNX | 160 | 15 |
| 3MSE | 179 | 15 |
| 3MR7 | 189 | 15 |
| 3NO6 | 229 | 15 |
| 3NO3 | 258 | 15 |
| 3ON7 | 279 | 15 |
Settings for the GW energy function: |energyBM| denotes the magnitude of the energy level using the BM energy level.
| group ( | | | |
|---|---|---|
| 5 | 16 | 2000−3477 |
| 4 | 11 | 1501−2000 |
| 3 | 7 | 1001−1500 |
| 2 | 4 | 501−1000 |
| 1 | 2 | 101−500 |
| 0 | 1 | 0−100 |
The energy values achieved by our algorithm using the derived energy function or GW function and the genetic algorithm of Rashid et al. [12] using the mixed energy model. Bold indicates the better values.
| mixed model [ | GW function | R.I. (%) | |||||
|---|---|---|---|---|---|---|---|
| PDB ID | length | best | avg | best | avg | best | avg |
| 4RXN | 54 | −166.88 | −162.72 | − | − | 2.03 | 3.06 |
| 1ENH | 54 | −153.79 | −151.65 | − | − | 3.06 | 2.29 |
| 4PTI | 58 | −210.29 | −204.56 | − | − | 2.62 | 3.11 |
| 2IGD | 61 | −183.18 | −176.83 | − | − | 2.96 | 4.64 |
| 1YPA | 64 | −256.95 | −253.09 | − | − | 1.78 | 1.71 |
| 1R69 | 69 | −216.37 | −208.79 | − | − | 3.48 | 4.34 |
| 1CTF | 74 | −233.51 | −225.43 | − | − | 2.44 | 3.61 |
| 3MX7 | 90 | −340.05 | −325.45 | − | − | 2.54 | 4.82 |
| 3NBM | 108 | −436.76 | −419.25 | − | − | 3.98 | 5.19 |
| 3MQO | 120 | −486.05 | −472.78 | − | − | 4.34 | 2.91 |
| 3MRO | 142 | −479.36 | −447.77 | − | − | 2.44 | 4.73 |
| 3PNX | 160 | −615.82 | −592.25 | − | − | 3.86 | 3.06 |
| 3MSE | 179 | — | — | − | − | — | — |
| 3MR7 | 189 | — | — | − | − | — | — |
| 3NO6 | 229 | — | — | − | − | — | — |
| 3NO3 | 258 | — | — | − | − | — | — |
| 3ON7 | 279 | — | — | − | − | — | — |
Figure 2.Differences of average energy obtained by using derived energy function and mixed model [12].
Figure 3.Differences of best energy obtained by using derived energy function and mixed model [12].
Figure 7.Differences of average energy obtained by using derived energy function exhaustively (GW) and using derived energy function non-exhaustively (GWN).
The best energy values achieved by our approach using the GW function and the hybrid approach [11]. Bold indicates the better values.
| hybrid approach [ | our GW function | ||||
|---|---|---|---|---|---|
| pdb id | length | energy | time | energy | time |
| 4RXN | 54 | −168.076 | 1 h 5 min | − | 1 h 56 min |
| 1ENH | 54 | −157.062 | 1 h 2 min | − | 21 min |
| 4PTI | 58 | −213.778 | 1 h 20 min | − | 1 h 54 min |
| 2IGD | 61 | −186.696 | 55 min | − | 1 h 28 min |
| 1YPA | 64 | −258.709 | 42 min | − | 18 min |
| 1R69 | 69 | −222.317 | 35 min | − | 58 min |
| 1CTF | 74 | −233.764 | 1 h 36 min | − | 13 min |
Figure 4.Differences of average energy obtained by using derived energy function (GW) and using real energy function (BM).
Figure 5.Differences of best energy obtained by using derived energy function (GW) and using real energy function (BM).
Best and average RMSD values achieved by different ensembles of our algorithm and by the mixed energy model [12]. Bold indicates the better values.
| initial RMSD | best conformation | best 2000 conformation | all conformation | mixed model [ | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| PDB ID | length | avg | best | avg | best | avg | best | avg | best | avg | best |
| 4RXN | 54 | 6.2 | 5.07 | 6.76 | 6.24 | 6.15 | 5.9 | 5.41 | 4.7 | ||
| 1ENH | 54 | 6.29 | 4.57 | 6.91 | 6.51 | 6.33 | 6.07 | 5.22 | 4.57 | ||
| 4PTI | 58 | 7.51 | 6.32 | 7.99 | 7.67 | 7.44 | 6.98 | 6.46 | 5.97 | ||
| 2IGD | 61 | 8.77 | 7.45 | 9.32 | 8.98 | 8.86 | 8.41 | 7.81 | 6.85 | ||
| 1YPA | 64 | 7.57 | 6.29 | 7.67 | 7.25 | 7.19 | 6.85 | 6.29 | 5.42 | ||
| 1R69 | 69 | 6.66 | 4.9 | 6.55 | 6.15 | 6.01 | 5.66 | 5.17 | 4.68 | ||
| 1CTF | 74 | 7.91 | 6.17 | 7.13 | 6.57 | 6.68 | 6.18 | 5.28 | 4.69 | ||
| 3MX7 | 90 | 9.32 | 7.92 | 9.5 | 9.08 | 9.12 | 8.74 | 7.94 | 7.31 | ||
| 3NBM | 108 | 9.32 | 7.41 | 8.56 | 7.98 | 8.2 | 7.57 | 6.66 | 5.91 | ||
| 3MQO | 120 | 9.63 | 7.75 | 9.45 | 8.73 | 9.09 | 8.51 | 7.06 | 6.62 | ||
| 3MRO | 142 | 11.86 | 9.28 | 13.06 | 11.88 | 12.75 | 11.52 | 8.81 | 8.01 | ||
| 3PNX | 160 | 12.06 | 9.2 | 11.59 | 10.5 | 11.34 | 10.22 | 8.51 | 7.5 | ||
| 3MSE | 179 | 16.85 | 11.21 | 19.7 | 17.89 | 18.83 | 15.26 | 9.29 | 7.7 | — | — |
| 3MR7 | 189 | 12.67 | 9.48 | 10.88 | 9.98 | 10.58 | 9.74 | 8.83 | 7.95 | — | — |
| 3NO6 | 229 | 14.34 | 10.78 | 12.76 | 11.84 | 12.32 | 11.38 | 10.29 | 9.37 | — | — |
| 3NO3 | 258 | 14.53 | 9.84 | 11.07 | 9.68 | 9.82 | 8.78 | 8.56 | 7.86 | — | — |
| 3ON7 | 279 | 15.4 | 11.76 | 13.27 | 12.26 | 11.99 | 10.92 | 11.02 | 10.41 | — | — |
Comparison of the number of objective function evaluation achieved by our algorithm using the derived energy function or GW function to the same achieved by the firefly inspired algorithm in Maher et al. [26].
| firefly algorithm [ | GW function | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| PDB ID | length | speed-up | ||||||||
| 4RXN | 54 | −158.9 | 20.05 | 39.82 | 52.78 | −159.32 | 0.44 | 1.86 | 6.22 | 21.41 |
| 1ENH | 54 | −144.12 | 16.86 | 35.09 | 47.52 | −144.48 | 0.18 | 0.81 | 1.88 | 43.32 |
| 4PTI | 58 | −200.86 | 23.98 | 31.88 | 58.56 | −201.32 | 0.48 | 2.24 | 8.27 | 14.23 |
| 2IGD | 61 | −179.88 | 34.71 | 64.87 | 111.22 | −180.28 | 1.01 | 7.03 | 24.13 | 9.23 |
| average speed-up | 22.05 | |||||||||
Comparison of the number of objective function evaluation achieved using the derived energy function or GW function to the same achieved using the real energy function (BM).
| BM function | GW function | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| PDB ID | length | speed-up | ||||||||
| 4RXN | 54 | −161.35 | 1 | 71.61 | 179.69 | −161.78 | 0.48 | 3.37 | 12.4 | 21.25 |
| 1ENH | 54 | −150.46 | 2.51 | 64.23 | 181.46 | −150.78 | 0.68 | 4.42 | 15.28 | 14.53 |
| 4PTI | 58 | −202.58 | 1.57 | 56.07 | 160.15 | −203.03 | 0.6 | 3.48 | 12.87 | 16.11 |
| 2IGD | 61 | −177.73 | 4.2 | 52.13 | 153.69 | −178.17 | 0.54 | 4.41 | 19.71 | 11.82 |
| 1YPA | 64 | −250.06 | 4.11 | 67.99 | 143.5 | −250.5 | 0.65 | 5.19 | 12.95 | 13.10 |
| 1R69 | 69 | −208.37 | 2.12 | 53.51 | 125.68 | −208.83 | 0.97 | 5.77 | 18.2 | 9.27 |
| 1CTF | 74 | −223.41 | 1.01 | 53.37 | 118.47 | −223.92 | 0.67 | 6.68 | 31.96 | 7.99 |
| 3MX7 | 90 | −326.57 | 2.26 | 39.68 | 93.62 | −327.14 | 1.11 | 6.99 | 59.23 | 5.68 |
| 3NBM | 108 | −425.3 | 2.1 | 34.41 | 69.77 | −425.78 | 1.3 | 11.11 | 79.4 | 3.10 |
| 3MQO | 120 | −469.43 | 2.11 | 25.48 | 61.73 | −469.92 | 1.12 | 10.23 | 61.01 | 2.49 |
| 3MRO | 142 | −454.42 | 3.58 | 27.02 | 49.41 | −454.59 | 1.78 | 10.34 | 57.98 | 2.61 |
| 3PNX | 160 | −592.57 | 3.41 | 23.47 | 40.1 | −593.22 | 1.72 | 8.19 | 27.2 | 2.87 |
| 3MSE | 179 | −692.76 | 3.58 | 22 | 34.86 | −693.34 | 2.03 | 12.55 | 50.5 | 1.75 |
| 3MR7 | 189 | −636.64 | 3.83 | 20.17 | 32.61 | −635.79 | 1.44 | 11.82 | 57.59 | 1.71 |
| 3NO6 | 229 | −817.31 | 5.63 | 19.39 | 24.58 | −816.23 | 2.88 | 12.76 | 32.18 | 1.52 |
| 3NO3 | 258 | −880.46 | 0.84 | 17.37 | 21.2 | −881.13 | 1.89 | 7.98 | 24.98 | 2.18 |
| 3ON7 | 279 | −953.87 | 5.38 | 16.14 | 18.63 | −954.63 | 2.43 | 8.76 | 21.48 | 1.84 |
| average speed-up | 7.05 | |||||||||
Figure 6.Ratio of average number of objective function evaluation by using derived energy function (GW) to the same using real energy function (BM).
Figure 8.Ratio of average number of iterations by using derived energy function non-exhaustively (GWN) to the same using derived energy function exhaustively (GW).