Literature DB >> 23408172

In silico Evaluation of Crosslinking Effects on Denaturant m(eq) values and ΔCp upon Protein Unfolding.

Maryam Hamzeh-Mivehroud1, Ali Akbar Alizade, Monire Ahmadifar, Siavoush Dastmalchi.   

Abstract

Important thermodynamic parameters including denaturant equilibrium m values (m(eq)) and heat capacity changes (ΔCp) can be predicted based on changes in Solvent Accessible Surface Area (SASA) upon unfolding. Crosslinks such as disulfide bonds influence the stability of the proteins by decreasing the entropy gain as well as reduction of SASA of unfolded state. The aim of the study was to develop mathematical models to predict the effect of crosslinks on ΔSASA and ultimately on m(eq) and ΔCp based on in silico methods. Changes of SASA upon computationally simulated unfolding were calculated for a set of 45 proteins with known m(eq) and ΔCp values and the effect of crosslinks on ΔSASA of unfolding was investigated. The results were used to predict the m(eq) of denaturation for guanidine hydrochloride and urea, as well as ΔCp for the studied proteins with overall error of 20%, 31% and 17%, respectively. The results of the current study were in close agreement with those obtained from the previous studies.

Entities:  

Keywords:  Crosslinks; Disulfides; Protein stability; Thermodynamics

Year:  2012        PMID: 23408172      PMCID: PMC3558204     

Source DB:  PubMed          Journal:  Avicenna J Med Biotechnol        ISSN: 2008-2835


Introduction

Through the human genome project we now know that a human cell can synthesize about 20,000 to 25,000 different proteins (1). Proteins are an important class of biological macromolecules present in all biological organisms, and constitute high proportion of the dry mass of all cells (2). Most of the biological processes in all cells are executed by proteins. The amino acid sequence of a protein contains all information needed for adopting its three-dimensional structure. However, misfolding does occur, even though help from other molecules, such as chaperons, for correct and fast in vivo folding are in place (3–5). Denaturation studies are very useful for investigating the thermodynamic properties of proteins. Transition from native to denatured states can be brought about by changing the properties of protein's environment. In general, this can be done by increasing the temperature, adding chemical denaturants or changing the pH. Urea and guanidinium ion (used in the form of guanidinium chloride-GdnHCI) favor the denatured state by increasing the solubility of the unfolded chain in an aqueous solution. In comparison to temperature denaturation, chemical denaturation is often a reversible process. This is possible since the hydrophobic groups of the unfolded chain are shielded by the denaturants, which prevent aggregation. The unfolding free energy (ΔG ) depends linearly on the denaturant concentration as: Where is the free energy of unfolding in the absence of denaturant and m denotes the dependency of free energy on denaturant concentration (i.e. m ) (6). A good linearity is observed at high denaturant concentrations and is obtained by extrapolation to the zero concentration of denaturant. values calculated from guanidinium chloride and urea denaturation are in very good agreement (7) which gives this relation some further credibility. One of the major challenges in the field of protein science is to predict the stability and function of proteins from their primary structures. To accomplish this task, efficient algorithms are needed to relate the structure to stability. The availability of about 77,000 protein structures in Protein Data Bank (PDB) (8) and a great deal of experimental works on the thermodynamic stability of proteins have provided a wealth of information which can be used for the development of empirical functions that relate thermodynamic and structural parameters. The success of such approach in developing structure-based methods to predict various thermodynamic parameters that define the Gibbs energy, i.e., the enthalpy, entropy and heat capacity changes, has been shown previously (9–13). In the process of unfolding, the major contribution to the enthalpy change arises from the disruption of intramolecular interactions such as van der Waals and hydrogen bonds and also solvation of the interacting groups. Therefore, the change in solvent accessible surface area (ΔSASA) upon unfolding has been used as a mean for predicting the ΔH as presented below: Where ΔSASA is the change in SASA of atom i upon unfolding, and is a coefficient that depends on the atom type and the average packing density of that atom within the protein (14). The heat capacity change (ΔC ) in protein unfolding largely arises from changes in the hydration of groups that are buried in the native form away from the surrounding aqueous environment. ΔCp is correlated to the changes in SASA upon unfolding, as shown in the following equation: Where a is the contribution of atom i per unit area and ΔSASA is as defined above. Using both equations, good correlations were obtained between experimental and calculated ΔH and ΔCp values (14). The aim of current study is to develop empirical models to account for the effect of crosslinks on ΔSASA and hence on thermodynamic parameters (i.e., m and ΔCp) of protein unfolding based on computational approach.

Materials and Methods

Databases and programs

The experimental m values for urea and GdnHCl denaturation, as well as ΔCp denaturations for a set of 45 proteins used in this study were from Myers et al (10). The three-dimensional (3D) structures of the studied proteins were obtained from Protein Data Bank (http://www.rcsb.org/) at RCSB (8). The SASA of the proteins in folded and unfolded forms were calculated using DSSP program implemented in GROMACS package. The DSSP program was designed by Wolfgang Kabsch and Chris Sander to standardize secondary structure assignment based on a database of secondary structure for protein entries in the PDB (15). Swiss-Pdb Viewer (SPDBV, version 3.7, Swiss Institute of Bioinformatics), an interactive molecular graphics program was used for viewing and analyzing protein structures (16). HyperChem (version 7.1; 2002; Hypercube Inc.) is the other molecular modeling software used in this study. GROMACS (version 3.3, University of Groningen, The Netherlands, currently maintained by ScalaLife), an engine to perform molecular dynamics simulations and energy minimization (17) was used under Linux operating system (Fedora core 5) on a cluster consisting of 8 nodes each with two dual-core Opteron 2212 CPUs and 2 GB RAM.

Unfolding the proteins

The unfolded states of the proteins were achieved by three different approaches; (i) building the fully extended conformation of the protein, (ii) instantaneously assigning standard bond lengths, bond angles, torsion angles, and stereochemistry properties to the model structure using a given force field method, or (iii) molecular dynamics simulation.

Fully extended conformation

SPDBV was used to upload the sequence of the protein saved in FASTA format. Then the sequence was folded into an extended conformation by setting phi (ϕ) and psi (ψ) angles to those corresponding with β-pleated strand. It is clear that in such a conformation there is no crosslink in the generated model even if the native form of protein consists such constraints.

Instantaneous unfolding using standard bond and angle assignment

HyperChem program was used to open the crystal structure (native form) of protein. In this way, the disulfide bonds are lost. If the re-establishment of crosslinks was desired, first the residues involved in the crosslink were selected and then the necessary bonds were created between the sulfur atoms involved in the disulfide bonds. Subsequently, the structure was forced to unfold into a random coil losing its regular structures while preserving the crosslinks. The unfolded structural model was energy minimized using the molecular mechanics force field. The minimization protocol employed the steepest descent method using BIO+, the HyperChem implementation of CHARMM (Chemistry at HARvard using Molecular Mechanics) force field (18), until the difference in energy after two consecutive iterations was less than 0.1 kcal/mol. The model structures were stored as unfolded states and their SASA were calculated as described above. In the case of heme containing proteins, two bonds were built linking the chelating atoms to the central iron atom. This effectively constrains the spatial distance between two residues to which the iron atom of the heme group is linked through coordination of the unpaired electrons of nitrogen or sulfur atoms.

Unfolding using molecular dynamics simulation

In order to unfold proteins using Molecular Dynamics (MD) simulation technique, the following steps were performed. First, the native structure was downloaded from PDB at RCSB and converted into standard Gromacs file format. The positions of all hydrogen atoms were reconstructed. Subsequently, the protein structure was energy minimized in vacuum using steepest descent algorithm until the maximum force was smaller than 1.0 kJ mol nm . GROMOS-96, the officially distributed force field for Gromacs, was used for molecular mechanics simulations as implemented in the software package (19). Then a simulation box was created and protein was centred into it. The simulation box was filled in by Simple Point Charge (spc216) water and urea molecules. The final concentration of the urea in the box was about 4.4 M. Before running the MD simulation, the system was neutralized by adding appropriate number of either Na+ or Cl− counter ions to have zero net charge. Ultimately, the solvated protein was subjected to MD simulation for 10 ns at 500°K and the trajectories were saved every 0.02 ns.

Crosslinking factor (CLF)

In order to investigate the effect of crosslinking on the unfolding behaviour of a protein, an index named Crosslinking Factor (CLF) was defined as follows: Where SASA refers to solvent accessible surface area of unfolded conformation and the subscripts c and nc denote whether the crosslinks are preserved or not in the unfolded conformation, respectively. The value of n equals the number of crosslinks present in any of those proteins studied here which have cross-links in the native form. N is the number of proteins with crosslinks, and i denotes any of the studied proteins used to derive CLF value.

Statistical treatment

Statistical analyses were performed by SPSS (SPSS for windows version 11.5, IBM) and Excel (Microsoft Office 2007) programs. Predictive power of the mathematical models were evaluated by excluding one of the data points, i.e. one of the proteins from the data set of 45 proteins listed in Table 1, and training the model based on the remaining proteins and subsequently predicting the value of thermodynamic parameter for the excluded protein. This was continued until all proteins were used for the prediction.
Table 1

Characteristics of 45 proteins that have m values and crystal structures available a.

Protein namePDBNumber of ResiduesNumber of crosslinks m eq(GdnHcl) m eq(Urea) ΔCp SASA folded SASA unfolded b ΔSASA

cal/(mol.M)
Ovomucoid third domain (turkey) 1CHO53d 3580250590373571573422
1gG binding domain of protein G lPGB5601800NA620375277053953
BPTl (A30, A51) 7PTI5821200NANA396982544285
BPTl (V30, A51) 1AAL5821500NANA399382764283
SH3 domain of a-spectrin lSHG57 d 01880766813392582204295
Chymotrypsin inhibitor 2 2CI265 d 01890NA720456492464682
Calbindin D9K 1IG5750NA1140NA4774103735599
Ubiquitin lUBl760NA1140NA4911107585847
HPr (B. subtilis) 2HPR87 d 0NA105011604751116886937
Barstar 1BTA8902400125014605653125966943
Lambda repressor (N-terminal) 1LMB102024001090NA6270130136743
Cytochrome c (tuna) 5CYT10312800NANA6087143828295
Cytochrome c (horse heart) 2PCB10413010120017306363148128449
Ribonuclease T1 9RNT10422560121012705467136518184
Arc repressor c 1PAR10603270191016006566154718906
FK binding protein (human) lFKD1070NA1460NA6144147988654
Iso-I-cytochrome c (yeast) lYCC10813400143013706575151698594
Thioredoxin ( E.coli ) 2TRX10813310130016605847147768929
Barnase I RNB109 d 04400194016506050150939043
Ribonuclease A 9RSA124431001100123069651698310018
ROP 1 ROP12602400NA18906445161919746
Che Y ( E.coli ) 3CHY128 d 022601600NA66731764610973
Lysozyme (hen egg white) 1AKI129423301290154067551788611131
Lysozyme (human) lLZl13043460NA158067771830511528
Fatty acid binding protein (rat) IlFC131044701770NA71451856411419
Staphylococcal nuclease 2SNS141 d 068302380232080522008312031
Interleukin 1-β 511B15105580NA189082092118812979
Apomyoglobin (horse) IYMB153137102140187082962189513599
Apomyoglobin (sperm whale) 5MBN153126001460277083202218013860
Metmyoglobin (horse) IYMB1530NA1800NA82962104212746
Metmyoglobin (sperm whale) 5MBN1530NA2040NA83202132713007
Ribonuclease H 2RN2155045001930NA87852163512850
Dihydrofolate reductase ( E.coli ) 4DFR1590NA1900NA87172194513228
T4 lysozyme (T54, A97) 1L63162 d 055002000257085532291314360
Gene v protein c 1VQB172 d 03600NANA132162672813512
Adenylate kinase (porcine) 3ADK19404800NANA110512697815927
HIV-1 protease c lHVR1980NA2050NA98652678416919
SIV protease c lSIV1980NA1880NA99622657616614
Trp aporepressor c 3WRP202 d 0NA2900NA113882858317195
α-Chymotrypsin 4CHA239 d 5410020703020107423198520498
Chymotrypsinogen A 2CGA245544402030NA107423198521243
Tryptophan synthase, α-subunit IBKS255 d 0NA37504600115853427122686
β-Lactamase 3BLM257 d 072003210NA115613644424883
Pepsinogen 2PSG3703NA78006090147484847833730
Phosphoglycerate kinase (yeast) 3PGK41509700NA7500189885305134063

NA: Not Available

for each protein, the PDB file code, number of residues, and number of disulfides or covalent heme-protein crosslinks is shown. SASA values were calculated by DSSP program as described in the text. The 5, 6 and 7th columns give experimental m values for GdnHCI or urea denaturation and the observed ΔC , for each protein, taken from reference (10). ΔSASA values are in Å2, m values in cal/(mol.M), and ΔCp, in cal/(mol.K)

SASA unfolded values in this table were calculated using the extended β-strand conformation of all proteins

Dimer;

These values were checked and corrected based on the number of the residues in the corresponding PDB files and hence are different from those reported in Myers et al. (10)

Characteristics of 45 proteins that have m values and crystal structures available a. NA: Not Available for each protein, the PDB file code, number of residues, and number of disulfides or covalent heme-protein crosslinks is shown. SASA values were calculated by DSSP program as described in the text. The 5, 6 and 7th columns give experimental m values for GdnHCI or urea denaturation and the observed ΔC , for each protein, taken from reference (10). ΔSASA values are in Å2, m values in cal/(mol.M), and ΔCp, in cal/(mol.K) SASA unfolded values in this table were calculated using the extended β-strand conformation of all proteins Dimer; These values were checked and corrected based on the number of the residues in the corresponding PDB files and hence are different from those reported in Myers et al. (10) The Standard Deviation of Error of Prediction (SDEP) was calculated to give a measure for the distribution of the errors involved in the predictions using the following equation: Here Aexp and Acalc are predicted values, respectively. N denotes the number of data points.

Mean absolute percentage error (MAPE)

To evaluate the accuracy of predictions, absolute percentage errors were calculated based on the following equations: Where A and A are the calculated and experimental values for a given parameter of interest, such as ΔC , m for GdnHCl or urea. The average of APE over all data points for each of the above mentioned parameters was calculated and called MAPE.Where N is the number of data points.

Results and Discussion

The changes in solvent accessible surface area (ΔSASA) upon unfolding, as determined by the differences in solvent accessibilities of native form (calculated from the crystal structure) and denatured form (modeled by an extended polypeptide chain) are given for a set of 45 proteins in Table 1. The table also shows m values from denaturation experiments, ΔCp of unfolding, number of residues as well as crosslinks present in each of these proteins taken from the compilation made by Myers et al (10). Figures 1A and 1B demonstrate dependencies that exist between the denaturants m values and the changes in the solvent accessible surface area upon unfolding. There are significant linear correlations in both cases, with the correlation coefficient (R) values of 0.85 and 0.87 for GdnHCl and urea, respectively. The slopes of the linear regression lines are 0.25 and 0.17 cal/ (mol.M.Å ) for GdnHCl and urea, respectively, indicating the stronger denaturing effects of GdnHCl.
Figure 1

Dependence of A) m value for Gdn HCl denaturation, B) m value for urea denaturation, and C) heat capacity changes upon unfolding on ΔSASA for the 45 proteins shown in Table 1

Dependence of A) m value for Gdn HCl denaturation, B) m value for urea denaturation, and C) heat capacity changes upon unfolding on ΔSASA for the 45 proteins shown in Table 1 Denaturation heat capacity changes (ΔCp) were also correlated with the ΔSASA strongly with the correlation coefficient of 0.97 as shown in Figure 1C. The same linear correlations between m values and ΔSASA have been shown previously by Myers et al (10). ΔSASA has been also related linearly to ΔCp by others (20, 21). The main purpose of this study is to reevaluate the effect of crosslinks on ΔSASA and also predict the m and ΔCp of unfolding based on protein sequence information. These latter two parameters are amongst the important criterion indicative of the stability of proteins. Therefore, prediction or any improvement in the prediction of these values has significant theoretical and practical applications. The presence of crosslinks such as disulfide bonds and heme groups in a protein (as shown in Table 2) will result in a more compact unfolded state, thus reducing the solvent accessibility of the unfolded polypeptide chain. To compensate for the effects of crosslinks, Myers et al (10) employed the results of different empirical methods (22) to estimate the magnitude of the reduction of solvent accessible surface area (ΔSASA) per disulfide bond. The reduction of ΔSASA per crosslink was estimated to be about 900Å .
Table 2

List of crosslink-containing proteins used in this study. Differences of SASA values for the unfolded stats in two different forms, i.e., with and without conserving the crosslinks, have been shown along with the number of crosslinks and crosslinking number for each protein

PDB code SASA unfolded without Crosslinksa SASA unfolded with Crosslinksa ΔSASA Number of crosslinks (n)Crosslinking number
1CHO 7039573013093436.33
7PTI 8278661616622831.00
1AAL 8315653917762888.00
5CYT b 13929133925371537.00
2PCB b 14410141182921292.00
9RNT 1334811227212121060.50
1YCC b 15145146205251525.00
2TRX 13568132543141314.00
9RSA 1712513015411041027.50
1AKI 1759712575502241255.50
1LZ1 1869812967573141430.25
1YMB b 2211020864124611246.00
5MBN b 2220221007119511195.00
4CHA 269902558514055281.00
2CGA 277072369340145802.80
2PSG 4517037447772332574.33

CLF (equals to the average of crosslinking numbers)±Standard Error918.5±145.1

In order to be consistent, the results presented in this table were derived from instantaneous unfolding method using standard bond length and angle values for both sets of data labeled “without crosslinks” and “with crosslinks”and then the SASA values were calculated using DSSP.

The heme containing proteins

List of crosslink-containing proteins used in this study. Differences of SASA values for the unfolded stats in two different forms, i.e., with and without conserving the crosslinks, have been shown along with the number of crosslinks and crosslinking number for each protein In order to be consistent, the results presented in this table were derived from instantaneous unfolding method using standard bond length and angle values for both sets of data labeled “without crosslinks” and “with crosslinks”and then the SASA values were calculated using DSSP. The heme containing proteins In the current study, to find out more about the effect of crosslinking through theoretical and computational methods, the different unfolded models were generated for crosslink-containing proteins while the crosslinks were preserved or removed in the unfolded states generated by instantaneous unfolding method based on assigning standard bond length and angle values. Then the SASA values were calculated for the generated unfolded structural models (Table 2). To quantitatively indicate the effect of crosslinks on ΔSASA upon unfolding a new term called Crosslinking Factor (CLF) was introduced (CLF was described in Materials and Methods section.) Effectively, CLF is a measure of reduction in the SASA of unfolded protein as a consequence of presence of a single crosslink, such as disulfide bond, and calculated to be equal to 918.5 Å . This value is the average of crosslinking numbers calculated for 16 crosslink-containing proteins listed in Table 2 for which the m and ΔCp values were available. In five proteins listed in the table, cross-links are formed via ligation of central ion atom of heme groups by sulfur or nitrogen atoms of the side chains of the interacting residues. The average of crosslinking numbers for these proteins (759.0 Å ) is smaller than the average of the numbers (991.0 Å ) for the remaining proteins where the crosslinks are formed by disulfide bounds. However, the difference is not statistically significant (p-value>0.05). None of these values are statistically different from the calculated CLF value of 918.5. Based on the above findings, the ΔSASA values were corrected for the effect of cross-links on the solvent accessibility of the unfolded state by taking 918.5 Å per crosslink off the ΔSASA (called ΔSASA corrected) and then the corrected values were re-correlated to the m and ΔCp values. Linear correlation coefficients improved to 0.90, 0.88 and 0.99 for GdnHCL and urea m as well as ΔCp values, respectively as shown in Figure 2.
Figure 2

Dependence of A) m value for GdnHCI denaturation, B) m value for urea denaturation, and C) heat capacity changes upon unfolding on ΔSASA after correction for the effect of crosslinks by taking out 918.5 Å per crosslink for the 45 proteins in our data set (see text for further explanation)

Dependence of A) m value for GdnHCI denaturation, B) m value for urea denaturation, and C) heat capacity changes upon unfolding on ΔSASA after correction for the effect of crosslinks by taking out 918.5 Å per crosslink for the 45 proteins in our data set (see text for further explanation) The extent of increase in SASA upon unfolding of a protein highly depends on the number of residues (i.e. protein size) and the constraints present in the unfolded state. The unfolded state of a protein is populated by an ensemble consisting huge number of conformationally distinct species. The presence of structural constraints limits the conformational space available to be explored by the protein polypeptide chain. Our analyses, in agreement with the results of others (10), show that the amount of area buried in each protein correlates very strongly (R=0.99) with the number of residues in each protein (Eq. 9). The strong correlation between ΔSASA and the number of residues, makes it possible to estimate the thermodynamic parameters using equations 10 to 12. Where k denotes the number of residues for a given protein. These equations provide means to predict m and ΔCp directly based on the primary structure information. The results of experimental studies are in close agreement with the results of our theoretical calculations which indicate the important thermodynamic parameters can be predicted using ΔSASA upon unfolding and taking into account the presence of crosslinks in the protein. In a different approach to estimate SASA of unfolded state of proteins, we have used MD to simulate the unfolding behavior of proteins in denaturing condition, as stated in Materials and Methods section. Four of the proteins in our dataset (listed in Table 3) were subjected to MD simulations for 10 ns at 500 °K while inserted in a solvation box filled by a mixture of water and urea molecules. As can be seen from the table, the maximum SASA values for the unfolded conformations of proteins obtained by MD are smaller than that achieved by non-simulation method. Consequently, the ΔSASA values are also relatively smaller.
Table 3

Comparison of SASA and ΔSASA values obtained by different methods used to unfold the proteins

PDB code SASA of native structure SASA of the unfolded model ΔSASA of the unfolding

Unfolding methodUnfolding method

MD simulationInstantaneousMD simulationInstantaneous
1AKI a 6755104121257536575820
1AAL b 39935579653915862546
2TRX c 584791351325432887407
1PGB d 37525772814320204391

a, b, c and d are 4, 2, 1, and zero, respectively and denote the number of crosslinks

Comparison of SASA and ΔSASA values obtained by different methods used to unfold the proteins a, b, c and d are 4, 2, 1, and zero, respectively and denote the number of crosslinks Figure 3 shows the snapshots of conformational changes during unfolding simulation of IgG binding domain of protein G (IBPG) which has 56 residues with no crosslink. As time evolves, both tertiary and secondary structures of IBPG are lost and at the same time its SASA increases. The maximum SASA achieved during 10 ns is 5772 Å which is less than that estimated for fully extended conformation (8143 Å ).
Figure 3

Molecular dynamics simulation of IgG binding domain of protein G (PDB code 1PGB) solvated in 4.4 M urea in water at 500 °K for 10 ns using GROMOS-96 force field parameters. The non-protein molecules (i.e. water and urea) are not shown for the sake of clarity

Molecular dynamics simulation of IgG binding domain of protein G (PDB code 1PGB) solvated in 4.4 M urea in water at 500 °K for 10 ns using GROMOS-96 force field parameters. The non-protein molecules (i.e. water and urea) are not shown for the sake of clarity The presence of crosslinks in the unfolded state will result in a more compact unfolded form and the higher the number of crosslinks, the more pronounced is this effect. For example, as shown in Figure 4, the unfolded conformation of lysozyme (hen egg white), a 129-residue protein with four disulfide bonds, retained more globular shape at the end of MD simulation, although it loses the elements of secondary structures.
Figure 4

Molecular dynamics simulation of lysozyme (hen egg white) (PDB code 1AKI) solvated in 4.4 M urea in water at 500 °K for 10 ns using GROMOS-96 force field parameters

Molecular dynamics simulation of lysozyme (hen egg white) (PDB code 1AKI) solvated in 4.4 M urea in water at 500 °K for 10 ns using GROMOS-96 force field parameters As shown in Table 3, the SASAs of the investigated proteins increased at the end of MD simulation. However, the extent of this increase is bigger for the protein with no crosslink. For example 1PGB which is a 56-residue protein without any crosslink showed 54% increase in SASA upon unfolding using MD method. However, applying the same unfolding condition on 1AAL, a protein with almost equal size (i.e. 58 residues) and two disulfide bond has led to only 40% increase in SASA. In all studied cases, the maximum SASA for unfolded conformations achieved by MD are smaller than that of instantaneous method. Analyses of MD trajectories showed that the RMSD differences for C atoms increases as time evolves approaching high values in the range of ∼14–19 Å for the studied proteins during the simulation. The rate of RMSD increase was dramatically fast for 1PGB and 2TRX, with no and one crosslink, respectively. However, the rate was gradual in the case of 1AAl and 1AKI with two and four crosslinks, respectively. Although MD simulation under the condition used in this study can unfold the proteins and also demonstrates the effect of crosslink, however, using this method the SASA of unfolded conformations never reached to the SASA values of the unfolded conformations obtained by instantaneously decomposing the protein native structure just by taking into consideration to preserve the standard bond lengths, bond angles and other standard chemical structure geometries. This could be due to insufficient simulation time or entrapment of protein in an ensemble of conformations in a local minimum of energy landscape. However, to find out more about these issues and draw more sensible conclusion, further computational experiments such as hydrodynamic simulation are required. Crosslinks such as disulfide bonds and heme groups have profound effect on the conformational flexibility and SASA values of unfolded state and hence influence the stability of the proteins by decreasing the entropy gain as well as reduction of ΔSASA upon unfolding. Studies of proteins with chemical crosslinks have shown clearly that the major effect of the crosslink on the stability results from a decrease in the conformational entropy of the unfolded molecule (23, 24). On the other hand, attempts to increase the stability of proteins through introducing disulfide bonds suggest that the structural restraints in the native state due to the crosslinks may also make an important contribution to the net effect of the crosslinks on the stability (25). Furthermore, inspection of the model structures of Micro-myoglobin (Mb) revealed a role for heme in stabilizing the folded state (26). Doig and Williams (27) investigated the effect of disulfide crosslinks on hydrophobicity derived stability of proteins. Based on data obtained from solvent transfer experiment, they calculated the non-polar ΔSASA to be 590 and 690 Å per disulfide bond according to free energy of hydration and ΔC measurements, respectively. Taking into account that the fraction of total area buried which is non-polar is about 0.70, these values correspond to a reduction in the total area change of 850 Å and 990 Å per disulfide. Using solvent perturbation difference spectroscopy, Pace et al demonstrated that the solvent accessibility of the aromatic residues (Tyr and Trp) in three studied proteins (lysozyme, RNase A and RNase T1) was changed upon unfolding (22). Myers et al, used these experimental data to estimate an approximate average value of 900 Å reduction in ΔSASA unfolded per disulfide, assuming a universal change in accessibility across all residue types (10). The results of these experimental methods have been averaged and used by Myers et al to compensate for the effects of crosslinks on ΔSASA. However, it may suffer from an over simplification by using only the changes in accessibility of just two amino acids and extrapolating these changes to the total area. It should also be mentioned that these results have been concluded from very limited number of experiments performed on only three globular proteins (22). One of the shortcomings of using either CLF, introduced in this work, or experimentally derived value of 900, introduced by Myers et al to compensate for the effects of crosslinks on ΔSASA and hence estimation of m and ΔCp values is the scarcity of the data used. The correction value close to 900 Å (proposed by Myers and here as CLF) can be justified by fitting equations 12 to 14 presented in Myers et al where the disulfide bond corrections that maximize the fits are all close to 900 Å . However, there is no need to use a correction factors such as 900 Å proposed by Myers or CLF to account for the effects of crosslinks on m or ΔCp. Although we believe the correction factor most likely is close to 900 Å , but it is not a magic number and any other value close to that can be used to do the correction and then draw empirical equations to relate m or ΔCp to the corrected ΔSASA (or to the combination of number of amino acids and CLF as we used in here). The coefficients in the final mathematical equations will be adjusted to balance out any changes in the value of correction factor. In a situation where the ultimate aim is to be able to predict the thermodynamic parameters as precise as possible, one may decide to use different structural descriptors to derive empirical equations for the prediction purposes. To this end we have furthered our investigation by trying to develop different empirical equations to predict m and ΔCp values. We have examined the effects of different structural properties such as number of amino acids, number of crosslinks, size of loops representing the total number of amino acids involved in the loops formed by crosslinks, and the central position of the regions in the loop area on the prediction of the thermo-dynamic properties. The best statistical Multiple Linear Regression (MLR) models were achieved using variables representing total number of amino acids and the number of crosslinks. In order to test the predictive power of these models, the Leave One Out (LOO) cross validation method was used. The mean absolute percentage error of predictions (MAPE) of m (GdnHcl), m (Urea) and ΔCp values for all proteins listed in Table 1 (or proteins with crosslinks listed in Table 4) based on Myers’ models are 19.7 (21.5), 22.9 (31.8) and 13.8 (17.9). The corresponding MAPEs using models presented in equations 13, 14 and 15 are 22.3 (20.5), 23.6 (31.1) and 13.5 (16.8), respectively.
Table 4

Thermodynamic parameters of proteins predicted based on different methods

PDB codeMyersa Predicted values using equations 13 to 15b Experimentalc

mGdnHCLmUrea ΔCp mGdnHCLpre mUreapre ΔCp pre mGdnHCLmUrea ΔCp
1CHO 1069.3−94.1−71.91117.5−29.4134.2580.0250.0590.0
7PTI 1502.6261.3366.81516.4245.5444.71200.0NANA
1AAL 1478.4261.0366.41491.8245.5444.71500.0NANA
5CYT 2982.81283.71559.32834.11142.91426.22800.0NANA
2PCB 3017.11315.31584.62849.91158.01430.23010.01200.01730.0
9RNT 2490.1939.11168.12528.51007.81296.32560.01210.01270.0
1YCC 2813.11174.81448.42930.01218.61525.83400.01430.01370.0
2TRX 2906.81242.61507.02933.11224.21511.73310.01300.01660.0
9RSA 2476.8951.51177.82212.81077.71392.43100.01100.01230.0
1AKI 2790.81156.51411.02442.81165.91456.82330.01290.01540.0
1LZ1 2874.41227.01488.92315.61182.81453.93460.0NA1580.0
1YMB 4147.52071.22506.93992.21998.92390.23710.02140.01870.0
5MBN 4258.22140.52523.84028.92019.72351.22600.01460.02770.0
4CHA 5039.82685.93171.54770.53152.43468.44100.02070.03020.0
2CGA 5230.52830.73317.14838.63307.13450.44440.02030.0NA
2PSG 8892.84129.06354.98203.93964.06045.6NA7800.06090.0
Correlation Coefficient d 0.72930.70820.97510.69960.75190.9596
MAPE 21.531.817.920.531.116.8
SDEP 646.51128.3300.9612.11226.0294.8

Prediction of heat capacity changes and m values for GdnHCL and Urea upon unfolding based on Myers’ equations (10).

Same predictions using equations 13-15.

Experimental data which are compiled from the literature and taken from reference (10).

Correlation coefficient between predicted and experimental values

Thermodynamic parameters of proteins predicted based on different methods Prediction of heat capacity changes and m values for GdnHCL and Urea upon unfolding based on Myers’ equations (10). Same predictions using equations 13-15. Experimental data which are compiled from the literature and taken from reference (10). Correlation coefficient between predicted and experimental values The results show that both methods are not statistically different in predicting the evaluated thermodynamic parameters, either for all data points (proteins in Table 1) or for the proteins with crosslinks (i.e. values indicated inside the brackets), and simple MLR equations based on limited number of structural descriptors, i.e. number of residues and number of crosslinks, are able to perform equally well. In fact equations 13 to 15 are identical to equations 10 to 12 and the only difference is the way to represent the effect of crosslink on the parameter of interest. For example in equation 13 the coefficient of variable n (i.e. number of crosslinks) equals to the coefficient of the second term on the right hand side of the equation 10 multiplied by the value of CLF. These equations are also too close to the equations proposed by Myers et al (eqs. 12 to 14 in reference 10). For instance, the coefficient of variable n in equation 14 above (i.e. 155.58) is very close to the 139.3 calculated by multiplying 0.14 and 995 in equation 13 in Myers’ study (10).

Conclusion

In summary, it can be concluded that the proposed relationships represent valuable tools for predicting thermodynamic parameters of protein folding using the primary sequence information. The proposed crosslinking factor (CLF; which shows the effect of a single crosslink on ΔSASA upon unfolding) of 918.5 Å obtained based on computational simulation is very close to the previously published experimentally derived value of 900 Å . Such a correction factor can be used to estimate the ΔSASA upon unfolding which in turn can be used for the prediction of thermodynamic parameters such as m and ΔCp. For the prediction of these parameters, one may also use number of amino acids (k) and number of crosslinks (n) without need to any kind of correction factor. Although the correction factor for the effect of crosslink on ΔSASA is a quantitative value describing a fundamental property in protein folding, however, for the prediction purposes, the use of more simple properties taken from the primary structure of proteins gives as well accurate results. In addition, the current work demonstrates an example where theory is capable of reproducing the results obtained from experimental works.
  21 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Contribution to the thermodynamics of protein folding from the reduction in water-accessible nonpolar surface area.

Authors:  J R Livingstone; R S Spolar; M T Record
Journal:  Biochemistry       Date:  1991-04-30       Impact factor: 3.162

3.  Sequence-specific solvent accessibilities of protein residues in unfolded protein ensembles.

Authors:  Pau Bernadó; Martin Blackledge; Javier Sancho
Journal:  Biophys J       Date:  2006-09-29       Impact factor: 4.033

Review 4.  The two faces of protein misfolding: gain- and loss-of-function in neurodegenerative diseases.

Authors:  Konstanze F Winklhofer; Jörg Tatzelt; Christian Haass
Journal:  EMBO J       Date:  2008-01-23       Impact factor: 11.598

Review 5.  Urea and guanidine hydrochloride denaturation curves.

Authors:  B A Shirley
Journal:  Methods Mol Biol       Date:  1995

6.  Denaturant m values and heat capacity changes: relation to changes in accessible surface areas of protein unfolding.

Authors:  J K Myers; C N Pace; J M Scholtz
Journal:  Protein Sci       Date:  1995-10       Impact factor: 6.725

7.  Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Authors:  W Kabsch; C Sander
Journal:  Biopolymers       Date:  1983-12       Impact factor: 2.505

8.  Urea denaturation of barnase: pH dependence and characterization of the unfolded state.

Authors:  C N Pace; D V Laurents; R E Erickson
Journal:  Biochemistry       Date:  1992-03-17       Impact factor: 3.162

Review 9.  Rhodopsin: structure, signal transduction and oligomerisation.

Authors:  Michael B Morris; Siavoush Dastmalchi; W Bret Church
Journal:  Int J Biochem Cell Biol       Date:  2008-08-03       Impact factor: 5.085

10.  Conformational stability and activity of ribonuclease T1 with zero, one, and two intact disulfide bonds.

Authors:  C N Pace; G R Grimsley; J A Thomson; B J Barnett
Journal:  J Biol Chem       Date:  1988-08-25       Impact factor: 5.157

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.