Literature DB >> 24348206

High-Resolution Free-Energy Landscape Analysis of α-Helical Protein Folding: HP35 and Its Double Mutant.

Abstract

The free-energy landscape can provide a quantitative description of folding dynamics, if determined as a function of an optimally chosen reaction coordinate. Here, we construct the optimal coordinate and the associated free-energy profile for all-helical proteins HP35 and its norleucine (Nle/Nle) double mutant, based on realistic equilibrium folding simulations [Piana et al. Proc. Natl. Acad. Sci. U.S.A.2012, 109, 17845]. From the obtained profiles, we directly determine such basic properties of folding dynamics as the configurations of the minima and transition states (TS), the formation of secondary structure and hydrophobic core during the folding process, the value of the pre-exponential factor and its relation to the transition path times, the relation between the autocorrelation times in TS and minima. We also present an investigation of the accuracy of the pre-exponential factor estimation based on the transition-path times. Four different estimations of the pre-exponential factor for both proteins give k0-1 values of approximately a few tens of nanoseconds. Our analysis gives detailed information about folding of the proteins and can serve as a rigorous common language for extensive comparison between experiment and simulation.

Entities: Chemical Disease Species

Year: 2013 PMID： 24348206 PMCID： PMC3860327 DOI： 10.1021/ct400651z

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

One of the reasons that the protein folding problem still interests researchers is that it is difficult to get direct and unambiguous answers about the basic questions of how proteins fold: What are the residual structure in denatured state, the nature of the folding steps, the free-energy landscape and kinetic barriers, transition path time, and pre-exponential factor? Widely differing opinions exist even for the fundamental issues and interpretation of many folding experiments.[1] Because of the limited spatial and temporal resolution of state-of-the-art experimental techniques, it is hard to obtain a direct detailed experimental characterization of the folding process. Many ingenious experimental approaches have been developed to overcome the shortcomings. Consider, for example, the problem of determining the folding free-energy barrier and the pre-exponential factor. While the folding rate can be measured directly, these quantities cannot. One approach considers “barrier-less” proteins, where the folding barrier is absent, and the pre-exponential factor closely approximates the folding time.[2,3] Another approach uses the relationship between the transition path times ⟨τTP⟩ and the pre-exponential factor derived for proteins with a simple landscape described by a single parabolic barrier between the native and denatured states.[4] The highly nontrivial task of measuring directly the transition path times has been solved recently by direct counting of photons in single-molecule experiment.[4] In another approach, a related quantity “molecular time” has been measured as a deviation from single exponential relaxation dynamics in a bulk temperature jump experiment.[5] However, it is not clear how to directly verify experimentally that the protein landscape agrees with the assumed model form. In another attempt, the free-energy landscape of the PrP protein was reconstructed using force spectroscopy.[6] However, direct interpretation of the results is complicated, because of the smoothing effect of the DNA handles and beads on dynamics, the perturbation of the landscape by the applied force, and the fact that the experimentally accessible reaction coordinate is not necessarily a “good” reaction coordinate. As a result, the obtained estimates for the pre-exponential factor and the transition path times have very large error margins. By measuring the contact formation times in unfolded polypeptides, a lower bound of k0–1 ≈ 10 ns has been suggested.[7] In principle, the detailed picture of how proteins fold and, in particular, the estimation of the pre-exponential factor and transition-path times or the shape of the free-energy landscape, can be obtained by simulation. An additional advantage of such an approach is that it becomes possible to test the assumptions underlying the models used for the analysis of experimental data. However, until recently, such simulations faced three main challenges: the simulation time gap, accuracy of force fields and rigorous quantitative analysis of the obtained data.[8−10] Recent advances in the computer hardware, simulation methodology, and force-field accuracy have made realistic simulation of the folding of small fast-folding proteins computationally affordable.[11−13] In particular, Lindorff-Larsen et al. reported the results of “brute-force” atomic-level MD simulations, of 12 fast-folding proteins.[14] With the steady progress in the simulation of protein folding, the rigorous quantitative analysis of the obtained data becomes all the more important.[9] The popular approaches are the Markov state models (MSMs),[15−19] conformation network analysis,[20−22] and the free-energy landscape framework.[23−28] The latter allows one to directly determine the major properties of the folding dynamics, namely, the folding free-energy barriers and the pre-exponential factor, the structure of the transition states (TS) and intermediates, the diffusion coefficient, and the transition path times. Determination of many of these properties with the alternative techniques is not straightforward. The most challenging part in the approach is the construction of the optimal reaction coordinate. A poorly chosen coordinate can hide the complexity of the dynamics,[8,25] decrease the height of the folding barrier, and make the dynamics subdiffusive.[27,29] In result, it may happen that analyses of the same trajectories, using the same framework, but with different methods produce different results.[13,27] The latter illustrates the importance of extensive, repetitive analysis of simulations. Here, we apply a recently developed approach for the construction of the optimal reaction coordinate[27,30,31] to analyze the folding dynamics of the all-helical C-terminal fragment of villin headpiece (HP35) and its double norleucine mutant (Nle/Nle).[32] The two proteins have been extensively studied by experiments[33−38] and theory/simulations.[8,14,32,39−46] In particular, an intermediate in wild-type HP35 was detected experimentally by solid-state NMR,[37] a triplet–triplet energy transfer (TTET) experiment,[36] and a simulation of an Ising type model of the protein.[35] Most of the experiments and simulations conclude that secondary structure and topology develop earlier than the full set of native contacts.[37,45] There is some disagreement on secondary structure formation; in particular, some results indicate that helices 1 and 2 are folded and helix 3 is unfolded in the intermediate state,[35,36] while MD simulations suggest an intermediate with helices 2 and 3 forming native interactions and helix 1 undocked.[40,41] Introduction of the stabilizing Nle-residues in helix 3 (the Nle/Nle mutant) increases the stability[33,34] and the folding rate, compared to those of the wild-type protein.[32,34] A rough estimate of k0–1 for HP35 double mutant of 420 ns was reported by Kubelka et al.[34] In this paper, we determine the optimal reaction coordinate and the associated free-energy profiles (FEP) for both proteins, which give a rigorous quantitative description of their folding dynamics. In particular, the secondary structure and hydrophobic core formations during the folding process are investigated and compared with experiment. The (folding) pre-exponential factor k0 is estimated in four different ways. The estimates have the same order of magnitude (k0–1 ≈ 20 ns). In addition, we check the assumption used in the experimental estimate of k0–1, namely, that the correlation times at the TS and in the native state are the same.[34] The Appendix investigates the accuracy of the estimation of the pre-exponential factor from the transition-path times[4,32] for model systems. We believe that the detailed, rigorous analysis that has been presented allowed us to clarify the matter with the pre-exponential factor estimation.

Methods

The determination of the optimal reaction coordinate, which accurately represents the multidimensional folding dynamics, is the most challenging part of the approach. Once the coordinate has been determined, all the properties, such as the free-energy profile, the diffusion coefficient, and the structures are computed in straightforward manner with no further assumptions. The putative reaction coordinate is taken as the (smoothed) number of contacts,[27]where r is the distance between atoms i and j, defined ash(x) is the (smoothed) Heaviside function (h(x) = min(1, max(x,0)), Δ is the threshold for the contact considered to be formed, and α is either +1 or −1. Two thousand (2000) contacts between H and O atoms are considered. Note that multiple contacts with the same atoms i and j but different Δ and α are possible, which makes the putative coordinate more flexible. Given the multidimensional simulation trajectory X(t) and putative reaction coordinate y (y = R(X)), one can compute the putative coordinate time series y(t) = R(X(t)), partition functions of conventional free-energy profile ZH(y) and cut free-energy profiles ZC(y) and ZC,1(y).[30,31] The coordinate is optimized by numerically optimizing the parameters Δ, α, and (i, j), so that the functional ∫ABdy/ZC,1(y) computed for the trajectory is minimal, where A and B are positions of the minima in the native and denatured states. The optimization is performed with a penalty term to avoid overfitting. The detailed description of the stochastic optimization procedure, the penalty term, and the analysis of the robustness of the approach are reported in refs (27 and 30). Here, the robustness of the results was tested by repeating the optimization procedure, starting from different reaction coordinates (e.g., the root-mean-square deviation (rmsd) from the native state, the first principal component), using different seed numbers, all of which leading to the same results. For an infinitely flexible reaction coordinate, the functional attains a minimum when the coordinate is equal to the (possibly rescaled) pfold coordinate R(X) = pfold(X).[30] The dynamics projected on such a coordinate is diffusive and, together with the corresponding FEP and the diffusion coefficient, provides a complete and accurate description of the folding process.[27,31] In particular, the equilibrium folding flux can be computed exactly as diffusion on the free energy landscape.[31] For a coordinate with finite approximation power, as considered here, the putative optimal coordinate approximates pfold well only near the TS regions. However, these are the most important regions for folding kinetics, and such a coordinate is sufficient, for example, to directly determine the folding barrier and the pre-exponential factor. Two thousand parameters were chosen, based on previous experience with analysis of a similar size protein.[27,30] While this number may seem very large, the following consideration shows that it is, in fact, quite modest. The reaction coordinate projects a trajectory with the length of a few million frames from a configuration space with dimension 3N – 6 (here 1725) onto a single coordinate. The optimal coordinate means that every configuration from the trajectory obtains the correct position after projection. Moreover, instead of using a specifically designed functional form with many parameters to better approximate the reaction coordinate, a linear combination of simple basis functions is used. Usage of the cut-profiles, which are invariant with respect to arbitrary rescaling of reaction coordinate, simplifies the problem since now every configuration should be in a correct order with respect to other configurations, rather than to have the correct absolute position. That probably explains why such approach constructs the coordinates that are optimal only around the TS. In order to do this, one needs to solve two much simpler problems. First, remove all the points that belong to the minima from the TS region toward the corresponding minima; their precise position in the minima is not important. Second, correctly project the points that do belong to the TS region. Their number is usually orders of magnitude smaller than the trajectory length. The position-dependent diffusion coefficient (D(y)) is directly determined as[30,31]After optimization, the putative optimal coordinate is rescaled by numerically integrating the expressionso that the diffusion coefficient is constant and equal to unity (D(z) = 1). In this case, the conventional profile and the cut profile differ by an unimportant constant. However, the cut profiles are less prone to statistical noise and are shown in the figures presented later in this work.

Results and Discussion

Simulation Detailes

Two simulation trajectories are analyzed: that of 398 μs for wild-type HP35 at T = 345 K and that of 301 μs for the (Nle/Nle) mutant at T = 380 K reported by Piana et al.[32] The analysis is performed with a time resolution of Δt = 0.2 ns.

HP35 Wild-Type Villin: The Free-Energy Landscape

Figure 1 shows the FEP of wild-type villin HP35 as a function of the determined reaction coordinate. The landscape consists of five states: denatured basin (D), first transition state (TS1), intermediate state (I), second transition state (TS2), and native basin (N). The main folding barrier is the one between the denatured and intermediate states, with the height of ΔF/kBT ≈ 5.5.

Figure 1

Free-energy profile for wild-type villin (HP35) along the putative optimal reaction coordinate. [Legend: D, the denatured basin; I, the intermediate basin; N, the native basin; TS1, the first transition state; and TS2 the second transition state.] The main folding barrier between D and I states is ΔF/kBT ≈ 5.5. The representative structures for the regions of the landscape show a trajectory snapshot closest to the average structure of the region. Colors code the root-mean-square (rms) fluctuations of atomic positions around the average structure from 1.5 Å (blue) to 7 Å (red). At the denatured state, the protein is unstructured and generally lacks a helical secondary structure. The yellow color in the beginning of the third helix suggests that this part is more stable than the rest of the protein, while the red color shows large fluctuations of other parts of the protein. At TS1, helices 1 and 2 start to form (the green color indicates that fluctuations in these regions are decreasing). Full formation and stabilization of the second helix occurs at the intermediate state (green changes to blue). At TS2, the end of the C-terminal helix still fluctuates strongly (red color); however, all three helices are predominantly formed, showing the native-like structure of the protein, which is fully stabilized in the native state (deep blue color). Figure 2 shows representative conformers for each transition state.

Figure 2

Stereo view of representative conformers for (A) TS1 and (B) TS2 transition states. Six conformers (for visual clarity) were randomly selected from each ensemble.

Secondary Structure Formation

Figure 3 shows the helical propensity (the fraction of time a residue is in a helical state) for different regions on the FEP and gives a detailed view on formation of helices during the folding process. In the denatured state (red line), the first and second helices are mainly unstructured (helical propensity of 20%–40%), while the beginning of the third helix (residues 63–66) is predominantly formed (60%). The red line shows that conformations with the joint first and second helices, as well as with the joint second and third helices, are possible.

Figure 3

Helical propensity for different regions on the FEP. [Legend: the D state is shown by the red line, TS1 by the green dotted line, I state by the black line, TS2 by the magenta dotted line, and N state by the blue line.] The three helices in the native state are formed by residues 43–52, 54–59, and 62–73. At TS1 (green dotted line) two separated helices form: part of helix 1 (residues 45–51) shows a helical propensity of 50%–80% and helix 2 shows 60%. Changes in the third helix are insignificant. The intermediate state (denoted by the black line) is characterized by stabilization of helix 2 with the helical structure observed in more than 90% of the snapshots. Surprisingly, the helical propensity of the end of helix 3 (residues 67–73) is lower, compared to that of the D state, and TS1 and shows nonmonotonic behavior. A similar analysis of the order of helix formation in wild-type villin shows that helix 2 forms first in 80% of the folding events.[32] A triplet–triplet energy transfer (TTET) experiment[36] and a simulation of an Ising-type model of the protein[35] also indicate the presence of the intermediate state with helices 1 and 2 folded and helix 3 unfolded. These findings are in agreement with the presented results but in contrast to results from MD simulations that suggested an intermediate with helices 2 and 3 forming native interactions and helix 1 undocked.[40,41] At TS2 (denoted by the magenta dotted line), the turn between helix 2 and helix 3 forms and the third helix shows increased helical propensity at the end (residues 67–73). The latter is fully formed in the native state (blue line).

Hydrophobic Core Formation

A solid-state NMR experiment detected an intermediate state during HP35 folding with nearly native secondary structure but disordered tertiary structure.[37] In particular, starting with a thermally unfolded ensemble, a hydrophobic core formation of the HP35 folding process was investigated in unfolded, intermediate, and folded states. This experiment was carried out in a glycerol/water solution (the simulations were done in explicit water). Figure 4A explores the formation of the hydrophobic core (residues Phe47, Val50, Phe51, Phe58, and Leu69) during the folding process. The snapshots show that the formation of native topology and secondary structure begins early during the folding process, while the stabilization of the hydrophobic core residues happens later. At the denatured state, unfolded protein has some helical content and a fully disordered tertiary structure. The intermediate state is characterized by the first and second helices formed but an incomplete hydrophobic core. The red and yellow colors of side-chains Val50 and Leu69 indicate large fluctuations of these residues. In the native state, the tightly packed hydrophobic core is fully formed. This finding reproduces the experimental results[37] and is in agreement with MD simulations, concluding that secondary structure and topology develop earlier than the full set of native contacts.[32] Interestingly, the intermediate state contains conformations with a nearly native secondary structure and native-like topology (Figure 4B) but with an incompletely folded hydrophobic core.

Figure 4

(A) Hydrophobic core formation during HP35 folding (the D state has a fully disordered tertiary structure; in the I state, the first and second helices formed but it still has an incomplete hydrophobic core; the N state has a tightly packed hydrophobic core). (B) A native-like structure with an incompletely folded hydrophobic core from the intermediate state (single snapshot). (C) Contact formation between side-chains Trp64 and Phe76 (the I state has contact between Trp64 and Phe76; the TS2 state shows the absence of Trp-Phe contact). The average configurations are taken from Figure 1.

Trp64 and Phe76 Contact Formation

The presence of an intermediate at the native side of the major folding/unfolding barrier in HP35 was suggested by an experiment using TTET to monitor conformational fluctuations.[36] In the intermediate state, the partially unfolded third helix is flexible enough to allow contact between side-chains Trp64 and Phe76, which is very unlikely in the native state. The experiment detected the presence of conformations without contact (I) and with contact (I*) in the intermediate state. Our analysis confirms that the intermediate state contains both types of conformations. The distance between the residues fluctuates between 3.5 Å and 30 Å. The population of the conformations where Trp64 and Phe76 are in contact (Figure 4C) is approximately the same as the population of the conformations where these side-chains are apart (ratio 1:1.5). In contrast, TS2 is characterized by the absence of contact between Trp64 and Phe76 in most of the structures, with an average distance of 17–20 Å between these residues (Figure 4C). Structures with the interacting Trp and Phe are also present at TS2 but in a much smaller proportion (ratio is ∼1:15). The schematic free-energy profile for the folding of wild-type HP35 at T = 300 K, with identified native (N) and near-native (N′) states, suggested in ref (36), differs from ours (Figure 1). One can attempt to extrapolate the profile to higher temperature. At the melting temperature, one would expect that the denatured and N′ states are equally populated while the N and I states are much less stable. It is also likely that the barrier between the denatured and intermediate states can become the rate limiting one. In this case, the profile will be similar to the one shown on Figure 1 if one assumes that N′ corresponds to N.

Nle/Nle Double Mutant

The HP35 protein contains two buried lysine residues at positions 65 and 70. The high-resolution X-ray structure previously showed that removing the charge of Lys65 by substituting norleucine increases burial of the aliphatic side-chain of residue 65.[33] The stability of the mutant increases by 0.5 kcal/mol and additional mutation in residue 70 stabilizes the protein by another 0.5 kcal/mol.[34] Introduction of the stabilizing Nle-residues in helix 3 shifts the folding pathway relative to that in the wild-type protein.[43] In particular, it was found that helix 3 generally forms early during the folding path and helix 1 forms last.[45]

The Free-Energy Landscape

Figure 5 shows the FEP of the Nle/Nle mutant as a function of the determined reaction coordinate. The FEP has one transition state (TS) between the denatured (D) and native (N) basins. The folding barrier of ΔF/kBT ≈ 4.6 is lower than that of the wild type, reflecting the fact that the double mutant folds faster.[32,34]

Figure 5

(A) Free-energy profile for the Nle/Nle mutant along the optimal reaction coordinate. D and N denote the denatured and native basins, respectively, while TS denotes the transition state.

(A) Free-energy profile for the Nle/Nle mutant along the optimal reaction coordinate. D and N denote the denatured and native basins, respectively, while TS denotes the transition state. Helix 3 is almost completely formed in the denatured state. The green color in the middle part of the helix indicates that this part is quite stable. The rest of the protein is unstructured (red and yellow colors suggest large fluctuations in the first and second helices). In the transition state, helix 2 forms (green color) and the protein takes near-native conformation (predominantly green and blue colors). In the native state, the protein is fully folded and stabilized (deep blue color). Figure 6 shows representative conformers of the TS ensemble.

Figure 6

Stereo view of randomly chosen conformers from the TS ensemble of the Nle/Nle mutant.

Stereo view of randomly chosen conformers from the TS ensemble of the Nle/Nle mutant. An MSM analysis of a simulation of the Nle/Nle mutant with a ff9sb-ildn force field (the currently analyzed simulation used ff99sb*-ildn) and lower temperature (T = 360 K)[46] identified native N and near-native N′ states separated by a barrier. The latter, while being native-like, is characterized by partial unraveling of helix 3. The structural interpretation of both states was suggested to be close to those found in the TTET experiment of the wild-type HP35 at temperatures lower than 300 K.[36] However, our analysis suggests that the mutant has a simple landscape with just native and denatured basins and one transition state (Figure 5). Another coordinate optimized for the analysis of just the native basin found a small barrier of ΔF/kBT ≈ 1.5. The conformations in two sub-basins differ mainly only by the orientation of side-chain Leu42 in the helix 1. The difference between our results and that of the MSM analysis can be due to different temperatures and force fields or due to the fact that while the major folding barrier can be easily identified, assignment and comparison of multiple small barriers is not so straightforward. Figure 7 shows how the helical propensity changes during folding. In the absence of an intermediate state, the plot shows monotonic behavior. In the denatured state (denoted by the red line), helix 3 is predominantly formed (helical propensity of 80%) with residues 67–73 having higher helical propensity (40%–60%), compared to the wild type. However, the structures with the joint second and third helices are still present in this state (helical propensity 10% in residues 61–62). Helices 1 and 2 are mainly unstructured. In the transition state (denoted by the green dotted line), the turn between the first two helices is well-defined and the propensity of both formed helices increases to 70%. The helical structure of helix 3 is present in 90% of the snapshots. Finally, all three helices are fully formed in the native state (denoted by the blue line).

Figure 7

Helical propensity of the residues for different regions on the FEP. The D state is shown by the red line, the TS by the green dotted line, and the N state by the blue line.

Hydrophobic Core Formation

Figure 8 explores the hydrophobic core formation. The denatured state has helix 3 formed while the hydrophobic core residues are disordered. The red color shows strong fluctuations of the residues. In the transition state, the mutant exhibits a structure with native-like topology, the second and third helices are formed, and the hydrophobic core has almost packed. However, the red and yellow colors still indicate the presence of large fluctuations in Phe47 and Val50 side-chains. In the native state, tight packing of the phenylalanine residues completes the formation of the hydrophobic core.

Figure 8

Hydrophobic core formation during the Nle/Nle double mutant folding.

Estimation of the Pre-exponential Factor k0

We first report estimates of the pre-exponential factor (k0) for the Nle/Nle mutant. It has a single TS, and the analysis is straightforward. The pre-exponential factor is estimated using four different approaches, with all results being in good agreement.

Estimate 1

The mean folding time or the mean first passage time (mfpt) from the denatured to the native state of the mutant estimated from the FEP by using Kramer’s equation is 1.6 μs. This value is lower than the folding time of τf = 3.0 μs, estimated directly from the trajectory. Such reasonable, although not ideal, agreement indicates that the FEP describes the kinetics reasonably well up to a factor of 2. Using the height of the free-energy barrier between the D and N states (ΔF/kBT = 4.6; recall Figure 5), the pre-exponential factor can be estimated fromas k0–1 ≈ 30 ns.

Estimate 2

Kramer’s equation for mfpt with harmonic approximation is[2,27]where ωTS and ωD are the curvatures of the TS and the D state, respectively, DTS is the diffusion coefficient in the transition state, and β = 1/kBT. The equation can be rearranged as[2] Assuming ωTS = ωD, one obtains k0–1 = 2πτcorr,TS, where τcorr,TS = kBT/(DTSωTS2) is the autocorrelation decay time at the transition state.[2,47,48] Note that the reaction coordinate is rescaled so that the diffusion coefficient D(x) = 1 (see Methods). The top of the transition state (Figure 5) is approximated by (ωTS2/2)/kBT ≈ 0.034, which leads to k0–1 ≈ 18 ns.

Estimate 3

Equation 3 can be rearranged in another way:[27]where ZD is the total partition function of the denatured state, ZC,TS is the cut profile at the top of the transition state (ZC,TS = exp(FC,TS/kBT)), and Δt is the sampling interval. While this estimate also assumes that the profile at the TS is parabolic, it does not need the value of the curvature at the TS (as does the previous estimate): only the value of the cut profile ZC,TS is needed. For the transition state (ZC,TS = exp(6.6), ZD = 1.1 × 106) with Δt = 0.2 ns, one obtains τcorr,TS ≈ 10.0 ns and k0–1 ≈ 63 ns. In estimates 1–3, we used τf = 3.0 μs, obtained directly from the trajectory. Note, however, that if one uses the value estimated from the profile for the mfpt (i.e., 1.6 μs), then estimates 1, 2, and 3 give values of 16, 18, and 18 ns, respectively. Such a (superficially) good agreement is not surprising, since it is for a diffusive dynamics on the obtained FES, i.e., it just shows that the equations that have been derived are correct.

Estimate 4

A transition path is the part of the trajectory that crosses the reaction coordinate x at x1 and reaches x2 on the other side of the barrier without recrossing x1.[49] The duration of this part is the transition-path time. The mean transition path times ⟨tTP⟩, computed directly from the trajectory, can be used to estimate the pre-exponential factor using the relation[50]where kf = 1/τf is the folding rate and γ ≈ 0.577 is Euler’s constant. The relation was derived assuming diffusive dynamics over a parabolic transition state with the height of the barrier being ΔF > 2kBT. We consider two cases: first, where boundaries x1 and x2 are placed on the FEP around the TS barrier, such that ΔF (ΔF = FTS – F) is 3kBT and second, where x1 and x2 are taken at the minima of the denatured and native basins, correspondingly. The measured ⟨tTP⟩ values are 14.6 and 62.4 ns, respectively, with the corresponding k0–1 values (eq 6) being 33.3 and 169.5 ns. The first number agrees with the other estimates, while the second is much larger. An analysis of the model systems (see the Appendix) shows that the estimate of k0 with eq 6 is the most accurate when transition path times are calculated between x1 and x2 with ΔF ≈ 3kBT; that, in our case, corresponds to k0–1 ≈ 33.3 ns. Experimental estimates of k0 for the Nle/Nle double mutant were obtained by Kubelka et al.[34] A “very rough estimate” was made by assuming that the empirical protein folding “speed limit” tf = N/100 μs, where N is the number of residues in the polypeptide chain,[2] corresponds to k0–1; for N = 35, one obtains k0–1 ≈ 350 ns. The second estimate is based on the decay time of the autocorrelation function in the folded state. A value of τcorr = 70 ns was obtained from a biexponential fit of the relaxation after a temperature jump.[34] Assuming that the decay times in the native and transition states are the same (i.e., that these states have similar curvature and diffusion coefficients), one finds k0–1 = 2πτcorr ≈ 420 ns. Having the folding trajectory, we can test the assumptions: in particular, how similar are the autocorrelation decay times at different regions on the FEP? Figure 9 shows the logarithm of the position autocorrelation function ln C(τ) in the N, D, and TS states, whereandAs one can see, the autocorrelation function does not have a simple single-exponential decay C(τ) = exp(−τ/τcorr), and, thus, τcorr cannot be unambiguously determined. However, it is clear that the “effective” decay time at the transition state, which actually determines the pre-exponential factor, is significantly smaller than that in the basins, indicating that the assumption above is likely to be poor. Note that the long-time slope of lnC(τ) in the D and N states is close to the experimentally measured τcorr value (τcorr = 70 ns).

Figure 9

Plot of the position autocorrelation function in the native (N) state (dashed line), the denatured (D) state (dash-dotted line), and the transition state (TS) (solid line). Dotted line shows ln C(τ) = −1 – τ/70, which is an autocorrelation function with a time decay τcorr = 70 ns to mimic the experiment.[34]

Estimation of k0 for HP35

The wild-type protein has two free-energy barriers, where one is lower than the other. The pre-exponential factor of only the main folding barrier between D and I states is considered. The mfpt value needed to overcome the first folding barrier, computed directly from the trajectory, is τ = 9.2 μs. The mfpt value estimated from the FEP is τ = 3.75 μs, which indicates that the FEP describes the dynamics relatively well. Using eq 2 with ΔF/kBT = 5.5, one finds k0–1 ≈ 35 ns. The curvature of the first transition state (ωTS2/2)/kBT ≈ 0.056 gives k0–1 ≈ 11 ns. Using eq 5 with ZD = 1.41 × 106 and ZC,TS = exp(5.6), one obtains τcorr,TS ≈ 7.8 ns and k0–1 ≈ 49 ns. For estimate 4, we, analogously, consider several cases where x1 and x2 are placed at 3kBT from the top of the first transition barrier, at the minima of the D and I states and at the minima of the D and N states, correspondingly. The ⟨tTP⟩ values are 12.8, 200, and 753 ns with corresponding k0–1 = 26.3, 555, and 2380 ns. Piana et al. determined the mean transition path times between the folded and unfolded states to be 120 ns < ⟨tTP⟩ < 460 ns, which gives pre-exponential factor estimates of 500 ns < k0–1 < 1500 ns.[32] These upper and lower boundaries are close to our estimates of k0,DI–1 ≈ 555 ns and k0,DN–1 ≈ 2380 ns. However, the analysis of such estimation for model systems presented in the Appendix shows that the transition path times are very sensitive to the exact nature of the landscape and the positions of the boundaries. In particular, it is shown that the most accurate estimate of the pre-exponential factor is obtained when x1 and x2 are taken around the main transition state at positions with energy of ∼3kBT, which is less than that of the barrier. In our case, it corresponds to k0–1 ≈ 26 ns. If the region between the boundaries contains multiple, even relatively small barriers, the transition path times are dominated by the waiting time in the “intermediate” states. They no longer measure just the time required to cross the barrier and, hence, are no longer directly related to k0. Reiner et al.[36] estimated the rates of loop formation between all three helices in the denatured state for wild-type HP35 at T = 25 °C to be ∼107 s–1. Assuming that the rate of loop formation is higher at higher T = 380 K and that the pre-exponential factor is somewhat faster than the rate of the complete loop formation, one can expect an estimate close to the one obtained here. A related estimate for the pre-exponential factor k0–1 ≈ 10 ns obtained from the rates of contact formation in short polypeptides[7] is in agreement with ours.

Conclusion

The free-energy profiles for all-helical proteins, HP35 and its Nle/Nle double mutant, along the putative optimal reaction coordinate have been determined. The coordinates, together with the associated FEPs and diffusion coefficient, provide an accurate description of the folding dynamics of these proteins and allow direct estimation of the transition path times and the pre-exponential factor. The analysis shows that HP35 folds through an intermediate. In particular, the intermediate is characterized by the second and first helices formed but with an incomplete hydrophobic core that quantitatively reproduces the NMR experiment.[37] The second transition state describes the folding of helix 3. It has been also observed that, because of the fluctuations of helix 3, the intermediate state contains structures with and without contact between side-chains Trp64 and Phe76, in agreement with the TTET experiment.[36] The Nle/Nle mutant with two stabilizing residues in helix 3 has a simple landscape with only one transition state. Helix 3 is mostly folded in the denatured state and it appears that single TS (where helices 1 and 2 cooperatively fold) is sufficient to complete the folding process. A lower free-energy barrier also leads to faster folding dynamics in agreement with previous studies. The pre-exponential factor k0 was estimated in four different ways all giving the same order of k0–1 ≈ 20–50 ns. In summary, significant recent advances in computational power, accuracy of the force field and simulation methodology have made possible the realistic simulation of relatively fast-folding proteins. Rigorous free-energy landscape analysis of such simulations gives a detailed quantitative picture of how proteins fold and allows direct determination of many of the basic properties of protein folding dynamics, some of which can be estimated experimentally, only in an indirect way.

Table A1

Dependence of Transition Path Times ⟨tTP⟩ and Corresponding k0–1 on the Positions of the Boundaries x1 and x2 on the FEP

x₁	x₂	⟨t_TP⟩ (ns)	k₀^–1a (ns)	ΔF/k_BTb
36	19.8	26.2	57.8	3
37.8	17.8	32.2	72	4
41.6	14.2	53.6	125.8	5

k0–1 is the pre-exponential factor estimated from eq 6.

ΔF/kBT is the height of the barrier from the position x1 or x2 to the transition state.

Table A2

Dependence of Transition Path Times ⟨tTP⟩ and Corresponding k0–1 on the Position of Boundaries x1 and x2 on the FEP

x₁	x₂	⟨t_TP⟩ (ns)	k₀^–1a (ns)	ΔF/k_BTb
37.6	21.8	24.5	53.5	3
39.7	20.3	31.4	70	4
42.8	18.5	43.4	100	5
37.6	14.2	125.7	322.5	3
39.7	13.0	147.8	394	4
42.8	11.6	160	435	5

k0–1 is the pre-exponential factor estimated from eq 6.

ΔF/kBT is the height of the barrier from the position x1 or x2 to the transition state.

50 in total

1. Folding at the speed limit.

Authors: Wei Yuan Yang; Martin Gruebele
Journal: Nature Date: 2003-05-08 Impact factor: 49.962

2. From transition paths to transition states and rate coefficients.

Authors: Gerhard Hummer
Journal: J Chem Phys Date: 2004-01-08 Impact factor: 3.488

3. The protein folding network.

Authors: Francesco Rao; Amedeo Caflisch
Journal: J Mol Biol Date: 2004-09-03 Impact factor: 5.469

Review 4. The protein folding 'speed limit'.

Authors: Jan Kubelka; James Hofrichter; William A Eaton
Journal: Curr Opin Struct Biol Date: 2004-02 Impact factor: 6.809

High-Resolution Free-Energy Landscape Analysis of α-Helical Protein Folding: HP35 and Its Double Mutant.

Introduction

Methods

Results and Discussion

Simulation Detailes

HP35 Wild-Type Villin: The Free-Energy Landscape

Secondary Structure Formation

Hydrophobic Core Formation

Trp64 and Phe76 Contact Formation

Nle/Nle Double Mutant

The Free-Energy Landscape

Hydrophobic Core Formation

Estimation of the Pre-exponential Factor k0

Estimate 1

Estimate 2

Estimate 3

Estimate 4

Estimation of k0 for HP35

Conclusion

1. Folding at the speed limit.

2. From transition paths to transition states and rate coefficients.

3. The protein folding network.

Review 4. The protein folding 'speed limit'.

5. Hidden complexity of free energy surfaces for peptide (protein) folding.

6. Reaction coordinates and rates from transition paths.

7. High-resolution x-ray crystal structures of the villin headpiece subdomain, an ultrafast folding protein.

8. Sub-microsecond protein folding.

9. Folding free-energy landscape of villin headpiece subdomain from molecular dynamics simulations.

10. Dynamics of unfolded polypeptide chains as model for the earliest steps in protein folding.

1. Size-and-Shape Space Gaussian Mixture Models for Structural Clustering of Molecular Dynamics Trajectories.

2. Accelerated molecular dynamics simulations of protein folding.