Literature DB >> 25136267

Quantifying the Sources of Kinetic Frustration in Folding Simulations of Small Proteins.

Andrej J Savol¹, Chakra S Chennubhotla².

Abstract

Experiments and atomistic simulations of polypeptides have revealed structural intermediates that promote or inhibit conformational transitions to the native state during folding. We invoke a concept of "kinetic frustration" to quantify the prevalence and impact of these behaviors on folding rates within a large set of atomistic simulation data for 10 fast-folding proteins, where each protein's conformational space is represented as a Markov state model of conformational transitions. Our graph theoretic approach addresses what conformational features correlate with folding inhibition and therefore permits comparison among features within a single protein network and also more generally between proteins. Nonnative contacts and nonnative secondary structure formation can thus be quantitatively implicated in inhibiting folding for several of the tested peptides.

Entities: Chemical Disease Species

Year: 2014 PMID： 25136267 PMCID： PMC4132847 DOI： 10.1021/ct500361w

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Theoretical and computational modeling has provided many insights into the remarkable ability of proteins to rapidly fold from unstructured coils into their native, functional conformations.[1,2] Especially for small structured proteins, entire folding processes can be investigated via atomistic, equilibrium molecular dynamics (MD) simulations, where the ability to sample multiple folding events (with μs simulations) with a transferable force field is an important milestone in algorithm development and hardware parallelization.[3−7] When multiple folding events are observed, the underlying kinetics and conformational features that promote structural transitions and the eventual attainment of the native state can be statistically compared. Such studies reveal important characteristics of the underlying free energy landscape (FEL), the high-dimensional surface of hills and valleys that govern the likelihood of structural transitions and the occupancy probabilities of energetically coherent states, called conformational substates.[8,9] For structured proteins, the FEL has been conceptualized as a funnel with a low-energy native ensemble at its global minimum, where near-native intermediates are kinetic neighbors, and a nonnative ensemble composed of freely interconverting conformers at some further reaction distance.[10,11] While the majority of protein functions are accomplished via the native ensemble, quantifying the structural and kinetic characteristics of the nonnative ensemble can aid calibration of coarse-grained polypeptide models[12−14] and improve our understanding of folding initiation pathways,[15,16] protein misfolding,[17] protein aggregation,[18,19] and synergistic folding (i.e., folding in tandem with a binding partner).[20] Although nonnative ensembles recapitulate several properties of idealized random-coil models,[21,22] they have also been shown to deviate from polymeric predictions in important ways. Substantial secondary structure can accrue in the nonnative ensemble,[6,23,24] and these nucleation locations have been implicated as consistent waypoints in folding pathways.[25,26] Lindorff–Larsen et al.[5] likewise showed that for transition pathways specifically, secondary structure accumulates before native contacts are formed, a temporal bias that is inconsistent with an idealized nonnative ensemble. From the kinetic perspective, another surprise is that the nonnative ensemble can be modeled as a hub-like transition map,[27] where interchange between unfolded peptide geometries is mediated preferentially via the native (hub) ensemble instead of by direct routes[28] (see ref (29) for a contrasting interpretation). The minimally frustrated model of protein folding harmonizes some of these observations by recognizing that folding is energetically downhill and will thus avoid the enthalpic frustration of nonnative structure formation.[30−33] Analogously, we can ask whether the nonnative ensemble is minimally frustrated in a kinetic sense. Does folding proceed sequentially[34] from unfolded to folded substates or are there off-pathway kinetic inhibitors populating the FEL? Computational studies have invoked Cartesian, angular, topological, subspace-projection, or other structural descriptors to identify nonnative conformational states,[25,35−37] but assessing their impact on folding rates requires additional analysis. To quantify the kinetic contribution to folding of specific conformational substates, we present here a methodology that (1) permits comparisons of kinetic inhibition across multiple folding events and between multiple proteins and (2) can query any proposed structural parameters that may impact folding kinetics. Instead of only specifying the existence of kinetic traps, hubs, or preferential pathways in MD trajectories, we quantify the overall kinetic burden, or kinetic frustration, that structural deformations (secondary structure, tertiary structure, standard RMSD-to-native, or others) effect in protein simulations. In the rest of this introduction we give an overview of our model’s assumptions, justification for a topological definition of kinetic frustration, and primary results. We invoke a kinetic modeling of MD simulation data where a simulation trajectory and its conformers are represented as (1) sets of clusters, or conformational substates, whose kinetically indistinguishable members share conformational features, and (2) transitions, which capture the observed jumps between substates. Such a network of substates (nodes) and edges, when constructed with an appropriate lag-time between sampled trajectory snapshots and clustering criteria, satisfies the properties of a Markov State Model (MSM).[38,39] These models are guided by the motivation to equate conformational transitions with probability flow, enabling multistep transition pathways to be associated with a probability and expected duration even if the path itself was never observed within contiguous trajectory frames. By representing a protein’s FEL as an evolving finite markov chain, MSMs permit computation of the stationary distribution, the unique set of substate probabilities that is stable over time. It is then possible to calculate the expected time for any substate to transition to the assigned native ensemble, that is, the mean first passage time (MFPT) or transit time.[40] MFPT values express temporal expectations for random walks along the weighted edges of the conformational network.[41,42] They are robust[43,44] and can be compared to diffusional models of folding[45,46] and nanosecond laser T-jump experiments.[47] Whereas MFPTs necessarily are a function of two specified end points, our concern is only with those transition paths that terminate at the native ensemble, a convention implicit throughout this study and indicated by the subscript of MFPT values, τnat. Can these values tell us which substates are responsible for accelerating or hindering folding? Not directly, but that exact information is revealed when substates are theoretically removed from the transition network and the change in τnat values among the rest of the nonnative ensemble compared. Kinetic frustration, quantified in frustration scores, f̅nat, captures these changes and quantifies the degree to which a particular conformational substate state inhibits or facilitates transitions to the native state. The terms inhibit and facilitate summarize a substate’s topological neighborhood with respect to the native ensemble: a substate that facilitates folding is highly connected to native or native-like substates, whereas a folding inhibitor promotes transitions to non-native regions of the FEL. MFPTs and frustration scores, f̅nat, are therefore related but, importantly, distill different information. τnat values reflect expected transit times given a network structure and designated native ensemble, whereas frustration scores quantify the impact on the network given the substate of interest. Nodes that have equal transit times need not share frustration scores, for example (see Figures 1 and 2B). Moreover, because a demarcated native state is inherent to their definition, frustration scores go beyond a quantification of local topology, such as average neighbor connectivity,[27] which treats equally links leading toward or away from the native state (for a given nonnative conformational substate). That is, frustration scores are sensitive to the native state location, whereas local connectivities are not (cf. Supporting Information (SI) Figure S6), an advantage that warrants the more computationally demanding node by node perturbation approach presented here. Frustration scores are thus individually computed for all nodes in the nonnative ensemble by observing changes in τnat values when each is removed from the network model (Figure 1),[48] a process akin to the eigenvalue estimation problem in matrix perturbation theory.[49] Observations obtained by altering the transition network in this way provide a quantitative framework for understanding its unperturbed behavior. Specifically, each frustration score f̅nat can be understood as the mean percentage change in all transit times from all possible paths as a result of node i, a kinetic interpretation lacking from many local topology metrics. Substates with f̅nat > 0 are labeled facilitators since folding rates would decrease (i.e., τnat increase) in their absence; states with f̅nat < 0 are inhibitors, or kinetic traps, in that folding rates would increase (τnat decrease) if they were to be removed from the conformational landscape. We thus invoke the concept of kinetic frustration because MFPT values alone cannot elucidate these causal relationships.

Figure 1

Figure 2

Network representations of substates and transitions. (A) Nodes indicate conformational substates determined with RMSD clustering;[64] node diameters are proportional to substate probability. The native ensemble, green, was determined by modularity optimization.[65] White diamonds indicate the substate containing the conformation closest to the experimental structure. Black diamonds indicate the substate containing the native conformation (see Methods). Frustration scores, f̅nat, are denoted by the color spectrum, centered at f̅nat = 0. Positive scores, blue, indicate substates that facilitate transition to the native ensemble; negative scores, red, indicate kinetic traps. (B) Comparison of frustration scores, f̅nat, and transit times, τnat (μs), for nonnative substates in part A. Color values correspond to frustration scores as in part A.

Computing frustration scores, f̅nat, for a model transition network. (A) Each conformational substate in the nonnative ensemble (1...knn) is colored according to frustration scores, f̅nat; substate diameters indicate the stationary probability. The native ensemble is represented as a single green substate. Transition probabilities are shown along observed transitions where values above the transition path always denote left → right transitions and values beneath the arrow refer to moving right → left. (B) Computed MFPTs, τnat values, for each nonnative substate to reach the native ensemble. (C) Procedure for computing f̅nat. Each nonnative substate is removed from the transition matrix (states i = 1...4, top to bottom) and transit times for all remaining knn – 1 substates are compared with unperturbed values (left panels: black bars, unperturbed; red, gray, or blue bars, perturbed). Relative changes in transit times (wedges, right panels) are averaged over remaining substates to yield f̅nat (dashed lines). These frustration scores are then depicted by the color scale on the original intact network (A). Substates 1 and 2 have identical MFPTs, whereas f̅nat values indicate substate 2 is a facilitator and increases folding rates from all other substates by an average of 25%, while substate 1 is kinetically neutral. Substate 4 is a kinetic trap, slowing all transit times by 30% on average in the unperturbed network. Network representations of substates and transitions. (A) Nodes indicate conformational substates determined with RMSD clustering;[64] node diameters are proportional to substate probability. The native ensemble, green, was determined by modularity optimization.[65] White diamonds indicate the substate containing the conformation closest to the experimental structure. Black diamonds indicate the substate containing the native conformation (see Methods). Frustration scores, f̅nat, are denoted by the color spectrum, centered at f̅nat = 0. Positive scores, blue, indicate substates that facilitate transition to the native ensemble; negative scores, red, indicate kinetic traps. (B) Comparison of frustration scores, f̅nat, and transit times, τnat (μs), for nonnative substates in part A. Color values correspond to frustration scores as in part A. Within our simulation data set of 10 fast-folding proteins, substantial kinetic traps were observed for four proteins (TRP, BBL, PB, and HMDM) whereas kinetic inhibition was chiefly absent in the nonnative ensembles of WW, PG, and A3D. The largest frustration scores (i.e., most extreme facilitators) were observed in the WW and PG simulations, but none of the 10 globular proteins examined (Table 1) presented a single facilitator, an ultimate gatekeeper substate, to the native ensemble. As shown in Figure 2, kinetic traps were unequally distributed throughout the nonnative ensembles. The transition networks, or transition maps, did display unique topological features, and we were able to ask to what relative degree secondary structure, tertiary structure, and nonnativeness (standard RMSD-to-native) are associated with positive or negative kinetic frustration. We chose these structural parameters because of their broad interpretability and popularity for monitoring folding progress[50,51] but emphasize that the approach is compatible with any geometric or energetic feature that can be computed for all trajectory frames.

Table 1

Simulation Dataa

protein	N_res	t_total (μs)	k	PDB	Temp (K)	N_f	N_u	r_nc (Å)
trp-cage (TRP)	20	208	417	2JOF	290	12	12	1.5
BBA	28	325	999–1	1FME	325	14	14	2.6
villin headpiece (VHP)	35	125	251	2F4K	360	34	34	1.3
WW-domain (WW)	35	1137	2274	2F21 (4–39)	360	12	11	1.4
BBL	47	429	860	2WXC	298	12	11	4.8
protein B (PB)	47	104	2310	1PRB (7–53)	340	19	19	3.4
homeodomain (HMDM)	52	327	654	2P6J	360	27	28	3.7
protein G (PG)	56	1155	2310–2	1MI0 (10–65)	350	12	13	1.2
alpha 3D (A3D)	73	707	1414	2A3D	370	12	12	2.9
lambda repressor (LAMDA)	80	643	1293–1	1LMB (6–85)	350	10	12	1.9

A summary of the proteins and simulations studied, adapted from Lindorff-Larsen et al.[5] Data columns indicate sequence length (Nres), total aggregated simulation duration (ttotal), number of conformational substates (and any substates excised during transition matrix MLE) (k), Protein Data Bank accession code (residue indices), simulation temperature, number of folding events (Nf), number of unfolding events (Nf), and the native-ensemble RMSD cutoff (rnc). All figures and tables order proteins according to increasing sequence length.

Folding is a conformationally heterogeneous process,[52] but the recognized prevalence of preferred folding routes[53] and transition pathways[54] highlights the need for tools linking specific nonnative substates to folding kinetics. Quantifying these relationships is a legitimate aim in its own right, but our findings relate to the wider problem of predicting the kinetic impact of direct perturbations to protein systems. Mutations, small molecule ligands, or solvent conditions that modulate the populations of conformational substates can influence folding rates or folding routes,[55−57] and quantifying any such changes therefore has applications to pathway inhibition, aggregation-based diseases, and protein engineering.[58−61] A summary of the proteins and simulations studied, adapted from Lindorff-Larsen et al.[5] Data columns indicate sequence length (Nres), total aggregated simulation duration (ttotal), number of conformational substates (and any substates excised during transition matrix MLE) (k), Protein Data Bank accession code (residue indices), simulation temperature, number of folding events (Nf), number of unfolding events (Nf), and the native-ensemble RMSD cutoff (rnc). All figures and tables order proteins according to increasing sequence length.

Methods

MD Simulations

We applied our analysis to 10 proteins within a large simulation data set generated by D. E. Shaw Research as reported in Lindorff-Larsen et al.[5] and analyzed further elsewhere.[29,33,62,63] Aggregate simulations of the 10 proteins comprise 5.1 ms of total sampling where each protein undergoes at least 10 folding and unfolding events (Table 1); details for the simulations and folding event classification are contained in the original reference. Clustering and all subsequent analysis was performed on the Cα coordinates. Snapshots were recorded every 200 ps. In this study, multiple simulations for the same protein, if present, were concatenated.

Determining Conformational Substates

To identify conformational substates for each protein, we performed hierarchical clustering with MSMBuilder2.[64] Trajectories were first subsampled to obtain snapshots every 50 ns based on implied time scale plots (SI Figure S1), then clustered into substates using root mean squared distance (RMSD) and Ward’s algorithm.[66] The number of substates, k (see Table 1), is a heuristic user parameter that was selected to be approximately equal to (simulation frames)/10.[67] This parameter has been shown to have little dependence on peptide length, Nres.[68] The k values chosen here correspond closely to those in ref (29). The transition probability matrix P was then approximated using the MSMBuilder2 maximum likelihood estimation (MLE) routine, and substates not included in the estimated matrix (i.e., those separate from the primary connected component) were excised from subsequent analysis. Connected singletons (substates with a single member) were retained, however, and constituted 0% of total conformers for BBA, BBL, PB, and TRP and 0.8–10% for A3D, HMDM, LAMDA, PG, VHP, and WW. Distributions of cluster sizes (number of member conformers) and widths (defined as mean pairwise RMSD of any two substate members) are given in the Supporting Information (Figures S4 and S5).

Defining the Native Ensemble

Our network folding model requires a demarcated native state to function as a kinetic end point, that is, a theoretical absorbing state[40] where folding is defined as complete. Selecting the largest conformational substate,[5] the substate closest to the PDB-deposited coordinates, or a hard RMSD threshold is too restrictive, excluding many substates with ‘native-like’ properties and artificially increasing theoretical τnat values.[69] Instead, we chose to designate a native ensemble, or a set of conformational substates that interconvert more frequently with each other than with outside substates. Such a graph property is captured by an algorithm called modularity optimization,[65] and is particularly suited for this classification task in that it reflects and adapts to the actual network topology, unlike an RMSD threshold. Modularity optimization proceeds by initially designating each substate as its own ensemble and then iteratively combining them until only highly intraconnected ensembles remain, at which point modularity is maximized. For a transition network, modularity is defined aswhere c is the number of transitions between substates i and j, k is the total number of transitions to substate i, k is the number of transitions to substate j, 2m is the total transition count in the network, and δ(s,s) = 1 when substates i and j reside in the same ensemble s and 0 otherwise (elsewhere lnn or ln denote total edges among nodes in the nonnative or native ensembles, respectively). Other formulations for optimizing modularity are possible, including normalized cut and conductance criteria.[70] Maximizing W yields multiple ensembles for each of 10 analyzed transition networks, but only one per network will be defined as the native ensemble. Within each candidate ensemble, five random conformers were sampled from every constituent substate, and the aggregate number of conformers within a cutoff r to the PDB-deposited native structure were counted and compared with similar counts from the remaining candidate ensembles. The ensemble with the most conformers under the cutoff was designated to be the native ensemble, and all its substates, not only those under the cutoff, were included (all substates not in the native ensemble were defined to be in the nonnative ensemble). The cutoff itself, rnc, was determined by identifying the RMSD value such that 5% of total trajectory frames were within rnc Å from the PDB-deposited crystal structure (Table 1). For 8 of 10 proteins, 100% of the frames with RMSD < rnc were found in the native ensemble; the same values were 82% for BBA and 93% for BBL. Substates containing the snapshot nearest the experimental native structure were always contained in the assigned native ensemble (Figure 2, white diamonds). The number of substates assigned to the native and nonnative ensembles was kn and knn, respectively (Table 1). Although the nonnative ensembles were partitioned variously during iterations of modularity optimization, the constituent substates of the (eventually defined) native ensemble were in fact identically preserved through all iterations for all proteins. The algorithm converged in seconds for all networks. Modularity optimization operates exclusively on transitions counts; however, we observed that secondary structure also separated cleanly between the native and nonnative ensembles as a result of this classification (SI Figure S2). The identified native ensembles are shown in green in the network representations (Figure 2). For purposes of computing RMSD, a native conformation was defined to be the trajectory snapshot nearest the theoretical mean structure of the entire native ensemble (Figure 2, black diamonds, and Figure 6, tube representations).

Figure 6

Ensemble representation of kinetic traps compared with native structure. (A) Representative structures of kinetic traps, shown in topological context in SI Figure S8. (B) Comparison of substate widths (intrasubstate pairwise RMSD). Red, substates classified as kinetic traps; gray, all nonnative substates; green, all native substates.

Defining Q and Secondary Structure

The proportion of native contacts present in any trajectory conformer, Q, is a useful reaction coordinate for monitoring folding progress[71,72] or modeling energy barriers.[73] It is a degenerate quantity in that many distinct conformers could map to an identical Q value.[74,75] We defined native contacts as those residue pairs whose separation (Cα–Cα) was less than 10 Å for at least 65% of the conformers within the native ensemble (SI Figure S3). Native contacts separated by fewer than 7 amino acids in the primary structure were excluded. We denote the percentage of native contacts as Qn(t) and the percentage of nonnative contacts as Qnn(t) for some time t. We quantified the presence of secondary structure in each trajectory frame using Protein Secondary Element Assignment (P-SEA), which labels every residue as in either an unstructured coil, α helix, β sheet, or ‘other’ configuration[76] (SI Figure S2). An ‘ideal’ sequence of native secondary structure assignment was defined as the residue-wise assignment most common within the native ensemble and termed the structure sequence. The presence of native secondary structure throughout the simulation was quantified by dividing the number of native-like P-SEA assignments by the total number of β sheet and α helix assignments within the structure sequence. This value is denoted Hn(t). Nonnative secondary structure, which captures the percentage of α and β secondary structure assignment that is unlike that found in the structure sequence, is denoted Hnn(t).

Mean First Passage Times

Having determined the set of substates defining the native state, we next derived the expected mean first passage time of each nonnative substate to the native ensemble as put forward in ref (48) (alternative algorithms for computing transit times are given in Torchala et al.[77]). First, we estimated the symmetric transition probability matrix P from the clustering results using the MSMBuilder2 MLE method to guarantee detailed balance.[78] This matrix carries jump probabilities for the embedded discrete Markov chain,[40] but can also be expressed as a rate matrix that approximates the continuous time transition rates.[39] For each nonnative substate i, we modify to have zero transition rates to all substates previously connected to i. We then compute the formal matrix exponentiation e( for geometrically spaced t values (t = 50(1.2), r ∈ [0,1,...40]. That is, t ∼ 50 ns...74 μs). The fraction of trajectories, starting at (nonnative) state i, that will arrive at the native ensemble N before time t is then given bywhere j indexes substates in the native ensemble. This fraction consistently converged for all substates (i.e., minP = 0.9964 at tmax out of all proteins). The mean first passage time (see SI Figure S7) is then given by

Frustration Scores

We then ask how these mean first passage times to the native ensemble, or transit times, change in response to network perturbation, that is, the removal of a substate in the nonnative ensemble. To that end, we remove a substate i in the nonnative ensemble from the network and then observe the percentage change between unperturbed (τ) and perturbed (τ*) transit times, in both cases for all nonnative substates m ∈ [1...knn ≠ i] (see Figure 1). That is,where the bar thus indicates the average percentage change in transit time over all knn–1 substates in the nonnative ensemble, and the multiplicative factor allows frustration scores to be interpreted as percentages. Substates in the native ensemble are never removed throughout the procedure. Any isolates resulting from removing node i were discarded when computing f̅nat, but this was rare ((isolates)/(knn) < 0.01 for all proteins except LAMDA, (isolates)/(knn) < 0.021). Frustration scores f̅nat quantify the kinetic impact, positive or negative, for each nonnative substate i, expressed as percentages in Figure 3 and Table 2. States with positive frustration scores are termed facilitators, those with negative frustration scores are termed inhibitors or kinetic traps. All analysis subsequent to clustering was performed in Matlab.[79] Due to the matrix exponentiation, complexity of f̅nat computation is ∼O(N3), and runtimes were between 4 min (PB, knn = 167) and 20h (PG, knn = 2248) on a 12-core cluster.

Figure 3

Frustration scores. Nonnative substate f̅nat values are shown for all 10 proteins. Because the substates of the native ensemble cannot be associated with f̅nat values, the number of data points corresponds to the number of substates in the nonnative ensemble, knn (Table 1). Values less than zero indicate a kinetic trap, those above zero indicate a substate that facilitates transition to the native ensemble. Central red marks indicate the median; box edges are the 25th and 75th percentiles. See SI Figure S7 for a comparison with frustration scores of phantom networks.

Table 2

Transition Network Summary Statisticsa

	min f̅_nat	max f̅_nat	median substrate size	median substrate width (Å)	median neighbors per substate	k_nn(k_n)	transition matrix density (%)	W	(l_nn→n)/l_nn (%)
TRP	–1.8	0.1	7 (11.5)	3.4 (2.6)	14 (12)	387 (30)	3.6 (22.3)	0.30	5.74
BBA	–0.7	2.2	5 (12)	4.3 (2.9)	8 (16)	905 (93)	1.0 (16.2)	0.42	3.89
VHP	–1.6	0.6	7 (9.5)	5.3 (4.6)	13 (16)	207(44)	6.1 (19.3)	0.23	10.23
WW	–0.6	3.5	2 (68)	4.7 (1.7)	4 (86)	2067 (207)	0.2 (39.2)	0.46	0.68
BBL	–1.6	1.1	7 (15)	6.2 (4.8)	11 (18)	758 (102)	1.5 (12.7)	0.42	8.36
PB	–4.2	1.4	5 (21)	7.0 (4.7)	10 (20)	167 (41)	5.8 (39.7)	0.34	6.61
HMDM	–1.7	1.1	5 (18)	5.6 (4.0)	8 (29)	517 (137)	1.8 (17.8)	0.44	20.86
PG	–0.7	3.3	4 (78)	5.5 (2.5)	6 (27)	2248 (62)	0.4 (37.5)	0.66	0.50
A3D	–0.6	2.2	4 (88.5)	8.2 (3.6)	8 (40)	1346 (68)	0.6 (48.5)	0.45	0.51
LAMDA	–1.2	2.0	6 (18.5)	6.0 (4.6)	5 (18)	1181 (112)	0.6 (14.7)	0.68	2.25

Details of transition networks in Figure 2. Columns 1–7: parenthetical values denote properties of the native ensemble, all others to the nonnative ensemble. The range of frustration scores is given in the first two columns. Median substate size refers to the number of trajectory snapshots clustered into each conformational substate. Substate width refers to average intra-substate pairwise RMSD. The number of substates classified by modularity optimization as being in the nonnative (native) ensemble is denoted by knn(kn). Maximum modularity value, W, for the modularity optimization algorithm utilized (see Methods) is also given. The last column shows the ratio between (1) total transition edges connecting nonnative and native ensembles (lnn→n) and (2) the total number of edges in the nonnative ensemble (lnn), expressed as a percentage.

Network Representations

Substate transition matrices are usually very sparse, especially in the nonnative ensemble (Table 2). Most transitions are forbidden due to the involved steric clashes, backbone geometry restrictions, and repulsive electrostatics. Graph-based visualizations of conformation space thus have interpretive value in conveying only the transitions that do take place as well as the relative sizes of conformational states.[80,81] We used Gephi[82] to represent each protein’s transition network (Figure 2). Network layouts were optimized using the Force Atlas algorithm, first allowing and then penalizing node overlap, in both cases with an internode repulsion strength of 200. Edge weights were scaled according to the transition matrix, specifically 1000, but are not differentiated visually in the figure. The repulsion force acts between all nodes, whereas node attraction is relative to connecting edge weight, so unconnected nodes feel zero direct attraction. Node diameters reflect their relative populations, but the smallest node is shown no smaller than 1/30 the size of the largest node for clarity. Network radial orientation was rooted with the native ensemble facing east.

Results and Discussion

Conformational landscapes of the 10 proteins can be conveniently depicted as networks of nodes and edges that illustrate folding properties of each peptide. These representations are shown in Figure 2A. The native ensembles as defined in Methods are colored green, though the maps’ layouts themselves were created without preknowledge of native or nonnative substate assignments. All nonnative substates are colored according to their computed f̅nat values. Details of transition networks in Figure 2. Columns 1–7: parenthetical values denote properties of the native ensemble, all others to the nonnative ensemble. The range of frustration scores is given in the first two columns. Median substate size refers to the number of trajectory snapshots clustered into each conformational substate. Substate width refers to average intra-substate pairwise RMSD. The number of substates classified by modularity optimization as being in the nonnative (native) ensemble is denoted by knn(kn). Maximum modularity value, W, for the modularity optimization algorithm utilized (see Methods) is also given. The last column shows the ratio between (1) total transition edges connecting nonnative and native ensembles (lnn→n) and (2) the total number of edges in the nonnative ensemble (lnn), expressed as a percentage.

Properties of Transition Networks

Many general phenomenological aspects of protein folding are visible in these abstractions in addition to features that distinguish the folding behavior of specific polypeptides. The prevalence of large substates within the highly interconnected native ensemble (see Table 2), for example, reflects the loss of entropy a folding peptide experiences upon attaining the energetically favorable folded conformation (cluster size and widths given in SI Figures S4 and S5). Second, folding facilitators (blue substates) are as expected mostly proximal to the native ensemble due to being conformationally very native-like.[83] The topological isolation of the native ensemble, especially for TRP, WW, PG, A3D, and LAMDA, suggests modularity optimization effectively classifies the folded and unfolded ensembles without invoking any protein-specific parameters. A particularly evident separation between the two ensembles characterized the transition maps of WW, PG, and A3D, all of which had less than one percent of nonnative edges connecting the nonnative and native ensembles (range for all proteins: 0.5–20.9%, see lnn→n/lnn in Table 2). Higher values of this measure indicate less homogeneous folding pathways,[84] most evident in HMDM and PB transition maps. Large kinetic traps, transition bottlenecks, and facilitators, among other topological motifs, are unequally prominent among the networks. Several of the maps depict a nonnative ensemble of freely interconverting structures that form no apparent energetically coherent substates (min f̅nat values for WW, PG, and A3D are −0.6,–0.7, and −0.6, respectively), whereas substantial kinetic traps characterize the nonnative ensembles of TRP, BBL, PB, and HMDM (min f̅nat = −1.8,–1.6,–4.2, and −1.7, Table 2). The distribution of frustration scores for each peptide is shown in Figure 3. We also compared τnat and f̅nat values to corresponding quantities computed for phantom (i.e., synthesized) networks in SI Figure S7. The comparison reveals that the degree distribution inherent to the transition network of each peptide is sufficient input for approximating τnat and f̅nat in generated networks. However, a more quantitative analysis is necessary to reveal the specific conformational features, or structural parameters, that are responsible for the frustration scores unique to each protein’s unfolded ensemble. We focus primarily on properties of kinetic traps because facilitators inherently border a native/nonnative delineation that is convenient but imposed; any conformational differences between facilitators and native substates are likely to be subtle with regard to the structural parameters used here.[83] We first address clustering properties that could be thought to cause negative outlier f̅nat values and then discuss the structural features that indeed correlate with kinetic frustration.

Kinetic Frustration Is Not a Clustering Artifact

As shown in Figure 3, most substates within the nonnative ensemble are kinetically neutral; their individual presence in the FEL has little impact on the expected transit times of other substates. Importantly, these substates need not be small, or have few constituent conformers. While substates with substantial positive or negative frustration scores tend to be above average in size, the converse is not true (SI Figure S4). That kinetic traps, as identified through f̅nat, must contain a substantial number of conformers reflects our intuition that kinetic traps represent local minima with stabilizing intramolecular interactions in the nonnative FEL.[85] Substate size (i.e., number of members) as a descriptive trait can be contrasted with substate width (i.e., the mean pairwise RMSD of any two of its members), which provides an approximation of local entropy. If kinetic traps presented increasing substate width as f̅nat values became more extreme, we could conclude that f̅nat values are actually artifacts of the clustering step. In this scenario larger and larger peripheral regions of the configurational state space are unfairly grouped together during clustering, resulting in artificially exaggerated f̅nat values. We observed, in contrast, that kinetic traps display decreasing width values, suggesting they represent genuine local energy minima (SI Figure S5).

Properties of Kinetic Traps

Frustration scores directly reflect the transition topology. Having discussed that clusters in our networks are well-formed, we next investigate conformational causes of this observed topology. Specifically, are there general structural features that cause kinetic traps?[86] We selected five structural parameters that share the desirable properties of normalizability and interpretability, and we computed them for all substates (all nonnative and native trajectory frames) in the transition networks. Definitions for native contacts, Qn, nonnative contacts, Qnn, native secondary structure Hn, and nonnative secondary structure Hnn are given in Methods, and our fifth structural parameter was standard RMSD (against the native conformation). Figure 4 illustrates the relationship between f̅nat and fractional contacts (Qnn or Qn) for all nonnative substates within the conformational landscape of LAMDA. The relationship observed confirms our expectation that kinetic traps display more nonnative contacts than the nonnative ensemble in general. The presence of interresidue contacts, both native and nonnative, is normalized against the corresponding quantity observed in the native ensemble. Mean values for these features are shown with dashed horizontal lines, gray for the entire nonnative ensemble, green for the entire native ensemble. For LAMDA, we conclude that the enrichment of nonnative contacts among kinetically frustrated substates is one hypothesis for the appearance of outlying red substates in LAMDA’s transition network (Figure 2, white arrow).

Figure 4

Structural features in the nonnative ensemble are related to kinetic frustration. Average intrasubstate Qnn (C) and Qn (D) values for LAMDA are plotted against frustration scores, f̅nat, showing that nonnative contacts are associated with kinetic frustration. Structural parameter values, Qn and Qnn included, are normalized against the average corresponding values within the native ensemble. Marker widths are scaled according to substate populations, and error bars indicate one standard deviation. Dashed lines show normalized average values for the nonnative (gray) and native (green) ensembles. Cumulative sums, (A) sQnn = (Qnn – Q̅nn) and (B) sQn = (Qn – Q̅n), (see main text) show the propensity of structural features to be more associated with negative or positive f̅nat values. When integrated these curves yield the bias values, β, that allow quantitative comparison between proteins (Figure 5 and SI Table S1). Color values along the curve correspond to substate color values in Figure 2.

Figure 5

Comparison of β values for five structural parameters. Frustration bias values relate structural features to kinetic frustration. Negative values indicate the structural feature is strongly associated with kinetic frustration, that is, slowing transition to the native state for that protein. Positive values indicate the feature is associated with states that facilitate attainment of the native state. The RMSD distance from the native conformation has the largest negative bias value for BBA, WW, BBL, HMDM, and A3D. Nonnative contacts have the largest negative biases for TRP, PB, and LAMDA. Nonnative secondary structure, Hnn, is the most biased structural parameter only for PG. Some bias values close to zero are not statistically significant (SI Table S1), indicating the structural parameter has little kinetic impact on folding for that protein.

In normalizing Qn and Qnn against their respective values in the native ensemble, we can evaluate their correlative relationship to frustration scores and then compare among protein systems. We thus quantify whether a feature is more enriched among kinetic traps or facilitators with a bias valuefor any feature F with mean value F̅ (within the nonnative ensemble), where f indexes the ascendingly sorted f̅nat values. Bias values convey whether feature F is more enriched for negative or positive f̅nat values (Figure 4A and B), where the negative sign allows us to compare with standard linear correlation, which we performed with the addition of weighted substate size (SI Table S1). Bias values for all five structural parameters are presented in Figure 5. Values near zero indicate the structural parameter is not strongly associated with kinetic frustration. Large positive or negative values indicate a strong relationship. We performed permutation tests to check the statistical significance of these bias values (SI Table S1). As we would expect, native secondary structure and native contacts frequently have positive bias values (TRP β is the only statistically significant exception), indicating that facilitator substates contain many native-like structural features, whereas kinetic traps do not. Significant nonnative secondary structure bias values were observed for BBA, PG, A3D, and LAMDA (β = −0.13,–0.19,–0.12, and −0.10 respectively), and significant nonnative contacts bias values were observed for TRP, BBA, BBL, PG, A3D, LAMDA (β = −0.33,–0.13,–0.18,–0.15,–0.13, and −0.28, respectively). Because RMSD-to-native is so commonly invoked as a distance metric for how far a simulation has progressed, we also computed bias values for RMSD, which were negative and statistically significant for all proteins except TRP, VHP, and PB. Especially for BBA, WW, BBL, HMDM, and A3D, nonspecific structural deformity, the characteristic summarized by RMSD, appears more associated with kinetic frustration than the specific structural features tested. VHP and PB simulations did not present statistically significant bias values for any structural parameters, perhaps due to modularity optimization defining the native ensemble too inclusively for these networks (see SI Figure S2). Comparison of β values for five structural parameters. Frustration bias values relate structural features to kinetic frustration. Negative values indicate the structural feature is strongly associated with kinetic frustration, that is, slowing transition to the native state for that protein. Positive values indicate the feature is associated with states that facilitate attainment of the native state. The RMSD distance from the native conformation has the largest negative bias value for BBA, WW, BBL, HMDM, and A3D. Nonnative contacts have the largest negative biases for TRP, PB, and LAMDA. Nonnative secondary structure, Hnn, is the most biased structural parameter only for PG. Some bias values close to zero are not statistically significant (SI Table S1), indicating the structural parameter has little kinetic impact on folding for that protein.

Visualizing Kinetic Traps

Conformational ensembles consisting of snapshots from the most kinetically frustrated substates are shown in Figure 6, rendered with PyMOL.[87] The 5% of frames that were members of substates with the most negative frustration scores were aligned to their collective mean structure and represented as ensembles (topological context shown in SI Figure S8). Then the native conformation (see Methods) was added, aligned, and shown in a thicker tube representation. These ensembles illustrate properties of the nonnative ensemble that were suggested by the transition networks in Figure 2. The diffuse nonnative ensembles of WW, PG, and A3D, for example, have few stabilizing nonnative interactions, so even their most kinetically frustrated states appear almost completely unstructured (Figure 6A). In contrast, the nonnative transition maps with more topological isolation among inhibitor substates, especially BBL and PG, show much more homogeneity in the respective structural ensembles. BBL’s bias for Qnn was −0.18, suggesting that the nonnative tertiary structure apparent in the ensemble is responsible for the cluster of kinetic traps evident in the network (Figure 2 black arrow). The kinetic trap ensemble for TRP shows that the peptide can get conformationally ‘stuck’ in a nonnative but stabilized geometry. Although the nonnative configuration in the ensemble and the superimposed native state have very different backbone geometries, the relative compactness of the former may explain why TRP presented a statistically significant negative β, a property not observed for any other peptide. Peptides with simple contact topologies have been shown in lattice models to allow more interplay between native and nonnative contacts,[88] consistent with our findings on TRP. That stabilizing interactions generally may be responsible for kinetic traps is suggested by Figure 6B, which shows that kinetic traps commonly have smaller widths (lower average pairwise RMSD of constituent members) than the nonnative ensemble. Ensemble representation of kinetic traps compared with native structure. (A) Representative structures of kinetic traps, shown in topological context in SI Figure S8. (B) Comparison of substate widths (intrasubstate pairwise RMSD). Red, substates classified as kinetic traps; gray, all nonnative substates; green, all native substates.

Summary

To compare the folding properties of 10 protein sequences, we have exploited both quantitative and interpretive aspects of network models of protein folding. Our definition of kinetic frustration is grounded in graph theoretic principles while being consistent with qualitative definitions of kinetic features, such as kinetic traps. The method thus allows direct comparison between temporal folding behaviors and conformational features, the latter summarized by five standard structural metrics that were normalized against their prevalence in the native ensemble. While nonnative intramolecular interactions or nonnative secondary structure formation have been recognized as contributing factors to misfolding[89,90] or folding rate reductions,[91,92] we quantified the influence of these structural malformations through a normative process that requires no prior domain knowledge of the protein of interest. Specifically, folding for TRP, VHP, PB, PG, and LAMDA was most kinetically frustrated by deformations other than that characterized by RMSD, suggestive of stabilizing forces that trap a folding protein in a semistructured but nonnative conformation. These details were resolvable because we chose to perturb individual substates rather than substate collections within the transition networks.[29] We additionally observed that phantom networks constructed by mimicking gross topological attributes of the observed networks mostly reproduced emergent kinetic properties (τnat and f̅nat) (SI Figure S7). If transition networks directly reflect the kinetic barriers, traps, and pathways caused by conformational fluctuations, as argued, then further topological properties can hopefully be linked to more nuanced categories of structural deformation. Certainly, subjective concepts such as misfolded intermediates, unstructured intermediates, and kinetic traps, often invoked in the literature of misfolding pathologies,[19,93−95] can especially benefit from this type of quantification since simulations are increasingly sampling the distant or rare FEL regions where these events occur.

78 in total

1. A theoretical search for folding/unfolding nuclei in three-dimensional protein structures.

Authors: O V Galzitskaya; A V Finkelstein
Journal: Proc Natl Acad Sci U S A Date: 1999-09-28 Impact factor: 11.205

2. The origin of nonmonotonic complex behavior and the effects of nonnative interactions on the diffusive properties of protein folding.

Authors: Ronaldo J Oliveira; Paul C Whitford; Jorge Chahine; Jin Wang; José N Onuchic; Vitor B P Leite
Journal: Biophys J Date: 2010-07-21 Impact factor: 4.033

Quantifying the Sources of Kinetic Frustration in Folding Simulations of Small Proteins.

Introduction

Methods

MD Simulations

Determining Conformational Substates

Defining the Native Ensemble

Defining Q and Secondary Structure

Mean First Passage Times

Frustration Scores

Network Representations

Results and Discussion

Properties of Transition Networks

Kinetic Frustration Is Not a Clustering Artifact

Properties of Kinetic Traps

Visualizing Kinetic Traps

Summary

1. A theoretical search for folding/unfolding nuclei in three-dimensional protein structures.

2. The origin of nonmonotonic complex behavior and the effects of nonnative interactions on the diffusive properties of protein folding.

3. Designed TPR modules as novel anticancer agents.

4. The energy landscapes and motions of proteins.

5. Energy landscape of knotted protein folding.

Review 6. Intermediates in the folding reactions of small proteins.

7. Nucleation-based prediction of the protein folding rate and its correlation with the folding nucleus size.

Review 8. Everything you wanted to know about Markov State Models but were afraid to ask.

9. RaTrav: a tool for calculating mean first-passage times on biochemical networks.

10. Energetic frustrations in protein folding at residue resolution: a homologous simulation study of Im9 proteins.

1. Approximating frustration scores in complex networks via perturbed Laplacian spectra.