Literature DB >> 35756371

Cooperativity and Folding Kinetics in a Multidomain Protein with Interwoven Chain Topology.

Abstract

Although a large percentage of eukaryotic proteomes consist of proteins with multiple domains, not much is known about their assembly mechanism, especially those with intricate native state architectures. Some have a complex topology in which the structural elements along the sequence are interwoven in such a manner that the domains cannot be separated by cutting at any location along the sequence. Such proteins are multiply connected multidomain proteins (MMPs) with the three-domain (NMP, LID, and CORE) phosphotransferase enzyme adenylate kinase (ADK) being an example. We devised a coarse-grained model to simulate ADK folding initiated by changing either the temperature or guanidinium chloride (GdmCl) concentration. The simulations reproduce the experimentally measured melting temperatures (associated with two equilibrium transitions), FRET efficiency as a function of GdmCl concentration, and the folding times quantitatively. Although the NMP domain orders independently, cooperative interactions between the LID and the CORE domains are required for complete assembly of the enzyme. Kinetic simulations show that, on the collapse time scale, multiple interconnected metastable states are populated, attesting to the folding heterogeneity. The network of kinetically connected states reveals that the CORE domain folds only after the NMP and LID domains, reflecting the interwoven nature of the chain topology.

Entities: Chemical

Year: 2022 PMID： 35756371 PMCID： PMC9228575 DOI： 10.1021/acscentsci.2c00140

Source DB: PubMed Journal: ACS Cent Sci ISSN： 2374-7943 Impact factor: 18.728

Introduction

It is estimated that nearly 70% of eukaryotic proteins consist of multiple domains.[1] They are involved in a wide array of functions, such as allosteric signaling (for example, hemoglobin and the bacterial chaperonin GroEL), passive elasticity of muscle (titin), and cargo transport by motors (dynein). Despite the inherent difficulties in identifying domains in proteins, a perusal of their structures shows that there is a great deal of diversity in their architectures.[1] For example, the giant titin protein is a heteropolymer made of thousands of β-sheet single-domain immunoglobulin (Ig) proteins that are connected by linkers. Pioneering single-molecule pulling experiments[2] on the polyprotein Ig established that it unfolds one domain at a time. It is likely that refolding, upon force quench, also proceeds by the formation of the native state, one domain at a time. Therefore, titin might assemble by preformed monomer units. We refer to polyproteins, such as Ig, as simply connected multidomain proteins (SMPs) because they can be partitioned into individual subunits by merely excising the linkers. Another example of SMP, the ankyrin repeat, is shown in Figure A (Protein Data Bank ID: 3TWT). The amino acid residues, and the associated secondary structural elements (SSEs), in the SMPs are “one-dimensionally contiguous”.[3] In contrast, in multiply connected multidomain proteins (MMPs), the sequences are intertwined in such a manner that their structures cannot be dissected into independently folding subunits. Thus, topologically, the domains cannot be cut in such a manner that they follow the sequence in a continuous linear manner, as is the case in SMPs. An example of an MMP is the T4 lysozyme whose folding cooperativity was shown, using pulling experiments,[4] to reflect the discontinuity in the connectivity of the SSEs in which a portion of the N-terminal sequence is part of the C-terminal domain. The connectivity of domains in terms of sequence is even more complicated in adenylate kinase (ADK), the protein of interest in this study, shown in Figure B (Protein Data Bank ID: 4AKE). According to the Wetlaufer[3] classification, the domains in ADK are discontinuous with respect to the sequence and the connectivity of the SSEs. The rules linking the topology of the folded state of the MMPs are hard to anticipate based solely on the connectivity of sequence and SSEs. The problem is exacerbated because, with the exception of very few studies,[4−6] there is a paucity of detailed experimental studies that have dissected the folding pathways of MMPs.

Figure 1

Structure, sequence, and folding thermodynamics. (A) Cartoon representation of the ankyrin repeat (PDB ID: 3TWT), an example for SMP. (B) Cartoon representation of ADK (PDB ID: 4AKE), an example for MMP. (C) Ribbon diagram representation of ADK. The N and C termini are indicated. The simulated sequence is shown below. (D) Temperature dependence of the CD signal (black line) extracted from experiments[18] and the simulated total energy (red line) as a function of temperature. The inset shows the calculated heat capacity as a function of temperature. (E) Free energy profiles at (black) and (red) as a function of the structural overlap function, χ (see eq ). The values and are used to separate the global equilibrium states, which are the native basin of attraction (NBA), intermediate basin of attraction (IBA), and unfolded basin of attraction (UBA). (F) Temperature dependence of the fraction of molecules in the NBA (red), UBA (green), and IBA (blue). In a series of most insightful experiments, Haas and co-workers reported the steps involved in the folding of Escherichia coli ADK, triggered by varying the guanidinium chloride (GdmCl) concentration.[6−8] One of their key findings is that the collapse of ADK is fast, but the formation of secondary structural elements is slow.[7] In a subsequent double kinetics experiments,[6] they further established that structure formation in the CORE domain (Figure B) is slow upon denaturant quenching. More recently, Haran and co-workers used single-molecule fluorescence resonance energy transfer (smFRET) experiments[5] to generate equilibrium trajectories at a fixed [GdmCl] concentration. Their results, which were analyzed by a hidden Markov model (HMM) analysis at different [GdmCl] values, suggest that folding occurs by the kinetic partitioning mechanism in which there are multiple metastable states in the folding landscape of ADK. The direct flux to the folded state from the unfolded ensemble (referred to as the partition factor elsewhere[9]) is only Φ ≈ 0.02,[10] which implies that the majority of the molecules fold by first populating one of the (roughly six obtained from HMM analysis at 0.65 M [GdmCl]) metastable states. Surprisingly, they found that connectivity between the states and the associated fluxes between them could be tuned by altering the denaturant concentration. Their experiments showed that the folding landscape of ADK is not only rugged but is also malleable to changes in the external conditions. Although qualitatively similar results have been found in the folding of PDZ3,[11] a 110-residue protein with a topology much simpler than that of ADK, the intricate topology of the MMP renders the folding of the latter more complicated. The structure of the 214-residue ADK consists of three domains: the NMP domain, the LID domain, and the CORE domain (Figure C).[12] The NMP domain spans residues 30–59 (indicated by the blue square in Figure C) and the LID domain consists of residues 122–159 (indicated by the yellow square), and the rest of the residues (1–29, 60–121, and 160–214) belong to the CORE domain. Sequence penetration across the native structure is vividly illustrated in ADK by noting that the N-terminal residues, 1–29, are part of the CORE domain comprising C-terminal residues. Contacts formed between N-terminal and C-terminal residues are labeled in the contact map of ADK in Figure S1. Here, we investigate thermal and denaturant-dependent folding of ADK using simulations based on the self-organized polymer model with side chains (SOP-SC) and the molecular transfer model (MTM),[13−15] under conditions that closely mimic those used in the experiments.[5,6,10] Coarse-grained model simulations, without side chains, were used to investigate folding cooperativity and multiple routes to the native state in ADK by thermal folding and unfolding.[16,17] After demonstrating that our simulations quantitatively reproduce many experimental measurements, we show that cooperative interactions between the LID and CORE domains, with folding of the former being slave to partial ordering of the latter, are required for ADK self-assembly. In contrast, the NMP domain folds independently at a higher temperature (or denaturant concentration) than the other two domains. The enhanced cooperative interactions between the LID and the CORE domains arise due to the discontinuous nature of the latter. The network of states linking the unfolded to the folded state, both at equilibrium and during refolding upon temperature quench, is multiply connected and shows that folding must occur by parallel pathways. The late stage of folding involves interaction between a reentrant helix in the CORE domain that forms contact with elements in the N-terminal CORE domain. The methods used here are transferable for investigating the folding of other MMPs.

Results

Simulations Predict Thermal Denaturation Accurately

The circular dichroism (CD) spectrum shows that ADK undergoes two cooperative transitions, one at and the other at (18) (Figure D, black line). In order to assess if the simulations reproduce the observed thermal melting profile, we used replica-exchange molecular dynamics (REMD)[19−21] and low-friction Langevin dynamics[22] in order to calculate the melting profiles. The temperature-dependent total energy E, which mirrors the CD signal, also shows two cooperative transitions (Figure D, red line). The corresponding melting temperatures, identified by the peaks in the heat capacity, C, are at , (see the inset of Figure D). The values of Tm’s obtained from our simulations and experiments are in excellent agreement with each other. This is remarkable given that no parameter in the SOP-SC model was adjusted to obtain agreement with experiments.

Three State Folding

The temperature-dependent profiles of E and C demonstrate that ADK folds globally in a three-state manner. The free energy profile, G(χ), as a function of the overlap function, χ given in eq , at and (Figure E), shows three states, which represent the NBA (native basin of attraction), UBA (unfolded basin of attraction), and the IBA (intermediate basin of attraction, i.e., IEQ). The conformations are grouped into three basins based on the χ values, shown by the black vertical dashed lines in Figure E. If , the conformations are classified as belonging to the NBA. Conformations with belong to the UBA, and the rest of the conformations represent IBA (IEQ). At , the NBA is unstable while IEQ and the unfolded state have similar stabilities (Figure E, black line). At , ADK transitions between the NBA and IBA while the UBA is unstable (Figure E, red line). Additional structural details of the IEQ state are shown in Figure S2. In Figure F, we plot the fraction of molecules in the NBA, fNBA([0], T) (the first argument indicates the value of the denaturant concentration); in the UBA, fUBA([0], T); and in the IBA, fIBA([0], T). The temperature dependencies of fUBA([0], T) (green curve in Figure F) and fNBA([0], T) (shown in red in Figure F) show that ADK unfolds and folds cooperatively at the two melting temperatures. At , the value of fNBA([0], T) is negligible, reflecting the cooperative transition between the IBA and UBA. Using fIBA([0], T) = fUBA([0], T) = 0.5, we obtained , which coincides with the peak in the heat capacity (inset in Figure D) . At low temperatures, the value of fUBA([0], T) is negligible, suggesting that ADK undergoes a cooperative transition between the NBA and the IBA. Using fIBA([0], T) = fNBA([0], T) = 0.5, we obtained , which also agrees with the peak in C.

Equilibrium Folding of the Domains

The average fraction of native contacts in each domain, QNMP, QLID, and QCORE, as a function of temperature (Figure A), shows that the NMP and the LID domains fold in a two-state manner with the melting temperature, and . In contrast, the CORE domain folds in a three-state manner. The two melting temperatures, extracted from the temperature dependence of , show that the ordering of this domain reflects the two transition temperatures in the heat capacity and the total energy (Figure D). It follows that the incremental assembly of the CORE domain, across both the melting temperatures, is the reason that ADK globally folds in three stages.

Figure 2

Temperature-dependent connectivity of metastable states. (A) Fraction of native contacts in each domain, QNMP, QLID, and QCORE, as a function of temperature. The inset shows the temperature dependence of . (B) Distributions of the fraction of native contacts within the three domains P(QNMP), P(QLID), and P(QCORE) at (left) and (right). (C) Network of thermodynamically connected substates at . The numbers on the arrows are the transition times from one substate to another substate. (D) Same as part C except it is calculated at . A more nuanced picture of the folding thermodynamics emerges from the distributions of QNMP, QLID, and QCORE at and shown in Figure B. If , the NMP domain is predominantly folded otherwise it is unfolded (compare the left and right panels in the upper panels in Figure B). The data in Figure A,B show that the NMP domain forms before the LID and CORE domains become structured, as the temperature is decreased. Similarly, if , the LID substructure adopts nativelike conformations (see the middle panels in Figure B). P(QNMP) [P(QLID)] is bimodal at (Figure B), which is consistent with the interpretation that both of the domains fold in an almost all-or-none manner, albeit at different melting temperatures. The lower melting temperature of the LID domain shows that it is thermodynamically less stable than the NMP domain, which accords well with single-molecule pulling experiments.[23] If , the CORE domain is in the native state. If the inequality is satisfied, the CORE domain is in the unfolded state. In the intermediate state, we find that 0.21 < QCORE < 0.45 (see the bottom panels in Figure B). Two-dimensional free energy profiles G(Qα, Qβ) (α and β are the appropriate domain labels) in Figure S3 illustrate the cooperativity between the LID and CORE domains at the two melting temperatures.

Network of Connected Substates

The NMP and LID domains exhibit two statelike transitions as T is varied whereas the CORE domain ordering is best described using three states labeled as U, I, and N (Figure A). Thus, from a thermodynamic perspective, we could describe the formation of ADK using 2 × 2 × 3 = 12 substates. They are S1(UUU), S2(UUI), S3(UUN), S4(UNU), S5(UNI), S6(UNN), S7(NUU), S8(NUI), S9(NUN), S10(NNU), S11(NNI), and S12(NNN). The first letter in the parentheses represents the state of the NMP domain, the second letter the state of the LID domain, and the third letter the state of the CORE domain. We first determined the percentages of the substates in each global state obtained in the simulations by generating 28 folding trajectories. The conformations that are sampled were grouped into the 12 substates (S1–S12) and the 3 global states (UG, IG, and NG). The percentages are determined from the number of each substate in each global state. Out of the total 12 substates, only 7 substates are significantly populated. We find that the global native state is a superposition of the substate S9 (7.3%) and S12 (92.7%), which we write as NG = 7.3%·S9 + 92.7%·S12. Similarly, the globally unfolded state is decomposed as UG = 92.8%·S1 + 7.2%·S7. For the global intermediate state, we find IG = 1.1%·S2 + 97.8%·S8 + 1.1%·S11. We then performed a flux analysis among these substates at the two melting temperatures ( and ) to assess the complexity of the network connectivity in the thermodynamic folding landscape. At the lower melting temperature, , the equilibrium flux predominantly flows through the substates S9, S12 and S2, S8, S11 (Figure C). Considering their global structural features, the network shows that ADK transitions primarily between NG and IG. The numbers on the arrows are the transition times from one substate to another substate. At the higher melting temperature , the network connectivity involves predominantly the substates S2, S8 and S1, S7 (Figure D). By mapping to the global structural features, we find that ADK transitions back and forth between IG and UG at the higher melting temperature.

Chemical Denaturation

In order to compare with experiments directly, we first used the molecular transfer model (MTM)[13] to simulate the effects of GdmCl on the equilibrium properties. Following our previous studies,[13−15] we chose a simulation temperature, Ts, at which the calculated free energy difference between the native state (NG) and the unfolded state (UG), ΔGNU(Ts) (= ) and the measured free energy ΔGNU(TE) at TE (= 293 K) coincide. The use of ΔGNU(Ts) = ΔGNU(TE) (in water) to fix Ts is equivalent to choosing the overall reference free energy scale in the simulations. For ADK, ΔGNU(TE = 293 K) = −9.8 kcal/mol at [C] = 0,[18] which results in Ts = 322 K. Except for the choice of Ts, no other parameter is adjusted to obtain agreement with experiments for any property. With Ts = 322 K fixed, we first computed the FRET efficiency as a function of [GdmCl] for ADK (Figure A). The FRET efficiency of a protein conformation was calculated using , where Ree is the end-to-end distance, and R0 = 49 Å. In the experiments, residues 73, 203 were labeled.[24] The agreement between the computed (thick black line) and the measured (black dots)[5] FRET efficiencies is excellent. The derivative of the computed FRET efficiency with respect to [GdmCl] (inset of Figure A) also shows signs of the two thermodynamic transitions as the denaturant concentration is increased, which accords well with our thermal unfolding calculations. The midpoint concentration for the major transition is , which agrees well with the measured result.[24] The predicted midpoint concentration for the second transition is at , which has not been observed in experiments. The values for the FRET efficiency for the structures in the UBA are roughly constant as [GdmCl] changes (green line in Figure A) whereas the values for the FRET efficiency for the structures in the NBA decrease substantially as [GdmCl] increases (red line in Figure A).

Figure 3

Effect of GdmCl on ADK folding. (A) Comparison of the calculated (thick black line) and experimental measurements[5] (black dots) of the FRET efficiencies as a function of [GdmCl], the GdmCl concentration. The inset shows the derivative of the calculated FRET efficiency, which clearly indicates that there are two distinct transitions. Decomposition of the FRET efficiency for the structures in the NBA (red), UBA (green), and IBA (blue). (B) Comparison of experimental measurements of the FRET efficiencies (black dots)[5] and the calculated fraction of ADK molecules in the NBA (red), UBA (green), and IBA (blue) as a function of [GdmCl]. The comparison shows that below ∼0.8 M the experimental FRET efficiency coincides with the calculated values for ADK molecules that are predominantly in the NBA, which is consistent with the plot in part A. In the range 0.8 2 M molecules, both the NBA and IBA contribute to the FRET efficiency. (C) Fraction of native contacts in each domain, QNMP, QLID, and QCORE, as a function of [GdmCl]. The inset shows the denaturant dependence of . (D) Distributions of the fraction of native contacts within the three domains P(QNMP), P(QLID), and P(QCORE) at (left) and (right). (E) Heat capacity versus temperature for different values of [GdmCl]. (F) [GdmCl] dependence of the melting temperatures. The fits to the lines are explicitly displayed. The units of B1 and B2 are K M–1. The black (red) line is for (). The dependence of the simulated fNBA([GdmCl], Ts) on [GdmCl] is also in excellent agreement with the measured FRET efficiency[5] (Figure B). As in the case of thermal denaturation, the transition at low [GdmCl] takes place between the NBA and the IBA. The corresponding midpoint concentration , determined using fNBA([GdmCl], Ts) = fIBA([GdmCl], Ts) = 0.5 (red and blue lines in Figure B), is close to the experimental value.[24] The transition at high [GdmCl] occurs as the intermediate state is destabilized, thus populating the unfolded state. Using fUBA([GdmCl], Ts) = fIBA([GdmCl], Ts) = 0.5 (green and blue lines in Figure B), the associated midpoint concentration is . The variations in the average values of the fraction of native contacts in the various domains (QNMP, QLID, and QCORE), shown in Figure C, as a function of [GdmCl], are very similar to the results in Figure A. The distributions of QNMP, QLID, and QCORE at the two midpoint concentrations (Figure D) are also qualitatively similar to the ones calculated at the two melting temperatures (Figure B). However, there is a subtle difference. In the presence of the denaturant, the range of conformations that are accessed is broader. For example, the probability of sampling the ordered state of LID domain (⟨QLID⟩ > 0.6) is non-negligible at whereas it is much smaller at (compare the middle panels in Figures B and 3D). This subtle difference could result in the differences in the stability of the folded ADK in the presence of GdmCl and folding induced by lowering the temperature. The heat capacity curves at various values of [GdmCl] show that the peaks corresponding to and decrease as [GdmCl] increases (Figure E). The decreases in and are both linear (Figure F). The variation in is well fitted using , where B1 ≈ −4.7 K/M. The variation in can be fitted similarly using , where B2 ≈ −4.5 K/M.

Collapse Kinetics and Folding Kinetics

To analyze the collapse and folding kinetics, we generated 100 folding trajectories (see the SI for details) using Brownian dynamics simulations at T = 293 K at [GdmCl] = 0 M.[25] We calculated the time-dependent changes in the radius of gyration (⟨Rg(t)⟩ by averaging over the ensemble of trajectories). The decay of ⟨Rg(t)⟩, which is a measure of the extent of collapse, is fitted using a single exponential function (Figure B), yielding collapse rate kc = 391 s–1, which as we discuss below is larger than the folding kf. Thus, global compaction occurs before folding, as observed in the experiment.[26] In particular, the distances d(28–71)(t) and d(122–159)(t) approach the native values extremely rapidly (Figure C), which likely corresponds to the dead time of the experiments.[26]Figure S5A,C in the SI shows that the time for the probabilities of these two distances, P(d(28–71))(t) and P(d(122–159))(t), to exceed about 0.5 is ∼2 ms, which is on the order of the collapse time . Thus, global compaction occurs rapidly upon making the conditions favorable for folding, as observed in the experiment.[26]

Figure 4

Folding and collapse kinetics. (A) Fraction of unfolded ADK molecules as a function of time (black) calculated from the distribution of first passage times. The red line is an exponential fit ( with τF = 5.5 ms) to the data. (B) Kinetics of collapse monitored by the average as a function of t (black). The fit to the data, given by a single exponential function (red line) yields an average collapse time τc = 2.56 ms. (C) Time-dependent changes in the distances between residues 28 and 71 (black line), 18 and 203 (red line), 122 and 159 (green line), and 36 and 129 (blue line). In the folded state, the distances between these four pairs of residues are 11.3, 13.1, 7.6, and 26.5 Å, respectively. Before estimating the folding time from simulations, we first calculated the folding rate theoretically using ,[27,28] which gives fairly accurate estimates for the folding times, spanning nearly 10 orders of magnitude, for proteins of varying length.[29] For ADK, N = 214, we find that , which is in good agreement with experiments.[7] From the distribution of first passage times, Pfp(s), the fraction of unfolded molecules at time t is calculated using . An exponential fit, , yielded the folding rate at [GdmCl] = 0 M, kf = 182 s–1(Figure A). The calculated value is larger than the experimental value of ∼ obtained by quenching the denaturant concentration from a high value to [GdmCl] = 0.3 M. The experimental [GdmCl] is fairly close to = 0.55 M. Therefore, one has to account for the stability change so that the simulation results could be compared to experiments directly. If the corrections due to the stability change at 0.3 M are made (Figure S4), using the data in Figure B, we predict that kf at [GdmCl] = 0.3 M is ∼ (details in the SI), which agrees well with measurements.[6,7]

Heterogeneity in the Self-Assembly of ADK

The generated folding trajectories could be used to quantitatively extract the extent of folding heterogeneity. In particular, ensemble FRET experiments provide data on the time-dependent changes in FRET efficiencies by varying the positions of the FRET probes. We calculated the time dependence of the distances between four pairs of residues: (i) the distance between 28 and 71, which are at the ends of a 44-residue segment including the NMP domain; (ii) the distance between residues 18 and 203, which could be a reporter of the global folding; (iii) the distance between residues 122 and 159, which are the ends of LID domain; and (iv) the distance between residues 36 and 129, which reflects the closeness of the NMP and LID domains (Figure C). The time-dependent changes in these distances are well-fitted using single exponential functions, from which we obtained the time scales for d(28–71), d(18–203), d(122–159), and d(36–129). The values are ∼, ∼, ∼, and ∼. The corresponding rates of formation for the NMP domain, global molecule, LID, and interface between NMP and LID domains are 3, 472 s–1, 255 s–1, 2, 342 s–1, and 584 s–1. These calculations show that the NMP and LID domains form early in the folding process, and their interface forms before collapse. The distributions of these four distances at different times, shown in Figure S5, provide a more detailed picture of the assembly dynamics of different regions of ADK. The results in Figure S5 show that there is a great deal of dispersion in the ordering of various parts of the ADK structure.

Parallel Pathways and Kinetic Intermediates

We calculated the fraction of native contacts of each domain, QNMP, QLID, and QCORE, from the 100 folding trajectories. Using these as progress variables for the folding reaction, we find that ADK folds by multiple parallel pathways. The NMP and the LID domains fold cooperatively in a two-state manner, albeit at different rates, while the CORE domain folds through 5 successive stages, which is illustrated using a sample folding trajectory at the bottom right of Figure . In each of these stages, the CORE domain becomes increasingly ordered with acquisition of the nativelike structure occurring in the final stage. Therefore, ADK could fold through 2 × 2 × 5 = 20 states. However, in the 100 folding trajectories, only 13 states are kinetically populated. We classify these as LL1, HL1, LH1, HH1; LL2, HL2, LH2, HH2; HL3, LH3, HH3; HH4; and HH5. The first letter in the 3-letter notation represents the state of NMP domain: “L” means that the value of QNMP is low, and the NMP domain is unfolded. “H” means that the value of QNMP is high, and the NMP domain is folded. The second letter represents the state of LID domain, and the third letter stands for the state of the CORE domain. Labels “1–5” denote different levels for the values of QCORE. LL1 is the starting unfolded state, and HH5 is the final folded state. There are 15 distinct folding pathways found in the generated folding trajectories (see Table S1 in the SI).

Figure 5

Network of connected states accessed during folding kinetics and parallel pathways. Each state is colored according to the average fraction of the native contacts formed at each residue. Color code: blue, structured; red, unstructured. The 12 most probable folding pathways are represented by the colored arrows with the line widths representing the probability of each folding pathway. The panel on the bottom right shows one representative folding trajectory. The hierarchy of assembly of the domains is clear. The NMP and LID domains form prior to the formation of the CORE domain. A sequence of transitions (1 to 5) drive consolidation of the folded ADK. The fluxes through the 13 states follow a complex pattern, as illustrated in Figure , where each state is colored according to the average fraction of native contacts formed at each residue. As observed in the single-molecule experiment,[5] and in a previous coarse-grained thermally triggered folding simulation,[16] the folding trajectories might involve transitions between the distinct states, thus introducing loops in the folding pathways. For simplicity, we removed these loops from the figure, and only the 12 most probable pathways are shown in Figure . From this folding flux diagram, we find that the early stage in the folding reaction for ADK is very plastic while the late stage is more restricted, which reflects the narrowing of the folding free energy landscape to the native state. In addition, there is a pathway that directly connects the globally unfolded state (LL1) to HH2 from which folding to the native state (HH5) occurs sequentially (Figure ). It is likely that the fluxes through the metastable states could be altered by changing the external conditions.[5]

Thermal and Kinetic Networks Are Similar

To illustrate the structural similarities between the 7 thermal substates with significant populations and the 13 substates identified from the kinetic folding trajectories, we computed the average fraction of native contacts formed by every residue, fQ, for the 20 states. For each thermal substate, we searched for the kinetic state that has a high degree of correlation (exceeding 0.9) between thermal and kinetic fQ. For the thermal substates S2 and S9, we could not find suitable matching kinetic states, which means that these two substates are not sampled in the kinetic folding trajectories. The other 5 thermal states correlate with the kinetic states (Figure ). The correlation between S8 and HL3 is very high (R = 0.99, Figure D). We surmise that S8 and HL3 are structurally similar (in short, S8 ∼ HL3). Likewise, we find S11 ∼ HH3 (R = 0.93, Figure E), S1 ∼ LL1 (R = 0.96, Figure A), S7 ∼ HL1 (R = 0.98, Figure B), and S12 ∼ HH5 (R = 0.94, Figure C). The high degree of correlation for the major states during folding shows that in both thermal and kinetic folding, very similar states are sampled. The connectivity between these states, which would define the folding pathways, could vary and may readily be altered by changing the [GdmCl].[5]

Figure 6

Comparison between thermal and kinetic states. (A) Correlation between fQ’s for S1 and LL1. The correlation coefficient R = 0.96. (B) Correlation between fQ’s for S7 and HL1 with R = 0.98. (C) Correlation between fQ’s for S12 and HH5 with R = 0.94. (D) Correlation between fQ’s for S8 and HL3 with R = 0.99. (E) Correlation between fQ’s for S11 and HH3 with R = 0.93.

Discussion

We have shown that the strategy developed here that combines a coarse-grained SOP-SC model for the multidomain protein and a phenomenological theory that takes the effects of denaturants into account (GdmCl in the present work) accounts quantitatively for many aspects of folding of ADK, which is an example of MMP. The key results are as follows: (i) The calculated folding times, corrected for stability at the small value of [GdmCl] (= 0.3 M), are in excellent agreement with experiment. (ii) The order of events, as assessed by the time-dependent changes in the distances between specific residues (Figure C), reproduces the ensemble averaged FRET experiments.[6] In particular, the finding that the intramolecular distance between residues 28 and 71 reaches the value in the native state extremely rapidly before any global structure acquisition accords well with experiments.[6] In addition, we find that the slow steps in the consolidation of the native fold involve reduction in the distances between 36 and 129 (residues in the NMP and LID domains) and between 18 and 203 reaching the values in the folded state. Although both Q18 and A203 are in the core domain, the rate of approaching the Q18–A203 distance in the folded state is slow and occurs only upon global folding, a conclusion that is also in accord with ensemble FRET experiments.

Equilibrium Collapse under the Folding Condition

There is considerable interest in determining the size, , of the structures in the UBA under folding conditions, determined by low denaturant concentration. The current consensus is that the contracts continuously as the [GdmCl] is decreased from a high to low value (see, for example, ref (30)). We calculated the dependence of as a function of [GdmCl] using the ensemble of conformations that belong to only the UBA. The results in the inset of Figure B show that, as expected on general grounds, decreases continuously from ∼57.8 to ∼49.8 Å as the denaturant concentration decreases. This corresponds to a contraction of about 14%, relative to the size at high [GdmCl]. It is worth noting that the percent decrease is in line with that found in other proteins.[30]

Cooperativity

Communication between domains during the assembly of ADK, as the temperature is decreased, is dramatically different from the all-or-none behavior normally observed in ensemble experiments in single domain proteins[31−36] or in the folding of SMPs. We expect, based on the structure of ADK (Figure C), that the NMP and (possibly) the LID domains could order nearly independently in a two-state manner because their sequences are contiguous in sequence and in the arrangement of the secondary structural elements. This expectation is borne out in the plots in Figure A, which shows that the more stable NMP domain melts at the higher temperature . Rao and Gosavi[17] also came to similar conclusions. Similarly, a two-state like, albeit with less cooperativity, is observed in the folding of the less stable LID domain whose melting temperature is . The near independence of their folding is also reflected in the free energy profiles shown in Figure S3 in the SI. Comparison of Figure S3A,D shows that the NMP domain forms independently of the LID domain. At both and , the average value of QNMP exceeds 0.5 while the value of QLID remains small until the temperature is reduced below (Figure A). In other words, the formation of the NMP domain does not induce order in the LID domain. Similar conclusions may be drawn by comparing the temperature-dependent profiles for QNMP and QCORE shown in Figure A and the free energy profiles shown in Figure S3B,E. The NMP and CORE domains assemble independently, which indicates that there is little communication between the two domains. In contrast, from the perspective of both thermodynamics and kinetics, the folding of the LID and CORE domains are intertwined. As the temperature decreases below , the ⟨Q⟩ for the CORE increases, and only when it reaches ∼0.5 is there a sharp increase in ⟨Q⟩ for the LID (see the green curve in Figure A). The results in Figure S3C,F also suggest that at orders in the CORE and the LID domain are coordinated in the sense that a decrease in the free energy associated with the CORE domain also results in an increase in the stability of the LID domain. In other words, folding of the LID domain is slave to acquisition of certain order in the CORE domain. The importance of cooperative interactions between the LID and CORE domain, which is a consequence of contacts between the reentrant secondary structural elements in the folded state (Figure S1), was previously established[17] by comparing the equilibrium energy profiles of ADK and circular permutants. Finally, we note that, even after folding is complete, the LID domain is less stable compared to the other two domains.[23] The decreased stability of the LID domain might be a conduit to facilitate allosteric transitions.[37,38] Based on optical tweezer experiments on T4 lysozyme,[4] which also harbors a reentrant helix like ADK, it has been argued that cooperativity between the distant parts of the chain is needed for stability. It is tempting to speculate that interwoven discontinuous chain topology in MMPs might be an evolutionary consequence not only for stability but also for functional purposes.

Pathways

Both equilibrium, thermal melting and denaturant-induced unfolding show that multiple states are sampled as the transition from the folded state to the unfolded state occurs in ADK. In addition, the refolding kinetics reveals that an intricately connected network of metastable states is involved in the route to the folded state. The fluxes through these states are dramatically different, which suggests that refolding is heterogeneous. The heterogeneity in the folding pathway has been shown in smFRET experiments,[5,10] which established that the complexity of the pathways increases as GdmCl concentration increases. The kinetic simulations further support the conclusion reached in experiments. Because smFRET uses only a one-dimensional coordinate for the structures, the metastable structures could not be determined. Our simulations (Figure ) reveal that the folding pathways are highly complex and heterogeneous and give possibly an even more nuanced picture than that suggested based on analyses of the experimental data.[5,10] We find that, at the late stages of folding (HH3 → HH4 → HH5), the contacts in the NMP and the LID domain are fully formed, but various secondary structural elements in the CORE domain have not fully folded (see the bottom right panel in Figure ). In the HH3 → HH4 transition, the helix given in orange in Figure C becomes ordered, and in the HH4 → HH5 transition, the helix displayed in red in Figure C is acquired, resulting in an increase in QCORE (see the blue line in the sample folding trajectory in Figure ). Although the late stages of folding occur sequentially (HH3 → HH4 → HH5), folding pathways are heterogeneous before HH3 forms. Figure vividly illustrates that there are multiple routes to the formation of HH3. In some of the pathways the LID domain forms first, but in others, the NMP domain forms before the LID domain. The network is multiply connected in the sense HH2 can be accessed through LL1 → HH2 or by the pathway LL1 → LH2 → HH2. Such a complex network of pathways through which the fluxes could dramatically change does not exist in single domain proteins or possibly in SMPs.

Rules for the Folding of Multidomain Proteins

Some general lessons about multidomain proteins emerge from the current work when integrated with previous studies. (1) The domains in the SMPs assemble almost independently with the stability being determined by interface between neighboring constructs. For instance, in the ankyrin repeat proteins (Figure A), folding is triggered by interaction between domains i and i + 1, which propagates until assembly is complete.[39] (2) In other homo-oligomeric SMP complexes, the interactions at the interfaces contribute most to the stability, which implies that the nature of residues at the junction of domains must play a key role. From a kinetic perspective, the orientations of the domains are significant, as shown in the assembly of the allosteric tetrameric protein, l-lactate dehydrogenase.[40] In these cases, the free energy of stability is approximately the sum of the individual domains and an additional gain in the interface formation with correct relative orientation. (3) In contrast, in the MMPs in which the domains are discontinuous from the sequence and SSE perspective, communication between domains is most relevant. That this is the case has been shown in pulling experiments on T4 lysozyme[4] in which a reentrant helix stabilizes interaction between the two domains. Similarly, the C-terminal helix α9, the N-terminal helix α1, and the strand β3 play the analogous role in ADK (Figure S1 in the SI). We speculate that the enhanced stability due to the interwoven contacts might be needed to minimize fluctuations in the apo folded state (see Figure S1 in the pulling experiment[23]) in order to facilitate the closed to open transition in the LID domain, which is required for function. (4) During the refolding kinetics, the continuous domains fold before the discontinuous domains. We do not find that the CORE domain is formed before the other two. The same conclusion was reached in the refolding of the two dihydrofolate reductases[41] in which the continuous adenosine loop domain always folds before the discontinuous loop domain. Despite such a stringent requirement in the order of folding of the domains, there are many metastable states that are visited during the early stages of folding (Figure ), which attests to the plasticity of the folding landscape.[5]

Concluding Remarks

The general conclusion that emerges from this study is that cooperative interactions in multidomain proteins, with discontinuity in sequence and interactions between reentrant secondary structural elements that stabilize the native fold, arise late in the folding process. The early stages of folding and assembly are highly dynamic. A number of predictions, such as the dependence of the melting temperatures on [GdmCl], order of formation of the domains, and interplay between formation of the three domains, could be experimentally tested. The more subtle structural changes might require additional single-molecule experiments along the lines initiated recently.[5] It is likely that similar rules also hold in the self-assembly of ion-driven folding of ribozymes (RNA molecules that function as enzymes), which are composed of many domains.[42,43] In Azoarcus ribozyme, several of the domains could fold independently. However, some parts of the sequences are interwoven, and cooperative interactions between domains that harbor these domains typically occur only above the midpoint of the ion concentration (usually Mg2+). It would be interesting to use single-molecule pulling experiments[4] to dissect interdomain interactions in RNA.

Methods

SOP-SC Model

We carried out simulations using the SOP-SC (self-organized polymer-side chain) model for the protein.[14,44] Each residue is represented by two interaction beads with one located at the Cα position and the other at the center of mass of the side chain. The SOP-SC energy function is The detailed functional forms for VFENE, , VNEI, and and the values of the parameters are described elsewhere.[14]

Molecular Transfer Model (MTM)

Currently the MTM is the only available computational method that accurately predicts the outcomes of experiments and provides the structural basis for folding as a function of denaturants[15,45,46] and pH.[47] In the MTM, whose theoretical basis is provided in a previous study,[44] the effective free energy function for a protein in aqueous denaturant solution is given by In eq , ΔG({r}, [C]) is the free energy of transferring a given protein conformation from water to an aqueous denaturant solution with [C] being the concentration. The sum in the above equation is over all of the beads, and δg(i, [C]) is the transfer free energy of the interaction center i; α is the solvent accessible surface area (SASA) of the interaction center i, and α is the SASA of the interaction center i in the tripeptide Gly–i–Gly. We used the procedure described previously[14,44] to calculate the thermodynamic properties of proteins in the presence of denaturants.

Langevin and Brownian Dynamics Simulations

We assume that the dynamics of the protein is governed by the Langevin equationwhere m is the mass of a bead, ζ is the friction coefficient, Fc = –∂EP({r})/∂r is the conformational force calculated using eq , and Γ is the random force with a white noise spectrum. We performed Langevin simulations using a low friction coefficient ζ = 0.05m/τL.[22] The equations of motions were integrated using the Verlet leapfrog algorithm. To enhance conformational sampling, we used replica-exchange molecular dynamics (REMD).[19−21] In order to simulate the folding kinetics, we set ζ = 50m/τL, which approximately corresponds to the value in water.[25] At the high ζ value, we use the Brownian dynamics algorithm[48] to integrate equations of motion using

Data Analysis

We identify the melting temperature as the peak position in the specific heat as a function of temperature, . The structural overlap function, ,[9] is employed to monitor the folding/unfolding reaction, where In eq , Θ(x) is the Heavyside function. If = 2 Å, there is a contact. N is the number of contacts in the kth conformation, and NT is the total number of contacts in the folded state. The microscopic order parameter of the protein, χ, is used to distinguish between the native, unfolded, and intermediate states.

43 in total

1. Determination of intramolecular distance distribution during protein folding on the millisecond timescale.

Authors: V Ratner; M Sinev; E Haas
Journal: J Mol Biol Date: 2000-06-23 Impact factor: 5.469

2. Watching proteins fold one molecule at a time.

Authors: Elizabeth Rhoades; Eugene Gussakovsky; Gilad Haran
Journal: Proc Natl Acad Sci U S A Date: 2003-02-28 Impact factor: 11.205

3. The nature of folded states of globular proteins.

Authors: J D Honeycutt; D Thirumalai
Journal: Biopolymers Date: 1992-06 Impact factor: 2.505

Review 4. Protein folding studied by single-molecule FRET.

Authors: Benjamin Schuler; William A Eaton
Journal: Curr Opin Struct Biol Date: 2008-01-24 Impact factor: 6.809

5. Early closure of a long loop in the refolding of adenylate kinase: a possible key role of non-local interactions in the initial folding steps.

Authors: Tomer Orevi; Eldad Ben Ishay; Menachem Pirchi; Maik H Jacob; Dan Amir; Elisha Haas
Journal: J Mol Biol Date: 2008-11-05 Impact factor: 5.469