Literature DB >> 34478435

The N-terminal domain of RfaH plays an active role in protein fold-switching.

Pablo Galaz-Davison^1,2, Ernesto A Román^3,4, César A Ramírez-Sarmiento^1,2.

Abstract

The bacterial elongation factor RfaH promotes the expression of virulence factors by specifically binding to RNA polymerases (RNAP) paused at a DNA signal. This behavior is unlike that of its paralog NusG, the major representative of the protein family to which RfaH belongs. Both proteins have an N-terminal domain (NTD) bearing an RNAP binding site, yet NusG C-terminal domain (CTD) is folded as a β-barrel while RfaH CTD is forming an α-hairpin blocking such site. Upon recognition of the specific DNA exposed by RNAP, RfaH is activated via interdomain dissociation and complete CTD structural rearrangement into a β-barrel structurally identical to NusG CTD. Although RfaH transformation has been extensively characterized computationally, little attention has been given to the role of the NTD in the fold-switching process, as its structure remains unchanged. Here, we used Associative Water-mediated Structure and Energy Model (AWSEM) molecular dynamics to characterize the transformation of RfaH, spotlighting the sequence-dependent effects of NTD on CTD fold stabilization. Umbrella sampling simulations guided by native contacts recapitulate the thermodynamic equilibrium experimentally observed for RfaH and its isolated CTD. Temperature refolding simulations of full-length RfaH show a high success towards α-folded CTD, whereas the NTD interferes with βCTD folding, becoming trapped in a β-barrel intermediate. Meanwhile, NusG CTD refolding is unaffected by the presence of RfaH NTD, showing that these NTD-CTD interactions are encoded in RfaH sequence. Altogether, these results suggest that the NTD of RfaH favors the α-folded RfaH by specifically orienting the αCTD upon interdomain binding and by favoring β-barrel rupture into an intermediate from which fold-switching proceeds.

Entities: Chemical

Mesh：

Substances：

Year: 2021 PMID： 34478435 PMCID： PMC8454952 DOI： 10.1371/journal.pcbi.1008882

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

Introduction

The NusG/Spt5 family of transcription regulators is universally conserved in all three domains of life. E. coli NusG displays two domains in its structure, named N-terminal (NTD) and C-terminal domains (CTD) due to their location in the sequence [1]. The NTD is structurally conserved, folding as an α/β sandwich containing an hydrophobic depression that serves as binding site for the RNA polymerase (RNAP) [2], whereas the CTD folds as a small β-barrel that recruits the ribosome for coupled transcription-translation as well as other partners that regulate transcription (Fig 1) [3-5].

Fig 1

Schematic representation of the folding states of NusG (top) and RfaH (bottom) upon binding to and release from the transcription elongation complex (TEC). For RfaH a fold-switch is involved in this process, in which the steps after release from the TEC corresponding to partial unfolding into a β-intermediate and transiting the unfolding state before refolding into the autoinhibited state are based on the results presented in this article. The elongation factor RfaH of E. coli is a clear outlier of the NusG family of transcription factors, having an NTD with the canonical protein family structure but a CTD that is folded as an α-helical hairpin rather than the classical β-barrel [6]. This conformation makes up the autoinhibited state of RfaH, as the α-folded CTD is blocking the RNAP binding site located at the NTD and impedes the spontaneous binding to the transcription elongation complex (TEC), i.e. the RNA polymerase in complex with DNA and RNA [6]. This autoinhibition is relieved when the transcribing polymerase pauses at a DNA sequence named operon polarity suppressor (ops) [7], whose exposed non-template strand forms a DNA hairpin acting as a recruiting partner for RfaH to the RNA polymerase [8-10], promoting interdomain dissociation and NTD binding to the β and β’ subunit of RNAP [11,12]. Strikingly, the dissociated CTD refolds from the initial α-hairpin to a canonical β-barrel which serves as recruiting partner to the ribosomal protein S10, coupling transcription with translation (Fig 1) [3,11,13]. Numerous studies have addressed the metamorphosis of RfaH through a computational approach, in part due to the difficulties of observing the process in solution since the trigger for RfaH interdomain dissociation is the entire TEC. There have been reports indicating the possible pathways through which the isolated CTD may refold from the α- to the β-fold [14-17], which differ from the ones proposed when the CTD is accompanied by the NTD [18-20]. These results suggest that interactions formed between both domains strongly aid in stabilizing the α-fold as well as forming intermediate states that enable the transition between folds [18]. Nevertheless, these studies have focused mostly on the CTD transformation, leaving aside the details of how the NTD stabilizes the α-fold or its effects over the β-folded CTD after release from the TEC. The specifics of NTD-induced energetics on RfaH are not trivial, since the structure of RfaH-NTD [10] displays a more hydrophobic patch than that of NusG [11,21], which has been simultaneously associated to a tighter binding to RNAP, being RfaH NTD the only trigger required for fold-switching back from the active into the autoinhibited state [22]. In this work, we relied on the Associative Water-Mediated Structure and Energy Model (AWSEM) to determine the effect that the NTD of RfaH has on the overall transformation energetics and the configurational space of both folds. AWSEM is a transferable force field, coarse-grained to three beads per residue (Cα, Cβ and O), initially used to predict protein structure [23]. As a force field, it has been successfully used to study the NF-κB/IκB/DNA regulatory system [24], the nucleosome dynamics and energetics [25] and to determine the energy landscape of aggregation of the amyloid-β protein [26], among others. Unlike common atomistic force-fields, its energy potentials and granularity have been developed for efficiently explore protein folding while robustly carrying enough information to represent up to the dihedral behavior of the main chain. This is a significant step up from our previous works on RfaH using a structure-based Cα model [18], as not only we are now reducing the granularity but also increasing the roughness of the energy surface by including potentials for hydrogen bonding and solvent exposure propensity of each residue as well as residue-residue pairwise potentials that consider residue identity [23]. Using umbrella sampling, we determined the change in stability associated to interdomain separation and subsequent fold-switching, recapitulating the experimentally determined equilibrium of the system. That is, RfaH is much more stable in the α-configuration, but the β-folded CTD becomes much more stable in the absence of the NTD. Further temperature refolding simulations in the absence of information of known interdomain contacts showed that the highly hydrophobic side of the α-folded CTD consistently looks for an interaction partner and the NTD provides a suitable surface for its stabilization, recapitulating the binding orientation experimentally observed in solved structures of the autoinhibited state of RfaH. At the same time, the NTD interferes with βCTD refolding by mostly trapping it into a β-barrel intermediate, which is also observed in its metamorphic pathway. Altogether, these results suggest that the NTD favors the CTD transformation towards the α-folded CTD by simultaneously stabilizing the α-hairpin and switching the equilibrium to favor β-barrel rupture into a β-intermediate state that is part of the refolding pathway towards its autoinhibited state.

Methods

Initial structures for molecular dynamics

The structure of the full-length RfaH protein in its α-state (αRfaH hereafter) was extracted from the crystal structure deposited in the Protein Data Bank (PDB) with accession ID 5ond, and so was the α-folded CTD (αCTD hereafter). The isolated β-folded CTD (βCTD hereafter) was extracted from the first NMR solution model of the PDB accession ID 2lcl, whereas the full-length version of the active β-folded RfaH (βRfaH hereafter) was extracted from the cryoEM RfaH:TEC structure with PDB accession ID 6c6s. On the other hand, the isolated CTD of NusG was extracted from the first model of the NMR-determined structure with PDB accession ID 2jvv.

The AWSEM force field

The Associative Water-mediated Structure and Energy Model, AWSEM, [23] is a coarse-grained molecular dynamics (MD) protein folding model implemented in LAMMPS [27]. The granularity and efficiency of this model is achieved by reducing the number of atoms per residue to three beads, the Cα, Cβ and O atoms, with the rest of them being calculated from ideal backbone geometry. This model contains five energy terms, which are extensively described in the work by Davtyan and cols [23] and are briefly summarized below: Of these terms, the backbone energy term guides the atoms to a protein-like geometry, which is achieved using potentials that ensure atom connectivity, chirality, Ramachandran distribution, and excluded volume interaction. The contact term defines Cβ-Cβ distances and is responsible for the formation of residue-residue interactions in an amino acid-dependent manner. This potential includes pairwise direct contact potentials and many-body water-mediated contact potentials. The burial energy term is a many-body interaction potential that regulates solvent exposure of the protein core, depending on whether a residue has propensity to be in a low, medium or high-density environment. The hydrogen bonding term replicates the contacts of carbonyl oxygen to amide nitrogen formed in α-helices, parallel β-sheets and anti-parallel β-sheets. This potential includes additive terms for hydrogen bonding and cooperative stabilization terms for β-sheets, which we modified such that sheets of a minimum length of 3 residues can form, as the shortest strands observed in the β-barrels of NusG and RfaH are of this length. Finally, the memory term is a local bias applied to overlapping fragments from 3 to 9 residues that guides Cα and Cβ distances to those of a reference structure, being the only native bias that is used in these simulations. This potential has the form: In this equation, the outer sum is carried out over all the aligned memory fragments, i.e., all short overlapping segments that share high sequence identity to a library of known proteins structures, with ωm corresponding to the memory weight. The inner sum is carried out over the Cα and Cβ i,j pairs that are separated by at least 2 residues, with r being the distance between the atoms and r the distance in the reference fragment. Finally, λ corresponds to a scaling factor of the strength of this potential relative to the other terms. This potential can be guided to multiple structures in a simulation, or as used in this work, limited to a single or two reference structures [23]. Also, the λ used in this work is of 0.3 compared to the default value of 0.2, resulting in a higher cooperativity due to a decrease in the roughness of the final potential.

Calculation of Q and umbrella sampling

Normally, MD simulations sample configurations that are very close to the initial structure, hence observing structural transitions such as RfaH fold-switching would be a rare event that would require a very long simulation time. A way to overcome this is by using enhanced sampling strategies, such as umbrella sampling. This technique enables exploring poorly sampled regions of the configurational space by applying an external bias along a reaction coordinate that describes the transition between both RfaH folds. Generally, this external bias corresponds to a harmonic potential that is applied to multiple different reaction coordinate values, such that different simulations thoroughly sample a narrow phase space while ensuring the potential energy overlap between simulations at adjacent values along the reaction coordinate. The potential energy and reaction coordinate values from multiple independent simulations are then used as input for the Weighted Histogram Analysis Method (WHAM) [28] that returns the unbiased free energy landscape of RfaH fold-switching. For the umbrella sampling method, 51 simulations of 2.4∙107 timesteps or 120 ns each were run, and energy and frames were collected every 1,000 timesteps. The initial configuration was that of the unfolded isolated CTD and a dual memory approach was used, i.e., the fragments were driven to the memory of αCTD and βCTD with equal strength. Similarly, for the full-length protein the initial state was that of the folded NTD plus unfolded CTD. The simulations sampled fractions of an order parameter called Q which corresponds to [26,29]: Where N is the sequence length, qA and qB are constants obtained by evaluating the q function in the two structures to which the transition is to be interpolated, and rij measures the Cα distance between residue i and j in the simulation, where superscripts A and B refer to such distance in each reference structure. This is evaluated for all contacts between j>i+2 residues whose Cα are at 9.5 Å or below in at least one of the reference structures. This distance is calculated from a Cα-Cα distance matrix for RfaH (S1 Fig) or the isolated CTD. In the case of the full-length protein, the NTD was excluded from the calculations for the autoinhibited and active RfaH configurations, as it does not experience a conformational change during RfaH activation. Interdomain contacts in the starting structure for βRfaH were also excluded. These exclusions were achieved by increasing the residue-residue distances within the NTD and between the NTD and CTD of βRfaH in the distance matrices to 99 Å. Using the Q value, a bias is applied by adding a new potential to the system with the form: Where k is the harmonic potential constant, here 1,500 kcal∙mol-1, and Q being the center of the distribution of a Q value ranging from 0.00 to 1.00 by increments of 0.02. From these simulations the potential energy and Q values were obtained for each frame, as well as the Cα RMSD of the best-fit against both reference CTD folds that were calculated using VMD [30]. The simulations exploring the same Q range were run at two temperatures, 650 K and 750 K, and the AWSEM temperature units were expressed as folding temperature (Tf) by expressing these temperatures relative to the folding temperature of full-length αRfaH (~650 K). Histograms of these quantities show overlap between simulations at adjacent Q values (S2 Fig). The RMSD against αCTD and βCTD were then used as reaction coordinates for thermodynamic analysis using the WHAM algorithm [31] implemented in Java [32]. For this analysis, the first 4,000 frames or 20 ns were excluded as this was the equilibration time from the unfolded state to the desired biased configuration.

Refolding simulations

For these simulations, random initial unfolded configurations for each system were generated by running 100,000 timesteps of 5 fs of a simulation without any potential but the backbone energy term, saving a simulation restart configuration every 10,000 timesteps. The restart configuration with the lowest Q value, which in all cases was below 0.1, was used as a starting configuration for the refolding simulations. All 100 simulations were randomly assigned initial velocities and run for 3·107 timesteps of 5 fs, totaling 150 ns each, during which the temperature linearly decreased from 1.5Tf to 0.6Tf, where the temperature is expressed relative to the folding temperature (Tf) of αRfaH (S3 Fig), the predominant state in solution for full-length RfaH. All constructs were completely unfolded at the initial temperature and either completely refolded, trapped into an intermediate state or misfolded at the final temperature. The final structures of these simulations were clustered by calculating pairwise best-fit RMSD [33] using Chimera [34]. For the representative member of each cluster, as well as for non-clustered models, the secondary structure assignment was calculated using STRIDE [35]. These secondary structure assignments are summarized in S1 Table alongside the corresponding Q, which is a measure of structural similarity to a given structure and obtained using the formula [36]: Where, similarly to Q, r measures the Cα distance between residues i and j for the current and reference (superscript N) structure, given that the distance in the latter is lower than 9.5 Å, and N stands for the number of residues in the protein.

Results

MD simulations of RfaH and its isolated CTD recapitulate their experimental states

The simplest question that can be asked to an energy model about RfaH is whether it can replicate the experimentally observed CTD populations of α and β folds. More precisely, the strong predominance of αRfaH for the full-length protein in solution [6,10], and of βCTD when this domain is isolated as the result of the NTD-CTD linker being cleaved or by purifying only this domain in solution [3]. To explore this scenario, we set up umbrella sampling simulations that guide the transformation of RfaH for two systems: one in which we modeled the transition in the context of the full-length protein, that is αRfaH and βRfaH, and another in which only its CTD is modeled transitioning between αCTD and βCTD. Specifically, 51 umbrella simulations were generated for each system at two temperatures, 1.0 and 1.15Tf, where each simulation is energetically biased to explore a fraction of the configurations determined by a reaction coordinate named Q, resulting in a gradual exploration of the configurational space between the α- and β-states of either full-length RfaH or its isolated CTD. This exploration of the transformation was then analyzed using WHAM [31], and the heat capacity was visually inspected (S3 Fig). To evaluate the change in stability between RfaH folds, free energy surfaces were calculated at a temperature just below the first peak in heat capacity for each system to ascertain the preferred folded state (Fig 2A and 2B).

Fig 2

Energetics of RfaH transformation.

Energetics of RfaH transformation.

(A, B) Free energy surface for the transformation of RfaH CTD in the full-length protein (A) or the isolated domain (B). The RMSD against the experimental αCTD and βCTD were used as reaction coordinates. (C) Free energy surface of the transitions of RfaH CTD in the context of the full-length protein with folded NTD or the isolated CTD, projected onto the transformation reaction coordinates Q and RMSD β-α. Here, βI corresponds to a folding intermediate, and U corresponds to the unfolded state. The isolated CTD free energy surface displays two minima of similar free energy at low RMSD of βCTD and a higher free energy minimum at low RMSD of αCTD (Fig 2A). This suggests that the isolated CTD (residues 100 to 162) exist predominantly as a β-barrel, and it needs to cross an energy barrier of over 50 kcal∙mol-1 to reach the α-folded state. On the other hand, the energy landscape of the CTD in the context of the full-length protein displays a major free energy minimum that expands between 1 and 4 Å in RMSD to αCTD (Fig 2B), indicating that RfaH exists predominantly in the autoinhibited state. These results are consistent with the experimental evidence for full-length RfaH and the isolated CTD in solution [3]. The fold-switching path explored in our simulations is best observed when projecting the free energy surface onto coordinates that directly measure the structural transition of RfaH CTD, such as Q and the difference in RMSD between βCTD and αCTD. These transitions, shown in Fig 2C, were obtained at temperatures where the peak in heat capacity is observed for each system (S3 Fig). In the case of the isolated CTD, the first peak in heat capacity is observed at 0.95Tf and corresponds to the transition between the folded βCTD and a folding intermediate. Meanwhile, a second peak in heat capacity is observed at 1.15Tf and corresponds to the transition between the β-intermediate and the unfolded state. In the first of these landscapes, the αCTD minimum is shown as a high and broad free energy minimum similarly to its basin observed in Fig 2B, a characteristic that likely arises from the structuredness of the helices, which have been ascertained in both simulations [18,37] and experiments [38]. By projecting the free energy into a single coordinate, namely Q, the energy barriers involved in the fold-switching process can be observed more clearly (S4 Fig). The transition between the β-barrel and β-intermediate has an estimated free energy barrier of 6.4 kcal/mol, whereas the transition between the β-intermediate and the unfolded minimum has a free energy barrier of mere 1.5 kcal/mol. At 1.15Tf only the β-intermediate and unfolded states are observed, while at 0.95Tf the transition to the αCTD is better observed, separated by a free energy barrier of 30 kcal/mol with a transition state sitting at unfolded configurations. In the free energy surfaces for the full-length RfaH protein only one transition is observed at its Tf. By analyzing its free energy barriers, it can be noted that a transition occurs between Q 0.7 and 0.9, with a barrier of 4.4 kcal/mol. Closer inspection of the structural characteristics of this second minimum show that it has a RMSD of around 2.5 Å to αCTD, indicating that the cooperative decrease in Q is explained by the dissociation and partial rupture of the αCTD. The second energy barrier observed separates the folded state from the unfolded configurations and has a similar energy of 4.6 kcal/mol. The free energy basin for βRfaH is not observed at this temperature. Altogether, these results recapitulate the experimentally predominant folded state for each simulation system in solution, which is separated by a significant energy gap from their alternative native states. Our results also show that both folded states of RfaH are connected by the unfolded state as well as by a hypothetical three-strand intermediate observed in the simulations for the isolated CTD, thus proposing the following fold-switching mechanism:

The NTD of RfaH strongly stabilizes the α-fold and hinders proper βRfaH refolding

One disadvantage of the umbrella sampling simulations is that it directly employs the number of native contacts of the system in αRfaH and βRfaH as collective variables to drive the structural interconversion of RfaH. Then, it becomes difficult to calculate the likelihood of other configurations that, albeit having a significant number of native contacts, may also display an important number of non-native contacts that could be relevant for its stabilization. Consequently, one is unable to directly evaluate, for example, how the appropriate binding configuration between the NTD and CTD is guided by sequence features in RfaH. AWSEM allows to restrict the use of structural biases only towards local-in-sequence interactions by using the fragment memory potential that limits the configurational exploration of short segments of the protein to those of a reference structure [23]. By not providing information about contacts between the NTD and CTD, these simulations freely explore the interdomain interaction landscape. A similar simulation strategy has been previously employed to correctly predict binding interfaces of both homodimers and heterodimers [39]. Using a temperature gradient through long MD simulations (3·107 timesteps of 5 fs, compared to previously reported folding annealing simulations of 4·106 timesteps [23] and 6·106 timesteps [40]), 100 models with fragment memory to a single reference structure were allowed to refold starting from random unfolded conformations (Q < 0.1). In these single-memory models, only the NTD and CTD of RfaH, but not the linker connecting both domains, were given memory, and these memories are withdrawn from a single reference structure, either αRfaH or βRfaH. This approach leaves the linker that connects both domains with a major conformational freedom and results in the C- and N-terminal domains being structurally uncoupled, as the 10-residue long connector that exist between them is not part of the structural bias and therefore disrupts memory continuity. Therefore, any interdomain interaction formed in these simulations is the result of stabilizing residue-residue contacts encoded by the transferable part of the AWSEM force field, and not due to fragment memory or any other external potentials to favor its exploration. Using this approach, we simulated the refolding of αRfaH and calculated the amount of native tertiary contacts reached at the end of the simulation (Fig 3 and S1 Table).

Fig 3

Refolding efficiency of αRfaH.

Refolding efficiency of αRfaH.

(A) Distribution of tertiary contacts (Q) in the final structure of the 100 refolding simulations generated for αRfaH using a single memory. (B-C) Representative final structures after αRfaH refolding with high (B) and low (C) Q respectively. The images are colored in gradient from red (N-terminus) to blue (C-terminus). Refolding simulations employing αRfaH as the single memory reference structure show that 81% of the trajectories reach the native state (Q = 0.75, Fig 3A). These predicted structures are characterized by the proper orientation and binding of the αCTD against the NTD (Fig 3B), recapitulating the experimentally solved structure of RfaH in its autoinhibited state [6], and is compatible with the observation that the full-length protein successfully refolds to this state on its own [22]. This specificity is achieved despite the lack of structural biases on the interdomain interface and linker regions, and thus a result of sequence determinants in both the NTD and CTD of RfaH encoding this behavior. In fact, the linker is not stabilized in a particular conformation (Fig 3B) and does not form stable contacts with any domain. In all other trajectories the interdomain interface is formed incorrectly, although both the NTD and αCTD reach their native conformations mostly due to the fragment memory bias. Observation of the refolding traces (S5 Fig) show that the αCTD is only stabilized upon or after NTD folding, suggesting that the NTD is responsible for the stabilization and orientation of the αCTD. To further assess the effect of the NTD hydrophobic patch in CTD folding, the same refolding experiment was performed for βRfaH extracted from the cryo-EM structure. To enlighten the effect that the NTD could have on βCTD refolding, the resulting structures are compared with equivalent refolding of the solved structure of the isolated βCTD. The results of βRfaH and βCTD refolding experiments are summarized in Fig 4 and S1 Table.

Fig 4

Refolding of βCTD in the context of the full-length protein and in isolation.

Refolding of βCTD in the context of the full-length protein and in isolation.

Representative final structures after βCTD refolding in the context of the full-length protein (A) and in isolation (B). The histograms represent the Q distribution of the final structures. The intermediate state is formed by the three largest β-strands that form the CTD barrel, namely strands β2, β3 and β4. For the isolated domain, the βCTD refolds with a similar efficiency than αRfaH (75%), with the remainder of the simulations reaching an intermediate state characterized by a lower Q, in which only the three larger β-strands of the barrel are folded (Fig 4B). In stark contrast, the presence of RfaH NTD reduces the CTD refolding efficiency to only 29%, whereas all other refolding trajectories become trapped in the same β-intermediate observed for the isolated βCTD. These results suggest that the stabilization of this intermediate is a result of specific NTD-CTD interactions established during the folding process of βRfaH. To determine that the βCTD intermediate is stabilized by specific interactions between both RfaH domains, a harmonic potential was used to maintain the NTD and CTD domains away from each other during refolding simulations of βRfaH. Upon keeping both domains apart throughout the simulation, the βCTD mostly refolds as if it was isolated, with 66% of cases achieving complete refolding (S1 Table). Also, two additional systems were used for refolding simulations: i) the isolated CTD of NusG, a protein that shares almost identical secondary and tertiary structure but lacks any observable metamorphic feature (S6 Fig and S1 Table), and ii) a chimeric protein connecting the NTD of RfaH with the CTD of NusG (S5 Fig and S1 Table), in which it is expected that no specific interdomain interactions are formed given the divergent evolution of RfaH and NusG [41]. Remarkably, when the isolated CTD of NusG and its fusion to RfaH NTD were used as input for refolding simulations, the totality of the simulations reached the β-folded state of NusG CTD, regardless of the presence of the NTD (S7 Fig). Although NusG CTD also traverses through a three-strand intermediate state during refolding, it does not become trapped in this configuration as it does the βCTD of RfaH (S6 Fig). Altogether, these data strongly suggest that an interruption in the β-barrel folding process is caused by specific interactions established between RfaH domains. To gain insights into what interactions are arising between the βCTD of RfaH and its NTD, the majority cluster of the intermediate-folded βRfaH was analyzed in more detail (Fig 5A). A Cα contact map with a threshold of 9.5 Å was calculated for the interaction between the β-intermediate and NTD, as well as αCTD and NTD. In this map, three distinct interaction regions between the βCTD intermediate and the NTD were identified (Fig 5B). Among these, one set comprises native contacts found in the α-fold, corresponding to residues that form the helix α2 of αCTD, or the loop between strands β3-β4 in the βCTD. Apart from this, the region comprising strand β1 (residues 114–123) contains most contacts with the NTD, all of which are absent in the autoinhibited state of RfaH.

Fig 5

Contact and frustration analysis of the RfaH β-intermediate and its interaction with the NTD.

Contact and frustration analysis of the RfaH β-intermediate and its interaction with the NTD.

(A) Superimposed structures of the αCTD (diffuse, red) and β-intermediate (yellow) on the aligned NTD (gray). The three major points of contacts are circled in different colors. (B) Contact map of the interdomain interface observed in αRfaH (blue) and in the β-intermediate (red). The number of highly (red) and minimally frustrated (green) contacts is shown for the CTD in isolation (dashed line) and in the context of full-length RfaH (solid line) for the completely folded CTD (C) or the β-intermediate (D). To get further information of the nature of the interactions established between the CTD and NTD, we calculated the per-residue tertiary contacts that are minimally or highly frustrated using the protein frustratometer [42]. For this end, the representative structure of the most populated cluster of the intermediate-trapped or completely refolded βCTD, both in isolation and in the context of full-length RfaH, were analyzed using the web version of the protein frustratometer (http://frustratometer.qb.fcen.uba.ar) (Fig 5C and 5D). When the CTD is successfully refolded, most of the minimally and highly frustrated contacts in the CTD residues are the same throughout this domain, except for residues 123, 145 and 130. Residues 123 and 145 show an increase of more than 10 minimally frustrated contacts when refolded in the full-length RfaH, whereas residue 130 has more minimally frustrated contacts in the isolated CTD. These sets of residues have been identified to be relevant for the stability of the βCTD in previous simulations using dual-basin structure-based models [18] and also for the stability of the autoinhibited state of RfaH in recent NMR experiments of the transformation of RfaH [43]. In contrast, the β-barrel intermediate of the CTD forms more minimally frustrated contacts when in the presence of the NTD than in isolation (Fig 5), particularly doubling the number of these type of contacts in the region corresponding to strand β1 and the loop preceding strand β2 (residues 114–123). Despite not forming the strand β1, such region becomes highly stabilized by bridging interactions between the NTD and the β-barrel intermediate and serves as the interface between the two domains. The non-native, minimally frustrated interactions that stabilize the β-intermediate in the full-length protein are formed against a hydrophobic patch in the NTD, comprising residues 78–82 and 91–93. It is worth noting that these NTD residues are solvent-protected when RfaH is bound to the TEC [11]. This patch is flanked by a charged and a polar residue, namely H77 and Q95, that are at close distance from two acidic residues of the CTD, E120 and D114. Most of the other CTD residues in between these positions are non-polar and form interactions either with the incipient hydrophobic core of the three-strand intermediate or the hydrophobic patch of the NTD. Of these residues, the only non-polar residue that does not form part of the hydrophobic core in the folded βCTD is I117. We also observe a decrease in minimally frustrated contacts in strand β3, β4 and the C-terminus of the β-intermediate upon binding to the NTD. Upon careful inspection of the contacts taking place in these regions, we noted that the region corresponding to strand β1 forms a core of contacts with the C-terminus and the three β-strands in the isolated β-intermediate. This core decreases its amount of intradomain contacts when strand β1 encounters the NTD hydrophobic patch rich in minimally frustrated contacts.

Discussion

E. coli RfaH is known as one of the most dramatic examples of protein fold-switching. In solution, RfaH folds into an autoinhibited state in which the αCTD tightly binds to the NTD. This contrasts to the dynamics of its active state, which is only feasible in its full length in the presence of ops-paused TEC [43], in which case both domains dissociate and fluctuate independently. In contrast, the non-metamorphic E. coli NusG only transiently forms interdomain interactions, existing always in solution as a protein with two independently moving domains [3,5]. Our simulations using the AWSEM MD and force field package correctly model RfaH in all its conformations and recapitulate its thermodynamic behavior in solution, evidenced as the switching of the energetic minimum between αCTD and βCTD when breaking interdomain interactions. This switch has also been observed in previous computational works on full-length RfaH using various simulation strategies [18,37]. More importantly, our refolding simulations show that the number of trajectories that successfully reach the β-folded CTD in the context of full-length RfaH is a minority when compared to the cases in which the CTD becomes trapped in a three-strand β-barrel intermediate, and almost three times less successful than refolding of αRfaH. We also demonstrate that a significant number of minimally frustrated NTD-CTD interactions, some of which are also observed in the autoinhibited state of RfaH, interfere with proper β-fold formation by stabilizing its intermediate state. These results suggest that the thermodynamic stability of the autoinhibited state of RfaH is not only due to the compatibility between the αCTD and NTD but also due to a selective stabilization of the β-intermediate by the NTD, which increases the probability of the β-barrel being trapped in a three β-strands intermediate. Moreover, while refolding of the CTD of the non-metamorphic RfaH paralog NusG successfully reaches the β-folded state, the transient observation of a structurally similar intermediate state also suggests that it is the nature of the NTD and CTD sequence of RfaH that drives the interdomain interaction and ultimate trapping into this state. Of importance in the refolding process is the configuration that the interdomain linker may take. As it has been previously reported [44], including our own research [38], the linker does play a role in interdomain stability by favoring and stabilizing the αCTD in the hairpin conformation. During our experiments the linker was not given a memory potential, not being stabilized in a particular conformation other than that which arises from the force field for its sequence. We observed the linker to be flexible, not acquiring any degree of secondary structure during our refolding or umbrella sampling simulations. Based on our results and the literature, we hypothesize that αRfaH stabilization by the linker is due to it acting as an entropic spring, i.e., when both domains are close together the linker accesses to a higher number of configurations than when the domains are separated. A similar process may be responsible for allowing the interactions between the β-barrel intermediate and the NTD. Multiple reports have studied the metamorphic process of RfaH CTD in the context of the isolated domain [14-17] and the full-length protein [18-20], but only a few have described the β-intermediate observed here during βCTD refolding. One of such works corresponds to the computational study of the α-to-β transition of the isolated CTD of RfaH through targeted MD and Markov state models using an adaptive seeding method, in which several en-route ensembles collectively suggests that strands β2, β3 and β4 are relatively stable and form earlier during refolding towards the β-state [14]. Additionally, our previous work with full-length RfaH using dual-basin structure-based models also identified a βCTD-like intermediate that is either free or interacting with the NTD, but with a different topology [18]. Lastly, recent unbiased explicit solvent simulations of the spontaneous α-to-β fold-switch of RfaH CTD using a replica exchange with hybrid tempering method exhibits three-stranded and four-stranded intermediates before reaching the β-folded CTD [45]. Nevertheless, none of these works described the active role of the NTD in stabilizing such intermediate state nor characterized its role as part of the β-barrel folding process. We believe that this three-strand intermediate and its NTD-dependent stabilization has been overlooked due to either the granularity of the model used, the absence of sequence-dependent potentials or the velocity with which the system is being driven out of the equilibrium. In fact, the sequence-dependent potential embedded on AWSEM shows its capabilities when simulating the correct refolding of αRfaH to a high fraction of native contacts Q even in the absence of knowledge-based contact information of the interdomain interface and the linker connecting both domains, meaning that these simulations are robust enough to discriminate the interactions arising from RfaH sequence in terms of NTD-CTD association. The observation that NusG CTD, unlike RfaH βCTD, is not affected by RfaH NTD in these simulations is confirmation of the latter. These arguments, alongside the observation of this intermediate in both NusG and RfaH βCTD folding pathways, also suggest that this intermediate is likely a topological solution to the small β-barrel folding process, which could also be necessary for the transition between the α- and β-folds of RfaH. While our previous work using hydrogen-deuterium exchange mass spectrometry show no apparent differences between NusG CTD and RfaH CTD and no indications of intermediate states under native conditions [38], it is possible that the intermediate state observed here requires the addition of chaotropic agents to favor its abundance. It can be presumed that the destabilization of the native state using such approaches not only would favor the intermediate population but also the unfolded state. All in all, our simulations indicate that the NTD actively participates in thermodynamically favoring the autoinhibited α-state by properly orienting the αCTD and correctly specifying the interactions occurring upon interdomain interface formation and by switching the equilibrium from the β-folded CTD into a folding intermediate. Such intermediate could be potentially observed by studying the equilibrium unfolding of the isolated CTD, as it was observed here during the refolding process of the isolated CTD of RfaH and NusG as well as part of the metamorphic pathway in full-length RfaH. We also hypothesize that stabilization into the β-intermediate by the NTD is the initial step for RfaH to fold-switch back into the autoinhibited state, as the intermediate states observed through umbrella sampling and temperature annealing are structurally the same, i.e., both have three β-strands and share an RMSD value of 2.5 Å (S5 Fig). This idea is compatible with the observation of RfaH stably binding the ribosomal protein S10 through its βCTD when bound to the TEC [43], as in such state the NTD hydrophobic patch is blocked by RNAP. Therefore, the effect of the NTD over the βCTD can only be observed when releasing the active state of RfaH from the TEC, hence the role of the NTD to fold-switch back into the autoinhibited state.

Summary of the refolding experiments and features of final refolded states for all systems in this work.

(XLSX) Click here for additional data file.

Interaction matrices for umbrella sampling using Q.

Cα residue-residue distance matrices for full-length RfaH and its isolated CTD. The matrices grow along the diagonal, which represents the same residue distance, in this case set to 0. Along this diagonal, contacts are formed in a 1–4 residue pattern for α-helices, antiparallel and parallel lines indicating β-strands. The blue blocks indicate regions of high distance (99 Å), which were manually set in order to exclude them from the Q calculation. (TIF) Click here for additional data file.

Histograms of the energy and Q reaction coordinates in umbrella sampling.

In these umbrella sampling simulations, 51 simulations in Q steps of 0.02 were run, totaling 51 simulations per system per temperature. The histograms marked in red were not used for the WHAM analysis as the simulation got trapped in a misfolded configuration. RfaH reaches the α-folded autoinhibited state when Q = 1 and the isolated CTD reaches the β-folded state when Q = 1. Although not sufficient sampling was achieved for Q ~ 0.00 for the full-length protein, the beta configuration was successfully sampled as it is observed in Fig 1B. (TIF) Click here for additional data file.

Heat capacity of RfaH.

Heat capacity calculated from umbrella simulations on the full-length RfaH and the isolated CTD. The blue arrow indicates the temperature selected for presenting the free energy landscape of the isolated CTD in Fig 2A, and the blue arrow indicates the temperature selected for presenting the free energy landscape of the full-length RfaH in Fig 2B. The values on the left y-axis correspond to RfaH, whereas the values on the right y-axis correspond to the isolated CTD. (TIF) Click here for additional data file.

Free-energy landscapes of RfaH over Q.

The free energy landscapes of isolated CTD (left) or full-length protein (right) were projected onto the Q reaction coordinate alone, which describes the transition between α-folded and β-folded CTD. (TIF) Click here for additional data file.

Representative refolding traces for the two-domain constructs used in the work.

The N-C distance shown in green is a measure of how close or separated are the proteins. At low temperatures they tend to agglutinate as a way to minimize the energy, particularly of the exposed NTD hydrophobic patch, which has many residues whose burial energy remains unsatisfied otherwise. (TIF) Click here for additional data file.

The intermediate of the RfaH CTD is the same for NusG CTD.

(A) Annealing plots of RfaH CTD and NusG CTD. Each point was taken every 2,000 steps of 3·107 step trajectories that ramped down from 1.6 Tf to 0.6 Tf. For both RfaH and NusG, an intermediate is observed at 0.4 ≤ QW ≤ 0.6. (B) Comparison of refolding traces and intermediate structures of RfaH CTD and NusG CTD. The folding states of both traces was visually inspected. For each trace, the unfolded state is denoted as U, while the intermediate state is denoted as I and the folded state is denoted as F. (C) Structural alignment via STAMP of the intermediate states observed for RfaH and NusG and the RMSD to the folded state for RfaH and NusG. (D) Structural alignment via STAMP of the β-intermediate state observed for RfaH CTD in umbrella sampling and refolding simulations. (TIF) Click here for additional data file.

Refolding of NusG βCTD alone and its fusion to RfaH NTD.

Representative final structures after NusG βCTD refolding in a RfaH NTD–NusG CTD chimera and in isolation. The histograms represent the RMSD distribution of the final structures. All simulations reached the β-folded state of NusG CTD. (TIF) Click here for additional data file. 12 Apr 2021 Dear Dr. Ramirez-Sarmiento, Thank you very much for submitting your manuscript "The N-terminal domain of RfaH plays an active role in protein fold-switching" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Anders Wallqvist Associate Editor PLOS Computational Biology Arne Elofsson Deputy Editor PLOS Computational Biology *********************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: RfaH is one of the most extreme examples of a fold-switching protein. The two-domain protein reversibly cycles between two states: in the closed state the C-terminal domain (CTD) folds as a-helical hairpin and binds to the N-terminal domain, rendering RfaH autoinhibited. Upon recruitment to an ops-paused TEC the domains dissociate and the CTD refolds to a NusG-like b-barrel. Although many details of its functional cycle have been deciphered and despite many bioinformatical studies, the molecular basis of CTD refolding is still largely unkown. In this manuscript the authors use Associative Water-mediated Structure and Energy Model (AWSEM) molecular dynamics applying umbrella sampling and temperature refolding simulations in order to study the transformation of the CTD with a particular focus on how the NTD affects CTD folding. First they confirm the experimentally determined folding states of full-length RfaH and the isolated CTD exploring their conformation space, but, interestingly, their analysis reveals a β-intermediate, which consists of three β -strands and which may be a general folding intermediate in the folding pathway of the β -barrel. The authors show that this intermediate is stabilized by contacts to the NTD and they hypothesize that a destabilization of the β -barrel by the NTD is the first step when RfaH transforms from its activated to the autoinhibited state. Overall the manuscript is sound and reveals new aspects of the transformation of RfaH. Comments 1. p. 5, l 9: A general explanation of the “umbrella sampling” should be added so that also non-bioinformaticians can read and understand this section. 2. Fig. 1B: labeling of the points at which the energy landscapes are calculated should be colored according to the curves to facilitate understand; it is also not clear why the letters “C” and “R” are chosen so that arrows colored according to the curves should be sufficient Fig. 1C: The energy barrier between αRfaH and βRfaH should be marked 3. p.7, l 1-3: Contacts of the flexible linker have been ignored as no defined conformation has been reported, which is reasonable. Nevertheless, the linker might play a role during refolding (either to α or β or both states). This possibility should be included in the discussion. 4. p. 7, l 19-20: The authors should describe the graphs of Fig. S2 and how exactly they relate to the energy landcapes in Fig. 1 (not only technically, i.e. that it’s the 1D projection, but in terms of minima etc). They should also describe how the β-intermediate has been identified/observed (minimum) As this is the first time that the β-intermediate has been identified, the Fig. S2 should be moved to the main manuscript. 5. p. 8, l 24-25: This kind of behavior is expected for a two-domain proteins the domains of which behave independently not for a two-domain protein in general -> rephrase 6. Fig. 2B and C: to clarify the states they should be labeled “correctly folded domains, correct interface” (B) and “correctly folded domains, incorrect interface” (C); the color code should be explained 7. p. 9, l 10: The refolding simulations of full length RfaH not only recapitulate the structure of RfaH (Ref 8), but also the experimental unfolding/refolding experiments (ref 24) 8. p. 9, l 16-17. It is not clear why solely the NTD-side of the domain interface is responsible for the correct orientation/binding of the CTD; such a conclusion requires a detailed analysis of the contacts and is only valid if specific sidechains of the NTD contact only backbone atoms from the CTD; if also CTD sidechains are involved the specificity relies on both domains 9. Fig. 3 and S3: The labeling “+NTD” and “-NTD” is misleading; it should be “full length RfaH” and “isolated CTD” or “NTD-CTD” and “CTD” For better comparability the orientation of the CTD should be the same in A and B Termini should be labeled The NTD should have a weaker color (e.g. just grey) in order to emphasize what happens with the CTD 10. Fig. 4: labeling: it should be “RfaH – mimimally frustrated”, “RfaH – highly frustrated” etc The authors state that the β-intermediate forms more minimaly frustrated contacts in the presence of NTD than in the absence. This seems to be true for the overall number of minimally frustrated contacts and for the region preceding strand β2. However, the authors should discuss the fact that there are more minimally frustrated contacts for the isolated CTD than for RfaH especially in strands β3, β4 and the C-terminus. 11. p. 11: Is the three-strand intermediate observed for NusG-CTD the same as for RfaH-βCTD? 12. p. 11: The authors state that NusG CTD also traverses through a three-strand intermediate during refolding. Is this the same intermediate as for RfaH-CTD? Why is this intermediate not shown in Fig. S3? 13. p.13, l 1: the term “exists in solution as two-domain protein” is strange as it was also a two-protein if the domains interacted (tightly) -> needs rephrasing (see #5) 14. p. 13, l9-10: is the intermediate observed in the analysis of the energy landscapes (Fig. S2) the same as that observed during refolding simulations? 15. p. 13, l 10-14: more details about the involved NTD residues should be provided. It should be discussed if these residues are available when RfaH is bound to the TEC. If yes, these contacts would stabilize the intermediate, preventing the CTD to proceed to the full beta-state, and the authors should discuss this 16. p. 13, l15-16: the authors conclude that the NTD actively destabilizes the β -barrel; this has however not been directly demonstrated. It as only been shown in the refolding simulations that the intermediate is stabilized when switching from random coil to beta. Moreover, if the β -state is trapped in the folding intermediate due to stabilizing contacts, what is the driving force to leave the intermediate towards the beta or the α-state (especially to beta as the energy of the intermediate and the beta state seem similar according to Fig. S2) 17. p. 15, l 7-8: The authors hypothesize that the βCTD destabilization by the NTD is the initial step for RfaH refolding into the autoinhibited state. It is not obvious why the NTD actively destabilizes the βCTD. Although there seem to be more minimally frustrated contacts for the intermediate than the bCTD, the b-intermediate seems to be comparable to the β-state energetically (Fig. S2). 18. Discussion: the discussion about the importance of the intermediate may benefit from a scheme that illustrates the refolding steps suggested by the authors (including the most important structures, i.e. autoinhibited, unfolded, intermediate, activated) 19. In general all physical and mathematical quantities/variables/constants (such as “T”, “Qdiff”, “j”, “Q0”) in the text and in graphs (e.g. Fig. 1B) must be in italics! Minor comments: p. 3, l 4 a reference should be chosen that reviews the modular structure of NusG proteins, e.g. Werner JMB 2012 or Artsimovitch mbio 2019 p. 5, l 5 and 7: only genes are expressed, not proteins p. 8, l 21: the chronological order is not correct as Fig. 2B is mentioned before 2A. Table S1: nomenclature of RfaH is not the same as in the text (αRfaH vs. RfaH-α) p. 11, l22-26: the sentence is very long and hard to understand and should be rephrased (divided in two or more sentences) P. 12, l. 17 the reference should be #34, not #13 p.12, l 15-17: The sentence seems not correct p.13, l 1: ref 5 is not correct and should be replaced by Burmann et al BiochemJ 2011 p. 16, l 22: “diff” must be subscript Reviewer #2: This work presents the results of folding simulations of the C-terminal domain of the RfaH transcription factor, either isolated or as part of the full-length protein. This domain is known to undergo a transition from a alpha-helical fold to a beta-barrel fold at different stages of its regulation activity. The work appears to highlight the role of interdomain interactions in RfaH in regulating the transformation from one fold to another. However it seems to me that the delivery of the main message of the work, being buried in several layers of other information, could be improved substantially. I am also concerned with several technical aspects of the work as described below. In my opinion, the work in its present form cannot be recommended for publication. The objectives of the work are very unclear at a structural level. The key description of the system is given in the second paragraph of the Introduction I think. However, personally, I had a very hard time visualizing the main features of the system. A picture/cartoon/diagram could be very useful here to summarize the system and the process. Perhaps the diagram could also pinpoint the interdomain interactions referenced in the following paragraphs that appear to be main subject of study. I found that, generally, the primary objectives and results of the work are hard to grasp when buried in long discussions interdispersed with acronyms and jargon. I recommend that these are condensed in a brief sentence. What does this work add to the extensive previous modeling studies of this system? The presentation of the Methods after the Results does not help the presentation. The Results refer to quantities (Qdiff, Qw, etc.) defined in the Methods whose significance is unclear to reader. Sentences such as "RfaH interdomain contacts were kept as bias for simulating the αRfaH state in the full-length system, while all contacts between the βCTD and NTD in the cryo-EM structure were ignored by increasing the distance in the residue-residue distance matrix beyond the threshold from which a contact is considered to take place (9.5 Å)." is very confusing to the reader if terms such as "bias", "distance matrix", "threshold", are not described, at least qualitatively. The AWSEM model comes out a bit from the blue. It would be useful to discuss early one of why it was adopted and why it is believed to be a suitable model for this system. I could not understand what the "memory" potential is, and what is the significance of "single" or "dual" memories. Qdiff page 16, should probably be (q-qB)/(qA-qB) so that Qdiff goes from 0 to 1 as the structure goes from B to A as suggested by the language of the paragraph following the third equation on this page. As defined Qdiff's range in not [0,1]. In the following equation, what is the meaning of rij raised to NA or NB? NA and NB are not defined but the notation suggests that they are the number of residues of the two reference structures? Or are NA and NB simply labeling the two reference structures? If the latter, why not use simply A and B? This equation is also dimensionally ill-defined. The exponents should be dimensionless but the units of distance square in the numerator do not match the units of sigmaij, which is dimensionless. Page 17. Here and elsewhere, supporting data in the the supplementary information are referenced as results. For example "secondary structure assignments are summarized in Table S1 ..." In my opinion, if they are so important to be referenced in the main text, supporting data should be shown in the main text. Otherwise, references should be omitted and the content of the supplementary information should be summarized elsewhere so as not to confuse the reader of which results are essential to the work and which ones are available for confirmation, if needed. Reviewer #3: Galaz-Davison et al. describes computer simulations of RfaH, a protein that undergoes so-called fold switching, i.e., a reversible structural transformation from one fold to another. The focus in this work is on the role of the N-terminal domain (NTD) of RfaH for this transition. Fold switching as a phenomenon is becoming increasingly recognized as an important mechanism in both protein function and evolution and RfaH is one of the most well studied fold switching proteins, both experimentally and theoretically. Depite this, there has been very little attention given to role of the NTD (the part of RfaH left unchanged during fold switching) in previous studies, making the present study timely. A major strength of this work is the folding “annealing” simulations carried out on RfaH and the related protein NusG. NusG is of interest because it is a homolog of RfaH but does not exhibit fold switching. It is shown that for RfaH the folding of the C terminal domain (CTD) into its beta fold is hampered by energetically favorable interactions with the NTD. However, these interactions are not as strong for NusG. For NusG, folding of the CTD proceeds undisturbed even in the presence of the NTD. This is an important results because it hints at the mechanism of the “reverse” fold switch (i.e. how the betaCTD fold reverts back to its alpha fold, which is neccessary to reset RfaH to its autoinhibited state). Overall, the present study is of significant interest for experimental and theoretical researchers in the field of protein fold switching. The manuscript is generally clear and well written (although see points below), and includes a very nice Discussion section. With the above in mind, I have a few questions and concerns regarding some of the methodology and analysis. These questions should be addressed before the manuscript can be accepted for publication. One of my main concerns is with the umbrella sampling strategy that are used to obtain the equilibrium behavior both the complete RfaH protein and the C terminal domain (CTD) in isolation. The AWSEM model (used for all simualtions) includes a term V_memory, which is based on information from experimental structures of RfaH. However, it is not clear in the manuscript how V_memory is applied. Page 16 states generally that V_memory “can be guided to multiple structures, or as used in this work, limited to a single reference structure” but further down on the same page “dual memories” are mentioned in relation to the choice of initial structures. How V_memory is incorporated into to the umbrella sampling simulations should be clarified. Since V_memory is central to the present study, I suggest also including sore more general information about this term, e.g., functional form, free parameters, to inform the reader. In the umbrella sampling simulations, the AWSEM model is combined with a harmonic potential in the parameter Q_diff. Does this mean that there are two different types of biases in these simulations (V_memory and the umbrella bias)? Are both types of biases needed to capture fold switching? It would be interesting to see some simulation results of AWSEM without the umbrella sampling, as a point of comparison. Typically, umbrella sampling is seen merely a trick to enhance sampling of conformational space and should in principle not influence the resulting (equilibrium) behavior of the underlying model. Or should the umbrella weight function be seen here as an additional “effective” bias towards the two folds? It would be useful if this issue could be clarified. I also have some concerns with the analysis of the umbrella sampling simulations. It seems somewhat optimistic that the results from a single temperature (please specify which T in the text) can be accurately reweighted to span the temperature range ~500-800 Kelvin using WHAM, especially if the simululations were carried out below the folding temperature. It is possible that the simulations sufficiently well samples the unfolded state, since they span states in between the alpha and beta folds. However, this can not be assumed a priori. Could a few separate simulations at higher temperatures be carried out to confirm the results from the WHAM analysis? Some clarifications are needed regarding the definition of order parameters Qw, Qdiff and q. i) According to page 17, Qw is calculated by summing over all (N-2)(N-3)/2 residue-residue pairs j>i+2. But on page 6, it is mentioned that beta CTD-NTD contacts are ignored by increasing the distance in the residue-residue distance matrix beyond 9.5 Å. ii) There appears to be a sign error in way Qdiff is defined. Inserting q_A for q leads to Qdiff = 0 and inserting q_B for q gives Qdiff=-1, but the range should presumably be 0 to 1. iii) Is it possible to define q in terms of Qw, thereby simplifying expression for q? iv) The width of the gaussian functions is not constant but taken to be sigma = |i-j|^0.15? Please comment on this choice. Minor points: 1) How was the value of lambda_FM=0.30 determined? Presumably, this parameter controls the strength of the fragment memory term (please clarity). Why does a higher lambda_FM lead to a stronger cooperativity? These points should be clarified. 2) Page 5 line 21. “Just before” should presumably be “just below”? 3) Page 7, line 7. The free energy surface in Fig. 1C is described as having a “single, deep energy minimum”. I suggest rephrasing this sentence as any protein energy landscape will have a multitude of minor minima throughout its energy landscape. 4) Page 16. It would be helpful if the basic idea of umbrella sampling is explained in a few sentences at the beginning of the section “Qdiff and umbrella sampling”. 5) The histograms in Fig. S1 indicate appropriate overlaps between consecutive Qdiff and energy distributions, however, they are hard to read. Perhaps it would be worth trying lines rather than box histograms. 7) Fig. S4. Refolding “curves” is probably not a good description of these scatter plots. 8) It would be nice to see some actual (re)folding trajectories, showing, e.g., Qw or RMSD values as a function of simulation time. 9) Please include figures showing the full residue-residue distance matrix referenced on page 6. Alternatively, if Qw, q, etc, are determined using sets of native contacts, please report all contact maps, number of contacts in each set, and the criterion for a contact. Reviewer #4: One of the domains of the protein RfaH folds to a β-rich fold in isolation but switches to an α-rich fold in the presence of its other domain. RfaH has been used as a model protein to understand the molecular basis of fold switching. In this manuscript, the authors simulate RfaH in order to understand the mechanism of fold switching and the effect of the non-switching domain on this mechanism. Comments: (1) Since RfaH has been simulated with many models (including the authors’ previous work), it would be useful to have a detailed discussion up front of what each of the models include/exclude in terms of force field and what is gained from using a specific model. This should help to focus on the benefits of using AWSEM as against other models. (2) The AWSEM force field has several terms and additionally can be used with or without native structure bias. For readers who have not seen AWSEM before, it would be useful to discuss each of the terms, clearly specifying which of the terms have a native-bias in each of the simulations. This is important for fully grasping the results. (3) It would be useful to have the same tics and tic labels on Figs. 1C and 1D for ease of visual comparison. Is the third minimum (12,16) in Fig. 1D the unfolded ensemble? What is at (5,5) in 1C and why is it absent in 1D? And where would the beta-barrel intermediate lie on these landscapes? The different basins on this figure should be labelled. The landscapes should be re-labelled free-energy landscapes and free-energy and energy should not be used interchangeably. The temperature units should be explained or reduced units should be used since the units are far from real temperatures. (4) The authors state: “One disadvantage of the umbrella sampling simulations is that, by directly employing the number of native contacts of the system in αRfaH and βRfaH as collective variables to drive the structural interconversion of RfaH, the formation or disruption of interdomain contacts between specific residue pairs is also biased.” on page 8. Why does unbiasing not work for this? What else is affected by the umbrella sampling which cannot be unbiased? (5) The authors perform a frustratometer analysis of the intermediate with full length RfaH. However, they do not indicate if the extra minimally frustrated contacts that they see are native contacts seen at the αCTD-NTD interface or are these non-native contacts. Also, if these contacts are native-like then are they between the hydrophobic patch which is present in the NTD of RfaH and absent in NusG. There should be sequence signatures present in the simulations and the analysis performed here which should allow the authors to predict one or a few mutations which destabilize the fold switching or change its kinetics. The authors should make such predictions. They should also be able to redo the frustratometer analysis using the intermediate structures but with mutated residues, as a first test for their mutations. Such mutations and tests will add to the strength of this manuscript. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No: I am not sure. But the authors appear to state that they will make the data available on their website after publication. Reviewer #3: Yes Reviewer #4: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: Yes: Stefan Wallin Reviewer #4: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at . Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols 14 Jun 2021 Submitted filename: Response_Reviewers_rev1.docx Click here for additional data file. 10 Jul 2021 Dear Dr. Ramirez-Sarmiento, Thank you very much for submitting your manuscript "The N-terminal domain of RfaH plays an active role in protein fold-switching" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Anders Wallqvist Associate Editor PLOS Computational Biology Arne Elofsson Deputy Editor PLOS Computational Biology *********************** A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #2: The revised methods section is substantially improved relative to the original manuscript. However, at least two of the equations (the equations are not numbered in the manuscript) are still problematic. Terms with dimension of distance squared cannot be present as exponents. In my opinion, the manuscript cannot be published unless these are corrected. Reviewer #3: The authors have clarified basically all the issues that I raised in my comments, including the functional form of V_memory and the structural parameters (q, Qdiff, etc). Regarding the WHAM analysis, it would have been nice to see a comparison of separate analyses of the two sets of simulations carried out (the results should in principle agree). However, it appears that the WHAM analysis of the combined simulations as presented in the revised manuscript (Fig. 2) are in qualitatively in agreement with those of the original submission, indicating agreement. From my perspective, the manuscript can be accepted for publication if the following (minor) points are addressed: 1) The description of the umbrella sampling simulations in Methods should be updated to include information about the two different temperatures used. 2) It would help the reader if Tf is explained in Results upon first usage in this section. 3) Expressing temperatures in units of Tf is probably a good idea. However, I suggest reporting also the nominal value of Tf in AWSEM units, such that the two scales can be linked. This will be helpful in case someone would like to perform similar types of simulations in the future. Reviewer #4: The manuscript is mostly okay in its current form. I have three minor comments: 1) Is the memory of the entire RfaH-NTD-alpha-CTD structure encoded in any of the simulations (in particular the refolding simulations)? Or are there only memories of the NTD and the two conformations of the CTD separately? Specifically, are the interdomain interactions between the RfaH NTD and RfaH CTD purely due to the transferable parts of the AWSEM potential? This should be clarified in the manuscript. A similar clarification should be given for the hybrid NusG NTD-RfaH-CTD model. 2)To me, this fold switch looks similar to most ligand induced conformational transitions: the beta-RfaH-CTD is the “open” structure because it has a larger number of stabilizing contacts than alpha-RfaH-CTD. Ligand binding, here binding to the NTD, increases the stabilization of the alpha-RfaH-CTD and this allows the conformational change. If the authors agree, they should put this fold switch in the context of ligand induced conformational transitions in general. 3) The manuscript needs to be copyedited to remove some non-standard phrasing and sentence construction. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: None ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No Reviewer #3: No Reviewer #4: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols References: Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. 4 Aug 2021 Submitted filename: Reviewer_Responses_rev2.docx Click here for additional data file. 7 Aug 2021 Dear Dr. Ramirez-Sarmiento, We are pleased to inform you that your manuscript 'The N-terminal domain of RfaH plays an active role in protein fold-switching' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Anders Wallqvist Associate Editor PLOS Computational Biology Arne Elofsson Deputy Editor PLOS Computational Biology *********************************************************** 27 Aug 2021 PCOMPBIOL-D-21-00427R2 The N-terminal domain of RfaH plays an active role in protein fold-switching Dear Dr Ramirez-Sarmiento, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Andrea Szabo PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

43 in total

1. Mechanism of the All-α to All-β Conformational Transition of RfaH-CTD: Molecular Dynamics Simulation and Markov State Model.

Authors: Shanshan Li; Bing Xiong; Yuan Xu; Tao Lu; Xiaomin Luo; Cheng Luo; Jingkang Shen; Kaixian Chen; Mingyue Zheng; Hualiang Jiang
Journal: J Chem Theory Comput Date: 2014-05-21 Impact factor: 6.006

2. Knowledge-based protein secondary structure assignment.

Authors: D Frishman; P Argos
Journal: Proteins Date: 1995-12

3. VMD: visual molecular dynamics.

Authors: W Humphrey; A Dalke; K Schulten
Journal: J Mol Graph Date: 1996-02

4. Electrostatics, structure prediction, and the energy landscapes for protein folding and binding.

Authors: Min-Yeh Tsai; Weihua Zheng; D Balamurugan; Nicholas P Schafer; Bobby L Kim; Margaret S Cheung; Peter G Wolynes
Journal: Protein Sci Date: 2015-08-08 Impact factor: 6.725

5. An α helix to β barrel domain switch transforms the transcription factor RfaH into a translation factor.

Authors: Björn M Burmann; Stefan H Knauer; Anastasia Sevostyanova; Kristian Schweimer; Rachel A Mooney; Robert Landick; Irina Artsimovitch; Paul Rösch
Journal: Cell Date: 2012-07-20 Impact factor: 41.582

6. Structural fluctuations and mechanical stabilities of the metamorphic protein RfaH.

Authors: Bahman Seifi; Adekunle Aina; Stefan Wallin
Journal: Proteins Date: 2020-10-10

7. The Associative Memory, Water Mediated, Structure and Energy Model (AWSEM)-Amylometer: Predicting Amyloid Propensity and Fibril Topology Using an Optimized Folding Landscape Model.

Authors: Mingchen Chen; Nicholas P Schafer; Weihua Zheng; Peter G Wolynes
Journal: ACS Chem Neurosci Date: 2018-01-10 Impact factor: 4.418

8. High resolution ensemble description of metamorphic and intrinsically disordered proteins using an efficient hybrid parallel tempering scheme.

Authors: Rajeswari Appadurai; Jayashree Nagesh; Anand Srivastava
Journal: Nat Commun Date: 2021-02-11 Impact factor: 14.919

9. Interdomain Contacts Control Native State Switching of RfaH on a Dual-Funneled Landscape.

Authors: César A Ramírez-Sarmiento; Jeffrey K Noel; Sandro L Valenzuela; Irina Artsimovitch
Journal: PLoS Comput Biol Date: 2015-07-31 Impact factor: 4.475

10. The universally-conserved transcription factor RfaH is recruited to a hairpin structure of the non-template DNA strand.

Authors: Philipp K Zuber; Irina Artsimovitch; Monali NandyMazumdar; Zhaokun Liu; Yuri Nedialkov; Kristian Schweimer; Paul Rösch; Stefan H Knauer
Journal: Elife Date: 2018-05-09 Impact factor: 8.140

1 in total

1. Coevolution-derived native and non-native contacts determine the emergence of a novel fold in a universally conserved family of transcription factors.

Authors: Pablo Galaz-Davison; Diego U Ferreiro; César A Ramírez-Sarmiento
Journal: Protein Sci Date: 2022-06 Impact factor: 6.993

1 in total