Literature DB >> 24564338

Extensive conformational heterogeneity within protein cores.

Abstract

Basic principles of statistical mechanics require that proteins sample an ensemble of conformations at any nonzero temperature. However, it is still common to treat the crystallographic structure of a protein as the structure of its native state, largely because high-resolution structural characterization of protein flexibility remains a profound challenge. To assess the typical degree of conformational heterogeneity within folded proteins, we construct Markov state models describing the thermodynamics and kinetics of proteins ranging from 72 to 263 residues in length. Each of these models is built from hundreds of microseconds of atomically detailed molecular dynamics simulations. Examination of the side-chain degrees of freedom reveals that almost every residue visits at least two rotameric states over this time frame, with rotamer transition rates spanning a wide range of time scales (from nanoseconds to tens of microseconds). We also report substantial backbone dynamics on time scales longer than are typically addressed by experimental measures of protein flexibility, such as NMR order parameters. Finally, we demonstrate that these extensive rearrangements are consistent with NMR and crystallographic data, which supports the validity of our models. Altogether, these results depict the interior of proteins not as well-ordered solids, as is often imagined, but instead as dense fluids, which undergo substantial structural fluctuations despite their high packing fraction.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2014 PMID： 24564338 PMCID： PMC4065209 DOI： 10.1021/jp4105823

Source DB: PubMed Journal: J Phys Chem B ISSN： 1520-5207 Impact factor: 2.991

Introduction

It is well established that proteins sample a variety of unfolded and partially folded structures, but there is still substantial uncertainty about the exact nature of the structural fluctuations within folded proteins under native conditions. Attempts to probe this conformational heterogeneity have led to a range of conclusions, including (i) proteins are surface-molten solids,[1] (ii) protein side chains undergo substantial fluctuations in the context of a more rigid backbone,[2−9] (iii) protein backbone motions reorient structural elements and create pockets,[10−13] and (iv) proteins undergo local folding/unfolding transitions.[14−16] Rather than being mutually exclusive, it is likely that all of these forms of heterogeneity are present, albeit on different time scales and to different extents. Quantitatively establishing the magnitude and dynamics of these fluctuations is an important goal for advancing our understanding of protein stability and function. Here, we combine extensive sampling of detailed molecular models with the theory of Markov processes to quantitatively describe proteins’ conformational heterogeneity on time scales across 8 orders of magnitude. Specifically, we use molecular dynamics simulations to generate hundreds of microseconds of time evolution for each of a series of proteins: ubiquitin (72 residues), RNase H (155 residues), and β-lactamase (263 residues, Figure 1). Results of these simulations serve as input for building Markov models, indicating the boundaries between adjacent free energy basins and the rates of hopping between them (see the Methods section for details).[13,17,18] These Markov models provide access to dynamics on tens of microsecond time scales for the largest systems examined and millisecond time scales for the smallest. These time scales are orders of magnitude longer than those accessible by any individual simulation. In addition, our models provide kinetic information that is inaccessible to other enhanced sampling schemes, which typically sacrifice kinetic accuracy by altering simulation parameters—such as the potential energy surface or temperature—to achieve broader conformational sampling. Therefore, we can capture processes beyond the reach of previous works[1,3,8,19,20] and make comparisons with experiments that appropriately account for the time scales a given experimental technique can probe. Our atomically detailed simulations also provide information that is not accessible experimentally, such as structural information that is not easily extracted from NMR order parameters and low population states that are beyond the detection limits of crystallography.

Figure 1

Structures of ubiquitin, RNase H, and β-lactamase in ribbon (top) and surface (bottom) representations. Surface residues are blue, and core residues are yellow.

Structures of ubiquitin, RNase H, and β-lactamase in ribbon (top) and surface (bottom) representations. Surface residues are blue, and core residues are yellow. This unprecedented access to long-time relaxation within folded proteins allows us to carefully address questions about the extent and facility of structural rearrangements: How variable are proteins’ side-chain degrees of freedom? On what time scales do transitions between alternative conformations occur? Are simulated long-time dynamics consistent with existing experimental data? And, how do fluctuations within the backbone and side chains compare?

Results and Discussion

Liquid-Like Behavior of Side Chains within Protein Cores

If protein cores were essentially crystalline, we would not expect them to exhibit significant rearrangements on the sub-millisecond time scales that can be accessed with our models and methods. However, an analysis of the distribution of side-chain structures readily reveals substantial variability throughout the proteins examined in this work. For example, the first dihedral angle of every side chain (called the χ1 angle) in ubiquitin visits at least one alternative rotameric state besides the dominant state seen in the crystallographic structure. We see similar behavior in RNase H and β-lactamase (where 100 and 98% of residues visit alternative χ1 rotameric states, respectively). We find even more variability in the rotameric states of other side-chain dihedral angles, consistent with chemical intuition that the χ1 rotamer should be more difficult to sample due to steric constraints from the backbone and the need to displace a larger proportion of the side chain. Furthermore, we do not observe unfolding within the time scales captured by our models, so these transitions are occurring in the context of a compact structure. To quantify the degree of structural heterogeneity within each protein, we estimate the interbasin entropy of each side chain. Specifically, we assign each dihedral angle to the gauche (+), gauche (−), or trans rotameric states. We then calculate the Shannon entropy (S) of a residue aswhere N is the total number of possible conformations (i.e., the product of the number of alternative rotameric states for each dihedral angle) and P is the probability of conformation i. This entropy will range from zero for side chains that do not visit any alternative rotameric states to 4.4 for a long side chain that spends equal time in every possible combination of rotameric states. Calculating entropies for each residue reveals extensive variability throughout each of the proteins examined, including substantial heterogeneity within their cores (Figure 2). We classified residues as part of the core if their solvent accessible surface area is less than 0.1 nm2 and as part of the surface otherwise (Figure 1). As expected, surface residues have a broad distribution of entropies. The substantial heterogeneity of core residues is more surprising and leads us to conclude that proteins are more liquid-like than crystalline, even within their cores. As in liquids, this variability is made possible in a dense environment by correlated motions spanning large distances.[13] Similar trends have been observed in previous computations and experiments but only for small proteins with low stabilities, such as ubiquitin.[6] The instability of these proteins raises the question of how general such observations are, since more stable proteins could easily have substantially less dynamics. Therefore, our ability to examine larger, more stable proteins and observe similar levels of heterogeneity is an important contribution that complements recent work on this subject from a crystallographic perspective.[9]

Figure 2

Histograms of the entropies of side-chain rotameric states (left) and the time scales for transitioning between rotamers (right) for surface residues (top, blue) and core residues (bottom, yellow). β-lact is β-lactamase. In addition to revealing alternative conformations, our computational models complement experiments by providing insight into the time scales for transitioning between different rotameric states. Such kinetic information is currently inaccessible crystallographically and may be challenging to obtain from NMR, since different techniques are often required to capture different time scales, as discussed later. Our computational analysis shows that transitions within the proteins’ cores are often slower than those on the surface (Figure 2). Furthermore, some dihedral angles exhibit quite slow transitions (on the 10 μs time scale) even on the proteins’ surfaces. By contrast, it is common to assume side-chain dynamics typically occur on ns time scales or not at all. The fact that some of these transition times are approaching the total simulation time also suggests that yet more variability may be present on longer time scales.

Consistency between Simulation and Experiment

An important question at this point is whether the apparent disorder suggested by our computer simulations is in fact real. Or is the structural heterogeneity in our models the result of errors in the force field used to parametrize the interatomic interactions? To address this question, we make a quantitative comparison with one of the most direct measures of protein flexibility, namely, NMR order parameters. NMR order parameters are derived from measurements of the autocorrelation function of a unit vector along a particular bond (relative to the molecule’s reference frame). Specifically, the order parameter () iswhere P2 is the second order Legendre polynomial, μ[t] is a unit vector along a particular bond at time t, and ⟨...⟩ denotes an equilibrium ensemble average.[21] Therefore, one will obtain an NMR order parameter of 1 for a perfectly rigid bond and an order parameter of 0 for a freely rotating bond that loses memory of its past orientation. Backbone order parameters (along the N–H bond) have been measured for many proteins, and order parameters for side-chain methyl groups have been measured for a few proteins. One important consideration in comparing to experiment is the different time scales that different NMR protocols address. For example, the molecular tumbling time sets an upper limit on the time scales accessible to standard relaxation methods. For the systems studied here, the tumbling time is on the order of 4–12 ns.[22−24] Therefore, we take care to apply the same time scale limitations when calculating relaxation-based order parameters from our computational models, as described in the Methods section. More recent residual dipolar coupling (RDC) experiments capture nanosecond to microsecond time scale events, so we use the entirety of our data for calculating RDC-based order parameters from our models. RDC data, however, is currently much less abundant and subject to greater statistical uncertainty (Figure 3). Given these limitations, we focus on capturing general trends revealed by RDC measurements, in particular the observation that RDC-based order parameters are generally lower than those determined from relaxation experiments.

Figure 3

(top) NMR order parameters for ubiquitin’s side-chain methyl groups measured from experiment (filled circles)[40,41] and calculated from our models (open diamonds). This perspective highlights that the calculated values are within the range of the experimental values and that the RDC-based measurements are generally lower than the relaxation-based measurements. (bottom) Calculated versus experimentally measured order parameters. Filled circles are for core residues, and open circles are for surface residues. This perspective better captures the degree of agreement between the two sets of values. We obtain reasonable agreement between calculated and experimental order parameters addressing both of these time scale regimes (Figure 3). For example, the root-mean-square deviation (RMSD) between calculated and measured relaxation-based order parameters for ubiquitin is 0.18 and the RMSD between calculated and measured RDC-based order parameters is 0.29 (Pearson’s R-values of 0.68 and 0.33, respectively). We judge this level of consistency as reasonable considering (i) the simplifying assumptions made in deriving order parameters make exact agreement unlikely, (ii) our errors are random rather than being systematically in one direction, (iii) the large error in experimental RDC-based measurements (as described above), and (iv) we capture the qualitative trend that examining longer time scales reveals more structural heterogeneity. For example, our calculated RDC-based order parameters for ubiquitin are 0.29 less than the relaxation-based order parameters, on average (compared to a difference of 0.30 in experiment). The agreement between computation and experiment is generally greater for core residues, which are also typically less mobile than surface residues. We also predict similar differences between relaxation-based and RDC-based order parameters for the side chains of RNase H and β-lactamase (Figure 4), where the average RDC-based order parameter drops by at least 0.10 and 0.06, respectively, compared to relaxation-based order parameters.

Figure 4

NMR order parameters for RNase H and β-lactamase’s side-chain methyl groups calculated from our models (open diamonds), again showing that the RDC-based measurements are generally lower than the relaxation-based measurements. No experimental data was available for comparison. Our results are also consistent with recent crystallographic studies. For example, Fraser et al. examined 30 proteins with room-temperature crystallography[9] and found that 37.7% of residues populate an alternative χ1 rotamer (i.e., not seen in structures solved at cryogenic temperatures) with a population greater than 20%. We find that 40.1% of residues satisfy this criterion in our sample of three proteins, in reasonable agreement with their findings. Importantly, we also capture alternative rotameric states that are less populated and, therefore, are invisible to existing crystallographic methods.

Prediction of More Extensive Backbone Dynamics on Longer Time Scales

Just as for side chains, examining longer time scales reveals significantly more structural heterogeneity in the backbone than is detectable on shorter time scales (Figures 5 and 6). For example, the average order parameter for ubiquitin drops from 0.90 for calculations mimicking relaxation-based experiments to 0.70 for calculations mimicking RDC-based experiments. We observe similar drops from 0.89 to at most 0.77 for RNase H and from 0.93 to at most 0.86 for β-lactamase. As was observed for side chains, the agreement between computation and experiment for the backbone is generally greater for core residues, which are also typically less mobile than surface residues.

Figure 5

Figure 6

NMR order parameters for ubiquitin, RNase H, and β-lactamase’s backbone N–H bonds from computation and experiment. This is the same data as in Figure 5 but better highlights the degree of agreement between the two sets of parameters. Filled circles are for core residues, and open circles are for surface residues.

NMR order parameters for ubiquitin, RNase H, and β-lactamase’s backbone N–H bonds measured from experiment (filled circles) and calculated from our models (open diamonds). Relaxation-based order parameters for ubiquitin are from Tjandra et al.,[22] and RDC-based parameters are from Lakomek et al.[42] Relaxation-based order parameters for RNase H and β-lactamase are from Kroenke et al.[25] and Savard et al.,[24] respectively. NMR order parameters for ubiquitin, RNase H, and β-lactamase’s backbone N–H bonds from computation and experiment. This is the same data as in Figure 5 but better highlights the degree of agreement between the two sets of parameters. Filled circles are for core residues, and open circles are for surface residues. The validity of our assertions is supported by reasonable agreement between calculated and experimental order parameters for each system (Figures 5 and 6). For example, the RMSD between calculated and experimental backbone order parameters for ubiquitin is 0.08 for relaxation-based order parameters and 0.20 for RDC-based order parameters (Pearson’s R-values of 0.38 and 0.17, respectively). In addition to the factors that limit the agreement between computation and experiment that we described previously, the correlation coefficients for the relaxation-based order parameters are low in part because of the small spread in their values—making the low RMSD a more informative measure of our performance. The experimental order parameters also drop from 0.89 to 0.69, compared to the computational drop from 0.90 to 0.70, showing that we again capture general trends in the data. We also find reasonable agreement between calculated and experimental relaxation-based order parameters for the backbones of RNase H and β-lactamase. The RMSD between calculated and experimental values is 0.08 for RNase H and 0.10 for β-lactamase (Pearson’s R-values of 0.80 and 0.28, respectively). Interestingly, there are two published sets of relaxation-based order parameters for RNase H’s backbone from 1995[23] and 1999[25] and the agreement between computation and experiment is noticeably better for the newer data set (RMSD decreases by 0.02 and Pearson’s R-value increases by 0.08). Therefore, it is possible that the agreement between computation and experiment would increase for other proteins if the experimental measurements were repeated with newer, more powerful NMR machines. To our knowledge, RDC-based order parameters have not been measured for RNase H or β-lactamase, but we predict that they would be lower than the relaxation-based order parameters by at least 0.1, as discussed above.

Conclusions

We have demonstrated that atomically detailed computational models of proteins employing physically realistic force fields generate dynamics whose scope is more consistent with liquid-like behavior than the crystalline character often attributed to folded proteins. Specifically, there is tremendous variability in side-chain conformations, even within protein cores. NMR order parameters calculated from these models are consistent with experimental measurements, supporting the validity of our models. Applying these same principles to investigate backbone dynamics leads to the prediction that experiments addressing longer time scales will reveal more extensive heterogeneity in these degrees of freedom as well. We expect methodological advances that further extend the range of accessible time scales to reveal even larger fluctuations, and to enable quantification of their thermodynamic and kinetic consequences. Accounting for heterogeneity across these scales will be crucial for deepening our understanding of how proteins function and how we can manipulate protein activity.

Methods

Software

Molecular dynamics simulations were run with GROMACS 4.5[26,27] on the Folding@home distributed computing platform.[28] Markov state models (MSMs) were built with MSMBuilder 2.0.[29,30] Structures were drawn with PyMOL 0.99.[31]

Molecular Dynamics Simulations

Simulations of ubiquitin were performed as described previously.[32] Simulations and Markov models of RNase H and β-lactamase were also taken from previous work.[13] A brief review is given below.

Ubiquitin

A total of 1000 simulations were started from PDB 1UBQ,[33] for an aggregate of 2.3 ms of dynamics. Each simulation was run at 300 K using the AMBER ff96 force field[34] with the GBSA solvation model.[35]

RNase H

A total of 1000 simulations were started from 1F21,[36] for an aggregate of 100 μs of dynamics. Each simulation was run at 300 K using the Amber03 force field[37] with explicit TIP3P water and 9 chlorine ions to neutralize the charge. V-sites were used to allow for a 5 fs time step.

β-Lactamase

A total of 1000 simulations were started from PDB 1JWP,[38] for an aggregate of 81 μs of dynamics. Each simulation was run at 300 K using the Amber03 force field[37] with explicit TIP3P water and 7 sodium ions to neutralize the charge. V-sites were used to allow for a 5 fs time step.

Markov State Model Construction

Markov models for RNase H and β-lactamase were taken from previous work.[13] All Markov models were constructed with MSMBuilder.[29,30] Following a standard protocol,[39] every 10th conformation from the simulations for each protein were clustered with a k-centers algorithm based on the RMSD between Cα and Cβ atoms until every cluster had a radius, i.e., maximum distance between any data point in the cluster and the cluster center—less than 1.2 Å. Then, 10 sweeps of a k-medoids update step were used to center the clusters on the densest regions of conformational space. The remaining 90% of the data was then assigned to these clusters and states with only inbound or outbound transitions discarded. On the basis of their implied time scales, a lag time of 10 ns was used for ubiquitin, 20 ns for RNase H, and 2 ns for β-lactamase.

NMR Order Parameters

NMR order parameters () are the plateau value ofwhere P2 is the second order Legendre polynomial (P2(x) = (3/2)x2 – 1/2), μ[t] is a unit vector along a particular bond vector at time t, and ⟨...⟩ denotes an ensemble average. Relaxation-based (or short time scale) order parameters were calculated by directly evaluating eq 1 at the molecular tumbling time. RDC-based (or long time scale) order parameters were calculated as the long-time limit of eq 1, as has been done previously.[20] Specifically,where μ is the x component of the unit vector along the bond of interest and the average is taken across all the conformations sampled (by using one representative structure from each state in the Markov model and weighting its contribution to the ensemble average by its equilibrium population). Backbone order parameters were measured from a unit vector pointing along the N–H bond, and side-chain methyl order parameters were measured from a unit vector pointing from the carbon atom to the center of mass of the three hydrogens. All measurements were taken relative to a common molecular reference frame by aligning each conformation to the protein’s crystallographic structure. Error bars from bootstrapping of our computational models were typically on the order of 0.001 and, therefore, are not shown for visual clarity.

35 in total

1. A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations.

Authors: Yong Duan; Chun Wu; Shibasish Chowdhury; Mathew C Lee; Guoming Xiong; Wei Zhang; Rong Yang; Piotr Cieplak; Ray Luo; Taisung Lee; James Caldwell; Junmei Wang; Peter Kollman
Journal: J Comput Chem Date: 2003-12 Impact factor: 3.376

2. Equilibrium fluctuations of a single folded protein reveal a multitude of potential cryptic allosteric sites.

Authors: Gregory R Bowman; Phillip L Geissler
Journal: Proc Natl Acad Sci U S A Date: 2012-07-02 Impact factor: 11.205

3. Using generalized ensemble simulations and Markov state models to identify conformational states.

Authors: Gregory R Bowman; Xuhui Huang; Vijay S Pande
Journal: Methods Date: 2009-05-04 Impact factor: 3.608

4. Self-consistent residual dipolar coupling based model-free analysis for the robust determination of nanosecond to microsecond protein dynamics.

Authors: Nils-Alexander Lakomek; Korvin F A Walter; Christophe Farès; Oliver F Lange; Bert L de Groot; Helmut Grubmüller; Rafael Brüschweiler; Axel Munk; Stefan Becker; Jens Meiler; Christian Griesinger
Journal: J Biomol NMR Date: 2008-06-04 Impact factor: 2.835

5. Progress and challenges in the automated construction of Markov state models for full protein systems.

Authors: Gregory R Bowman; Kyle A Beauchamp; George Boxer; Vijay S Pande
Journal: J Chem Phys Date: 2009-09-28 Impact factor: 3.488

6. Native proteins are surface-molten solids: application of the Lindemann criterion for the solid versus liquid state.

Authors: Y Zhou; D Vitkup; M Karplus
Journal: J Mol Biol Date: 1999-01-29 Impact factor: 5.469

7. GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit.

Authors: Sander Pronk; Szilárd Páll; Roland Schulz; Per Larsson; Pär Bjelkmar; Rossen Apostolov; Michael R Shirts; Jeremy C Smith; Peter M Kasson; David van der Spoel; Berk Hess; Erik Lindahl
Journal: Bioinformatics Date: 2013-02-13 Impact factor: 6.937

8. Divalent metal cofactor binding in the kinetic folding trajectory of Escherichia coli ribonuclease HI.

Authors: E R Goedken; J L Keck; J M Berger; S Marqusee
Journal: Protein Sci Date: 2000-10 Impact factor: 6.725

9. Backbone dynamics of TEM-1 determined by NMR: evidence for a highly ordered protein.

Authors: Pierre-Yves Savard; Stéphane M Gagné
Journal: Biochemistry Date: 2006-09-26 Impact factor: 3.162

Review 10. Hydrogen exchange studies of protein structure.

Authors: T M Raschke; S Marqusee
Journal: Curr Opin Biotechnol Date: 1998-02 Impact factor: 9.740

15 in total

1. Enspara: Modeling molecular ensembles with scalable data structures and parallel computing.

Authors: J R Porter; M I Zimmerman; G R Bowman
Journal: J Chem Phys Date: 2019-01-28 Impact factor: 3.488

Review 2. Advanced Methods for Accessing Protein Shape-Shifting Present New Therapeutic Opportunities.

Authors: Catherine R Knoverek; Gaya K Amarasinghe; Gregory R Bowman
Journal: Trends Biochem Sci Date: 2018-12-14 Impact factor: 13.807

3. Quantifying Allosteric Communication via Both Concerted Structural Changes and Conformational Disorder with CARDS.

Authors: Sukrit Singh; Gregory R Bowman
Journal: J Chem Theory Comput Date: 2017-03-22 Impact factor: 6.006

4. Flexible CDOCKER: Development and application of a pseudo-explicit structure-based docking method within CHARMM.

Authors: Jessica K Gagnon; Sean M Law; Charles L Brooks
Journal: J Comput Chem Date: 2015-12-21 Impact factor: 3.376

5. Capturing Invisible Motions in the Transition from Ground to Rare Excited States of T4 Lysozyme L99A.

Authors: Jamie M Schiffer; Victoria A Feher; Robert D Malmstrom; Roxana Sida; Rommie E Amaro
Journal: Biophys J Date: 2016-10-18 Impact factor: 4.033

6. Flexible CDOCKER: Hybrid Searching Algorithm and Scoring Function with Side Chain Conformational Entropy.

Authors: Yujin Wu; Charles L Brooks
Journal: J Chem Inf Model Date: 2021-10-27 Impact factor: 6.162

Review 7. Markov state models of biomolecular conformational dynamics.

Authors: John D Chodera; Frank Noé
Journal: Curr Opin Struct Biol Date: 2014-05-16 Impact factor: 6.809

Review 8. Novel insights in linking solvent relaxation dynamics and protein conformations utilizing red edge excitation shift approach.

Authors: Rupasree Brahma; H Raghuraman
Journal: Emerg Top Life Sci Date: 2021-05-14

9. Conformational heterogeneity of the calmodulin binding interface.

Authors: Diwakar Shukla; Ariana Peck; Vijay S Pande
Journal: Nat Commun Date: 2016-04-04 Impact factor: 14.919

10. Modeling molecular kinetics with tICA and the kernel trick.

Authors: Christian R Schwantes; Vijay S Pande
Journal: J Chem Theory Comput Date: 2015-02-10 Impact factor: 6.006