Literature DB >> 35245056

Molecular Dynamics-Assisted Optimization of Protein NMR Relaxation Analysis.

Janet S Anderson¹, Griselda Hernández², David M LeMaster².

Abstract

NMR relaxation analysis of the mobile residues in globular proteins is sensitive to the form of the experimentally fitted internal autocorrelation function, which is used to represent that motion. Different order parameter representations can precisely fit the same set of 15N R1, R2, and heteronuclear NOE measurements while yielding significantly divergent predictions of the underlying autocorrelation functions, indicating the insufficiency of these experimental relaxation data for assessing which order parameter representation provides the most physically realistic predictions. Molecular dynamics simulations offer an unparalleled capability for discriminating among different order parameter representations to assess which representation can most accurately model a wide range of physically realistic autocorrelation functions. Six currently utilized AMBER and CHARMM force fields were applied to calculate autocorrelation functions for the backbone H-N bond vectors of ubiquitin as an operational test set. An optimized time constant-constrained triexponential (TCCT) representation was shown to markedly outperform the widely used (Sf2,τs,S2) extended Lipari-Szabo representation and the more closely related (Sf2,SH2, SN2) Larmor frequency-selective representation. Optimization of the TCCT representation at both 600 and 900 MHz 1H converged to the same parameterization. The higher magnetic field yielded systematically larger deviations in the back-prediction of the autocorrelation functions for the mobile amides, indicating little added benefit from multiple field measurements in analyzing amides that lack slower (∼ms) exchange line-broadening effects. Experimental 15N relaxation data efficiently distinguished among the different force fields with regard to their prediction of ubiquitin backbone conformational dynamics in the ps-ns time frame. While the earlier AMBER 99SB and CHARMM27 force fields underestimate the scale of backbone dynamics, which occur in this time frame, AMBER 14SB provided the most consistent predictions for the well-averaged highly mobile C-terminal residues of ubiquitin.

Entities: Chemical

Mesh：

Substances：
Amides
Ubiquitin

Year: 2022 PMID： 35245056 PMCID： PMC9009080 DOI： 10.1021/acs.jctc.1c01165

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Protein backbone dynamics in the ps–ns time frame, as characterized by NMR relaxation measurements, are generally analyzed in terms of the orientational autocorrelation functions for each amide 1H–15N bond vector and the associated approximately colinear 15N chemical shift anisotropy tensor. In the absence of ∼ms dynamics that give rise to resonance line-broadening effects, the R1, R2, and heteronuclear NOE values are specified by the spectral density values at J(ω1H + ω15N), J(ω1H), J(ω1H – ω15N), J(ω15N), and J(0), as determined by the Fourier transform of the corresponding bond vector autocorrelation functions. Back-prediction of the underlying physical autocorrelation functions from the experimental data involves the use of simplified mathematical representations of those autocorrelation functions that can be parameterized in terms of the measured NMR relaxation values. With only three experimental relaxation measurements being typically utilized for each amide 15N site in the standard data set, at most, a three-parameter representation can be meaningfully fitted. In the NMR relaxation analysis of many globular proteins, the assumption of statistically separable internal and global dynamics appears to be robust.[1,2] When combined with either isotropic or modestly anisotropic global tumbling, the complete autocorrelation functions can be represented as the product of the individual internal autocorrelation functions and either a single global exponential or a set of bond-specific exponentials to model the global tumbling.[3,4] The order parameter-based analysis of such NMR data, originally introduced by Lipari and Szabo,[5] utilizes a single exponential representation of the internal autocorrelation function with the long time limit of the residual bond vector orientational order being represented as S2, while the time constant τe is adjusted so as to match the integral under the modeled autocorrelation function with that of the physical curve. The value of that integral corresponds to the conjugate spectral density value J(0), which is predominantly determined by the R2 measurement. To gain additional insight into more mobile segments of the backbone by more realistically modeling of the physical autocorrelation function, the widely used (Sf2,τs,S2) representation introduced a fast limit order parameter S, which reflects the degree of bond vector orientational disorder that occurs much more rapidly than the inverse of the highest Larmor frequency monitored.[6] By construction, the (Sf2,τs,S2) representation cannot accurately model any autocorrelation function that has at least two significant distinguishable exponential components that occur in the time frame of the Larmor frequencies being monitored by the NMR experiment. On the other hand, given a reasonable modeling of the global tumbling behavior, physically realistic values for the R1, R2, and NOE measurements can generally be precisely fitted by adjusting the three parameters of the (Sf2,τs,S2) representation. In the effort to address the challenge posed by potentially generating a precise fit for the experimental relaxation data that corresponds to an autocorrelation function prediction that significantly deviates from the physical curve, NMR spectroscopists have commonly utilized the analysis of relaxation values measured at different magnetic field strengths in hopes of reducing the resulting ambiguities. Unfortunately, as is illustrated by the example of Arg 74 in the mobile C-terminal tail of ubiquitin, this is not a robust approach. When an internal bond vector autocorrelation function derived from CHARMM36 simulations is used to calculate 15N relaxation values for a physically realistic global tumbling time, application of the (Sf2,τs,S2) representation at either 600 or 900 MHz 1H field strengths cannot accurately back-predict the original function (Figure ). Furthermore, not only does the higher field back-prediction deviate more severely from the original MD-derived autocorrelation function but the averaging of the 600 and 900 MHz back-predicted curves also deviates more strongly than does the 600 MHz calculation alone. Furthermore, the two field-dependent back-predictions from the (Sf2,τs,S2) representation agree considerably more closely with each other than either curve agrees with the original autocorrelation function, thus indicating that the level of internal discrepancy between the two magnetic field-dependent estimates is only a weak indicator of their robustness. Similar observations apply for each of the nine residues of ubiquitin, which exhibit enhanced conformational dynamics on the ps–ns time frame.

Figure 1

Internal autocorrelation function for the amide H–N bond vector of Arg 74 in the C-terminal tail of ubiquitin. 15N R1, R2, and NOE values were calculated, utilizing the internal autocorrelation function derived from 2 μs CHARMM36 simulations (green) and an isotropic global tumbling time of 4.10 ns. Those derived relaxation values are precisely fitted at 600 MHz (blue) and 900 MHz (red), utilizing the (Sf2,τs,S2) representation[6] with parameter sets (0.767, 1.252 ns, 0.398) and (0.790, 0.813 ns, 0.423), respectively. In principle, the acquisition of relaxation data sets at multiple magnetic fields can provide the statistical opportunity to utilize order parameter representations with more than three parameters so as to enable a more accurate modeling of the underlying physical autocorrelation functions. However, in this regard, it may be noted that even in the context of a single field relaxation analysis, the most commonly invoked four-parameter unconstrained biexponential (Sf2,τf,Ss2,τs) representation can underperform an alternative three-parameter expression that is based upon a simplified version of the time constant-constrained triexponential (TCCT) representation.[7] These results indicate the potential for further optimization of the form of the model autocorrelation function without expanding the number of adjustable parameters. Molecular dynamics simulations provide the most realistic detailed modeling of such conformational dynamics. This implies that the degree to which a particular order parameter representation can be optimized to accurately fit a wide range of MD-derived internal autocorrelation functions provides the best available measure of its adequacy in representing the physical autocorrelation function. In turn, the degree to which such representations are unable to directly represent the physical internal bond vector autocorrelation function fundamentally limits their ability to indirectly back-predict those internal dynamics from experimental NMR relaxation data.[7,8] In the present study, molecular dynamics simulations applying six currently utilized CHARMM and AMBER-based force fields were carried out on ubiquitin, and the derived internal bond vector autocorrelation functions for nine mobile residues of the protein were used to generate a diverse set of such functional curves. This testbed was then be used to establish an optimized parameterization of the time constant-constrained triexponential representation by the back-calculation of NMR relaxation values that are directly determined from those molecular simulation trajectories and modeled global tumbling dynamics. When this optimized parameterization was then applied to experimental 15N R1, R2, and heteronuclear NOE data from ubiquitin, it was apparent that these various AMBER and CHARMM force fields yielded NMR relaxation predictions of markedly differing quality, thus indicating a considerable potential utility for the use of NMR relaxation data in optimizing the conformational kinetics predicted from molecular dynamics force fields. The order parameter representations considered herein are by no means the only mathematical formulations that have been proposed for representing the internal autocorrelation function for protein dynamics analysis.[9−12] Characterizing the quality of the predictive performance provided by an optimized time constant-constrained triexponential (TCCT) representation should provide a valuable benchmark for analyzing the potential benefits that might arise from alternate model representations of the internal autocorrelation function used in NMR dynamics analyses.

Methods

Molecular Dynamic Simulations

Coordinates from the 1.8 Å resolution X-ray structure (PDB code 1UBQ(13)) were modified by the addition of hydrogen atoms to the protein heavy atoms and crystallographically defined water molecules, and all carboxyl and aliphatic amine groups were modeled in the charged state. The protein coordinates were placed in a rectangular water box with no protein atoms within 10 Å of the boundary. The starting configuration for each trajectory was then generated by an independent placement of 13 sodium ions and 13 chloride ions added to establish electroneutrality and an ionic strength near 150 mM. Particle mesh Ewald summation was applied for long-range electrostatics using a 1 Å grid spacing. Trajectory frames were stored at 5 ps intervals, which was previously reported to yield a satisfactory sampling density for the NMR relaxation calculations.[2] Four 500 ns NVE trajectories were calculated at 298 K using CHARMM22/CMAP (denoted here as CHARMM27)[14,15] and CHARMM36[16,17] force fields, with CHARMM-modified TIP3P water utilizing the NAMD2 program.[18] Analogous simulation conditions were applied to sets of four 500 ns trajectories for the AMBER force fields ff99SB,[19] ff14SB,[20] ff15FB,[21] and ff15ipq,[22] utilizing the AMBER16 program with the exception that the TIP3P-FB water model,[23] rather than the standard TIP3P model, was used for the ff15FB and ff15ipq calculations.

Bond Vector Autocorrelation and Spectral Density Function Analysis

Following superposition of the trajectory frames upon the initial frame of the production run, the H–N bond vector internal autocorrelation functions and spectral density functions were calculated as previously described.[8] The internal autocorrelation functions were calculated out to 50 ns to help ensure adequate decay of the complete autocorrelation functions for calculations utilizing global rotational correlation times of up to 10 ns. In analyzing the effect of anisotropic global tumbling, the trajectory frames were superimposed upon the diffusion tensor that was derived from the experimental NMR relaxation data so as to determine the relative orientation of each H–N bond vector with respect to the principal axes of diffusion. While in all cases, the calculation of NMR relaxation values utilizes the full range of the complete autocorrelation function, in statistical analysis of the quality of fit between the order parameter-derived autocorrelation functions and those obtained from the MD trajectories, the standard deviation was calculated starting from 30 ps in recognition of the fact that conformational dynamics that are substantially faster than the 1H Larmor frequency cannot be adequately characterized by this method.[7] To assess the influence of experimental error, 500 sets of model relaxation data were generated by adding a Gaussian sampling of experimental uncertainties to the observed R1, R2, and NOE values as well as to the bond-specific global correlation times. To define a 95% confidence limit for each residue, the 25 Monte Carlo samplings exhibiting the largest variance in their model R1, R2, NOE, and τM values were removed, and best-fit order parameters were calculated for the other 475 samplings. The corresponding internal autocorrelation functions were then calculated for these 475 samplings, and the extremum values for each time point of the autocorrelation functions were determined.[7]

Expression and Purification of [U–2H,15N] Ubiquitin

An E. coli-optimized codon sequence for ubiquitin (Genscript) was cloned into the expression vector pET11a and transformed into BL21-STAR (DE3) (Invitrogen). Growth was carried out in deuterated minimal media containing 0.25% U–2H glycerol (Cambridge Isotopes, Inc.) at 37 °C until the cells reached an OD600 of 0.6 at which time the flasks were quickly chilled to 25 °C and induced overnight with 0.5 mM IPTG (Sigma Chemicals, Inc.). Following a freeze–thaw and lysozyme treatment in 50 mM Tris–acetate pH 8.0, the lysis supernatant was fractionated with 40 and 95% saturated ammonium sulfate precipitations. The final pellet was resuspended into the Tris–acetate buffer and passed through a Sephadex G-50 column equilibrated with the Tris–acetate buffer. After titrating to pH 5.0, the fractions were loaded onto an SP-Sepharose FF column equilibrated with 30 mM Tris–acetate pH 5.0. A salt gradient was run up to 0.4 M NaCl. The pooled fractions were back-exchanged after adding 6% D2O and sufficient solid sodium carbonate to yield a pH of 10. After incubation for 3 h at 25 °C, the ubiquitin solution was adjusted to pH 5.0 and concentrated by centrifugal ultrafiltration. The sample was then exchanged into 25 mM sodium acetate-d3 buffer (pH 5.00) containing 6% 2H2O and then concentrated to 0.5 mM protein.

NMR Spectroscopy and Analysis

HSQC-based 15N R1, R1ρ, and heteronuclear NOE measurements were carried out on ubiquitin at 25 °C, as described,[24,25] utilizing a 600 MHz Bruker Avance III spectrometer and a 900 MHz Bruker Avance spectrometer, both equipped with TXI gradient cryoprobes. The 15N spin lock field was calibrated for the R1ρ experiments by offset-dependent scalar coupling measurements.[26] The R1ρ experiments were repeated at multiple 15N offset frequencies (four at 600 MHz 1H and five at 900 MHz 1H) to minimize the offset correction required for the corresponding R2 determination, as previously described.[27] A recycle time of 3 s was used for the R1 and R1ρ experiments, and a 12 s presaturation pulse was used for the heteronuclear NOE experiments. The complete set of relaxation experiments was repeated for statistical analysis purposes. The median difference between the repeated experiments was used to define the minimal experimental uncertainty, while the resonances exhibiting larger differentials were assigned the correspondingly larger uncertainty. FELIX software (Felix NMR) was used for NMR data processing. A prolate ellipsoidal diffusion tensor approximation has been found to be appropriate for ubiquitin.[28] Fifty comparatively rigid backbone amides were identified based upon applying an iterative 2.5 σ filter to the individual R1, R2, and heteronuclear NOE values. The program Modelfree (version 4.20) was used to derive the corresponding axially symmetric diffusion tensor, having an average global time of 4.10 ns and an axial ratio of 1.22.[29] While the 900 MHz data indicated close consistency in the diffusion tensor predictions when based upon either the three experimental measurements for each of these 50 amides or utilizing only their R1 and R2 data, the corresponding analysis of the 600 MHz data was further improved by utilizing a systematic increase in the heteronuclear NOE values of 0.01. The positions of the nitrogen atoms for these 50 residues were then used for the superpositioning of the individual trajectory frames. The effective global tumbling time for each N–H vector was then derived by averaging the instantaneous rotational diffusion rates for the angle of the bond vector with respect to the principal axis.[30]

Results and Discussion

Mathematical Characteristics of the (Sf2,τs,S2) Representation

In their original approach to model-free analysis of conformational dynamics using NMR relaxation data, Lipari and Szabo demonstrated that their (S2,τe) representation could provide an exact estimate of the limit order parameter S2 by adjusting the time constant for decay of the internal orientational order so as to match the integrals under the experimental and modeled autocorrelation functions as long as that internal motion occurred at rates substantially faster than the highest Larmor frequency sampled by the NMR experiment.[5] Under such circumstances, the S2 order parameter quantifies the proportion of orientational order that is quenched by the global tumbling process. As noted above, the extended parameterization of the widely used (Sf2,τs,S2) representation can generally serve to exactly fit all three of the standardly measured relaxation values. Operationally, how is that fitting accomplished? As with the original Lipari–Szabo formulation, parameter optimization serves to match the integrals under the experimental and modeled autocorrelation functions, thus principally tying the prediction of J(0) and S2 to the R2 value. In turn, the corresponding matching of the R1 and NOE values must be predominantly accomplished by a trade-off between the selected values for τs and Sf2. Any physical or MD-simulated bond vector autocorrelation function can be expected to have some exponential components that are faster than and others that are slower than the single τs value that is optimized to fit the curve using the (Sf2,τs,S2) representation. Thus, as illustrated in Figure , the CHARMM36-derived curve initially decreases more rapidly than the order parameter-derived curves. It then crosses each order parameter curve and continues on with a less steep slope. Specifically, the autocorrelation function for Arg 74 crosses the 900 MHz order parameter curve at 0.620 ns and crosses the 600 MHz curve at 1.075 ns. The corresponding analysis of the other eight residues that lie within the mobile loop segment of Leu 8 to Thr 12 and the C-terminal tail segment of Leu 73 to Gly 76 demonstrated similar crossover times for both magnetic fields (Table ), despite there being a considerable diversity in the dynamics of their bond vector autocorrelation functions. This pattern of similar magnetic field-dependent crossover times for the MD-derived and the (Sf2,τs,S2) back-predicted autocorrelation functions was found to also hold for the trajectories derived from the other five force fields examined.

Table 1

Crossover in CHARMM36-Derived Internal Autocorrelation Functions

	600 MHz (S²,τ_s,S_f²)			900 MHz (S²,τ_s,S_f²)
residue	crossover (ns)	τ_s	S²	crossover (ns)	τ_s	S²
Leu 8	1.150	1.275	0.743	0.640	0.905	0.753
Thr 9	1.100	1.241	0.598	0.640	0.907	0.611
Gly 10	1.060	1.070	0.562	0.645	0.764	0.570
Lys 11	1.085	1.223	0.611	0.665	0.858	0.620
Thr 12	1.090	1.261	0.774	0.670	0.939	0.779
Leu 73	1.050	1.353	0.520	0.655	0.971	0.537
Arg 74	1.075	1.254	0.398	0.620	0.814	0.423
Gly 75	1.075	1.088	0.268	0.625	0.704	0.294
Gly 76	1.030	0.817	0.151	0.660	0.599	0.163

The positioning of these crossover times can be understood by consideration of the discrete Fourier transform that relates the autocorrelation functions to the corresponding spectral density componentsFor the highest sampled frequencies, the incremental time-dependent contributions will be positive when ωt is less than π/2 and negative when ωt is between π/2 and 3π/2. Drawing upon the results of the reduced spectral density analysis formalism,[31−33] the optimal effective 1H frequency ω1H* is approximately 0.87 × ω1H, which at a field strength of 600 MHz for 1H yields values for π/2 and 3π/2 of 0.48 and 1.44 ns, respectively. At the crossover points between the MD-derived and the (Sf2,τs,S2) back-predicted autocorrelation functions around 1.07 ns, the incremental time-dependent contributions to J(ω1H*) will be near their maximally negative values. In sharp contrast, with a Larmor frequency that is approximately 10-fold lower, the corresponding incremental time-dependent contributions to J(ω15N) will still have a cos(ωt) value near 1. Since for the time points immediately preceding the crossover, C(t) is systematically larger for the (Sf2,τs,S2) back-predicted autocorrelation function than for the MD-derived curve, its incremental time-dependent contribution to J(ω1H*) will be more negative. For time points above the crossover point, those relative contributions are reversed. Due to the positive value of cos(ωt) at the 15N frequency, that pattern of relative contributions is reversed for incremental time-dependent contributions to J(ω15N). As a result, the relative intensities of the J(ω1H*) and J(ω15N) values between the MD-derived analysis and the (Sf2,τs,S2) back-predicted autocorrelation functions can be systematically altered by comparatively modest changes in the crossover point generated by shifts in the Sf2 and τs values of the order parameter analysis. As the J(ω1H*) and J(ω15N) values provide the dominant contributions to the heteronuclear NOE and R1 values, respectively, the MD-generated relaxation values can be accurately fitted by shifts in the Sf2 and τs values without accurately matching the MD-derived autocorrelation function. Indeed, in terms of the conformational dynamics that are represented by the instantaneous slope of the autocorrelation function, the sensitivity of this crossover adjustment process is maximized by likewise maximizing the difference in the slope between the two curves at the crossover point (Figure ). As a result, at the crossover time point for the MD and (Sf2,τs,S2) back-predicted autocorrelation function, the discrepancy in the representation of the internal dynamics by these two curves reaches a maximum. As indicated in Table , the large systematic magnetic field-dependent difference in the crossover point between the original and back-predicted autocorrelation functions is accompanied by a comparably large shift in the predicted time constant for internal motion τs. While the back-predictions of the aggregate order parameters S2 are reasonably consistent for the calculations carried out at 600 and 900 MHz, the τs values that were derived at 900 MHz were approximately 30% smaller than those deduced at 600 MHz for each of the nine mobile residues.

MD Trajectory-Dependent Optimization of the Time Constant-Constrained Triexponential Order Parameter Representation

Another proposed approach to order parameter analysis was based upon the qualitative anticipation that since the NMR relaxation signals are most acutely sensitive to motion occurring at the nuclear Larmor frequencies, setting the time constants for the two exponential terms of the model autocorrelation function to the inverse of those frequencies should enhance the parameter responsiveness to the dynamics being modeled.[34] When this Larmor frequency-selective order parameter approach is applied to both 13C and 15N relaxation analysis of molecular dynamics-derived autocorrelation functions of protein G, for the great majority of the H–N and H–C bond vectors of the protein, this (Sf2,SH2,SX2) representation back-predicted the initial MD-derived autocorrelation functions more accurately than did the widely used (Sf2,τs,S2) representation.[8] The only notable exception to this superior performance was for mobile Cβ positions that were directly bonded to a comparatively immobile Cα so that the effectively single exponential dihedral angle rotational transition dynamics around that Cα–Cβ bond dominated the predicted relaxation behavior. From the perspective of analytical utility, the (Sf2,SH2,SX2) representation offers the benefit that, to within the accuracy of the reduced spectral density approximation,[31−33] there exists an analytical relationship between the order parameters of the (Sf2,SH2,SX2) representation and the corresponding NMR relaxation values for a given global correlation time.[8] The linear functional relationship between the order parameter values, the corresponding spectral density values, and the derived NMR relaxation values ensures a highly regularized one-to-one mapping among these values.[8] Furthermore, for positions exhibiting order parameter values below the rigid limit (Si2 = 1), the root-mean-square deviation of the experimental uncertainties in the NMR relaxation measurements was found to closely match the resultant rmsd on the (Sf2,SH2,SX2) order parameters that were derived from the Monte Carlo analysis of those measurements.[7] Three basic considerations have hindered the broader utilization of this Larmor frequency-selective order parameter approach as originally formulated. First is the question as to whether optimal sensitivity to dynamics is actually achieved by specifying the time constants of the representation in terms of the experimental Larmor frequencies. Some support for that assumption has been obtained by making use of the wider range of ps–ns dynamics that are exhibited by protein side-chain mobility. The 13C relaxation analysis of the GB3 protein indicated that a systematic alteration of either of the two Larmor frequency-based time constants led to less robust modeling of the corresponding MD-derived internal autocorrelation functions.[8] Recently, a more formal approach has been proposed for defining an optimized set of “detectors” for the analysis of NMR relaxation in which the authors concluded that the previously described (Sf2,SH2,SN2) formalism corresponds to a special case version of their analysis approach.[35,36] The second practical impediment was the absence of published analysis of how the Larmor frequency-selective order parameter approach could be effectively applied to multifield NMR relaxation data. As illustrated below, the third limitation arises from the recognition that constraining the lower-frequency time constant to the 13C or 15N Larmor frequency leads to degraded robustness when applied to more slowly tumbling proteins. To address each of these considerations, the present study carried out sets of four 500 ns NVE simulations of ubiquitin using each of six force fields. The CHARMM27 (CHARMM22/CMAP)[14,15] and CHARMM36[16,17] calculations utilized the NAMD2 program.[18] The AMBER16 program[37] was applied to the ff99SB,[19] ff14SB,[20] ff15FB,[21] and ff15ipq[22] force fields. The initial set of studies examined how well the MD-derived internal autocorrelation functions could be back-calculated from NMR relaxation values that were obtained from these internal autocorrelation functions being multiplied by an exponential decay as would be expected for an isotropically tumbling protein. Following superposition on the initial frame of the production run, internal autocorrelation functions were calculated for the backbone H–N bond vectors, as illustrated for the CHARMM36 and AMBER 14SB simulations of C-terminal residues Leu 73 to Gly 76 (Figure A,B) and the other four force fields (Figure S1). For each of the four terminal residues, the AMBER 14SB force field predicted enhanced conformational dynamics, both in amplitude and in apparent rate, relative to CHARMM36. The predictions from the two other recent AMBER force fields more closely resemble those of AMBER 14SB, while in contrast, the earlier CHARMM27 and AMBER 99SB force fields calculated a lower degree of orientational disorder than seen for CHARMM36. A more strikingly disparate pattern of conformational dynamics was calculated for the flexible loop residues Leu 8 to Thr 12 in the CHARMM36 and AMBER 14SB simulations (Figure C,D) and the other four force fields examined (Figure S2). While the issue of completeness of conformational sampling will be considered below, these two sets of residues modeled across the six force fields exhibit a substantial range of calculated internal autocorrelation functions. As such, they should provide a useful test of robustness for the order parameter representations being examined.

Figure 2

Internal bond vector autocorrelation functions for the C-terminal tail (Leu 73 to Gly 76) and mobile internal loop (Leu 8 to Thr 12) of ubiquitin as calculated from 2 μs of simulation with CHARMM36 (panels A and C) and AMBER 14SB (panels B and D). As originally proposed, the Larmor frequency-selective representation of the internal autocorrelation function iswhere τh was set to the inverse of 0.87 × ω1H and τx was set to the inverse of ω15N. To illustrate the robustness of this representation as a function of the assumed global tumbling time τM, the internal autocorrelation functions derived from the CHARMM36 simulations for ubiquitin were used to calculate the corresponding R1, R2, and heteronuclear NOE values at 600 MHz 1H for τM values from 3 to 10 ns. These calculated NMR relaxation values were then used to back-predict the original internal autocorrelation functions. For tumbling times near the experimental value of 4.10 ns at 25 °C, the rmsd fits to the MD-derived internal autocorrelation functions exhibited rmsd fits near 0.01 or better, well within the level of experimental accuracy typically expected for protein NMR relaxation measurements (Figure A). On the other hand, as the global tumbling time increased, the quality of these predictions deteriorated significantly, although they still remained superior to the corresponding predictions obtained from the (Sf2,τs,S2) representation (Figure B). These results clearly demonstrate that if the original Larmor frequency-selective representation is to be usefully generalized, the τx parameter of eq must become a function of the experimentally accessible global tumbling time. Encouragingly, the increasingly large discrepancies for the Larmor frequency-based representation at longer tumbling times progress in a near-linear fashion, thus raising the prospect that a similar linear adjustment to the τx parameter might prove useful. In the subsequent analysis, the τx parameter was optimized in the form of a + b × τM.

Figure 3

Standard deviations for the back-prediction of the six force field-derived internal autocorrelation functions at 600 MHz 1H for the nine mobile residues of ubiquitin utilizing the (Sf2,SH2,SN2) and (Sf2,S2,τs) representations (panels A and B, respectively), as averaged over the time interval between 30 ps and the assumed global tumbling time τM. Given the considerable emphasis that has been placed upon the use of magnetic field dependence measurements in such protein NMR relaxation studies, it is warranted to clarify what practical benefits are to be anticipated from such measurements. Field-dependent measurements are clearly central to the characterization of NMR exchange dynamics that occur within the ∼ms line-broadening regime. On the other hand, for residues of a protein that are known not to exhibit significant exchange line-broadening, if the relaxation analysis at one field can accurately back-predict the underlying internal autocorrelation function, the corresponding relaxation values at any other magnetic field strength are directly predictable. While close agreement in predictions from different field strengths is clearly necessary for an accurate representation of the underlying internal autocorrelation function, as the (Sf2,τs,S2) representation analysis of Arg 74 demonstrates (Figure ), it is not a sufficient condition. This discrepancy regarding the lack of correlation between the magnitude of difference between field-dependent back-predictions and the magnitude of their differences from the underlying autocorrelation function demonstrates the importance of establishing an independent criterion for assessing the robustness with which a given model representation can accurately fit the underlying autocorrelation function. While the original Larmor frequency-selective order parameter representation would imply the use of magnetic field-dependent time constants when analyzing NMR relaxation data collected at multiple field strengths, the underlying physical internal autocorrelation function is independent of the magnetic field being used to study it. Furthermore, the more precisely a given order parameter representation can provide a direct fit to a given autocorrelation function, the better that same representation can be expected to provide a precise back-prediction of that internal autocorrelation function from the relaxation values, which have been derived from that autocorrelation function.[8] However, as illustrated in the discussion of magnetic field-dependent analyses with the (Sf2,τs,S2) representation given above, the passage of spectral density values through eq in carrying out the back-prediction of the autocorrelation function introduces a magnetic field-dependent weighting in the sampling of the internal autocorrelation function. To assess the potential benefits of utilizing magnetic field-dependent time constants in an optimized time constant-constrained triexponential representation, initial independent grid searches were carried out at magnetic field strengths corresponding to 600 and 900 MHz 1H. An additional consideration is whether the fast limit order parameter Sf2 should be assigned a nonzero time constant. Our previous studies have indicated that, with the use of a fast limit order parameter, robust fits to MD-derived internal autocorrelation functions are generally only obtained for times of 30 ps and longer.[7,8] This earliest time point corresponds to an approximately 10-fold higher frequency than what is directly sampled in the experimental NMR experiment. As preliminary calculations indicated that comparatively little benefit is to be gained from the use of a third nonzero time constant, our analysis turned to the optimization of the τh and τx (= a + b × τM) factors in eq . The MD-derived internal autocorrelation functions for the nine mobile residues using each of the six force fields were used for back-prediction from the NMR relaxation values obtained from assuming global tumbling times spanning from 3 to 10 ns, assuming magnetic fields of 600 and 900 MHz 1H. For the resultant 864 back-predictions of the MD-derived internal autocorrelation functions, the rmsd values between the back-predicted and original MD-derived curves were determined over the interval between 30 ps and the assumed global correlation time. The practical utility of that choice for the upper time limit is addressed in the later discussion of experimental uncertainty for bond vector-selective global tumbling times in anisotropic proteins. Following initially coarser grid searches over the three parameters used to specify τh and τx, the optimization searches for calculations at 600 and 900 MHz were carried out on 3 × 3 cubes in which the values for τh incremented in intervals of 0.01 ns, while the a and b components of τx were incremented by 0.1 ns and 0.01, respectively. Strikingly, at both magnetic fields, the aggregate rmsd value for the 432 fits to each of the MD-derived internal autocorrelation functions for the nine individual mobile residues at each τM value reached a minimum at τh = 0.34 ns and τx = 2.00 + 0.43 × τM (Figure ). As reflected in the expanded ordinate scale, the back-predictions for the nine mobile residues at 600 MHz with the optimized TCCT representation (Figure A) are 3- to 4-fold more accurate than those provided by the (Sf2,τs,S2) representation (Figure B). The maximal deviation for the back-predictions among all nine mobile residues for all force fields also reached a minimum near these same time constant values, with that value being consistently near 40% above the largest aggregate rmsd value observed at each τM value. To further test the insensitivity of these calculations to the magnetic field strength, analogous calculations that were carried out at 750 MHz for the local neighborhood of these optimized values of τh and τx yielded a similar quality minimum with aggregate rmsd values and maximal deviations for the 432 fits to each of the six MD-derived internal autocorrelation functions that were intermediate between those obtained at 600 and 900 MHz.

Figure 4

Standard deviations for the back-prediction of the six force field-derived internal autocorrelation functions at 600 MHz (panel A) and 900 MHz (panel B) for the nine mobile ubiquitin residues, utilizing the optimized time constant-constrained triexponential (TCCT) representation (Sf2,Sh2, Sx2) as averaged over the time interval between 30 ps and the assumed global tumbling time τM. It may be noted that the optimum value of 0.34 ns obtained for τh is approximately 10% above the τH* value obtained from the reduced spectral density analysis at 600 MHz 1H. In addition, when τM is set to 3.4 ns, the optimized value for τx is equal to the τ15N value of 2.62 ns at 600 MHz 1H. As that τM value equals the global tumbling time of the previously cited GB3 protein, the Larmor frequency-selective relaxation analysis earlier applied to that protein closely matches the optimal parameterization deduced from the present study.[8] In the present study, the lower-case subscript τh has been applied to indicate its approximate correspondence to the inverse 1H Larmor frequency. This particular set of time constant parameters has been determined with respect to nine mobile residues of one protein as assessed by six different force fields. The expectation of a broader utility for this parameterization is indicated by the fact that the analogous calculations over the 3 × 3 grid in which each of the three time parameters was varied by ±10% did not yield any aggregate rmsd values that exceeded the corresponding values of Figure by more than 0.001 for both 600 and 900 MHz calculations. The comparatively shallow minimum suggests this parameterization to be near optimal for a wider range of conformational dynamics. In considering the different sampling patterns generated by the two magnetic field strengths, particularly over the early time points of the internal autocorrelation functions, the derived Sf2 values were compared for the optimized TCCT representation at 600 and 900 MHz. For nearly 80% of the total 3456 optimizations, the Sf2 values obtained at the two field strengths agreed with each other to within ±0.003. Regarding the four more recent force fields considered, for the remainder of the differential Sf2 values, the 900 MHz-derived values for Sf2 were systematically larger. In particular, among the C-terminal four residues, the 900 MHz-derived Sf2 values exceeded those for 600 MHz analysis by an average of nearly 0.03. For these more mobile residues, the back-predicted autocorrelation functions from the 900 MHz analysis more strongly diverged from the original MD-derived curves, particularly at slower tumbling times (Figure ). Particularly for these more mobile residues, a simple averaging of the order parameters obtained from the two magnetic fields was found to systematically degrade the quality of the predictions relative to the values obtained from the 600 MHz analysis alone. In applying these optimized τh and τx time constant values to the full set of 72 backbone amide bond vectors, in nearly all instances, the rmsd fits to the individual sets of three relaxation values during the back-prediction of the internal autocorrelation functions for all six force fields were within 0.0008. Given the cubic spacing of 0.001 for the Sf2, Sh2, and Sx2 parameter grid searches, √3/2 times that value corresponds to the maximum anticipated statistical error. As noted above, this close correspondence in the magnitude of the “noise” levels for both the order parameters and relaxation rates reflects the direct linear relationship that exists between the spectral density values, the relaxation rates, and the Sf2, Sh2, and Sx2 order parameters. The only notable exception to these effectively “perfect” fits was observed for the highly mobile Gly 76, particularly for longer τM values, where for a subset of the force fields, the Sf2, Sh2, and Sx2 parameterization were unable to fully match the derived relaxation values. A comparison was made between the accuracy of the back-predicted internal autocorrelation functions obtained with the (Sf2,τs,S2) and optimized TCCT representations applied to the 63 less mobile backbone amide bond vectors in ubiquitin, as assessed at 600 MHz 1H. While the magnitude of those deviations was reduced, relative to the nine highly mobile residues, the median ratio of the rmsd values for the (Sf2,τs,S2) representation relative to those obtained for the optimized TCCT representation was 3.8 for the six force fields and eight τM values, quite similar to the results obtained for the more mobile residues (Figures B and 4A). To assess the practical significance of these discrepancies in the back-predictions, the fraction of discrepancies in excess of 0.002 and 0.005 was analyzed for each of the six force fields where 0.002 provides an approximate upper bound for the effective statistical noise level and 0.005 serves as a lower-bound estimate for when such a level of discrepancy might begin to perturb the analysis of experimental data. In all cases, the TCCT representation markedly outperforms the (Sf2,τs,S2) representation (Table ). The AMBER 14SB force field exhibits substantially more accurate back-predictions for both representations than does the 99SB force field from which it was derived. In the other instances, the change in performance between earlier and later versions of the two families of force fields is considerably more modest. These results further reinforce the previous conclusion drawn from the 13C relaxation dynamics analysis of the GB3 protein that the comparative benefit of the (Sf2,τs,S2) representation is largely limited to nearly pure single mode dynamical processes such as those exhibited by a rotatable Cβ side chain that is attached to a comparatively rigid Cα position in the protein backbone.[7] For the more complex dynamical behavior exhibited by the protein backbone, the use of the more robust TCCT representation is clearly justified.

Table 2

Autocorrelation Function Back-Calculation for Less Mobile Residues

	TCCT		StS
	% > 0.002a	% > 0.005	% > 0.002	% > 0.005
CHARMM27	3.8	1.4	30.0	4.4
CHARMM36	4.4	0.6	32.7	8.3
AMBER 99SB	11.7	2.8	74.4	30.2
AMBER 14SB	1.4	0.8	41.7	5.2
AMBER 15FB	13.1	0.6	76.6	31.7
AMBER 15ipq	8.5	1.6	67.3	21.2

rmsd between the original and back-predicted MD-derived curves.

rmsd between the original and back-predicted MD-derived curves. Having optimized the values of τh and τx under the assumption that τf is set to zero, the impact of setting τf to a nonzero value was examined. From a functional perspective, this implies that the internal autocorrelation function is set to 1.0 at t = 0 and then decreases to Sf2 with an exponential time constant of τf. In the simpler analysis considered up to this point, that decrease of the internal autocorrelation function to the value of Sf2 is assumed to be arbitrarily fast, relative to the NMR Larmor frequencies sampled. In carrying out a (τf,τh) grid search analysis for τh values in the neighborhood of 0.34 ns, the τf parameter was sampled at increasing values. When evaluated against the MD-derived autocorrelation functions as described above, the quality of the back-predictions was appreciably degraded even for comparatively small values of τf. This behavior reflected the fact that nonzero values of τf give rise to systematically lower optimized values for Sf2 and systematically higher values for Sh2. As a substantial fraction of the relatively immobile amides yield Sh2 predictions near 1.0 when τf is set to zero, for many of these positions, the Sh2 prediction became pegged at 1.0 as the value of τf was increased. Indeed, nearly a third of all amides yielded Sh2 predictions of 1.0 when τf was increased to 10 ps. As the value of Sh2 was not allowed to further increase for these residues, the optimizations for these residues were no longer able to precisely reproduce the input NMR relaxation parameters. Based upon these observations, the technically simpler assumption of two nonzero time constants appears to provide the most robust representation.

Errors in Back-Prediction of Internal Autocorrelation Functions Arising from Uncertainties in Global Tumbling Dynamics

NMR relaxation analysis based upon the use of superimposed MD trajectory frames to deduce internal bond vector correlation functions that are then combined with experimentally determined global tumbling dynamics to predict relaxation rates is conceptually redundant. Unfortunately, such a circuitous path of analysis is warranted by the acute sensitivity that the NMR relaxation behavior of globular proteins exhibits toward the global tumbling dynamics, which severely restricts the utility of the comparatively inaccurate molecular dynamics predictions of the global rotational correlation functions. For global tumbling times up to at least 10 ns, the precision of the back-predictions for the MD-derived internal autocorrelation functions described above compared favorably with the noise level that can be expected from good quality experimental NMR relaxation data. Crucial to this quality of back-prediction is the assumption of a precisely known global tumbling effect. Even in the most favorable case of a protein that exhibits essentially isotropic tumbling behavior, there still remains a degree of experimental uncertainty in determining the correct tumbling rate. Considerably more challenging is the common circumstance of anisotropic tumbling in which each H–N bond vector of the protein will be characterized by bond-specific rotational dynamics. For the highly rigid peptide units of a protein that tumbles as a prolate or oblate ellipsoid, projection of the principal axis of global rotation onto the crystal structure of the protein, combined with the anisotropy factor characterizing the relative rotation rates around the major axis and the two minor axes, is sufficient to determine the global tumbling dynamics for each H–N bond vector. On the other hand, for peptide positions that exhibit appreciable internal conformational dynamics, the distribution of the orientation of those bond vectors with respect to the principal axis of the diffusion tensor must be known, and MD simulations provide the most robust approach to assessing that distribution. For each such bond vector in each trajectory frame, the relative orientation with respect to the principal axis predicts the instantaneous rotational diffusion rates, which are then averaged to yield an effective global tumbling time.[30] While the accurate determination of the effective internal orientation of a N–H bond vector is more challenging for mobile residues, the resultant uncertainty in the predicted relaxation behavior is typically decreased by the orientational averaging of the bond vector, which tends to reduce the magnitude of the anisotropy effect. Indeed, molecular dynamics studies have indicated that the magnitude of deviation from the isotropic relaxation behavior roughly scales with the aggregate order parameter S2 for that bond.[8] The errors in back-predicting the internal autocorrelation function that arise from errors in the estimated global correlation time typically become more severe for the longer time points of the autocorrelation function curve. This consideration has guided the choice of assessing the back-predicted internal autocorrelation functions based upon rmsd values calculated up to the τM value being analyzed. While the global correlation function for an anisotropically tumbling protein formally requires a multiexponential representation,[30] the approximation of a bond-specific single exponential decay is quite robust.[3,4] Specifically, in the context of ellipsoidal rotational diffusion, for ordinate values of the autocorrelation function between 1.0 and 0.01, the single exponential approximation conforms to the exact expression to within an rmsd of 0.01 for all angles when the axial ratio Dz/Dx lies between 0.4 and 2.1.[7] If the assumed global tumbling time constant for a given bond vector is in error, it can be readily anticipated that the back-prediction of the internal autocorrelation function will give rise to a compensating error in the order parameter corresponding to the long time constant τx. To assess how large of an error is introduced into the back-predicted internal autocorrelation function as a result of an inaccurate bond-specific τM value, calculations were made under conditions that roughly approximate the experimental τM value for this protein. The NMR relaxation values predicted from the MD-derived internal autocorrelation functions and an isotropic 4.00 ns global correlation time as discussed above were used to back-predict those internal autocorrelation functions, assuming the global tumbling times of 4.04 and 3.96 ns. For a τM of 4.00 ns, the optimized τx value (2.00 + 0.43 × τM) is 3.72 ns. The distortion in the back-predicted internal autocorrelation function when a 1% error in the time constant for global tumbling is introduced is most easily analyzed for the comparatively rigid amide positions for which the internal motion has largely decayed to a limit S2 value before the significant onset of decay effects from global tumbling. As discussed below, from the experimental measurements on ubiquitin, a set of 50 backbone amide positions were identified as being suitably ordered for use in determining the global diffusion tensor. When an isotropic global correlation time of 4.04 ns was used to back-predict the internal correlation functions for this set of residues with NMR relaxation values that were derived assuming a τM value of 4.00 ns, the resultant average rmsd error in fitting the original MD-derived internal autocorrelation functions was 0.007, consistent with the original 1% error in the assumed tumbling time being scaled down by approximately the aggregate S2 values for those residues. On the other hand, when an isotropic global correlation time of 3.96 ns was used to back-predict from these same NMR relaxation values, the resultant average rmsd error in fitting the original MD-derived internal autocorrelation functions was reduced to only 0.004. Similar to the analysis of nonzero τf values discussed above, for many residues, the increase in the Sx2 parameter needed to compensate for the artificially shortened τM value had risen to a maximum of 1.00 so that further compensation for the foreshortened τM value was blocked. In this circumstance, the inability to fully respond to an erroneous τM value serves to limit how far the optimization procedure can deviate from the correct order parameter values. The residues that exhibited reduced errors in the back-predicted internal autocorrelation functions for a τM of 3.96 ns also exhibited appreciable degradation in the quality of their optimal fits to the relaxation values used for the back-predictions. For the rest of the N–H bond vectors, which exhibited sufficient mobility so as not to “peg” the order parameter values in the 3.96 ns analysis, the scaling of the errors in the back-predicted internal autocorrelation functions relative to the error in the global tumbling time varied roughly proportionate to the aggregate S2 values, as seen under the assumption of τM equal to 4.04 ns. Among the residues with limited internal mobility, the inability of the optimized (Sf2,Sh2,Sx2) order parameters to precisely back-predict the NMR relaxation parameters provides a strong indication of a significant error in either those relaxation parameters or in the global tumbling parameters used to analyze them.

Assessing the Conformational Sampling of the Ubiquitin C-Terminal Tail

The MD-derived internal autocorrelation functions obtained in this study generally exhibit the smooth monotonically decreasing slopes to be expected of a satisfactory statistical sampling of the relevant conformational space. To provide a more rigorous test for the adequate sampling needed to justify comparing these calculations against experimental NMR data, similar autocorrelation functions were derived from four 500 ns AMBER 14SB simulations initiated from the ubiquitin crystal structures 1TBE[38] and 2G45.[39] In each of these crystal structures, the C-terminal tail interacts in a larger protein complex that results in a conformation that differs markedly both from each other and from the 1UBQ crystal structure that was used as the initial model in this study. The resultant internal bond vector autocorrelation functions from the four terminal residues were compared against those predicted using the 1UBQ model (Figure S3). The first half of each production run (250 ns) was excluded from the bond vector correlation function calculations to allow for the initial residual conformational memory of the intermolecular interactions within the crystal form to dissipate. The corresponding exclusion of the first half of the 1UBQ production runs had minimal effect upon the resulting internal autocorrelation functions. The reasonably similar internal autocorrelation functions for each of the four C-terminal residues that were obtained from the three starting protein conformations support the assumption that the simulations of this study provide a usefully extensive sampling of the energetically accessible conformational space for these residues, thus providing justification for their application to the direct prediction of experimentally determined NMR relaxation parameters.

Analysis of Experimentally Determined Internal Bond Vector Autocorrelation Functions

To test the practical utility of this approach in deriving an experimentally based estimate of the internal bond vector autocorrelation functions for the backbone amides of ubiquitin, R1, R1ρ, and heteronuclear relaxation measurements were carried out on a 0.5 mM 2H,15N labeled sample using optimized pulse sequences.[24,25] The pH of the sample was adjusted to 5.00 to minimize the self-aggregation effects that have been observed for ubiquitin at neutral pH.[40,41] An iterative 2.5 σ filter was applied to each of the 15N R1, R2, and NOE sets of measurements on ubiquitin that identified 50 residues exhibiting limited internal dynamics. The relaxation data for these residues were fitted to a prolate global diffusion tensor having an isotropic τM of 4.10 ns and an axial asymmetry of 1.22 at 25 °C.[29] After aligning the principal axis of experimentally determined global diffusion tensor along the Z-axis, the trajectory frames for each of the molecular dynamics simulations were superimposed, utilizing the nitrogen atoms of the 50 relatively immobile residues. The bond vector angles with respect to the Z-axis were then used to predict effective global tumbling times for each bond vector. Initial estimates of the Sf2, Sh2, and Sx2 values that are consistent with the experimental NMR relaxation values were directly calculated by applying the reduced spectral density approximation.[8] Localized grid search optimizations were then carried out utilizing the complete relaxation equations. As anticipated, these experimental data identified the residue segments 8–12 and the C-terminal tail as the principal regions of enhanced backbone conformational dynamics here indicated by a color coding of the autocorrelation function as assessed at the τM value of 4.10 ns (Figure ).

Figure 5

Experimentally determined values of the internal bond vector autocorrelation functions of the ubiquitin backbone at 4.10 ns are displayed as rainbow coloring, with dark blue set to 1.0, green set to 0.69, and red set to 0.0. Residues lacking experimental measurements are set to dark blue. Consistent with the results obtained from the MD-derived autocorrelation functions, for each of the last four residues of the protein, the fast limit order parameter Sf2 was found to be higher for the experimental NMR relaxation data at 900 MHz than for the experimental data at 600 MHz. Over the interval between 30 ps and 4.10 ns, the rmsd between the bond vector autocorrelation functions that were back-predicted from the data at the two different field strengths were 0.011, 0.014, 0.021, and 0.021 for Leu 73 to Gly 76 (Figure ). To further assess the consistency between the internal autocorrelation functions that were back-predicted from experimental data collected at the two magnetic fields as well as for the corresponding results predicted by the various force field trajectories, Monte Carlo simulations were carried out for each amide N–H bond vector in which not only the experimental 15N R1, R2, and NOE values but the predicted bond-specific global correlation times were varied. To facilitate qualitative comparisons between residues exhibiting significantly different dynamical behavior, the Gaussian-distributed perturbations on the 15N R1, R2, NOE, and τM values were scaled by the physically realistic levels of 1%, 1%, 0.01, and 1%, respectively. The 95% confidence limits for each autocorrelation function were obtained from a set of 475 Monte Carlo simulations drawn from 500 random samplings of the four variable parameters from which the 25 samplings with the largest rmsd values for those parameters were removed. The extrema of those 475 curves at each time point of the autocorrelation function were then selected. Given the direct linear relationship between the spectral density functions and the order parameters in the TCCT representation, the bounds of these confidence limits will change proportionate to a change in the experimental uncertainties.[7,8]

Figure 6

Experimentally determined internal bond vector autocorrelation functions for the C-terminal tail of a ubiquitin sample having an average τM of 4.10 ns and an ellipsoidal asymmetry of 1.22 utilizing the optimized TCCT representation, with data from 600 MHz indicated in blue and the 900 MHz data indicated in red. The Monte Carlo-derived 95% confidence limits (dashes) were determined, assuming uncertainties of 1%, 1%, 0.01, and 1% for the R1, R2, NOE, and bond-specific τM values, respectively. For Leu 73 and Arg 74, the 95% confidence limits for the experimental data collected at the two magnetic field strengths are overlapping across essentially the entire portion of the autocorrelation function being analyzed (Figure ). This indicates that the effects of the actual experimental errors and the systematic differences arising from back-prediction at the two different magnetic field strengths lie within the assumed experimental uncertainties. The 95% confidence limits for Gly 75 and Gly 76 at the two magnetic field strengths modestly diverge from each other, in part, due to the larger field strength-dependent divergence of the back-predictions for these two more mobile residues, as discussed above. As the residual orientational order nears zero, the relative bandwidth of the percentage-based 95% confidence limits decreases as well. Returning to the initial illustration of the magnetic field-dependent back-prediction of the CHARMM36-derived internal autocorrelation function for Arg 74 utilizing the (Sf2,τs,S2) representation (Figure ), the analogous back-predictions utilizing the experimental 600 and 900 MHz NMR relaxation data yielded internal autocorrelation functions, which exhibited an rmsd of 0.017 over the interval 30 ps to 4.10 ns (Figure ). This field-dependent difference is only modestly larger than the rmsd of 0.014 obtained between the two corresponding back-predictions utilizing the optimized TCCT representation. On the other hand, back-prediction from the experimental 600 MHz data utilizing these two different order parameter representations yields an rmsd of 0.039, while the analogous comparison for the 900 MHz data yields an rmsd of 0.042. As the differences in the predictions generated by the two different order parameter presentations are nearly three times as large as the field strength-dependent differences obtained for each representation, these results affirm the conclusion drawn from the MD-derived autocorrelation analysis that the degree of similarity between the back-predictions derived from magnetic field-dependent measurements does not necessarily provide a useful measure of their accuracy.

Figure 7

Experimentally determined internal bond vector autocorrelation function for the amide H–N bond vector of Arg 74 utilizing the optimized TCCT representation, with 600 MHz data indicated in blue and the 900 MHz data indicated in red as well as the (Sf2,τs,S2) representation with 600 MHz data indicated in cyan and the 900 MHz data indicated in orange. The internal consistency in the analysis of the MD-derived internal bond vector autocorrelation functions with the optimized TCCT representation provides a firm basis for confidence in the capability of that representation to extract usefully reliable estimations of the physical autocorrelation functions underlying experimental protein NMR relaxation data. Given the markedly varying predictions of these autocorrelation functions provided by the six force fields examined here (Figures , S1, and S2), it is appropriate to consider how useful this analysis of experimental data may prove to be in characterizing the quality of the dynamics predictions provided by such force fields. To further pursue such comparisons, it is necessary to examine in more detail how the bond vector orientation information represented in the molecular dynamics simulation is quantitatively correlated with the observed NMR relaxation data. The two crucial parameters to be specified for such backbone amide bond vectors are the 15N chemical shift anisotropy (CSA) and the effective N–H bond length. The smaller of the two principal contributions to 15N NMR relaxation arises from the asymmetrical electron distribution surrounding the 15N nucleus. Although evidence for residue-specific 15N CSA values has been reported,[42−44] the approximation of a uniform value appears to be reasonably robust in conventional protein NMR relaxation studies. Given the 1/r6 dependence of the dipole–dipole interaction between the 1H and 15N nuclei, a highly precise estimate of the bond length is required. This estimate is complicated by the fact that radial fluctuations of the 1H and 15N nuclei serve to shorten that effective distance, while angular fluctuations cause the effective bond length to increase. As the covalent bond preferentially suppresses the magnitude of the radial fluctuations, the effective bond length for dipolar relaxation effects is lengthened relative to crystallographically determined values.[45] Further complicating the quantitative analysis of the relaxation interactions is the fact that classical molecular dynamics simulations disregard quantum mechanical dynamics and suppress the high-frequency classical vibrational/librational motions by freezing the H–X bond lengths during the calculations.[46] Each of the order parameter representations analyzed in this study makes use of a fast limit order parameter Sf2, which directly scales the corresponding spectral density values. Since a change in the assumed effective bond length can be precisely compensated for by a corresponding shift in the Sf2 value, in practical terms, the bond length parameter is optimized against the order parameter values that are deduced from the analysis of molecular dynamics trajectories of the comparatively immobile amides of the protein structure. The conceptual utility of the effective bond length parameter is somewhat compromised by the fact that different force fields predict different average values for the degree of orientational disorder of the amide bond vectors over the dynamical range that is used to deduce the order parameters for the comparatively immobilized residues. As shown in Table , the average value of the internal bond vector autocorrelation functions at t = 30 ps for the 50 motionally restricted residues used to analyze the global tumbling dynamics varies by a factor of 4.0%, while at t = 4.10 ns, that variation among these six force fields increases to 6.0%. As this range of dynamics is far slower than the simulated motions that might be perturbed by the freezing of H–X bond lengths during the molecular dynamics calculations, scaling of the predicted NMR relaxation values based upon a force field-specific effective bond length can potentially be misleading. In the present analysis, the value of 1.030 Å for the effective bond length has been selected as it yielded similar average predictions for these motionally restricted amides, as those obtained with the two CHARMM force fields and the widely used AMBER 14SB force field.

Table 3

Average Internal Autocorrelation Function Valuesa

	AC_i (30 ps)	AC_i (4.10 ns)
600 MHz exp (1.030 Å)	0.892	0.880
CHARMM27	0.895	0.881
CHARMM36	0.891	0.873
AMBER 99SB	0.874	0.844
AMBER 14SB	0.886	0.871
AMBER 15FB	0.860	0.828
AMBER 15ipq	0.874	0.849

50 motionally restricted residues.

50 motionally restricted residues. Assuming this value of 1.030 Å for the effective bond length, the internal autocorrelation functions derived from the six force fields were compared to the corresponding autocorrelation functions back-predicted from the 15N relaxation data obtained at 600 MHz (Table ), reflecting the more robust performance that is obtained from lower magnetic field strength data for these mobile amide positions. The AMBER 14SB force field yielded the closest predictions for both Leu 73 and Arg 74, while its reasonably precise predictions for Gly 75 and Gly 76 were modestly bested by the AMBER 15ipq force field. In stark contrast, both CHARMM force fields and the earlier AMBER 99SB force field predict a considerably lower level of conformational dynamics for the C-terminal tail. For all four residues, the AMBER 15FB force field yielded autocorrelation function predictions that were of a quality intermediate between the comparatively accurate predictions from the AMBER 14SB and 15ipq force fields and the overly rigid predictions from the three other force fields. Quantitative estimation of the statistical significance of the rmsd values of Table would require a precise characterization of the statistical robustness of the various force field simulations. That said, the marked variations among the autocorrelation functions derived from these force fields and the apparent ability to back-predict the physical autocorrelation functions to a precision comparable to the statistical uncertainty of the experimental NMR relaxation measurements indicate considerable potential for utilizing such measurements as a means of quantitatively assessing the accuracy with which specific force fields can model the dynamics of protein backbone motion.

Table 4

Fit to Back-Predicted Internal Autocorrelation Functionsa

	Leu 73	Arg 74	Gly 75	Gly 76
CHARMM27	0.076	0.066	0.140	0.172
CHARMM36	0.087	0.124	0.148	0.103
AMBER 99SB	0.178	0.280	0.277	0.228
AMBER 14SB	0.045	0.062	0.015	0.028
AMBER 15FB	0.106	0.083	0.021	0.069
AMBER 15ipq	0.076	0.074	0.010	0.015

rmsd between MD-derived and back-prediction from 600 MHz NMR data.

Conclusions

Modeling analysis based upon protein molecular dynamics simulations provides a powerful means for optimizing the mathematical representation of the internal bond vector autocorrelation functions used to interpret 15N NMR relaxation data for backbone amide positions. Optimization of the time constant-constrained triexponential (TCCT) representation was obtained at both 600 and 900 MHz 1H when the shortest time constant τf is set to zero, the τh time constant is set to 0.34 ns, and the global tumbling time-dependent τx time constant is set to 2.00 ns + 0.43 × τM. As averaged across six CHARMM and AMBER molecular force fields at 600 MHz 1H, the MD-derived autocorrelation functions can be back-predicted from the corresponding R1, R2, and heteronuclear NOE NMR relaxation values to within 1.0% over the global tumbling times of 3–7 ns for each of the nine mobile residues of ubiquitin. The modest deterioration of those back-predictions at slower tumbling rates suggests the potential for useful applicability for this order parameter representation at τM values even beyond 10 ns. The optimized TCCT representation is somewhat less robust for the NMR analysis of relaxation data collected at 900 MHz 1H. With an optimal performance for τM values near 5–6 ns, the increasingly severe systematic deviations in the back-predicted autocorrelation functions that occur at slower tumbling times utilizing these higher field data will likely limit the practical value of such data. Any fractional averaging of the order parameters obtained at 600 and 900 MHz 1H resulted in a reduced accuracy as compared to the predictions obtained at the lower field. That observation illustrates the fact that the practical benefit derived from NMR relaxation measurements obtained at multiple magnetic fields for analyzing protein motion in the ps–ns time frame is dependent upon the robustness with which the mathematical representation of the autocorrelation function is capable of modeling the underlying dynamics at each of those magnetic field strengths. As the differences among the internal autocorrelation functions derived from the various force field calculations far exceed the estimated uncertainties associated with the experimentally derived autocorrelation functions, the described approach to NMR relaxation analyses appears to offer considerable promise for helping to guide the further optimization of dynamical modeling in the time frame surrounding the 1H and 15N Larmor frequencies.

39 in total

Review 1. Nuclear magnetic resonance methods for quantifying microsecond-to-millisecond motions in biological macromolecules.

Authors: A G Palmer; C D Kroenke; J P Loria
Journal: Methods Enzymol Date: 2001 Impact factor: 1.600

2. Extending the treatment of backbone energetics in protein force fields: limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations.

Authors: Alexander D Mackerell; Michael Feig; Charles L Brooks
Journal: J Comput Chem Date: 2004-08 Impact factor: 3.376

Molecular Dynamics-Assisted Optimization of Protein NMR Relaxation Analysis.

Introduction

Methods

Molecular Dynamic Simulations

Bond Vector Autocorrelation and Spectral Density Function Analysis

Expression and Purification of [U–2H,15N] Ubiquitin

NMR Spectroscopy and Analysis

Results and Discussion

Mathematical Characteristics of the (Sf2,τs,S2) Representation

MD Trajectory-Dependent Optimization of the Time Constant-Constrained Triexponential Order Parameter Representation

Errors in Back-Prediction of Internal Autocorrelation Functions Arising from Uncertainties in Global Tumbling Dynamics

Assessing the Conformational Sampling of the Ubiquitin C-Terminal Tail

Analysis of Experimentally Determined Internal Bond Vector Autocorrelation Functions

Conclusions

Review 1. Nuclear magnetic resonance methods for quantifying microsecond-to-millisecond motions in biological macromolecules.

2. Extending the treatment of backbone energetics in protein force fields: limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations.

3. Rotational velocity rescaling of molecular dynamics trajectories for direct prediction of protein NMR relaxation.

4. Reducing bias in the analysis of solution-state NMR data with dynamics detectors.

5. Analysis of NMR Spin-Relaxation Data Using an Inverse Gaussian Distribution Function.

6. The ubiquitin binding domain ZnF UBP recognizes the C-terminal diglycine motif of unanchored ubiquitin.

7. Backbone dynamics of the natively unfolded pro-peptide of subtilisin by heteronuclear NMR relaxation studies.

8. Long-range motional restrictions in a multidomain zinc-finger protein from anisotropic tumbling.

9. Structure of tetraubiquitin shows how multiubiquitin chains can be formed.

10. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB.