Literature DB >> 25620895

Predicting protein backbone chemical shifts from Cα coordinates: extracting high resolution experimental observables from low resolution models.

Aaron T Frank1, Sean M Law, Logan S Ahlstrom, Charles L Brooks.   

Abstract

Given the demonstrated utility of coarse-grained modeling and simulations approaches in studying protein structure and dynamics, developing methods that allow experimental observables to be directly recovered from coarse-grained models is of great importance. In this work, we develop one such method that enables protein backbone chemical shifts (1HN, 1Hα, 13Cα, 13C, 13Cβ, and 15N) to be predicted from Cα coordinates. We show that our Cα-based method, LARMORCα, predicts backbone chemical shifts with comparable accuracy to some all-atom approaches. More importantly, we demonstrate that LARMORCα predicted chemical shifts are able to resolve native structure from decoy pools that contain both native and non-native models, and so it is sensitive to protein structure. As an application, we use LARMORCα to characterize the transient state of the fast-folding protein gpW using recently published NMR relaxation dispersion derived backbone chemical shifts. The model we obtain is consistent with the previously proposed model based on independent analysis of the chemical shift dispersion pattern of the transient state. We anticipate that LARMORCα will find utility as a tool that enables important protein conformational substates to be identified by “parsing” trajectories and ensembles generated using coarse-grained modeling and simulations.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 25620895      PMCID: PMC4295808          DOI: 10.1021/ct5009125

Source DB:  PubMed          Journal:  J Chem Theory Comput        ISSN: 1549-9618            Impact factor:   6.006


Introduction

Characterizing the folding/unfolding pathway of proteins remains an outstanding and significant challenge in structural biology. Though the emphasis has been on characterizing the native state structure of proteins, new experimental techniques are now being developed that enable transiently populated intermediates along the protein folding pathway to be characterized at the atomic scale.[1−6] Identifying such folding intermediates has always been viewed as an important task, but now as these and other non-native states have been implicated in several diseases,[7] developing approaches that enable the “complete” folding pathway to be characterized is of even greater importance. In principle, classical molecular dynamics (MD) simulations, which can generate full atomic trajectories of a protein by propagating Newton’s equations of motions, can be used to characterize its folding pathway(s). However, rigorous MD simulations are computationally expensive, making it difficult to simulate protein folding; typical simulations are on the order of nanosecond to microseconds, whereas proteins (with the exception of fast-folders) fold on a time scale of milliseconds and beyond. Though recent advances in computer hardware, software, and methodology[8−11] now allow the long time scale dynamics of some proteins to be studied,[12] these approaches still require significant computational resources. Thus, there remains a keen need for approaches that allow the folding pathway(s) of proteins to be characterized using readily available computational resources. Coarse-grained molecular simulations, in which the full atomic system is reduced to a smaller less complex system of interacting “coarse-grained” particles, have been used to overcome the “mismatch” between simulation and biological time scales by sacrificing resolution for enhanced sampling efficiency. Remarkably, despite their simplicity, coarse-grained modeling and simulation approaches have been used to provide significant insights into protein functioning.[13−16] Of considerable interest is the use of coarse-graining within a “multiscale” approach, in which coarse-grained simulations are used to rapidly and exhaustively sample the conformational space of a target protein, and then “selected” conformers from the coarse-grained simulations are used to “seed” more rigorous all-atom simulations. One approach to identifying relevant “seed” conformations is to use advanced clustering[17−19] and other data-reduction techniques.[20] Alternatively, experimental data can be used to select relevant “seed” structures by constructing ensembles that are consistent with the ensemble averaged data.[21−25] A prerequisite for such an “experimentally-augmented” identification of relevant conformational substates is the ability to calculate experimental observables from structural models, and, in the context of coarse-grained modeling, this typically requires mapping reduced models back to their all-atom representations.[26−31] Unfortunately, in addition to suffering from issues of nonuniqueness, this mapping incurs an additional computational cost; typically coarse-grained approaches generate on the order 106 conformers, so this additional cost can be significant. Techniques are therefore needed that maximize the structural information that can be directly extracted from coarse-grained models and thus obviate the need for all-atom reconstruction of an entire trajectory or ensemble generated using coarse-grained simulations. NMR relaxation dispersion (NMR-RD) experiments have recently garnered significant attention because they allow NMR observables of low-populated intermediates to be detected. These observables can be then used to “unveil” the structure of these previously “invisible” states. Using such an approach, NMR-RD derived chemical shifts, which are exquisitely sensitive to protein structure, have now been used to structurally characterize the folding intermediates of several proteins.[32−34] Incorporating NMR-RD derived chemical shifts into the analysis of coarse-grained simulations would allow relevant intermediate states that are sampled along the folding/unfolding pathways to be identified. As a first step toward being able to use NMR chemical shifts to “parse” trajectories or ensembles generated using coarse-grained modeling and simulations, we introduce LARMORCα, a prediction method that allows protein backbone (1HN, 1Hα, 13Cα, 13C, 13Cβ, and 15N) chemical shifts to be predicted based only on Cα-based atomic coordinates. In what follows we (i) describe the model used to generate LARMORCα; (ii) assess the accuracy of LARMORCα; (iii) assess the sensitivity of LARMORCα predicted chemical shifts to protein structure; and (iv) use coarse-grained simulations and LARMORCα to characterize the transient state of the gpW, a small fast-folding protein.

Methods

Training and Testing Set

LARMORCα predictors were trained and tested using the RefDB database.[35] Briefly, the data set contains proteins for which both high-resolution X-ray structures and NMR chemical shifts are available in the Protein Data Bank (PDB: http://www.pdb.org) and Biological Magnetic Resonance Bank (BMRB: http://www.bmrb.wisc.edu/), respectively. The training and testing set used here consisted of 196 and 61 proteins, respectively (Table S1).

Cα-Cα Distance-Based Structure Features

The distance-based structural features used to predict backbone chemical shifts from Cα coordinates were identical to those recently used by the program PCASSO to assign protein secondary structure from Cα coordinates.[36] Briefly, for a given residue, i, a set of 43 features are calculated from the Cα coordinates and the pseudocenter coordinates, respectively (see Table S2 and also ref (36) for a list of all features). The pseudocenter for the ith residue is defined as the center-of-geometry between Cα(i) and Cα(i+1). The feature vector, V(i), for the ith residue is made up by features from the ith, i–1th, and i+1th residues which results in a total of 258 feature elements (Table S2).

Generating LARMORCα Using Randomized Decision Trees

For all proteins in the training set, the Cα-Cα distance-based structural features described above were extracted from the X-ray structures and combined with their corresponding chemical shift data. The resulting data set was used as input to build a set of models to separately predict 1HN, 1Hα, 13Cα, 13C, 13Cβ, or 15N backbone chemical shifts. Specifically, for each backbone nucleus type, the random forest machine learning technique, implemented in the RandomForest module in the Scikit-learn Python package,[37] was used to build a predictive model. Each random forest predictor consisted of 50 randomized decision trees, and the maximum depth was set to 25. Each node in a given tree was split using the best splitting variable from a subset of 16 randomly chosen feature variables. The minimum number of samples required for splitting an internal node and the minimum number of samples required in a leaf node were both set to 5. Default values were used for all other parameters.

Molecular Dynamics (MD) Simulations

Coarse-grained decoy pools were generated for four arbitrarily chosen proteins in the testing set (PDB ID 1LM4: chain B,[38]2C5L: chain C,[39]1DYT: chain A[40] and 1H4A: chain X[41]) using Go̅ model MD simulations. The native contacts used to define the Go̅ models were derived from the coordinates in the corresponding PDBs using the MMTSB Go̅ Model Server (https://mmtsb.org/webservices/gomodel.html).[42,43] All simulations were carried out using CHARMM MD simulation package.[44] Trajectories of 7.5 μs each were propagated using Langevin dynamics at 300 K with a friction coefficient of 0.10 ps–1. All bonds were constrained using SHAKE,[45] and nonbond interactions were truncated at 25 Å with a smooth switching function between 21 and 23 Å. Go̅ model MD simulations of the gpW protein were carried out using the same procedure described above. In this case, the native contacts used to define the Go̅ model were derived from the gpW NMR structure (PDBID: 2L6Q; model 1).[46,47] For all the proteins, simulations at 300 K enabled both native and non-native conformations to be sampled.

Chemical Shift Analysis

For each of the four test systems, backbone chemical shifts were predicted from the resulting trajectory using LARMORCα, and then the weighted-root-mean-squared-error (wRMSE) between predicted and measured chemical shifts along each trajectory were calculated. The wRMSE is given aswhere δpred, δmeas, and w are the predicted chemical shift, the measured chemical shift, and weighting factor, respectively, for a given nucleus, i. The summation runs over the set of N chemical shifts. The weighting factors (w) were used to account for the differential accuracy of the predictors. Specifically,where R and MAE are the estimated Pearson correlation coefficient and estimated mean-absolute-error, respectively, between the measured and LARMORCα predicted chemical shifts for the nucleus type associated with i. The weighting factor also scales the contribution to the overall error such that nuclei with different dispersion ranges can contribute equally to the wRMSE. In addition to extracting the model with the lowest wRMSE and then comparing to the native structure, for each of the four systems receiver-operator-characteristic (ROC) analysis was carried out. First, the fraction of native contacts, Q, was calculated for each conformer along each trajectory. Conformers along the trajectories were then designated as native if Q > 0.90 and non-native otherwise. ROC curves were plotted for each test case, and the area-under-curve (AUC) was determined. Here the AUC, which ranges between 0 and 1, was used as a measure of the resolving power of the LARMORCα predicted chemical shifts. An AUC approaching 1 indicated that the models with the lowest error wRMSE corresponded to the native conformer and thus the wRMSE was effective at resolving native and non-native conformers, whereas an AUC of 0.5 indicated that the use of wRMSE to distinguish native from non-native conformers was no better than a random designation. In addition to carrying out ROC analysis using the total wRMSE, parallel analyses were carried out using only 1HN, 1Hα, 13Cα, 13C, 13Cβ, or 15N chemical shifts. For the gpW protein, the wRMSE between LARMORCα predicted chemical shifts and measured chemical shifts corresponding to (i) the native states and (ii) the transient state were determined.[46] The conformation closest to the average structure of the 10 models exhibiting the lowest wRMSE was then extracted and considered representative of the state.

Results and Discussion

The prospect of predicting backbone chemical shifts directly from Cα atomic coordinates opens up the possibility of utilizing chemical shifts to parse trajectories of Cα-based coarse-grained simulations and so identify intermediate states along the folding pathway of proteins. However, relying only on Cα atomic coordinates reduces the information content in the models and thus places an inherent limit on how accurately chemical shifts can be predicted. In what follows, we first examine the accuracy with which LARMORCα predicts backbone chemical shifts and then compare it to all-atom prediction methods. We then assess whether, given its current accuracy, LARMORCα predicted backbone chemical shifts are likely to be of utility in resolving protein structure. The latter is essential because it is sensitivity to protein structure rather than absolute prediction accuracy that will be most important when utilizing chemical shifts to study the folding pathway of proteins.

LARMORCα Backbone Chemical Shift Prediction Accuracy Is Comparable to Some All-Atom-Based Approaches

We began our analysis by determining the accuracy with which LARMORCα predicts protein backbone chemical shifts for the proteins in the testing set (Table S1). The accuracy of the predictions for 1HN, 1Hα, 13Cα, 13C, 13Cβ, and 15N nuclei were quantified by computing the root-mean-square-error (RMSE), mean-absolute-error (MAE), and the Pearson correlation coefficient (R) between LARMORCα predicted chemical shifts and measured chemical shifts. The results are summarized in Table 1. For 1HN, 1Hα, 13Cα, 13C, 13Cβ, and 15N the RMSE, MAE and R, calculated over all corresponding chemical shifts in the testing set, were 0.54, 0.35, 1.21, 1.79, 4.18, and 3.32 ppm, 0.39, 0.25, 0.90, 1.03, 1.55, and 2.45 ppm, and 0.67, 0.77, 0.82, 0.93, 0.94, and 0.81, respectively.
Table 1

Backbone Chemical Shifts Prediction Accuracya

nucleusRMSE (ppm)MAE (ppm)Rno. of shifts% prediction outliers
0.54 (0.44)0.39 (0.35)0.67 (0.75)57762.98
HN0.35 (0.28)0.25 (0.22)0.77 (0.83)83463.34
1.21 (1.06)0.90 (0.83)0.82 (0.86)88562.31
C1.79 (0.99)1.03 (0.76)0.93 (0.98)72185.96
4.18 (1.06)1.55 (0.81)0.94 (1.00)73227.39
N3.32 (2.88)2.45 (2.25)0.81 (0.85)81252.43

The root-mean-square-error (RMSE), mean-absolute-error (MAE), and Pearson correlation coefficient (R) between LARMORCα-predicted and experimental chemical shifts. For each nucleus, the RMSE, MAE, and R statistics were calculated using all chemical shifts in the testing set. In parentheses are the statistics obtained when prediction outliers are excluded. Also listed for each nucleus is the number of chemical shifts over which the statistics were computed and the percentage of prediction outliers identified. A prediction outlier is identified as having an error that exceeds the median error by more than three standard deviations (i.e., the 3-sigma rule).

The root-mean-square-error (RMSE), mean-absolute-error (MAE), and Pearson correlation coefficient (R) between LARMORCα-predicted and experimental chemical shifts. For each nucleus, the RMSE, MAE, and R statistics were calculated using all chemical shifts in the testing set. In parentheses are the statistics obtained when prediction outliers are excluded. Also listed for each nucleus is the number of chemical shifts over which the statistics were computed and the percentage of prediction outliers identified. A prediction outlier is identified as having an error that exceeds the median error by more than three standard deviations (i.e., the 3-sigma rule). The large discrepancy between the RMSE and MAE is indicative of the presence of a small set of large prediction outliers. To confirm this, outlier analysis was carried out for each backbone nucleus. Specifically, we identified possible prediction outliers using the 3-sigma rule, i.e. a prediction outlier was identified as one that had an error that exceeded the median error by more than three standard deviations. When excluding the prediction outliers–on average ∼4.0% of the total testing set for each nucleus–the RMSE and MAE decreased to 0.44, 0.28, 1.06, 0.99, 1.06, and 2.88 ppm and 0.35, 0.22, 0.83, 0.76, 0.81, and 2.25 ppm and the R increased to 0.75, 0.83, 0.86, 0.98, 0.99, and 0.85, for 1HN, 1Hα, 13Cα, 13C, 13Cβ, and 15N nucleus, respectively. Consistent with our expectation, backbone chemical shifts predicted using LARMORCα were generally less accurate than those calculated using all-atom methods. For example, SHIFX2[48] and SPARTA+,[49] which are currently the “gold-standard” for empirical structure-based protein chemical shift prediction, exhibited significantly lower RMSE over the testing set (Table S3) and had mean Rs of 0.98 and 0.92, respectively, compared with 0.82 for LARMORCα (Table S4). A similar picture emerges when comparing LARMORCα to CamShift;[50] the mean R for CamShift was 0.89 (Tables S3 and S4). However, LARMORCα predicts backbone chemical shifts with an accuracy comparable to PROSHIFT[51] and SHIFTS;[52] the mean R for PROSHIFT and SHIFTS were 0.86 and 0.81, respectively, compared to LARMORCα’s 0.82. When prediction outliers were accounted for, the overall accuracy of LARMORCα prediction accuracy was on par with Camshift (Tables S3 and S4). Together these results show that although LARMORCα generally predicts backbone chemical shifts less accurately than all-atom methods, with the exception of SHIFTX2 and SPARTA+, the drop off in accuracy is not too severe, this despite predicting backbone chemical shifts based only on Cα coordinates.

Sensitivity to Structure Allows LARMORCα To Distinguish Native and Non-Native States

Next, we examined whether chemical shifts predicted by LARMORCα were sensitive to protein structure by assessing their ability to resolve native structure from decoy conformational pools that contained both native and non-native conformers. If sensitive to protein structure, the native-like models in the decoy pool should exhibit the lowest error between LARMORCα predicted chemical shifts and measured chemical shifts and vice versa. To test this, decoy pools for 4 arbitrarily chosen proteins in the testing were generated using Go̅ model MD simulations. The final pools contained a total of 100,000 conformers. As shown in Figure 1, the decoy pools generally contained a mixture of native and non-native conformers. For each protein, LARMORCα was used to predict backbone chemical shifts for every conformer in the decoy pool, and then the corresponding wRMSE was computed. The fraction of native contacts (Q) was also determined for every conformer in the decoy pool. Receiver-operator-characteristic (ROC) analysis was then carried out to assess the extent to which native-like conformers (Q > 0.90) could be resolved from non-native conformers (Q ≤ 0.90).
Figure 1

Sensitivity of LARMORCα chemical shifts to protein structure (I). Shown are the results of using LARMORCα predicted chemical shifts to resolve native conformers from decoy pools generated using Cα-Go̅ model MD simulations. Results are shown for four arbitrarily chosen proteins in the testing set: PDB IDs (A) 1LM4, (B) 2C5L, (C) 1DYT, and (D) 1H4A, respectively. Shown for each protein are plots of the distribution of the fraction of native contacts (Q) in the decoy pool and the ROC curves (right). The plots characterize the degree to which the wRMSE between measured and LARMORCα predicted chemical shifts can distinguish native from non-native conformers in the decoy pools. In addition to ROC curves obtained using the total chemical shift error (black), separate ROC curves are shown when using only 1HN (red), 1Hα (orange), 13Cα (green), 13C (purple), 13Cβ (cyan), or 15N (blue) chemical shifts. The AUC values associated with each ROC curve are shown in boxes.

Sensitivity of LARMORCα chemical shifts to protein structure (I). Shown are the results of using LARMORCα predicted chemical shifts to resolve native conformers from decoy pools generated using Cα-Go̅ model MD simulations. Results are shown for four arbitrarily chosen proteins in the testing set: PDB IDs (A) 1LM4, (B) 2C5L, (C) 1DYT, and (D) 1H4A, respectively. Shown for each protein are plots of the distribution of the fraction of native contacts (Q) in the decoy pool and the ROC curves (right). The plots characterize the degree to which the wRMSE between measured and LARMORCα predicted chemical shifts can distinguish native from non-native conformers in the decoy pools. In addition to ROC curves obtained using the total chemical shift error (black), separate ROC curves are shown when using only 1HN (red), 1Hα (orange), 13Cα (green), 13C (purple), 13Cβ (cyan), or 15N (blue) chemical shifts. The AUC values associated with each ROC curve are shown in boxes. With the exception of 2C5L, the AUC determined from ROC curves (when using all available backbone chemical shift data) were all ≥0.95; the AUC for 2C5L was ∼0.70 (Figure 1). Similar results were obtained if 1HN, 1Hα, 13Cα, 13C, 13Cβ, or 15N chemical shifts were used separately; with the exception of Cα and N nuclei for 1LM4 and 1HN, 1Hα, 13Cα, 13C, and 15N nuclei for 2C5L, the AUC were all ≥0.88 (Figure 1). Encouragingly, for all four proteins, when using all available backbone chemical shifts, the models with the lowest wRMSE had Q ≥ 0.97 (Figure 2).
Figure 2

Comparison between X-ray structures and models with the lowest chemical shift error. Side-by-side comparison of the X-ray structure (left) and the models with the lowest total error (wRMSE) between experimental and LARMORCα-predicted chemical shifts (right) for proteins corresponding to PDB IDs (A) 1LM4, (B) 2C5L, (C) 1DYT, and (D) 1H4A. In each panel, the fraction of native contacts (Q) is indicated below the model with the lowest chemical shift error.

Comparison between X-ray structures and models with the lowest chemical shift error. Side-by-side comparison of the X-ray structure (left) and the models with the lowest total error (wRMSE) between experimental and LARMORCα-predicted chemical shifts (right) for proteins corresponding to PDB IDs (A) 1LM4, (B) 2C5L, (C) 1DYT, and (D) 1H4A. In each panel, the fraction of native contacts (Q) is indicated below the model with the lowest chemical shift error. As a further test of its sensitivity to structure, we examined whether LARMORCα predicted chemical shifts could be used to resolve the difference between conformational substates of the phage T4 lysozyme (T4L). The free-energy landscape of a mutant T4L, L99A, has been recently studied using NMR-RD experiments, allowing chemical shifts to be obtained of a transient low-populated (∼3%) conformational substate.[33] Using a mutate-to-trap approach, chemical shifts were also obtained for a triple mutant (L99A-G113A-R119P T4L) that was purported to “resemble” the transient state. The structures of the transient L99A and the triple T4L mutants were determined using CS-Rosetta and confirmed that the structure of the transient state closely resembles that of the triple T4L mutant. The RMSD between the transient state of the L99A mutant and the triple T4L mutant was ∼0.8 Å, whereas the RMSDs of the transient single and triple mutants compared to the highly populated state of L99 T4L were ∼2.5 and 2.3 Å, respectively (Figure 3).
Figure 3

Sensitivity of LARMORCα chemical shifts to protein structure (II): Structures of three conformational substates of T4L: (A) native L99A T4L, (B) the transiently populated intermediate of L99A T4L, and (C) the L99A-G113A-R119P T4L triple mutant, respectively. The region in the transient intermediate of L99A and the triple mutant that differs significantly from native L99A T4L is circled (yellow dotted). LARMORCα backbone chemical shifts were predicted from the Cα coordinates taken from the solved structure of each of these three substates and then compared to NMR-RD-derived chemical shifts of the native L99A T4L, the transient intermediate state of L99A T4L, and the triple T4L mutant, respectively. For each structure, the wRMSE relative to the native L99A T4L, the transient intermediate state of L99A T4L, and the triple T4L mutant are shown in black, red, and blue, respectively (boxes) and the lowest is highlighted (bold and underlined). Also, for each structure, the structural RMSD relative to the native L99A T4L, the transient intermediate state of L99A T4L, and the L99A-G113A-R119P T4L mutant structure are shown in black, red, and blue, respectively (boxes). Here, the wRMSE and RMSDs were calculated for residues 100–120 and 132–146.

Sensitivity of LARMORCα chemical shifts to protein structure (II): Structures of three conformational substates of T4L: (A) native L99A T4L, (B) the transiently populated intermediate of L99A T4L, and (C) the L99A-G113A-R119P T4L triple mutant, respectively. The region in the transient intermediate of L99A and the triple mutant that differs significantly from native L99A T4L is circled (yellow dotted). LARMORCα backbone chemical shifts were predicted from the Cα coordinates taken from the solved structure of each of these three substates and then compared to NMR-RD-derived chemical shifts of the native L99A T4L, the transient intermediate state of L99A T4L, and the triple T4L mutant, respectively. For each structure, the wRMSE relative to the native L99A T4L, the transient intermediate state of L99A T4L, and the triple T4L mutant are shown in black, red, and blue, respectively (boxes) and the lowest is highlighted (bold and underlined). Also, for each structure, the structural RMSD relative to the native L99A T4L, the transient intermediate state of L99A T4L, and the L99A-G113A-R119P T4L mutant structure are shown in black, red, and blue, respectively (boxes). Here, the wRMSE and RMSDs were calculated for residues 100–120 and 132–146. To test whether LARMORCα could resolve the small structural differences between these three states (namely, the highly and transiently populated states of L99A T4L and the conformation of the triple mutant), LARMORCα was used to predict backbone chemical shifts from the solved structures of each species. For each species, we computed the wRMSE between the predicted and experimental chemical shifts; the wRMSEs were computed using data for residues 100–120 and 132–146 as these were the only residues that exhibited significant changes in chemical shifts between the different the states of T4L. We expect that the structures with the lowest wRMSE should match the system associated with reference (experimental) chemical shifts. As shown in Figure 3, this was indeed the case. The L99A T4L structure exhibited the lowest wRMSE relative to the chemical shifts computed for the highly populated state, the transient state L99A T4L structure showed the lowest wRMSE relative to the transient-state chemical shifts, and the triple mutant structure displayed the lowest wRMSE relative to the mutant chemical shifts. These results indicate that LARMORCα was able to resolve the small structural difference between conformational substates of T4L. Although LARMORCα was able to resolve the “correct” structure based upon the chemical shifts, the errors for the L99A transient state and the triple mutant were higher than the error for the L99A T4L. The higher errors for the transient states suggest that models for these states can be refined even further. Indeed, during the CS-Rosetta protocol used to generate these models, it was assumed, based upon chemical shifts dispersion patterns, that only residues 100–120 and 132–146 were significantly different between the L99A transient state and the triple mutant. Thus, during refinement only atoms in these residues were allowed to deviate from the native L99A T4L structure. Together these results indicate that backbone chemical shifts predicted by LARMORCα are sufficiently sensitive to protein structure to allow chemical shifts to be used in resolving native from non-native structure. Even small structural differences between similar conformational substates can be detected. As such, NMR chemical shifts should be useful in “parsing” trajectories and ensembles generated using coarse-grained simulations to identify physically relevant conformational substates along the folding pathway of proteins.

Analysis Using LARMORCα Indicates That the Transient State of gpW Is Locally Unfolded

Recently, NMR relaxation dispersion (RD) experiments were used to study the free-energy landscape of gpW, a 62-residue α+β fast-folding protein (see Figure 4). NMR RD experiments allowed chemical shifts to be obtained for both the native-state and a low-populated transient state.[46] Analysis of the chemical shift dispersion pattern of the transient state revealed that the helices remained intact, whereas the beta-strand region was unfolded.[46] In principle combining LARMORCα with coarse-grained simulations should allow for structures consistent with the chemical shifts of the transient state to be identified. Thus, we used LARMORCα to probe the folding pathway of gpW during Go̅ model MD simulations.
Figure 4

Resolving native and transient states along the folding pathway of the fast-folding protein gpW using LARMORCα. The folding pathway of gpW was studied using Cα-based Go̅-model MD simulations. Shown are cartoon representations comparing (A) the solved native-state structure of the gpW and the representative models of (B) the native and (C) the transient states selected from the Cα-trajectory using LARMORCα. Representative models were selected by comparing LARMORCα-predicted chemical shifts to recently reported NMR-RD-derived backbone chemical shifts for the native and the transient intermediate states.[46] The models in (B) and (C) correspond to the two models that were closest to the average structure of the 10 models that exhibited the lowest error (wRMSE) between LARMORCα-predicted and the measured chemical shifts of the native state and the transient state, respectively.

Resolving native and transient states along the folding pathway of the fast-folding protein gpW using LARMORCα. The folding pathway of gpW was studied using Cα-based Go̅-model MD simulations. Shown are cartoon representations comparing (A) the solved native-state structure of the gpW and the representative models of (B) the native and (C) the transient states selected from the Cα-trajectory using LARMORCα. Representative models were selected by comparing LARMORCα-predicted chemical shifts to recently reported NMR-RD-derived backbone chemical shifts for the native and the transient intermediate states.[46] The models in (B) and (C) correspond to the two models that were closest to the average structure of the 10 models that exhibited the lowest error (wRMSE) between LARMORCα-predicted and the measured chemical shifts of the native state and the transient state, respectively. The representative model based on the native state chemical shifts was found to contain α+β topology (Figure 4B), indicating that the LARMORCα was able to resolve the native structure from the ensemble of structures generated during the Go̅ model simulations. In contrast to the representative model of the native states, the representative model of the transient-state exhibited an unfolded beta-region (Figure 4C). These results agree well with the analysis of Kay and co-workers,[46] and they serve to further confirm that LARMORCα can be used to efficiently parse coarse-grained trajectories and ensembles to identify important conformational substates (i.e., both native and intermediary states). Though in the current study we focused on using LARMORCα to seamlessly incorporate backbone chemical shifts into the analysis of coarse-grained MD simulations, LARMORCα is also well suited for incorporation into most protein structure prediction methods where it can be used to enable backbone chemical shifts to actively guide conformation sampling. Additionally, LARMORCα can also be used to parse large all-atom trajectories and ensembles to identify a smaller subset of relevant conformational states. In the spirit of “multi-scale analysis”,[36] more accurate and complete chemical shifts prediction (i.e., prediction of both backbone and side-chain chemical shifts) can then be carried out for the smaller subset using all-atom prediction approaches.

Conclusion

In summary, we have developed LARMORCα, a Cα-based approach that enables the prediction of backbone chemical shifts from coarse-grained models of proteins. We show that in addition to predicting chemical shifts with accuracy comparable to some all-atom approaches, LARMORCα was capable of resolving protein structure. This sensitivity to protein structure enables LARMORCα to identify conformational substates from coarse-grained simulations that are consistent with available NMR chemical shifts. An exciting application of the method is to identify “invisible” intermediate substates using chemical shifts obtained from NMR relaxation dispersion experiments, as was demonstrated here for the gpW fast-folding protein. Structural information on transiently populated intermediates afforded by the combination of coarse-grained simulation and LARMORCα has the potential to offer functional insights into the mechanism of protein folding, misfolding, and aggregation, and their role in folding-related diseases.[7] Beyond coarse-grained simulations, LARMORCα could be used to quickly parse all-atom MD trajectories and also be incorporated into existing structure prediction methods. To facilitate its use, the source code for LARMORCα is made freely available at https://github.com/atfrank/LARMORCA.
  46 in total

1.  RefDB: a database of uniformly referenced protein chemical shifts.

Authors:  Haiyan Zhang; Stephen Neal; David S Wishart
Journal:  J Biomol NMR       Date:  2003-03       Impact factor: 2.835

Review 2.  Protein folding and misfolding.

Authors:  Christopher M Dobson
Journal:  Nature       Date:  2003-12-18       Impact factor: 49.962

3.  Structure analysis of peptide deformylases from Streptococcus pneumoniae, Staphylococcus aureus, Thermotoga maritima and Pseudomonas aeruginosa: snapshots of the oxygen sensitivity of peptide deformylase.

Authors:  Andreas Kreusch; Glen Spraggon; Chris C Lee; Heath Klock; Daniel McMullan; Ken Ng; Tanya Shin; Juli Vincent; Ian Warner; Christer Ericson; Scott A Lesley
Journal:  J Mol Biol       Date:  2003-07-04       Impact factor: 5.469

4.  Database algorithm for generating protein backbone and side-chain co-ordinates from a C alpha trace application to model building and detection of co-ordinate errors.

Authors:  L Holm; C Sander
Journal:  J Mol Biol       Date:  1991-03-05       Impact factor: 5.469

Review 5.  Cooperativity, local-nonlocal coupling, and nonnative interactions: principles of protein folding from coarse-grained models.

Authors:  Hue Sun Chan; Zhuqing Zhang; Stefan Wallin; Zhirong Liu
Journal:  Annu Rev Phys Chem       Date:  2011       Impact factor: 12.703

6.  SHIFTX2: significantly improved protein chemical shift prediction.

Authors:  Beomsoo Han; Yifeng Liu; Simon W Ginzinger; David S Wishart
Journal:  J Biomol NMR       Date:  2011-03-30       Impact factor: 2.835

7.  Protein folding intermediates: native-state hydrogen exchange.

Authors:  Y Bai; T R Sosnick; L Mayne; S W Englander
Journal:  Science       Date:  1995-07-14       Impact factor: 47.728

8.  Recovering a representative conformational ensemble from underdetermined macromolecular structural data.

Authors:  Konstantin Berlin; Carlos A Castañeda; Dina Schneidman-Duhovny; Andrej Sali; Alfredo Nava-Tudela; David Fushman
Journal:  J Am Chem Soc       Date:  2013-11-06       Impact factor: 15.419

9.  OpenMM 4: A Reusable, Extensible, Hardware Independent Library for High Performance Molecular Simulation.

Authors:  Peter Eastman; Mark S Friedrichs; John D Chodera; Randall J Radmer; Christopher M Bruns; Joy P Ku; Kyle A Beauchamp; Thomas J Lane; Lee-Ping Wang; Diwakar Shukla; Tony Tye; Mike Houston; Timo Stich; Christoph Klein; Michael R Shirts; Vijay S Pande
Journal:  J Chem Theory Comput       Date:  2012-10-18       Impact factor: 6.006

10.  Solution structure of a minor and transiently formed state of a T4 lysozyme mutant.

Authors:  Guillaume Bouvignies; Pramodh Vallurupalli; D Flemming Hansen; Bruno E Correia; Oliver Lange; Alaji Bah; Robert M Vernon; Frederick W Dahlquist; David Baker; Lewis E Kay
Journal:  Nature       Date:  2011-08-21       Impact factor: 49.962

View more
  5 in total

1.  Capturing a Dynamic Chaperone-Substrate Interaction Using NMR-Informed Molecular Modeling.

Authors:  Loïc Salmon; Logan S Ahlstrom; Scott Horowitz; Alex Dickson; Charles L Brooks; James C A Bardwell
Journal:  J Am Chem Soc       Date:  2016-08-02       Impact factor: 15.419

2.  Interaction Networks in Protein Folding via Atomic-Resolution Experiments and Long-Time-Scale Molecular Dynamics Simulations.

Authors:  Lorenzo Sborgi; Abhinav Verma; Stefano Piana; Kresten Lindorff-Larsen; Michele Cerminara; Clara M Santiveri; David E Shaw; Eva de Alba; Victor Muñoz
Journal:  J Am Chem Soc       Date:  2015-05-12       Impact factor: 15.419

3.  Crystal structures of bacterial small multidrug resistance transporter EmrE in complex with structurally diverse substrates.

Authors:  Ali A Kermani; Olive E Burata; B Ben Koff; Akiko Koide; Shohei Koide; Randy B Stockbridge
Journal:  Elife       Date:  2022-03-07       Impact factor: 8.713

4.  Differences in the free energies between the excited states of Aβ40 and Aβ42 monomers encode their aggregation propensities.

Authors:  Debayan Chakraborty; John E Straub; D Thirumalai
Journal:  Proc Natl Acad Sci U S A       Date:  2020-07-30       Impact factor: 12.779

5.  Classification of RNA backbone conformations into rotamers using 13C' chemical shifts: exploring how far we can go.

Authors:  Alejandro A Icazatti; Juan M Loyola; Igal Szleifer; Jorge A Vila; Osvaldo A Martin
Journal:  PeerJ       Date:  2019-10-21       Impact factor: 2.984

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.