Literature DB >> 23956701

Iterative Molecular Dynamics-Rosetta Protein Structure Refinement Protocol to Improve Model Quality.

Steffen Lindert¹, Jens Meiler, J Andrew McCammon.

Abstract

Rosetta is one of the prime tools for high resolution protein structure refinement. While its scoring function can distinguish native-like from non-native-like conformations in many cases, the method is limited by conformational sampling for larger proteins, that is, leaving a local energy minimum in which the search algorithm may get stuck. Here, we test the hypothesis that iteration of Rosetta with an orthogonal sampling and scoring strategy might facilitate exploration of conformational space. Specifically, we run short molecular dynamics (MD) simulations on models created by de novo folding of large proteins into cryoEM density maps to enable sampling of conformational space not directly accessible to Rosetta and thus provide an escape route from the conformational traps. We present a combined MD-Rosetta protein structure refinement protocol that can overcome some of these sampling limitations. Two of four benchmark proteins showed incremental improvement through all three rounds of the iterative refinement protocol. Molecular dynamics is most efficient in applying subtle but important rearrangements within secondary structure elements and is thus highly complementary to the Rosetta refinement, which focuses on side chains and loop regions.

Entities: Chemical Disease Species

Year: 2013 PMID： 23956701 PMCID： PMC3744128 DOI： 10.1021/ct400260c

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

The last 20 years have seen an unprecedented increase in computational power available to biomedical research. One area that these vast resources have been focused on is the computational prediction and refinement of protein tertiary structures. To completely understand a protein’s function and mode of action, as well as to design small molecule binders to it, knowing the protein’s structure is of huge value. Also within the past few decades, experimental methods such as X-ray crystallography, nuclear magnetic resonance (NMR), or electron microscopy (EM) have helped to elucidate a large number of protein structures.[1] Despite the breathtaking achievements of experimental protein structure elucidation, there remain a considerable number of proteins for which no structure is known and also no template for comparative modeling is available. De novo protein structure prediction aims to elucidate the structure of such proteins. Predicting the structure of a protein from only its primary amino acid sequence is a formidable challenge, but lately, progress toward this goal has been undeniable: Protein energy landscape theory has been applied to obtain optimal energy functions for protein structure prediction.[2] Massively distributed computing has been used to simulate protein folding on a scale not accessible before.[3] Molecular dynamics simulations were able to probe folding pathways of very small proteins within simulation times upward of 100 μs.[4] The fragment replacement-based method Rosetta has become one of the prime tools to predict and refine the structures of proteins.[5] Particular success has been demonstrated whenever the folding is aided by sparse experimental restraints. Restraints from NMR,[6] electron paramagnetic resonance (EPR) spectroscopy[7] as well as EM[8] have been shown to considerably improve model quality or extend the scope of the method to much larger systems. The most recent installments of CASP (Critical Assessment of protein Structure Prediction), a biyearly community-wide blind protein structure prediction experiment, have seen Rosetta rank among the best performing programs for de novo protein structure prediction (free modeling category).[9] However, despite all recent improvements, only recently algorithms became available that can sample the topology space of proteins with more than 150 residues[10] without a template structure or experimental restraints.[11] In addition to de novo protein folding, Rosetta has also been shown to be able to refine small proteins to near atomic resolution in favorable cases.[12] Such refinement requires the starting structures to have low root-mean-square distances (RMSDs) to the native conformation already. The Rosetta scoring function is generally able to distinguish native-like from non-native-like conformations. The biggest obstacle to successful refinement is the vast conformational space that has to be sampled to find the near-native energy minimum. If the starting structure differs substantially from the native structure, it is possible that the fragment replacement search algorithm within Rosetta will not be able to sample conformations closer to the native structure. Not only the amplitude of the difference as determined by RMSD is important but also the type of conformation change required to bridge this distance or if such a change can be achieved with fragment replacement and side chain repacking. We observed this scenario in one of our recent works, where we used Rosetta to refine models built into medium resolution cryoEM density maps.[13] Despite reaching low RMSD values for many of the benchmark proteins, improvement stalled after two rounds of iterative loop rebuilding and refinement in Rosetta. The search algorithm became stuck in “conformational traps” from which no escape was possible using the sampling provided by Rosetta. Here, we investigate the possibility of using molecular dynamics (MD) to overcome some of these conformational traps and sample conformations from which Rosetta can more easily refine the structure. A protocol that goes through multiple iterative rounds of MD and Rosetta refinement is presented.

Material and Methods

System Preparation

Four different proteins were chosen for which low-RMSD models had been built de novo with EM-Fold and Rosetta:[13]1X91, 2A6B, 1ICX, 1OZ9, two of which are α-helical proteins while the other two are α-β-proteins. The proteins have between 150 and 234 residues. The best scoring model after the third round of Rosetta refinement in ref (13) for each protein was taken as input model for the iterative MD–Rosetta protocol. These systems were then prepared for molecular dynamics simulations. Tleap[14] was used to neutralize the systems by adding Na+ or Cl– counterions (1 Na+,15 Na+,7 Na+, and 4 Cl– for the 1X91, 2A6B, 1ICX, and 1OZ9 systems respectively) and solvating using a TIP3P water box. The fully solvated systems contained between 25 000 and 32 000 atoms. Minimization using SANDER[14] was carried out in two stages: 1000 steps of minimization of solvent and ions with the protein restrained using a force constant of 500 kcal/mol/Å2, followed by a 2500 step minimization of the entire system. A short initial 20 ps MD simulation with weak restraints (10 kcal/mol·Å2) on the protein residues was used to heat the system to a temperature of 300 K.

Molecular Dynamics Simulations

All MD simulations were performed under the NPT ensemble at 300 K using AMBER[14] and the ff99SBildn force field.[15] Periodic boundary conditions were used, along with a nonbonded interaction cutoff of 10 Å. Bonds involving hydrogen atoms were constrained using the SHAKE algorithm,[16] allowing for a time step of 2 fs. Each MD simulation during the iterative MD–Rosetta protocol was run for 2 ns.

Rosetta All Atom Refinement in Density Map

The models identified from the MD simulations were subjected to loop rebuilding and refinement within Rosetta[5,8a] guided by the cryoEM density map. The Rosetta refinement protocol is identical to the protocol described in ref (13). In summary, regions of the models that agree least with the density map of the protein are identified (loops_from_density.linuxgccrelease) and rebuilt (loopmodel.linuxgccrelease). Each round performs a full atom relaxation of the entire structure.

Iterative MD–Rosetta Protocol

Three rounds of iterative MD and Rosetta were run for all four benchmark proteins. The starting model for the protocol is the best scoring model after the third round of Rosetta refinement from ref (13). The protocol starts with an MD run, followed by a Rosetta run: MD1–Rosetta1–MD2–Rosetta2–MD3–Rosetta3. The MD runs are short (2 ns). After each MD run two models are picked to transition into the following Rosetta round. For this work we picked the two best models observed during the MD simulation: (a) the model with the overall lowest RMSD with respect to the native protein structure and (b) the model with the lowest RMSD over secondary structure elements (SSEs). For both these models the regions that agree least with the density map are identified and rebuilt using Rosetta, followed by an all atom refinement of the models. The Rosetta models are then sorted by score. The best scoring model is picked as input into the next MD round. Before each MD round counterions and TIP3P waters are added to the model and a short minimization and equilibration is run. All RMSDs reported are over the backbone atoms N, Cα, C, and O. All RMSD calculations were done using the BCL::Quality application.[11b]

Results and Discussion

Two of the Benchmark Proteins Show Improvement through All Three Iterative MD-Rosetta Rounds

The ability of an iterative MD–Rosetta refinement protocol to minimize the models’ deviation from the native protein structure was tested on four benchmark proteins. The best scoring Rosetta structure from a previous Rosetta-only density-map-guided refinement[13] was chosen as input model for all four proteins. Three rounds of iterative molecular dynamics followed by Rosetta refinement were performed. After each MD simulation, the two lowest RMSD models (with respect to RMSD over the entire sequence and RMSD over residues in secondary structure elements) were picked as input into a Rosetta loop rebuilding and refinement run guided by the cryoEM density map. Information from the density maps was not used during the MD simulations. Finally, the best scoring model after Rosetta refinement was used as starting model in the subsequent MD simulation. As an example, Figure 1 shows the evolution of the model quality of 1X91 during the all three rounds of MD. Frequently, the MD starting structures have slightly higher RMSDs than those of the previous-round Rosetta model. For example, the first round MD starting structure has an RMSD of 2.00 Å, compared to the RMSD of the previous round Rosetta model of 1.82 Å. These discrepancies are due to minimization/equilibration before the start of the MD, which consistently increases the RMSD with respect to the native structure. In all three rounds of MD, models were built that surpass the previous-round best-scoring Rosetta model with respect to RMSD over secondary structure elements (the green line breaking through the dashed red line). Only during the first round of MD, models are built that also have lower full length RMSDs than the previous-round best-scoring Rosetta model. RMSDs as low as 1.34 Å over the entire protein and 0.84 Å over secondary structure elements are sampled during the third round of MD simulations.

Figure 1

Model quality evolution of 1X91 during the three rounds of MD. The RMSD of the MD structure with respect to the native model is shown for all protein residues (blue) and for residues in secondary structure elements (green). RMSDs of reference models are displayed by vertical lines: the full length RMSD of the starting model (black line), the RMSD over SSEs of the starting model (dashed black line), the full length RMSD of the best scoring model from the previous Rosetta round (red line) and the RMSD over SSEs of the best scoring model from the previous Rosetta round (dashed red line). For the first round of MD, the red and black lines coincide. A successful protocol would be characterized by the blue line breaking through the red line (corresponding to MD sampling lower RMSD models than the best scoring model seen in the last Rosetta round) and the green line breaking through the dashed red line. Table 1 summarizes the quality of the generated models for all four proteins throughout all three rounds of the iterative protocol. For two of the proteins (1X91, 2A6B), there was improvement to the very end. For example, the RMSD for 2A6B improved from 3.17 Å to 2.87 Å over all protein residues and from 2.56 Å to 2.12 Å measured over residues in secondary structure elements. The improvement for 1X91 is even more considerable. Over the three rounds the RMSD over all residues improved from 1.82 Å to 1.33 Å and from 1.19 Å to 1.00 Å measured over residues in secondary structure elements. The potential for model quality improvement is even more apparent when looking at the best model in the Rosetta runs of the third round of the protocol. The best 2A6B model built by Rosetta in the third round has an RMSD of 2.56 Å (with 1.90 Å RMSD over SSEs), while the best 1X91 model has an RMSD of 0.88 Å (with 0.60 Å RMSD over SSEs). While these models scored worse than the models reported, this still demonstrates the combined power of the MD–Rosetta protocol to build models of excellent quality. Figure 2 shows the best 1X91 and 2A6B models overlaid with their native structure. The prediction within the protein core is virtually perfect. For the other two proteins improvement stopped either in the first or second round. Interestingly, those proteins represented the α-β-proteins in the benchmark, while the two successful proteins were α-helical proteins. However, even for these two α-β-proteins some initial model improvement was observed. These results demonstrate that an iterative MD–Rosetta refinement protocol can improve model quality in some cases, with considerable improvement seen in selected cases.

Table 1

Quality of the Generated Models for All Four Proteins Throughout Three Rounds of the Iterative Protocola

protein	startb	MD1c	Rosetta1d	MD2e	Rosetta2f	MD3g	Rosetta3h	besti
1X91	1.82(1.19)	1.76(0.93)	1.58(0.94)	1.58(0.89)	1.33(0.97)	1.34(0.84)	1.33(1.00)	0.88(0.60)
2A6B	3.17(2.56)	3.12(2.46)	3.08(2.38)	2.92(2.31)	2.86(2.22)	2.86(2.13)	2.87(2.12)	2.56(1.90)
1ICX	2.65(2.14)	2.55(1.93)	2.35(1.92)	2.35(1.67)	2.80(2.67)			2.35(1.67)
1OZ9	2.63(2.23)	2.51(2.18)	4.54(3.38)					2.51(2.18)

RMSDs of the models built with respect to native structure over all residues and over all residues in secondary structure elements (in parentheses). All RMSDs shown are in Ångstrom.

RMSDs of the starting models.

RMSDs of the best models seen in 2 ns of the first round of MD.

RMSDs of the top scoring model after the first round of Rosetta refinement.

RMSDs of the best models seen in 2 ns of the second round of MD.

RMSDs of the top scoring model after the second round of Rosetta refinement.

RMSDs of the best models seen in 2 ns of the third round of MD.

RMSDs of the top scoring model after the third round of Rosetta refinement.

RMSDs of the best models ever built during the iterative MD–Rosetta refinement protocol.

Figure 2

Lowest RMSD models after three rounds of iterative MD/Rosetta refinement for (A) 1X91 and (B) 2A6B. The native structure is shown in turquoise, while the model is shown in gold. The overall structure within secondary structure elements has been recovered in the models. Most side chain conformations within the interface of secondary structure elements have been built correctly. RMSDs of the models built with respect to native structure over all residues and over all residues in secondary structure elements (in parentheses). All RMSDs shown are in Ångstrom. RMSDs of the starting models. RMSDs of the best models seen in 2 ns of the first round of MD. RMSDs of the top scoring model after the first round of Rosetta refinement. RMSDs of the best models seen in 2 ns of the second round of MD. RMSDs of the top scoring model after the second round of Rosetta refinement. RMSDs of the best models seen in 2 ns of the third round of MD. RMSDs of the top scoring model after the third round of Rosetta refinement. RMSDs of the best models ever built during the iterative MD–Rosetta refinement protocol. Figure 3 shows the RMSD vs score plots of the two most successful proteins, 1X91 and 2A6B. Plots for the last round of Rosetta refinement from ref (13), alongside the three Rosetta rounds of the iterative MD–Rosetta protocol, are shown. The native structure is shown for reference in all plots. The native structure has nonzero RMSD values since it was relaxed in the Rosetta force field before being scored. Both for 1X91 and 2A6B the general distribution of RMSDs shifts to lower values with each round of the iterative MD–Rosetta protocol, suggesting incremental model quality improvement. Not surprisingly this effect is most pronounced with 1X91 due to the overall better model quality.

Figure 3

RMSD vs score plots for 1X91 and 2A6B. The first panel (blue, labeled round 3) shows the results of the last round of Rosetta refinement in ref (13). The other three panels show the results for the first (green), second (green), and third (red) Rosetta round of the iterative MD–Rosetta. The native structure, relaxed in the Rosetta force field, is shown in all panels (black).

Molecular Dynamics is Most Efficient in Improving RMSDs over Residues in Secondary Structure Elements

Molecular dynamics and Rosetta prove to be most efficient in improving the model quality in different parts of the proteins. From the data presented in Figure 1 and Table 1, it can be seen that the Rosetta stages of the protocol are most efficient in improving the RMSD over residues in loop regions. This is not unexpected since the Rosetta protocol includes targeted rebuilding of the loop regions guided by the cryoEM density map. However, encouragingly, MD was best at improving the RMSDs over residues in secondary structure elements. This means that MD has its strength in an area where improvement with Rosetta is more challenging. MD can thus contribute constructively to overcome some of the sampling limitations of Rosetta.

Conclusions

Here, we presented the results of a novel iterative MD–Rosetta protocol to computationally refine protein structures guided by medium resolution cryoEM density maps. It was shown that a combination of MD and Rosetta can indeed help to overcome some of the “conformational traps” in which Rosetta refinement gets trapped frequently. Molecular dynamics seems particularly helpful for improvement of model quality within secondary structure elements, thus complementing Rosetta whose strength is rebuilding and refining loop regions. The benchmark proteins all had between 150 and 234 residues. Observing models with sub-Angstrom RMSDs in a computational structure refinement protocol for proteins of that size is novel and reason for optimism. While these results are promising and demonstrate that there is the possibility of a combined MD–Rosetta protocol being more powerful than Rosetta alone, much potential for improvement remains. While full success for two out of four test cases is encouraging, future work will focus on ways of improving the success rate of the protocol. Furthermore, in the current implementation, the MD simulations were not guided by the cryoEM density map. Restraining the molecular dynamics runs with the density map may improve model quality during the MD sections of the protocol. Finally, the biggest shortcoming of the current implementation of the protocol is that, while the Rosetta models are picked based solely on score, the models from the MD rounds were picked based on low RMSD to the native structure. Clearly, outside of a benchmark scenario, this is not possible and thus currently serves more as a proof of principle. So, at this point, these calculations rely on the assumption that the best native conformation in an ensemble can be selected. Follow-up studies will focus on identifying these low RMSD models by some other metric, such as, for example, Rosetta score. Addressing these issues is the focus of ongoing research and will be discussed in subsequent publications.

27 in total

1. De novo prediction of three-dimensional structures for major protein families.

Authors: Richard Bonneau; Charlie E M Strauss; Carol A Rohl; Dylan Chivian; Phillip Bradley; Lars Malmström; Tim Robertson; David Baker
Journal: J Mol Biol Date: 2002-09-06 Impact factor: 5.469

2. How fast-folding proteins fold.

Authors: Kresten Lindorff-Larsen; Stefano Piana; Ron O Dror; David E Shaw
Journal: Science Date: 2011-10-28 Impact factor: 47.728

3. Toward high-resolution de novo structure prediction for small proteins.

Authors: Philip Bradley; Kira M S Misura; David Baker
Journal: Science Date: 2005-09-16 Impact factor: 47.728

4. Comparison of multiple Amber force fields and development of improved protein backbone parameters.

Authors: Viktor Hornak; Robert Abel; Asim Okur; Bentley Strockbine; Adrian Roitberg; Carlos Simmerling
Journal: Proteins Date: 2006-11-15

5. De novo high-resolution protein structure determination from sparse spin-labeling EPR data.

Authors: Nathan Alexander; Marco Bortolus; Ahmad Al-Mestarihi; Hassane Mchaourab; Jens Meiler
Journal: Structure Date: 2008-02 Impact factor: 5.006

6. A model for the solution structure of the rod arrestin tetramer.

Authors: Susan M Hanson; Eric S Dawson; Derek J Francis; Ned Van Eps; Candice S Klug; Wayne L Hubbell; Jens Meiler; Vsevolod V Gurevich
Journal: Structure Date: 2008-06 Impact factor: 5.006

7. Self-consistently optimized energy functions for protein structure prediction by molecular dynamics.

Authors: K K Koretke; Z Luthey-Schulten; P G Wolynes
Journal: Proc Natl Acad Sci U S A Date: 1998-03-17 Impact factor: 11.205

8. Refinement of protein structures into low-resolution density maps using rosetta.

Authors: Frank DiMaio; Michael D Tyka; Matthew L Baker; Wah Chiu; David Baker
Journal: J Mol Biol Date: 2009-07-08 Impact factor: 5.469

9. EM-fold: De novo folding of alpha-helical proteins guided by intermediate-resolution electron microscopy density maps.

Authors: Steffen Lindert; René Staritzbichler; Nils Wötzel; Mert Karakaş; Phoebe L Stewart; Jens Meiler
Journal: Structure Date: 2009-07-15 Impact factor: 5.006

Iterative Molecular Dynamics-Rosetta Protein Structure Refinement Protocol to Improve Model Quality.

Introduction

Material and Methods

System Preparation

Molecular Dynamics Simulations

Rosetta All Atom Refinement in Density Map

Iterative MD–Rosetta Protocol

Results and Discussion

Two of the Benchmark Proteins Show Improvement through All Three Iterative MD-Rosetta Rounds

Molecular Dynamics is Most Efficient in Improving RMSDs over Residues in Secondary Structure Elements

Conclusions

1. De novo prediction of three-dimensional structures for major protein families.

2. How fast-folding proteins fold.

3. Toward high-resolution de novo structure prediction for small proteins.

4. Comparison of multiple Amber force fields and development of improved protein backbone parameters.

5. De novo high-resolution protein structure determination from sparse spin-labeling EPR data.

6. A model for the solution structure of the rod arrestin tetramer.

7. Self-consistently optimized energy functions for protein structure prediction by molecular dynamics.

8. Refinement of protein structures into low-resolution density maps using rosetta.

9. EM-fold: De novo folding of alpha-helical proteins guided by intermediate-resolution electron microscopy density maps.

10. Improved side-chain torsion potentials for the Amber ff99SB protein force field.

1. Using NMR Chemical Shifts and Cryo-EM Density Restraints in Iterative Rosetta-MD Protein Structure Refinement.

Review 2. Hybrid methods for combined experimental and computational determination of protein structure.

3. Practical Considerations for Atomistic Structure Modeling with Cryo-EM Maps.

4. Iterative Molecular Dynamics-Rosetta Membrane Protein Structure Refinement Guided by Cryo-EM Densities.

5. Accurately Predicting Disordered Regions of Proteins Using Rosetta ResidueDisorder Application.

6. Protein Folding and Structure Prediction from the Ground Up II: AAWSEM for α/β Proteins.

7. Measuring Intrinsic Disorder and Tracking Conformational Transitions Using Rosetta ResidueDisorder.

8. Considerations of Protein Subpockets in Fragment-Based Drug Design.

9. Protein structure refinement via molecular-dynamics simulations: What works and what does not?

10. Identification of Novel Cyclin A2 Binding Site and Nanomolar Inhibitors of Cyclin A2-CDK2 Complex.