| Literature DB >> 25825683 |
Lars A Bratholm1, Anders S Christensen1, Thomas Hamelryck2, Jan H Jensen1.
Abstract
Protein chemical shifts are routinely used to augment molecular mechanics force fields in protein structure simulations, with weights of the chemical shift restraints determined empirically. These weights, however, might not be an optimal descriptor of a given protein structure and predictive model, and a bias is introduced which might result in incorrect structures. In the inferential structure determination framework, both the unknown structure and the disagreement between experimental and back-calculated data are formulated as a joint probability distribution, thus utilizing the full information content of the data. Here, we present the formulation of such a probability distribution where the error in chemical shift prediction is described by either a Gaussian or Cauchy distribution. The methodology is demonstrated and compared to a set of empirically weighted potentials through Markov chain Monte Carlo simulations of three small proteins (ENHD, Protein G and the SMN Tudor Domain) using the PROFASI force field and the chemical shift predictor CamShift. Using a clustering-criterion for identifying the best structure, together with the addition of a solvent exposure scoring term, the simulations suggests that sampling both the structure and the uncertainties in chemical shift prediction leads more accurate structures compared to conventional methods using empirical determined weights. The Cauchy distribution, using either sampled uncertainties or predetermined weights, did, however, result in overall better convergence to the native fold, suggesting that both types of distribution might be useful in different aspects of the protein structure prediction.Entities:
Keywords: Chemical shifts; Markov chain Monte Carlo; NMR; Probabilistic models; Protein structure
Year: 2015 PMID: 25825683 PMCID: PMC4375973 DOI: 10.7717/peerj.861
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Uncertainty sampling with Gaussian and Cauchy distributions.
Sampling of σ and γ, using Jeffrey’s priors, for C-chemical shifts of Protein G. n = 54 and . (A) Gaussian distribution, (B) Cauchy distribution.
Maximum likelihood estimates of σ (or root-mean-square deviation (RMSD)) obtained from the CamShift training set, compared to means extracted from a 107 MC step simulation using the Gaussian model (see text).
Shown values are in units of ppm.
| C | H | N | H | C | C | |
|---|---|---|---|---|---|---|
| CamShift training set | 1.22 | 0.26 | 2.78 | 0.56 | 1.12 | 1.19 |
| Frozen simulation | 1.13 | 0.26 | 3.53 | 0.52 | 1.06 | 1.21 |
| Free simulation | 1.03 | 0.20 | 2.92 | 0.46 | 1.16 | 1.23 |
Notes.
Estimated over the last 106 MC steps.
Maximum likelihood estimates of γ obtained from the CamShift training set, compared to means extracted from a 107 MC step simulation using the Cauchy model (see text).
Shown values are in units of ppm.
| C | H | N | H | C | C | |
|---|---|---|---|---|---|---|
| CamShift training set | 0.70 | 0.19 | 1.87 | 0.31 | 0.74 | 0.77 |
| Frozen simulation | 0.62 | 0.17 | 1.90 | 0.32 | 0.64 | 0.69 |
| Free simulation | 0.43 | 0.05 | 1.57 | 0.25 | 0.67 | 0.55 |
Notes.
Estimated over the last 106 MC steps.
Different weighting schemes used in the protein folding simulations.
In the columns to the left, the number of threads, out of a total of 32, sampling structures below 2 and 4 Å C-RMSD respectively to the reference structure is shown. The sampled structures from each thread were divided into clusters and representative structures for each cluster were selected as the structure median in PROFASI+CamShift energy, from the 100 structures closest to the cluster centroid. The C-RMSD in Å of the lowest-energy cluster representative is shown below in the columns to the right.
| Threads (out of 32) sampling below 2Å (left) and 4Å (right) | Lowest-energy RMSD (Å) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| ENHD | Protein G | SMN | ENHD | Protein G | SMN | ||||
| Gaussian/fixed | 32 | 32 | 0 | 7 | 29 | 30 | 3.67 | 3.11 | 3.11 |
| Gaussian/sampled | 32 | 32 | 4 | 15 | 13 | 20 | 2.15 | 3.03 | 5.88 |
| Gaussian/marginalized | 32 | 32 | 1 | 16 | 7 | 14 | 4.24 | 2.72 | 6.06 |
| Cauchy/fixed | 32 | 32 | 9 | 25 | 15 | 21 | 1.94 | 1.15 | 2.58 |
| Cauchy/sampled | 32 | 32 | 13 | 24 | 11 | 16 | 1.87 | 2.82 | 5.51 |
| Square well/ | 19 | 22 | 2 | 12 | 14 | 18 | 2.29 | 3.14 | 3.71 |
| Square well/ | 32 | 32 | 0 | 1 | 1 | 5 | 3.82 | 5.83 | 1.91 |
| CS-Torus | 4 | 27 | 8 | 25 | 0 | 0 | 19.2 | 3.01 | 8.33 |
Notes.
Weights, α, of 1 and 5 were used by Robustelli et al.
Lowest-energy cluster representatives for the CS-Torus simulations were selected from PROFASI energy alone.
Figure 2Local energy-minimum of Protein-G.
Crystal structure (grey) and local energy-minimum conformation (red) of Protein G. Figure made with PyMOL (Schrödinger LLC, 2010).
C-RMSDs in Å of the lowest-energy cluster representative, when a solvent exposure energy term (HSEMM) is added to re-score the structures.
| Lowest-re-scored-energy RMSD | |||
|---|---|---|---|
| ENHD | Protein G | SMN | |
| Gaussian/fixed | 1.40 | 2.45 | 2.23 |
| Gaussian/sampled | 1.03 | 1.29 | 1.24 |
| Gaussian/marginalized | 1.11 | 1.00 | 3.81 |
| Cauchy/fixed | 1.40 | 1.16 | 1.55 |
| Cauchy/sampled | 1.86 | 0.86 | 2.50 |
| Square well potential/ | 1.15 | 1.37 | 3.05 |
| Square well potential/ | 0.96 | 4.35 | 1.91 |
| CS-Torus | 3.88 | 1.57 | 9.18 |
Notes.
Weights, α, of 1 and 5 were used by Robustelli et al.
Lowest-energy cluster representatives for the CS-Torus simulations were selected from PROFASI+HSEMM energy alone.