| Literature DB >> 27560616 |
Christoph Hartlmüller1,2, Christoph Göbl1,2, Tobias Madl3,4,5.
Abstract
An approach to the de novo structure prediction of proteins is described that relies on surface accessibility data from NMR paramagnetic relaxation enhancements by a soluble paramagnetic compound (sPRE). This method exploits the distance-to-surface information encoded in the sPRE data in the chemical shift-based CS-Rosetta de novo structure prediction framework to generate reliable structural models. For several proteins, it is demonstrated that surface accessibility data is an excellent measure of the correct protein fold in the early stages of the computational folding algorithm and significantly improves accuracy and convergence of the standard Rosetta structure prediction approach.Entities:
Keywords: CS-Rosetta; NMR spectroscopy; paramagnetic relaxation; protein structure prediction; structural biology
Mesh:
Substances:
Year: 2016 PMID: 27560616 PMCID: PMC5026166 DOI: 10.1002/anie.201604788
Source DB: PubMed Journal: Angew Chem Int Ed Engl ISSN: 1433-7851 Impact factor: 15.336
Figure 1Principle of sPRE‐CS‐Rosetta. a) NMR sPRE data provides quantitative and residue specific information on the solvent accessibility as the effect of paramagnetic probes such as Gd(DTPA‐BMA) is distance dependent. b) Back‐calculation of sPRE data relies on placing the protein into equidistantly spaced grid points, while overlapping grid points are removed. The sPRE is approximated by the sum of all contributions of the surrounding grid points. c) The sPRE module is implemented as a scoring function capable of scoring centroid as well as full‐atom models. At its core, the experimental sPRE data (sPREexp) is compared to the predicted sPRE data of the current Rosetta model (sPREcalc) and a score based on the Spearman correlation coefficient (colored numbers) is computed. In this scheme, the sPRE score is used during the folding of the protein backbone using the simplified centroid model as well as for rescoring the final full‐atom models.
Figure 2sPRE data is an excellent measure of the correct protein fold and improves protein structure prediction. a) Structural ensembles of ubiquitin representing different stages of the AbinitioRelax protocol were rescored using Rosetta centroid and full‐atom scores (orange axis), the sPRE score (blue axis), and the chemical shift score (black axis). Experimental sPRE data for HN and Haliphatic protons were used as input for the sPRE score. b), c) Box plots showing the average Cα‐RMSD to the native structure for models obtained from CS‐Rosetta (orange) and sPRE‐CS‐Rosetta (blue). sPRE data was determined by NMR experiments (b) or back‐calculated (c). All obtained structural models were scored according to the sum of the Rosetta, chemical shift and sPRE score (b) or according to the sum of the Rosetta and the chemical shift score (c). For every protein, the best scored 0.2 % structures of all models were selected and used to generate the box plots. Proteins for which the sampling was improved by the sPRE module (reduced mean RMSD to native structure compared to CS‐Rosetta) are marked with a gray background and proteins for which CS‐Rosetta and sPRE‐CS‐Rosetta failed are not shown (average Cα‐RMSD >10 Å in the case of p16, 1CX1, 1F2 H, 1GXE, 1IX5, 1ON4, 1RFL, 1XWE, 2KNR, 2LFC, 2LFP, 2LLL, 2PQE, 2RRF, 3ZQD, and 4A5V). All scores are shown in arbitrary units.
Figure 3sPRE data enhances accuracy and convergence of CS‐Rosetta structure prediction. The lowest‐energy models of CS‐Rosetta (orange) and sPRE‐CS‐Rosetta (blue) are compared to the NMR solution structures (gray, PDB code). For both methods, the corresponding Rosetta score (score13_env_hb) is plotted on the left and the distribution of the Cα‐RMSD of the sampled structures is shown below for both methods in a logarithmic histogram. For ubiquitin (a) and the C‐terminal domain of Phl p 5a (b) experimental sPRE data for amide and aliphatic protons is used, and for human prion protein (c) and the P‐type ATPase CopA (d) the input sPRE data was back‐calculated using the lowest energy model. In (a) and (c), the best scored model according to the Rosetta score is shown (see arrow in score plots), and for (b) and (d) the 10 lowest‐energy models are shown. For ubiquitin (a), a red sphere represents the position of the Cβ atom of His 68, indicating the wrong positioning of the β‐strand in the CS‐Rosetta run. A more detailed picture of the scores is shown in the Supporting Information, Figure S3. All scores are shown in arbitrary units.