| Literature DB >> 31909740 |
Kaushik S Hatti1, Airlie J McCoy1, Robert D Oeffner1, Massimo D Sammito1, Randy J Read1.
Abstract
Good prior estimates of the effective root-mean-square deviation (r.m.s.d.) between the atomic coordinates of the model and the target optimize the signal in molecular replacement, thereby increasing the success rate in difficult cases. Previous studies using protein structures solved by X-ray crystallography as models showed that optimal error estimates (refined after structure solution) were correlated with the sequence identity between the model and target, and with the number of residues in the model. Here, this work has been extended to find additional correlations between parameters of the model and the target and hence improved prior estimates of the coordinate error. Using a graph database, a curated set of 6030 molecular-replacement calculations using models that had been solved by X-ray crystallography was analysed to consider about 120 model and target parameters. Improved estimates were achieved by replacing the sequence identity with the Gonnet score for sequence similarity, as well as by considering the resolution of the target structure and the MolProbity score of the model. This approach was extended by analysing 12 610 additional molecular-replacement calculations where the model was determined by NMR. The median r.m.s.d. between pairs of models in an ensemble was found to be correlated with the estimated r.m.s.d. to the target. For models solved by NMR, the overall coordinate error estimates were larger than for structures determined by X-ray crystallography, and were more highly correlated with the number of residues. open access.Entities:
Keywords: LLG; NMR; coordinate error; log-likelihood gain; molecular replacement; r.m.s.d.; root-mean-square deviation
Mesh:
Substances:
Year: 2020 PMID: 31909740 PMCID: PMC6939440 DOI: 10.1107/S2059798319015730
Source DB: PubMed Journal: Acta Crystallogr D Struct Biol ISSN: 2059-7983 Impact factor: 7.652
List of properties considered in the study
The sequence-similarity measures have been discussed in a previous review (Vogt et al., 1995 ▸) and citations therein. Ensemble consistency is measured as median r.m.s.d. between the models in an NMR ensemble.
| Target properties | Model properties | Sequence-similarity measures |
|---|---|---|
|
|
| Sequence identity, PAM250, PAM300, BLOSUM30, BLOSUM35, BLOSUM40, BLOSUM45, BLOSUM65, Benner6, Benner22, Benner74, Feng, Genetic, Gonnet, Johnson, Levin, McLach, Miyata, Rao, Risler, structure-based |
|
|
| |
|
|
| |
|
|
Properties specific to X-ray models.
Properties specific to NMR models.
Figure 1Schematic representation of the graph database. Targets and models are represented as square and circular nodes, while an edge connecting two nodes represents a relationship between a target and a model node. (a) Two types of edge can connect a target–model pair. (i) A unidirectional edge defines a single instance of a molecular-replacement trial in which a model was used to determine the target structure. The four different unidirectional edges represent four different trials of molecular replacement, for instance using data to different resolution limits. (ii) A bidirectional edge defines properties associated with sequence-similarity measures. More than one unidirectional edge exists between a target–model pair if more than one molecular-replacement trial was carried out. (b) presents an overview of a small graph database to show interconnections between the nodes. A single PDB entry could be used to determine two different targets; in which case the properties associated with processing the model, such as the MolProbity score of the processed model, are stored as part of the edge property. There are also examples where a single target could be determined using multiple independent models.
Correlation of properties to the X-ray VRMS term
Residual correlation is the correlation between the property and the difference between the estimated VRMS and the refined VRMS estimated either with the Oeffner equation (2) or the new equation (3).
| Property | Correlation to VRMS | Residual correlation to VRMS | |
|---|---|---|---|
| Oeffner estimate | New estimate | ||
| No. of residues of model | 0.43 | 0.10 | 0.00 |
| Sequence identity | −0.67 (−0.33 | 0.00 | 0.00 |
| Gonnet score | −0.71 (−0.41 | −0.16 | −0.03 |
| Target resolution | 0.26 | 0.24 | 0.00 |
|
| 0.16 | 0.18 | −0.02 |
| Percent α-helix | 0.20 | 0.19 | 0.10 |
| Percent β-sheet | −0.14 | −0.16 | −0.13 |
Correlation for a subset of cases with <30% sequence identity
Figure 2R.m.s. error in estimated VRMS as new properties are added to the prediction. Before any properties had been included (‘None’), the r.m.s. error was the r.m.s. deviation of the refined VRMS values from their mean for all calculations.
Figure 3Frequency distribution of refined over estimated VRMS ratios from the curated data set as a function of SCOP class. A red line represents all cases. An ideal distribution should be Gaussian, with the lowest possible variance, and centred on 1 (represented by a black dashed line). X-ray case: the Oeffner estimate has a shoulder, which is not present in the new X-ray estimate. NMR case: the distribution for the Oeffner estimate based on X-ray data is shifted to the right, indicating that errors are systematically underestimated when applied to models derived by NMR. The new estimate based on NMR data has a symmetrical distribution centred around 1.
Correlation of properties with VRMS for the case of NMR models
Residual correlation is the correlation between the property and the difference between the estimated and refined VRMS terms.
| Property | Correlation to VRMS | Residual correlation to VRMS | |
|---|---|---|---|
| Oeffner X-ray estimate | New estimate | ||
| No. of residues of model | 0.56 | 0.28 | 0.06 |
| Gonnet score | −0.38 | 0.40 | 0.00 |
| Target resolution | 0.28 | −0.05 | −0.01 |
| Median r.m.s.d. | 0.22 | 0.14 | 0.02 |
|
| 0.11 | 0.05 | 0.00 |
| Percent α-helix | 0.23 | 0.22 | 0.00 |
| Percent β-sheet | 0.07 | 0.24 | −0.01 |
Figure 4Calculation of LLGI starting with the Oeffner and new estimates of VRMS performed without VRMS refinement. (a) Values for X-ray models. (b) Values for NMR models. A limited range of LLGI values (along with the most extreme outliers) is displayed for the sake of clarity.
Figure 5Comparative analysis of errors between X-ray and NMR models of size 150 ± 25 residues. Although the Gonnet score was used to estimate VRMS, sequence identity (x axis) is provided for ease of comparison.