| Literature DB >> 23793147 |
Abstract
In macromolecular X-ray crystallography, typical data sets have substantial multiplicity. This can be used to calculate the consistency of repeated measurements and thereby assess data quality. Recently, the properties of a correlation coefficient, CC1/2, that can be used for this purpose were characterized and it was shown that CC1/2 has superior properties compared with `merging' R values. A derived quantity, CC*, links data and model quality. Using experimental data sets, the behaviour of CC1/2 and the more conventional indicators were compared in two situations of practical importance: merging data sets from different crystals and selectively rejecting weak observations or (merged) unique reflections from a data set. In these situations controlled `paired-refinement' tests show that even though discarding the weaker data leads to improvements in the merging R values, the refined models based on these data are of lower quality. These results show the folly of such data-filtering practices aimed at improving the merging R values. Interestingly, in all of these tests CC1/2 is the one data-quality indicator for which the behaviour accurately reflects which of the alternative data-handling strategies results in the best-quality refined model. Its properties in the presence of systematic error are documented and discussed.Entities:
Keywords: R value; correlation coefficient; data quality; model quality; outlier rejection
Mesh:
Substances:
Year: 2013 PMID: 23793147 PMCID: PMC3689524 DOI: 10.1107/S0907444913001121
Source DB: PubMed Journal: Acta Crystallogr D Biol Crystallogr ISSN: 0907-4449
Figure 1Scheme documenting the relationships of correlation coefficients calculated between squared observed and calculated amplitudes. This figure was adapted from Diederichs & Karplus (2013 ▶).
Statistics of single and merged CDO data sets
The resolution range is 50–1.57 Å; values in parentheses are for the highest shell (1.61–1.57 Å). CC statistics are only given for the highest resolution shell because the overall CC values are always close to 1 and thus are uninformative. For CC1/2, CC* and CCwork (based upon ∼2000 reflection pairs per shell) the values in the lower resolution shells are always higher than those in the highest resolution shell. This is not always true for CCfree, which owing to the smaller set of reflections (∼100 reflection pairs in each shell) has a standard error (∼0.1) that is much larger than that of CCwork (∼0.02).
| Data-set name | CDO3 | CDO4 | CDO5 | CDO3+4 | CDO3+5 | CDO4+5 | CDO3+4+5 |
|---|---|---|---|---|---|---|---|
| Data processing | |||||||
| No. of observations | 201160 (10837) | 155771 (2389) | 200117 (10838) | 358117 (13657) | 401270 (21717) | 357787 (13655) | 558273 (24518) |
| No. of unique reflections | 29424 (2008) | 26807 (1316) | 27939 (1982) | 29431 (2013) | 29433 (2013) | 28195 (1995) | 29433 (2013) |
| Completeness (%) | 99.9 (98.7) | 93.8 (79.3) | 95.7 (98.0) | 99.9 (98.8) | 99.9 (98.8) | 95.7 (98.0) | 99.9 (98.8) |
|
| 10.2 (294.0) | 23.1 (431.1) | 26.0 (395.6) | 15.0 (314.7) | 15.9 (332.6) | 26.0 (401.4) | 15.9 (339.8) |
| 〈 | 16.24 (0.64) | 10.68 (0.21) | 9.88 (0.19) | 13.63 (0.69) | 14.12 (0.87) | 13.76 (0.49) | 14.60 (0.91) |
| CC1/2 in highest shell; No. of pairs | 0.208; 1986 | 0.058; 842 | 0.127; 1961 | 0.175; 2006 | 0.223; 2008 | 0.154; 1992 | 0.222; 2008 |
| CC* in highest shell | 0.587 | 0.331 | 0.475 | 0.546 | 0.603 | 0.517 | 0.602 |
| Isotropic refinement | |||||||
| Highest shell CCwork, CCfree | 0.541, 0.581 | 0.256, 0.131 | 0.383, 0.425 | 0.522, 0.487 | 0.529, 0.596 | 0.432, 0.385 | 0.536, 0.526 |
| Overall | 0.186, 0.219 | 0.211, 0.252 | 0.198, 0.236 | 0.185, 0.221 | 0.185, 0.216 | 0.199, 0.237 | 0.186, 0.221 |
| R.m.s.d. from ideality: bonds (Å)/angles (°) | 0.015/1.57 | 0.016/1.53 | 0.016/1.51 | 0.015/1.55 | 0.015/1.53 | 0.015/1.51 | 0.015/1.54 |
Results (R work, R free) of pairwise refinements
Within each row of the table, the same sets of reflections are used. Values in parentheses are copied from Table 1 ▶; values in bold denote improvements in R free of models refined against a merged data set, compared with models refined against the single data set.
| Model refined against | ||||
|---|---|---|---|---|
| Data set | CDO3+4 | CDO3+5 | CDO4+5 | CDO3+4+5 |
| CDO3 | 0.188, 0.220 |
| Not determined | 0.192, 0.221 |
| CDO4 | 0.227, 0.262 | Not determined | 0.215, 0.253 | 0.224, 0.257 |
| CDO5 | Not determined | 0.215, 0.243 |
| 0.211, 0.240 |
| CDO3+4 | (0.185, 0.221) | Not determined | Not determined | 0.186, |
| CDO3+5 | Not determined | ( | Not determined | 0.186, 0.219 |
| CDO4+5 | Not determined | Not determined | ( | 0.211, 0.244 |
Figure 2Data statistics for CDO3 (blue), CDO3b (green) and CDO3c (red).
Application of the pairwise refinement technique to the data sets specified
Within each row of the table, the same sets of reflections are used to calculate R work and R free. Each model (top row) was obtained by refinement against one data set. Its model R values (R work, R free) against the other data sets are also given. For each data set, the model that gives the best R free is marked in bold.
|
| Model refined against | ||
|---|---|---|---|
| CDO3 | CDO3b | CDO3c | |
| CDO3 (all) |
| 0.187, 0.223 | 0.187, 0.227 |
| CDO3b (positive unique) |
| 0.178, 0.216 | 0.180, 0.220 |
| CDO3c (positive observations) |
| 0.204, 0.235 | 0.199, 0.239 |
Figure 3Example demonstrating the possibility of negative CC1/2 when rejecting reflections with negative intensities from a data set. The plots show ∊1 versus ∊2 for simulated data having Gaussian noise and no signal (τ = 0). (a) 1000 unique reflections, each represented by two observations; no rejections. The correlation of ∊1 and ∊2 is near zero. (b) From the 1000 unique reflections, those with negative intensity (∊1 + ∊2 < 0) were rejected. The resulting correlation between ∊1 and ∊2 is about −0.47. (c) From the 1000 unique reflections, those with negative ∊1 or negative ∊2 were rejected, also resulting in positive (merged) intensity. The resulting correlation between ∊1 and ∊2 is near zero.
Comparison of CC* with CCwork, CCfree in the highest resolution shell (1.61–1.57 Å)
All of the data from CDO3 were used or only positive unique reflections or only positive observations were used. The number of unique reflections is given in parentheses.
| Model refined against | |||
|---|---|---|---|
| All reflections (CDO3) | Positive unique reflections only (CDO3b) | Positive observations only (CDO3c) | |
| CC* | 0.587 | 0.174 | 0.477 |
| CCwork, CCfree | 0.540 (1912), 0.581 (99) | 0.580 (1308), 0.612 (73) | 0.385 (1872), 0.403 (94) |