| Literature DB >> 27275144 |
Greta Assmann1, Wolfgang Brehm1, Kay Diederichs1.
Abstract
Advances in beamline optics, detectors and X-ray sources allow new techniques of crystallographic data collection. In serial crystallography, a large number of partial datasets from crystals of small volume are measured. Merging of datasets from different crystals in order to enhance data completeness and accuracy is only valid if the crystals are isomorphous, i.e. sufficiently similar in cell parameters, unit-cell contents and molecular structure. Identification and exclusion of non-isomorphous datasets is therefore indispensable and must be done by means of suitable indicators. To identify rogue datasets, the influence of each dataset on CC1/2 [Karplus & Diederichs (2012 ▸). Science, 336, 1030-1033], the correlation coefficient between pairs of intensities averaged in two randomly assigned subsets of observations, is evaluated. The presented method employs a precise calculation of CC1/2 that avoids the random assignment, and instead of using an overall CC1/2, an average over resolution shells is employed to obtain sensible results. The selection procedure was verified by measuring the correlation of observed (merged) intensities and intensities calculated from a model. It is found that inclusion and merging of non-isomorphous datasets may bias the refined model towards those datasets, and measures to reduce this effect are suggested.Entities:
Keywords: CC1/2; isomorphism; model bias; non-isomorphism; outlier identification; precision; serial crystallography
Year: 2016 PMID: 27275144 PMCID: PMC4886987 DOI: 10.1107/S1600576716005471
Source DB: PubMed Journal: J Appl Crystallogr ISSN: 0021-8898 Impact factor: 3.304
Crystallographic statistics of experimental datasets
| PepT | AlgE | |
|---|---|---|
| PDB code | 4xni | 4xnk |
| Space group | (20) | (19) |
| Unit-cell parameters (Å) |
|
|
| Wavelength (Å) | 0.979180 | 1.033000 |
| No. of crystals | 159 | 266 |
| Resolution (Å) | 50–2.78 | 50–2.54 |
| Completeness (%) | 97.6 | 84.7 |
| Completeness highest resolution shell | 67.6 (2.85–2.78) | 7.6 (2.61–2.54) |
| Total No. of observations | 905 207 | 151 228 |
| No. of observations per crystal (min–max, mean) | 2592–6040, 5693 | 59–507, 564 |
| No. of unique reflections | 16 485 | 18 684 |
|
| 0.973 | 0.565 |
| CC1/2 | 0.992 | 0.926 |
| 〈 | 4.25 | 2.74 |
ΔCC1/2_ of synthetic datasets with elongated unit-cell parameters
| Change of unit-cell parameters (Å) | ΔCC1/2_ |
|---|---|
| +1.0 | −0.518 |
| +0.8 | −0.313 |
| +0.6 | −0.271 |
| +0.4 | −0.262 |
| +0.2 | +0.873 |
| +0.1 (2 datasets) | +0.785, +0.785 |
| 0.0 (4 datasets) | +0.710, +0.706, +0.698, +0.684 |
Figure 1Histogram of ΔCC1/2_ values for PepT. The −28.8σ unit outlier is indicated with an arrow.
Figure 2Plot of ΔCCFOC_ against ΔCC1/2_ for PepT. The −28.8 8σ unit outlier (ΔCC1/2_ ≃ −4.8 × 10−4) is boxed.
Figure 3Histogram of ΔCC1/2_ values for AlgE. The −14.8σ unit outlier is indicated with an arrow.
Figure 4Plot of ΔCCFOC_ against ΔCC1/2_ for AlgE. Different colours and marker symbols refer to the different random shifts of the atom coordinates. Arrows indicate the change of ΔCCFOC_ upon increasing the magnitude of random shifts for the three most significant outliers of the Gaussian distribution of Fig. 3 ▸.