Literature DB >> 21460446

An introduction to data reduction: space-group determination, scaling and intensity statistics.

Philip R Evans1.   

Abstract

This paper presents an overview of how to run the CCP4 programs for data reduction (SCALA, POINTLESS and CTRUNCATE) through the CCP4 graphical interface ccp4i and points out some issues that need to be considered, together with a few examples. It covers determination of the point-group symmetry of the diffraction data (the Laue group), which is required for the subsequent scaling step, examination of systematic absences, which in many cases will allow inference of the space group, putting multiple data sets on a common indexing system when there are alternatives, the scaling step itself, which produces a large set of data-quality indicators, estimation of |F| from intensity and finally examination of intensity statistics to detect crystal pathologies such as twinning. An appendix outlines the scoring schemes used by the program POINTLESS to assign probabilities to possible Laue and space groups.

Entities:  

Mesh:

Year:  2011        PMID: 21460446      PMCID: PMC3069743          DOI: 10.1107/S090744491003982X

Source DB:  PubMed          Journal:  Acta Crystallogr D Biol Crystallogr        ISSN: 0907-4449


Introduction

Estimates of integrated intensities from X-ray diffraction images are not generally suitable for immediate use in structure determination. Theoretically, the measured intensity I of a reflection h is proportional to the square of the underlying structure factor |F |2, which is the quantity that we want, with an associated measurement error, but systematic effects of the diffraction experiment break this proportionality. Such systematic effects include changes in the beam intensity, changes in the exposed volume of the crystal, radiation damage, bad areas of the detector and physical obstruction of the detector (e.g. by the backstop or cryostream). If data from different crystals (or different sweeps of the same crystal) are being merged, corrections must also be applied for changes in exposure time and rotation rate. In order to infer |F |2 from I , we need to put the measured intensities on the same scale by modelling the experiment and inverting its effects. This is generally performed in a scaling process that makes the data internally consistent by adjusting the scaling model to minimize the difference between symmetry-related observations. This process requires us to know the point-group symmetry of the diffraction pattern, so we need to determine this symmetry prior to scaling. The scaling process produces an estimate of the intensity of each unique reflection by averaging over all of the corrected intensities, together with an estimate of its error σ(I ). The final stage in data reduction is estimation of the structure amplitude |F | from the intensity, which is approximately I 1/2 (but with a skewing factor for intensities that are below or close to background noise, e.g. ‘negative’ intensities); at the same time, the intensity statistics can be examined to detect pathologies such as twinning. This paper presents a brief overview of how to run CCP4 programs for data reduction through the CCP4 graphical interface ccp4i and points out some issues that need to be considered. No attempt is made to be comprehensive nor to provide full references for everything. Automated pipelines such as xia2 (Winter, 2010 ▶) are often useful and generally work well, but sometimes in difficult cases finer control is needed. In the current version of ccp4i (CCP4 release 6.1.3) the ‘Data Reduction’ module contains two major relevant tasks: ‘Find or Match Laue Group’, which determines the crystal symmetry, and ‘Scale and Merge Intensities’, which outputs a file containing averaged structure amplitudes. Future GUI versions may combine these steps into a simplified interface. Much of the advice given here is also present in the CCP4 wiki (http://www.ccp4wiki.org/).

Space-group determination

The true space group is only a hypo­thesis until the structure has been solved, since it can be hard to distinguish between exact crystallographic symmetry and approximate noncrystallographic symmetry. However, it is useful to find the likely symmetry early on in the structure-determination pipeline, since it is required for scaling and indeed may affect the data-collection strategy. The program POINTLESS (Evans, 2006 ▶) examines the symmetry of the diffraction pattern and scores the possible crystallographic symmetry. Indexing in the integration program (e.g. MOSFLM) only indicates the lattice symmetry, i.e. the geometry of the lattice giving constraints on the cell dimensions (e.g. α = β = γ = 90° for an orthorhombic lattice), but such relationships can arise accidentally and may not reflect the true symmetry. For example, a primitive hexagonal lattice may belong to point groups 3, 321, 312, 6, 622 or indeed lower symmetry (C222, 2 or 1). A rotational axis of symmetry produces identical true intensities for reflections related by that axis, so examination of the observed symmetry in the diffraction pattern allows us to determine the likely point group and hence the Laue group (a point group with added Friedel symmetry) and the Patterson group (with any lattice centring): note that the Patterson group is labelled ‘Laue group’ in the output from POINTLESS. Translational symmetry operators that define the space group (e.g. the distinction between a pure dyad and a screw dyad) are only visible in the observed diffraction pattern as systematic absences, along the principal axes for screws, and these are less reliable indicators since there are relatively few axial reflections in a full three-dimensional data set and some of these may be unrecorded. The protocol for determination of space group in POINTLESS is as follows. (i) From the unit-cell dimensions and lattice centring, find the highest compatible lattice symmetry within some tolerance, ignoring any input symmetry information. (ii) Score each potential rotational symmetry element belonging to the lattice symmetry using all pairs of observations related by that element. (iii) Score combinations of symmetry elements for all possible subgroups of the lattice-symmetry group (Laue or Patterson groups). (iv) Score possible space groups from axial systematic absences (the space group is not needed for scaling but is required later for structure solution). (v) Scores for rotational symmetry operations are based on correlation coefficients rather than R factors, since they are less dependent on the unknown scales. A probability is estimated from the correlation coefficient, using equivalent-size samples of unrelated observations to estimate the width of the probability distribution (see Appendix A ).

A simple example

POINTLESS may be run from the ‘Data Reduction’ module of ccp4i with the task ‘Find or Match Laue Group’ or from the ‘QuickSymm’ option of the iMOSFLM interface (Battye et al., 2011 ▶). Unless the space group is known from previous crystals, the appropriate major option is ‘Determine Laue group’. To use this, fill in the boxes for the title, the input and output file names and the project, crystal and data-set names (if not already set in MOSFLM). Table 1 ▶ shows the results for a straightforward example in space group P212121. Table 1 ▶(a) shows the scores for the three possible dyad axes in the orthorhombic lattice, all of which are clearly present. Combining these (Table 1 ▶ b) shows that the Laue group is mmm with a primitive lattice, Patterson group Pmmm. Fourier analysis of systematic absences along the three principal axes shows that all three have alternating strong (even) and weak (odd) intensities (Fig. 1 ▶ and Table 1 ▶ c), so are likely to be screw axes, implying that the space group is P212121. However, there are only three h00 reflections recorded along the a* axis, so confidence in the space-group assignment is not as high as the confidence in the Laue-group assignment (Table 1 ▶ d). With so few observations along this axis, it is impossible to be confident that P212121 is the true space group rather than P22121.
Figure 1

Plots from POINTLESS of axial reflections for the P212121 example shown in Table 1 ▶: (a) h00, (b) 0k0, (c) 00l. In each case I/σ(I) alternates between weak and strong for odd and even indices, respectively, indicating a 21 screw axis in each direction. With only three observations along the h00 axis, assignment of a screw along a is far less certain than along b and c (see Table 1 ▶ c). The plot of I′/σ(I) (almost the same in this case) uses a modified value of I, subtracting 2% of the neighbouring axial reflection to allow for possible contamination of weak reflections by a strong neighbour. All panels in Figs. 1–5 are monochrome versions of plots from LOGGRAPH essentially as they appear from ccp4i.

A pseudo-cubic example

Table 2 ▶ shows the scores for individual symmetry elements for a pseudo-cubic case with a ≃ b ≃ c. It is clear that only the orthorhombic symmetry elements are present: these are the high-scoring elements marked ‘***’. Neither the fourfolds characteristic of tetragonal groups nor the body-diagonal threefolds (along 111 etc.) characteristic of cubic groups are present. The joint probability score for the Laue group Pmmm is 0.989. The suggested solution (not shown) interchanges k and l to make a < b < c, which is the IUCr standard convention for a primitive ortho­rhombic cell (Mighell, 2002 ▶). Scoring the possible symmetry elements separately may allow the program and the user to distinguish between true crystallographic symmetry and pseudo-symmetry (i.e. a noncrystallographic rotation close to a potential crystallographic rotation), although either the program or the user may be fooled by twinning or if the pseudo-symmetry is very close to crystallo­graphic. If the data were integrated with cell constraints from a higher symmetry than is present, integration should be repeated with the looser cell constraints for the correct symmetry class.
Table 2

Scores for potential individual symmetry operators for a pseudo-cubic example

Items are as in Table 1 ▶. The unit-cell parameters are a = 79.15, b = 81.33, c = 81.15 Å, α = β = γ = 90°, i.e. a ≃ b ≃ c. Only the orthorhombic symmetry operators are present (marked ***) and the true space group is P212121.

LikelihoodZ-CCCCNo. Rmeas SymmetryOperator
0.9529.680.97147330.074 Identity 
0.9439.500.95129280.163***Twofold l (0 0 1){−h, −kl}
0.9489.590.96125420.098***Twofold k (0 1 0){−h, k, −l}
0.9449.520.95170390.140***Twofold h (1 0 0){h, −k, −l}
0.0510.550.05139210.689 Twofold (1 −1 0){−k, −h, −l}
0.0570.120.01166470.734 Twofold (0 1 −1){−h, −l, −k}
0.0692.870.29105400.470 Twofold (1 0 −1){−l, −k, −h}
0.0510.620.06122290.690 Twofold (1 1 0){k, h, −l}
0.0652.680.27128290.484 Twofold (1 0 1){l, −kh}
0.0580.100.01174770.736 Twofold (0 1 1){−hlk}
0.0590.060.01248690.824 Threefold (1 −1 −1){−kl, −h} {−l, −hk}
0.0590.040.00270240.814 Threefold (1 1 −1){−lh, −k} {k, −l, −h}
0.0580.080.01225080.782 Threefold (1 −1 1){l, −h, −k} {−k, −lh}
0.0600.020.00238180.824 Threefold (1 1 1){klh} {lhk}
0.0510.580.06253380.635 Fourfold l (0 0 1){−khl} {k, −hl}
0.0622.490.25235160.476 Fourfold k (0 1 0){lk, −h} {−lkh}
0.065−0.15−0.02263830.739 Fourfold h (1 0 0){hl, −k} {h, −lk}

Alternative indexing

If the true point group is lower symmetry than the lattice group, alternative valid but non-equivalent indexing schemes are possible related by symmetry operators that are present in the lattice group but not in the point group (note that these are also the cases in which merohedral twinning is possible). For example, in space group P3 (or P31) there are four different schemes: (h, k, l), (−h, −k, l), (k, h, −l) or (−k, −h, −l). Alternate indexing ambiguities may also arise from special relationships between unit-cell parameters (e.g. a = b in an orthorhombic system). For the first crystal (or part data set) any indexing scheme may be chosen, but for subsequent ones autoindexing will randomly pick one setting which may be inconsistent with the original choice. POINTLESS can compare a new test data set with a previously processed reference data set (from a merged or unmerged file) and choose the most consistent option (option ‘Match index to reference’ in ccp4i). In this option, the space group in the reference file is assumed to be correct.

Combining multiple files and multiple wavelengths

Multiple files, e.g. from multiple runs of MOSFLM, can be combined in POINTLESS using the ‘Add file’ button in ccp4i. They may be combined into a single data set with the same Project, Crystal and Dataset names (button ‘Assign to the same data set as the previous file’) or assigned to different data sets in the case of multiple-wavelength data. Note that the data-set name is used in downstream programs to label columns in the MTZ file, so should be short. Batch numbers are automatically incremented by a multiple of 1000 if necessary to make them unique across all files. If alternative indexing schemes are possible in the lattice group determined from the cell dimensions, then second and subsequent files are compared with the previous ones in the same way as if a reference file were given. Note that if the Laue group symmetry of the first file is wrong this may lead to wrong answers in some cases, so there is an option to determine the Laue symmetry of the first file before reading the rest.

Scaling

Scaling tries to make symmetry-related and duplicate measurements of a reflection equal by modelling the diffraction experiment, principally as a function of the incident and diffracted beam directions in the crystal (Hamilton et al., 1965 ▶; Fox & Holmes, 1966 ▶; Kabsch, 1988 ▶, 2010 ▶; Otwinowski et al., 2003 ▶; Evans, 2006 ▶). This makes the data internally consistent, assuming that the correct Laue group has been determined. After scaling, the remaining differences between observations can be analysed to give an indication of data quality, though not necessarily of its absolute correctness. In the ccp4i interface, the task ‘Scale and Merge Intensities’ runs SCALA to scale and merge the multiple observations of the same unique reflection, followed by CTRUNCATE to infer |F| from the intensity I and optionally generate or copy a test set of reflections for R free. The input file may be the output of POINTLESS. The ccp4i task presents a large number of options, but in most cases the defaults are suitable. If you know that you have a significant anomalous scatterer in the crystal, the the option to ‘Separate anomalous pairs for merging statistics’ should be selected, since this allows for real differences between Bijvoet-related reflections hkl and −h −k −l (very small anomalous differences are probably treated better without this option). Other useful options, after the first run, include setting the high-resolution limit (after deciding on the ‘true’ resolution, see below) and excluding some batches or batch ranges (in the ‘Excluded Data’ tab).

Measures of internal consistency

The traditional measure of internal consistency is R merge (also known as R sym), which is defined as (i.e. summed over all observations l of reflection h), but this has the disadvantage that it increases with the data multiplicity, even though the merged data are improved by averaging more observations. An improvement is the multiplicity-weighted R meas or R r.i.m. (Diederichs & Karplus, 1997 ▶; Weiss & Hilgenfeld, 1997 ▶; Weiss, 2001 ▶), which is defined aswhere n is the number of observations of reflection h [note that in Evans (2006 ▶) the square-root was incorrectly omitted]. A related measure is the precision-indicating R factor, which estimates the data quality after merging, After scaling, SCALA outputs a large number of statistics, mostly presented as graphs, and a final summary table which contains most of the data required for the traditional ‘Table 1’ (or perhaps Table S1) in a structural paper. Analyses against ‘batch number’, i.e. image number or time, are useful to check for the effects of radiation damage and for bad batches (e.g. blank images) or bad regions (Fig. 2 ▶). Individual blank or bad images can be rejected in SCALA (see Figs. 2 ▶ g and 2 ▶ h), but if there are bad regions it may be best to check the integration process carefully. Decisions on where to cut back data to a point where radiation damage is tolerable, or how best to combine data from different crystals or sweeps, are more complicated and tools to explore the best compromise between damage and completeness are not yet well developed, although the program CHEF (Winter, 2009 ▶) used in xia2 provides a guide.
Figure 2

Plots from SCALA against ‘batch’ (image) number (a–c) for a good case with little radiation damage (see text) and (d–f) for a case with two crystals both suffering radiation damage. (a, d) Mean scale [Mn(k)] and scale at θ = 0° (0k); these diverge if the relative B factor is large. (b, e) Relative B factor in the scaling; a large and declining negative value (e) indicates progressive radiation damage. (c, f) R merge is roughly constant in the good case (c) but increases with radiation damage (f). (g) A plot of R merge against batch shows a single outlier arising from a weak or blank image: omitting this batch (h) removes this problem.

Analyses against resolution suggest whether a resolution cutoff should be applied. The decision on the ‘real’ resolution is not easy: ideally, we would determine the point at which adding the next shell of data is not adding any statistically significant information. The best cutoff point may depend on what the data are to be used for: experimental phasing techniques work on amplitude differences, which are less accurate than the amplitudes themselves. Useful guidelines are the point at which 〈〈I 〉/σ(〈I 〉)〉 [after merging and adjusting the σ(I) estimates] falls below about 2, where 〈I /σ(I )〉 (before merging) falls below about 1, where the correlation coefficient between random half-data-set estimates of 〈I 〉 falls below about 0.5 or where 〈I〉 flattens out with respect to resolution; R merge is not a very useful criterion. Fig. 3 ▶ shows an example in which the cutoff was set to 3.2 Å using a combination of these criteria. If the data are severely anisotropic then these limits may be relaxed to keep useful data in the best direction.
Figure 3

Plots from SCALA against resolution. A suitable resolution cutoff may be estimated from a plot of 〈〈I〉/σ(I)〉, i.e. after averaging, where it falls below ∼2 or flattens out [top line in (a)] or from the correlation coefficient between 〈I〉 for random halves of the observations.

Analyses of consistency against intensity are not generally useful, since the statistics will always be worse for weak data; however, R merge in the top intensity bin should be small. Analysis against intensity is useful in improving estimates of σ(I); see Appendix B .

Completeness

Data completeness is important, preferably in all resolution shells, although it may be less important at the outer edge. James Holton (Advanced Light Source, Lawrence Berkeley National Laboratory, Berkeley, California, USA) has produced a series of instructive movies (http://ucxray.berkeley.edu/~jamesh/movies/) showing the degradation of map quality with systematic incompleteness, such as missing a wedge of data from an incomplete rotation range or losing the strongest reflections as detector overloads: random incompleteness (e.g. from omitting an R free test set), on the other hand, has little effect on maps. The data-collection strategy should always aim to collect a complete set of data. Plots against resolution from SCALA may show incompleteness at low resolution owing to detector overloads (Fig. 4 ▶ a), at high resolution owing to integrating into the corners of a square detector (Fig. 4 ▶ b) or incompleteness of the anomalous data (Fig. 4 ▶ c) which will limit the quality of experimental phasing. Fig. 4 ▶(d) shows a plot of cumulative completeness against batch number in an 84° sweep: note that 100% completeness is not reached until the end and that the anomalous completeness lags behind the total completeness by an amount that depends on the symmetry. This plot is not yet implemented in SCALA, but when it is it may help in judging the trade-off between completeness and radiation damage.
Figure 4

Plots of data completeness against resolution and batch. (a) Incompleteness at low resolution owing to detector overloads. (b) Incompleteness at high resolution owing to integrating into the corners of a square detector. (c) Incompleteness of anomalous data. (d) Cumulative completeness against batch (plot not yet available in SCALA).

Outliers

Most data sets contain a small proportion of measurements that are just ‘wrong’ (from which no useful information about the true intensity can be extracted). These arise from various causes, notably diffraction from ice crystals or superfluous protein crystal lattices (crystal clusters) that superimposes on a few (or, in bad cases, many) of the reflections from the crystal of interest. Detection of these intensity outliers is reasonably reliable if the multiplicity is high, but is not possible if there are only one or two observations (if two disagree, which one is correct?). This is a good reason for collecting high-multiplicity data. If SCALA is told that there are anomalous differences then the outlier check for discrepancies between Bijvoet-related reflections I + and I − uses a larger tolerance than that used within the I + or I − sets, depending (rather crudely) on the average size of the anomalous differences. The outlier-rejection algorithm assumes that the majority of symmetry-related observations of a reflection are correct: this may fail for reflections behind the backstop, so it is important that the backstop shadow should be identified properly in MOSFLM. SCALA produces a plot of outliers in their position on the detector (ROGUEPLOT file), which may show outliers clustered around the ice rings or around the backstop, in which case these regions of the detector should be masked out in MOSFLM. There is also a list of outliers in the ROGUES file which may be useful to understand the rejects. The rejection limits are set as multiples of the standard deviations and can be altered by the user. When trying to use a weak anomalous signal it may be useful to reduce the limits and eliminate more outliers.

Detecting anomalous signals

A data set contains measurements of reflections from both Bijvoet pairs I +(h k l) and I −(−h −k −l), which will be systematically different if there is anomalous scattering. Fig. 5 ▶ shows some statistics from SCALA for a case with a very strong anomalous signal and for one with a weak but still useful signal. Figs. 5 ▶(a) and 5 ▶(e) show normal probability plots (Howell & Smith, 1992 ▶) of ΔI anom/σ(ΔI anom), where ΔI anom = I + − I − is the Bijvoet difference: the central slope of this plot will be >1 if the anomalous differences are on average greater than their error. Another way of detecting a significant anomalous signal is to compare the two estimates of ΔI anom from random half data sets, ΔI 1 and ΔI 2 (provided there are at least two measurements of each, i.e. a multiplicity of roughly 4). Figs. 5 ▶(b) and 5 ▶(f) show the correlation coefficient between ΔI 1 and ΔI 2 as a function of resolution: Fig. 5 ▶(f) shows little statistically significance beyond about 4.5 Å resolution. Figs. 5 ▶(c) and 5 ▶(g) show scatter plots of ΔI 1 against ΔI 2: this plot is elongated along the diagonal if there is a large anomalous signal and this can be quantitated as the ‘r.m.s. correlation ratio’, which is defined as (root-mean-square deviation along the diagonal)/(root-mean-square deviation perpendicular to the diagonal) and is shown as a function of resolution in Figs. 5 ▶(d) and 5 ▶(h). The plots against resolution give a suggestion of where the data might be cut for substructure determination, but it is important to note that useful albeit weak phase information extends well beyond the point at which these statistics show a significant signal.
Figure 5

Detection of anomalous signal. (a–d) An example with a very strong anomalous signal, shown by (a) a large slope of the normal probability plot of ΔI/σ(ΔI) values, (b) a large correlation coefficient between two ΔI estimates from random half-data sets, (c) a scatter plot relating two half-data-set values of ΔI/σ(ΔI) and (d) the r.m.s. correlation ratio derived from the scatter plot. (e–h) The same plots for an example with a weak but still useful anomalous signal.

Estimation of amplitude |F| from intensity I

If we knew the true intensity J we could just take the square root, |F| = J 1/2. However, measured intensities have an error, so a weak intensity may well be measured as negative (i.e. below background); indeed, multiple measurements of a true intensity of zero should be equally positive and negative. This is one reason why when possible it is better to use I rather than |F| in structure determination and refinement. The ‘best’ (most likely) estimate of |F| is larger than I 1/2 for weak intensities, since we know |F| > 0, but |F| = I 1/2 is a good estimate for stronger intensities, roughly those with I > 3σ(I). The programs TRUNCATE and its newer version CTRUNCATE estimate |F| from I and σ(I) as where the prior probability of the true intensity p(J) is estimated from the average intensity in the same resolution range (French & Wilson, 1978 ▶).

Intensity statistics and crystal pathologies

At the end stage of data reduction, after scaling and merging, the distribution of intensities and its variation with resolution can indicate problems with the data, notably twinning (see, for example, Lebedev et al., 2006 ▶; Zwart et al., 2008 ▶). The simplest expected intensity statistics as a function of resolution s = sinθ/λ arise from assuming that atoms are randomly placed in the unit cell, in which case 〈I〉(s) = 〈FF*〉(s) = g(j, s)2, where g(j, s) is the scattering from the jth atom at resolution s. This average intensity falls off with resolution mainly because of atomic motions (B factors). If all atoms were equal and had equal B factors, then 〈I〉(s) = Cexp(−2Bs 2) and the ‘Wilson plot’ of log[〈I〉(s)] against s 2 would be a straight line of slope −2B. The Wilson plot for proteins shows peaks at ∼10 and 4 Å and a dip at ∼6 Å arising from the distribution of inter­atomic spacings in polypeptides (fewer atoms 6 Å apart than 4 Å apart), but the slope at higher resolution does give an indication of the average B factor and an unusual shape can indicate a problem (e.g. 〈I〉 increasing at the outer limit, spuriously large 〈I〉 owing to ice rings etc.). For detection of crystal pathologies we are not so interested in resolution dependence, so we can use normalized intensities Z = I/〈I〉(s) ≃ |E|2 which are independent of resolution and should ideally be corrected for anisotropy (as is performed in CTRUNCATE). Two useful statistics on Z are plotted by CTRUNCATE: the moments of Z as a function of resolution and its cumulative distribution. While 〈Z〉(s) = 1.0 by definition, its second moment 〈Z 2〉(s) (equivalent to the fourth moment of E) is >1.0 and is larger if the distribution of Z is wider. The ideal value of 〈E 4〉 is 2.0, but it will be smaller for the narrower intensity distribution from a merohedral twin (too few weak reflections), equal to 1.5 for a perfect twin and larger if there are too many weak reflections, e.g. from a noncrystallographic translation which leads to a whole class of reflections being weak. The cumulative distribution plot of N(z), the fraction of reflections with Z < z, against z will show a characteristic sigmoidal shape if there are too few weak reflections in the case of twinning. The most reliable test for twinning seems to be the L test (Padilla & Yeates, 2003 ▶), examining N(|L|), the cumulative value of |L|, where L = [I(h 1) − I(h 2)]/[I(h 1) + I(h 2)] for pairs of reflections h 1 and h 2 close in reciprocal space and unrelated by crystal symmetry. For untwinned data N(|L|) = |L|, giving a diagonal plot, while for twinned data N(|L|) > |L| and N(|L|) = |L|(3 − L 2)/2 for a perfect twin. This test seems to be largely unaffected by anisotropy or translational non­crystallographic symmetry which may affect tests on Z. The calculation of Z = I/〈I〉(s) depends on using a suitable value for I/〈I〉(s) and noncrystallographic translations or uncorrected anisotropy lead to the use of an inappropriate value for 〈I〉(s). These statistical tests are all unweighted, so it may be better to exclude weak high-resolution data or to examine the resolution dependence of, for example, the moments of Z (or possibly L). It is also worth noting that fewer weak reflections than expected may arise from unresolved closely spaced spots along a long real-space axis, so that weak reflections are contaminated by neighbouring strong reflections, thus mimicking the effect of twinning.

Summary: questions and decisions

In the process of data reduction, a number of decisions need to be taken either by the programs or by the user. The main questions and con­siderations are as follows.In most cases the data-reduction process is straightforward, but in difficult cases critical examination of the results may make the difference between solving and not solving the structure. (i) What is the point group or Laue group? This is usually unambiguous, but pseudosymmetry may confuse the programs and the user. Close examination of the scores for individual symmetry elements from POINTLESS may suggest lower symmetry groups to try. (ii) What is the space group? Distinction between screw axes and pure rotations from axial systematic absences is often unreliable and it is generally a good idea to try all the likely space groups (consistent with the Laue group) in the key structure-solution step: either molecular-replacement searches or substructure searches in experimental phasing. For example, in a primitive orthorhombic system the eight possible groups P222 should be tried. This has the added advantage of providing some negative controls on the success of the structure solution. (iii) Is there radiation damage: should data collected after the crystal has had a high dose of radiation be ignored (possibly at the expense of resolution)? Cutting back data from the end may reduce completeness and the optimum trade-off is hard to choose. (iv) What is the best resolution cutoff? An appropriate choice of resolution cutoff is difficult and sometimes seems to be performed mainly to satisfy referees. On the one hand, cutting back too far risks excluding data that do contain some useful information. On the other hand, extending the resolution further makes all statistics look worse and may in the end degrade maps. The choice is perhaps not as important as is sometimes thought: maps calculated with slightly different resolution cutoffs are almost indistinguishable. (v) Is there an anomalous signal detectable in the intensity statistics? Note that a weak anomalous signal may still be useful even if it is not detectable in the statistics. The statistics do give a good guide to a suitable resolution limit for location of the substructure, but the whole resolution range should be used in phasing. (vi) Are the data twinned? Highly twinned data sets can be solved by molecular replacement and refined, but probably not solved, by experimental phasing methods. Partially twinned data sets can often be solved by ignoring the twinning and then refined as a twin. (vii) Is this data set better or worse than those previously collected? One of the best things to do with a bad data set is to throw it away in favour of a better one. With modern synchrotrons, data collection is so fast that we usually have the freedom to collect data from several equivalent crystals and choose the best.
(a)

Scores for each symmetry element. R meas = ; CC is the linear correlation coefficient between normalized intensities E 2; Z-CC = CC/σ(CC), where σ(CC) is estimated from random uncorrelated observations.

LikelihoodZ-CCCCNo.Rmeas SymmetryOperator
0.9489.540.95121220.097 Identity 
0.9429.440.94183460.121***Twofold l (001){−hk +l}
0.9499.580.96302590.097***Twofold h (100){+hkl}
0.9129.150.92174270.120***Twofold k (010){−h +kl}
(b)

Scores for possible subgroups of the lattice group Pmmm, giving a clear indication that Pmmm is the correct Laue symmetry. CC− is the correlation coefficient for all lattice symmetry elements not present in the Laue group; Zcc− = CC−/σ(CC−); NetZcc = Zcc+ − Zcc−; Likelihood is a probability estimate based on CC and CC− (see Appendix A ); Delta is the angular deviation between the test lattice symmetry and the lattice symmetry implied by the Laue group.

Laue groupLikelihoodNetZccZcc+Zcc−CCCC−RmeasRDeltaReindex
Pmmm0.985***9.359.350.000.940.000.110.000.0[hkl]
P12/m10.0060.389.569.180.960.920.100.120.0[−k, −h, −l]
P12/m10.005−0.019.389.390.940.940.110.110.0[−h, −l, −k]
P12/m10.003−0.139.319.440.930.940.110.110.0[hkl]
P−10.0000.229.549.320.950.930.100.110.0[hkl]
(c)

Fourier analysis of axial reflections for systematic absences, indicating the presence of 21 screws along each principal axis. Peak height is the value at 1/2 the cell in Fourier space relative to the origin.

AxisNo.Peak heightSDProbabilityCondition
Screw axis 21 [a]31.0000.2960.889**h00: h = 2n
Screw axis 21 [b]261.0000.1420.971***0k0: k = 2n
Screw axis 21 [c]460.9970.0970.986***00l: l = 2n
(d)

Summary of the best solution. The ‘confidence’ scores are derived from the total probability of the best solution p best and that for the next best solution p next: confidence = [p best(p best − p next)]1/2.

Best solutionSpace group P212121
Reindex operator[hkl]
Laue-group probability0.985
Systematic absence probability0.851
Total probability0.838
Space-group confidence0.784
Laue-group confidence0.982
  9 in total

1.  Multiparametric scaling of diffraction intensities.

Authors:  Zbyszek Otwinowski; Dominika Borek; Wladyslaw Majewski; Wladek Minor
Journal:  Acta Crystallogr A       Date:  2003-04-25       Impact factor: 2.290

2.  A statistic for local intensity differences: robustness to anisotropy and pseudo-centering and utility for detecting twinning.

Authors:  Jennifer E Padilla; Todd O Yeates
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2003-06-27

3.  Intensity statistics in twinned crystals with examples from the PDB.

Authors:  Andrey A Lebedev; Alexei A Vagin; Garib N Murshudov
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2005-12-14

4.  Improved R-factors for diffraction data analysis in macromolecular crystallography.

Authors:  K Diederichs; P A Karplus
Journal:  Nat Struct Biol       Date:  1997-04

5.  iMOSFLM: a new graphical interface for diffraction-image processing with MOSFLM.

Authors:  T Geoff G Battye; Luke Kontogiannis; Owen Johnson; Harold R Powell; Andrew G W Leslie
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2011-03-18

6.  Integration, scaling, space-group assignment and post-refinement.

Authors:  Wolfgang Kabsch
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2010-01-22

Review 7.  Scaling and assessment of data quality.

Authors:  Philip Evans
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2005-12-14

8.  Surprises and pitfalls arising from (pseudo)symmetry.

Authors:  Peter H Zwart; Ralf W Grosse-Kunstleve; Andrey A Lebedev; Garib N Murshudov; Paul D Adams
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2007-12-05

9.  Conventional Cells-The Last Step Toward General Acceptance of Standard Conventional Cells for the Reporting of Crystallographic Data.

Authors:  Alan D Mighell
Journal:  J Res Natl Inst Stand Technol       Date:  2002-08-01
  9 in total
  634 in total

1.  Moonlighting by different stressors: crystal structure of the chaperone species of a 2-Cys peroxiredoxin.

Authors:  Fulvio Saccoccia; Patrizio Di Micco; Giovanna Boumis; Maurizio Brunori; Ilias Koutris; Adriana E Miele; Veronica Morea; Palita Sriratana; David L Williams; Andrea Bellelli; Francesco Angelucci
Journal:  Structure       Date:  2012-03-07       Impact factor: 5.006

2.  Overexpression, purification, crystallization and preliminary X-ray crystallographic analysis of the periplasmic domain of outer membrane protein A from Acinetobacter baumannii.

Authors:  Jeong Soon Park; Woo Cheol Lee; Saehae Choi; Kwon Joo Yeo; Jung Hyun Song; Young Hyun Han; Je Chul Lee; Seung Il Kim; Young Ho Jeon; Chaejoon Cheong; Hye Yeon Kim
Journal:  Acta Crystallogr Sect F Struct Biol Cryst Commun       Date:  2011-11-25

3.  Crystallization and preliminary X-ray analysis of the Entamoeba histolytica α-actinin-2 rod domain.

Authors:  Barbara Addario; Shenghua Huang; Uwe H Sauer; Lars Backman
Journal:  Acta Crystallogr Sect F Struct Biol Cryst Commun       Date:  2011-09-24

4.  Purification, crystallization and preliminary crystallographic analysis of the adhesion domain of Epf from Streptococcus pyogenes.

Authors:  Christian Linke; Nikolai Siemens; Martin J Middleditch; Bernd Kreikemeyer; Edward N Baker
Journal:  Acta Crystallogr Sect F Struct Biol Cryst Commun       Date:  2012-06-28

5.  Dynamic Control of X Chromosome Conformation and Repression by a Histone H4K20 Demethylase.

Authors:  Katjuša Brejc; Qian Bian; Satoru Uzawa; Bayly S Wheeler; Erika C Anderson; David S King; Philip J Kranzusch; Christine G Preston; Barbara J Meyer
Journal:  Cell       Date:  2017-08-31       Impact factor: 41.582

6.  Cloning, expression, purification, crystallization and preliminary X-ray diffraction analysis of N-acetylmannosamine-6-phosphate 2-epimerase from methicillin-resistant Staphylococcus aureus.

Authors:  Rachel A North; Sarah A Kessans; Michael D W Griffin; Andrew J A Watson; Antony J Fairbanks; Renwick C J Dobson
Journal:  Acta Crystallogr F Struct Biol Commun       Date:  2014-04-17       Impact factor: 1.056

7.  Cloning, expression, purification, crystallization and preliminary X-ray diffraction analysis of N-acetylmannosamine kinase from methicillin-resistant Staphylococcus aureus.

Authors:  Rachel A North; Simona Seizova; Anja Stampfli; Sarah A Kessans; Hironori Suzuki; Michael D W Griffin; Marc Kvansakul; Renwick C J Dobson
Journal:  Acta Crystallogr F Struct Biol Commun       Date:  2014-04-17       Impact factor: 1.056

8.  Crystallization and preliminary X-ray crystallographic analysis of a novel α-L-arabinofuranosidase (CtGH43) from Clostridium thermocellum ATCC 27405.

Authors:  Arun Goyal; Shadab Ahmed; Carlos M G A Fontes; Shabir Najmudin
Journal:  Acta Crystallogr F Struct Biol Commun       Date:  2014-04-15       Impact factor: 1.056

9.  Crystallization and preliminary X-ray analysis of the ergothioneine-biosynthetic methyltransferase EgtD.

Authors:  Allegra Vit; Laëtitia Misson; Wulf Blankenfeldt; Florian Peter Seebeck
Journal:  Acta Crystallogr F Struct Biol Commun       Date:  2014-04-25       Impact factor: 1.056

10.  Structure-guided design and optimization of dipeptidyl inhibitors of norovirus 3CL protease. Structure-activity relationships and biochemical, X-ray crystallographic, cell-based, and in vivo studies.

Authors:  Anushka C Galasiti Kankanamalage; Yunjeong Kim; Pathum M Weerawarna; Roxanne Adeline Z Uy; Vishnu C Damalanka; Sivakoteswara Rao Mandadapu; Kevin R Alliston; Nurjahan Mehzabeen; Kevin P Battaile; Scott Lovell; Kyeong-Ok Chang; William C Groutas
Journal:  J Med Chem       Date:  2015-03-19       Impact factor: 7.446

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.