Literature DB >> 25781634

Enabling X-ray free electron laser crystallography for challenging biological systems from a limited number of crystals.

Monarin Uervirojnangkoorn¹, Oliver B Zeldin¹, Artem Y Lyubimov¹, Johan Hattne², Aaron S Brewster³, Nicholas K Sauter³, Axel T Brunger¹, William I Weis¹.

Abstract

There is considerable potential for X-ray free electron lasers (XFELs) to enable determination of macromolecular crystal structures that are difficult to solve using current synchrotron sources. Prior XFEL studies often involved the collection of thousands to millions of diffraction images, in part due to limitations of data processing methods. We implemented a data processing system based on classical post-refinement techniques, adapted to specific properties of XFEL diffraction data. When applied to XFEL data from three different proteins collected using various sample delivery systems and XFEL beam parameters, our method improved the quality of the diffraction data as well as the resulting refined atomic models and electron density maps. Moreover, the number of observations for a reflection necessary to assemble an accurate data set could be reduced to a few observations. These developments will help expand the applicability of XFEL crystallography to challenging biological systems, including cases where sample is limited.

Entities: Chemical Disease Species

Keywords: X-ray crystallography; biophysics; data processing; free electron laser; none; structural biology

Mesh：

Substances：

Year: 2015 PMID： 25781634 PMCID： PMC4397907 DOI： 10.7554/eLife.05421

Source DB: PubMed Journal: Elife ISSN： 2050-084X Impact factor: 8.140

Introduction

Radiation damage often limits the resolution and accuracy of macromolecular crystal structures (Garman, 2010; Zeldin et al., 2013). Femtosecond X-ray free electron laser (XFEL) pulses enable the possibility of visualizing molecular structures before the onset of radiation damage, and allow the dynamics of chemical processes to be captured (Solem, 1986; Neutze et al., 2000). Thus, from the first XFEL operation at the Linac Coherent Light Source (LCLS) in 2009, there has been considerable effort dedicated to the development of methods to utilize this rapid succession of bright pulses for macromolecular crystallography, with the aim of obtaining damage-free, chemically accurate structures. Most of the structures reported from XFELs to date use a liquid jet to inject small crystals into the beam (DePonte et al., 2008; Sierra et al., 2012; Weierstall et al., 2014), but diffraction data have also been measured from crystals placed in the beam with a standard goniometer setup (Cohen et al., 2014; Hirata et al., 2014). In both cases, the illuminated volume diffracts before suffering damage by a single XFEL pulse. Because the crystal is effectively stationary during the 10–50 fs exposure, ‘still’ diffraction patterns are obtained, in contrast to standard diffraction data collection where the sample is rotated through a small angle during the exposure. Extracting accurate Bragg peak intensities from XFEL diffraction data is a substantial challenge. An XFEL data set comprises ‘still’ diffraction patterns generally containing only partially recorded reflections, typically from randomly oriented crystals. The full intensity then has to be estimated from the observed partial intensity observations. Most XFEL diffraction data processing approaches reported to date have approximated the full intensity by the so-called “Monte Carlo” method, in which thousands of partial intensity observations of a given reflection are summed and normalized by the number of observations, which assumes that these observations sample the full 3D Bragg volume. Because a single diffraction image—in which each observed reflection samples only part of each reflection intensity–contains much less information than a small continuous wedge of diffraction data (as used in conventional crystallography), this method requires a very large number of crystals to ensure convergence of the averaged partial reflection intensities to the full intensity value (Kirian et al., 2010). Moreover, shot-to-shot differences in pulse intensity and energy spectrum that arise from the self-amplified stimulated emission (SASE) process (Kondratenko and Saldin, 1979; Bonifacio et al., 1984), along with differences in illuminated crystal volume, mosaicity, and unit-cell dimensions, contribute to intensity variation of the equivalent reflections observed on different images. These differences are assumed to be averaged out by the Monte Carlo method (Hattne et al., 2014). Thus, accurate determination of these parameters for each diffraction image should, in principle, provide more accurate integrated intensities, and converge with fewer measurements. Furthermore, it is desirable to assemble a data set from as few diffraction images as possible, since the potential of XFELs has been limited by the very large amounts of sample required for the Monte Carlo method, compounded by severe limitations in the availability of beamtime. In the 1970's, the Harrison and Rossmann groups developed ‘post-refinement’ methods (Rossmann et al., 1979; Winkler et al., 1979), in which the parameters that determine the location and volume of the Bragg peaks are ‘post’-refined against a reference set of fully recorded reflections following initial indexing and integration of rotation data. Accurate estimation of these parameters, including the unit-cell lengths and angles, crystal orientation, mosaic spread, and beam divergence enables accurate calculation of what fraction of the reflection intensity was recorded on the image, i.e., its ‘partiality’, which is then used to correct the measurement to its fully recorded equivalent. Applied to virus crystals, for which only a few images can typically be collected before radiation damage becomes significant, post-refinement made it possible to obtain high-quality diffraction data sets collected from many crystals (Rossmann et al., 1979; Winkler et al., 1979). The implementation of post-refinement for XFEL diffraction data poses unique challenges. Firstly, since XFEL diffraction data generally do not contain fully recorded reflections, the initial scaling and merging of images is difficult. Secondly, since the XFEL diffraction images are stills rather than rotation data, different approaches are required for the correction of measurements to determine the full spot equivalent. Other schemes for implementing post-refinement of XFEL diffraction data have been described previously, but thus far they have been only applied to simulated XFEL data (White, 2014), and to pseudo-still images collected using monochromatic synchrotron radiation (Kabsch, 2014). We have developed a new post-refinement procedure specifically designed for diffraction data from still images collected from crystals in random orientations. We implemented our method in a new computer program, prime (post-refinement and merging), that post-refines the parameters needed for calculating the partiality of reflections recorded on each still image. We describe here our method and demonstrate that post-refinement greatly improves the quality of the diffraction data from XFEL diffraction experiments with crystals of three different proteins. We show that our post-refinement procedure allows complete data sets to be extracted from a much smaller number of diffraction images than that necessary when using the Monte Carlo method. Thus, this development will help make XFEL crystallography accessible to many challenging problems in biology, including those for which sample quantity is a major limiting factor.

Results

Notation

Units are arbitrary unless specified in parenthesis. I, observed intensity. I, reference intensity. w, weighting term (inverse variance of the observed intensity). G, function of linear scale (G) and resolution-dependent (B) factors that scales the different diffraction images to the reference set. Eoc, Ewald-offset correction function. r, offset reciprocal-space distance from the center of the reflection to the Ewald sphere (Å−1). r, radius of the disc of intersection between the reciprocal lattice point and the Ewald sphere (Å−1). r, radius of the reciprocal lattice point (Å−1). θ, θ, θ, crystal rotation angles (see Figure 1A; °).

Figure 1.

Geometry of the diffraction experiment and calculation of the Ewald-offset distance, r.

DOI: http://dx.doi.org/10.7554/eLife.05421.003

Geometry of the diffraction experiment and calculation of the Ewald-offset distance, r.

(A) A reciprocal lattice point intersects the Ewald sphere. The inset shows the coordinate system used in cctbx.xfel and prime. The vector represents the direction of the incident beam (–z-axis) and forms the radius of the Ewald sphere of length 1/λ. The reciprocal lattice point i is expressed in reciprocal lab coordinates using Equation 5 as represented by the vector . The Ewald-offset distance, r, is the difference between the distance from the Ewald-sphere center to the reciprocal lattice point (length of ) and 1/λ. The inset shows the definition of the crystal rotation axes; they are applied in the following order: θz, θy, θx. (B) Shown is the volume of a reciprocal lattice point with radius r. The offset r defines the Ewald-offset correction Eoc, which is the ratio between the area intersecting the Ewald sphere, A, and the area at the center of the volume, A. DOI: http://dx.doi.org/10.7554/eLife.05421.003 γ0, parameter for Equation 3 (Å−1). γ, energy spread and unit-cell variation (see Equation 3; Å−1). γ and γ, beam divergence (see Equation 4; Å−1). {uc}, unit-cell dimensions (a, b, c (Å), α, β, and γ (°)). V, reciprocal-lattice volume correction function (Å−3). and , observed and predicted spot positions on the detector (mm). , position of the reciprocal lattice point (Å−1). , displacement vector from the center of the Ewald sphere to x (Å−1). , incident beam vector with length 1/wavelength (Å−1). , orthogonalization matrix. , rotation matrix. f and f, Lorentzian function and its normalized counterpart. Γ, full width at half maximum (FWHM) of the Lorentzian function.

Post-refinement overview

Partiality can be modeled by describing the full reflection as a sphere (Figure 1A). In a still diffraction pattern, assuming a monochromatic photon source, the observed intensity I for Miller index is a thin slice through a three-dimensional reflection. To calculate partiality, we assume that the measurement is an areal (i.e., infinitely thin) sample of the volume (Figure 1B). The maximum partial intensity that can be recorded for a given reflection will occur when its center lies exactly on the Ewald sphere. By definition, the center of the reflection will be offset from the Ewald sphere by r, and the corresponding disc will have a radius r. The offset r is determined by various experimental parameters, including the crystal orientation, unit-cell dimensions, and X-ray photon energy. The offset distance is used to calculate the Ewald offset correction, Eoc defined as the ratio between the areas defined by r and r (implemented as a smoothed correction function Eoc as defined in ‘Materials and methods’). The Ewald-offset corrected intensity is then converted to the full intensity in 3D by applying a volume correction factor, V. We define the target T for the post-refinement of a partiality and scaling model by:which minimizes the difference between the observed reflections I and a scaled and Ewald-offset corrected full intensity ‘reference set’ I using a least-squares method. The sum is over all observed reflections with Miller indices . In alternate refinement cycles, we also minimize the deviations between predicted () and observed () spot positions on the detector using a subset of strong spots as has been suggested previously (Hattne et al., 2014; Kabsch, 2014): Sets of parameters associated with each diffraction image, i.e., G, B, θ, θ, γ, γ, γ, γ and the unit-cell constants, are iteratively refined in a series of ‘microcycles’ against the current reference set (Figure 2).

Figure 2.

Post-refinement protocol.

DOI: http://dx.doi.org/10.7554/eLife.05421.004

Post-refinement protocol.

The flowchart illustrates the iterative post-refinement protocol, broken up into ‘microcycles’ that refine groups of parameters iteratively (blue boxes), and ‘macrocycles’. At the beginning of first macrocycle, a reference diffraction data set is generated. At the end of each macrocycle, the reference diffraction data set is updated. Both the micro- and macrocycles terminate either when the refinement converges or when a user-specified maximum number of cycles is reached. DOI: http://dx.doi.org/10.7554/eLife.05421.004 Procedures for generating the initial reference set I(initial) are described below. After convergence of the microcycles, scaled full intensities are calculated from the observed partial intensities I by multiplication of the inverse of the Ewald-offset correction and the scale factor G, along with the volume correction factor V. These scaled full reflections are then merged for each unique Miller index, taking into account estimated errors of the observed intensities, σ(I), and propagation of error estimates for the refined parameters. This merged and scaled set of full reflections is then used as the new reference set in the next round of post-refinement using the target functions (Equations 1 and 2, for details see ‘Materials and methods’). These ‘macrocycles’ are repeated until convergence is achieved, after which the merged and scaled set of full intensities is provided to the user. The prime program controls post-refinement of specified parameters in a particular microcycle (Figure 2). One can refine all parameters together, or selectively refine groups of parameters iteratively, starting from (1) a linear scale factor and a B-factor, (2) crystal orientations, (3) crystal mosaicity, beam divergence, and spectral dispersion, and (4) unit-cell dimensions. Space-group-specific constraints are used to limit the number of free parameters for the unit-cell refinement. A particular microcycle is completed when the target functions converge or when a specified number of iterations is reached; the program then generates the new reference intensity set to replace the current reference set for the next macrocycle. Finally, the program exits and outputs the latest merged reflection set either when the macrocycles converge or when a user-specified maximum number of cycles has been reached.

Preparation of the observed intensities

The starting point for our post-refinement method is a set of indexed and integrated partial intensities, along with their estimated errors, obtained from still images. For this study, diffraction data and their estimated errors were obtained from the cctbx.xfel package (Sauter et al., 2013; Hattne et al., 2014), although in principle integrated diffraction data from any other program can be used. Observed intensities on the diffraction image were classified as ‘spots’ by the program Spotfinder (Zhang et al., 2006), which identifies Bragg spots by considering connected pixels with area and signal height greater than user-defined thresholds. By trial and error, we accepted reflections larger than 25 pixels with individual-pixel intensity more than 5 σ over background for myoglobin and hydrogenase (collected on a Rayonix MX325HE detector with pixel size of 0.08 mm and beam diameter [FWHM] of 50 μm). For thermolysin (collected on a Cornell-SLAC pixel array detector with pixel size of 0.1 mm and beam size of 2.25 μm2), where reflections are generally smaller, these values were 1 pixel and 5 σ. A full list of parameters is available on the cctbx.xfel wiki (http://cci.lbl.gov/xfel). Separate resolution cutoffs for each image were applied by cctbx.xfel, at resolutions where the average I/σ(I) fell below 0.5 (Hattne et al., 2014). Prior to post-refinement, the experimentally observed partial intensities need to be corrected by a polarization factor. The primary XFEL beam at LCLS is strongly polarized in the horizontal plane, and we calculate the correction factor as a function of the Bragg angle (θ) and the angle between the sample reflection and the laboratory horizontal planes (Kahn et al., 1982; see ‘Materials and methods’). For a stationary crystal and a monochromatic beam, a Lorentz factor correction is not applicable; the spectral dispersion of the SASE beam (δE/E ∼ 3 × 10−3 for the data sets studied here) is accounted for by the γe term (see ‘Materials and methods’).

Generating the initial reference set and initial parameters

An essential step to initiate post-refinement is the generation of the initial reference set I (initial). This reference set has to be estimated from the available unmerged and unscaled partial reflection intensities after application of the polarization correction. For the results presented here, linear scale factors for each diffraction image were chosen to make the mean intensities of each diffraction image equal. Since this procedure can be affected by outliers in the observed intensities, we select a subset of reflections with user-specified resolution range and signal-to-noise ratio (I/σ(I)) cutoffs. From this selection, we calculate the mean intensity on each diffraction image and then scale each image to make the mean intensity of all images equal. We correct the scaled observed reflections to their Ewald-offset corrected equivalents using the starting parameters, and then merge the observations, taking into account the experimental σ(I), to generate the initial reference set. The initial values for crystal orientation, unit-cell dimensions, crystal-to-detector distance, and spot position on the detector were obtained from the refinement of these parameters by cctbx.xfel. The photon energy was that provided by the LCLS endstation system and is not refined. Initial values for the parameters of the reflection width model are described in the ‘Materials and methods’ section.

Definition and comparison of data processing schemes

In order to separately assess the effects of scaling, the Ewald offset correction (Equation 1), and post-refinement, we refer to three alternative schemes for processing the diffraction data sets: (1) ‘Averaged merged’, in which intensities were generated by averaging all observed partial intensities from equivalent reflections without Ewald-offset correction and scaling; (2) ‘Mean-intensity partiality corrected’, in which intensities were generated by scaling the reflections to the mean intensity and also applying the Ewald-offset correction determined from the initial parameters obtained from the indexing and integration program, followed by merging; and (3) ‘Post-refined’, in which intensities were from the final set of scaled and merged full reflections after the convergence of post-refinement. We note that although the ‘averaged merged’ process is similar to the original Monte Carlo method (Kirian et al., 2010), the integrated, unmerged partial intensities used in our tests were obtained from the program cctbx.xfel (Hattne et al., 2014), which also refines various parameters on an image-by-image basis (Sauter et al., 2014).

Quality assessment of post-refined data

We tested our post-refinement method on experimental XFEL diffraction data sets from three different crystallized proteins of known structure: myoglobin, hydrogenase, and thermolysin (Table 1). For quality assessment, we performed molecular replacement (MR) with Phaser (McCoy et al., 2007) using models with selected parts of the known structures omitted, followed by atomic model refinement with phenix.refine (Afonine et al., 2012), and inspection of (mF-DF) omit maps. We further used three different metrics: CC, and the crystallographic R and R of the fully refined atomic model. We then compared changes in the three quality metrics between merged XFEL diffraction data sets after scaling, partiality correction, and post-refinement. We also investigated the effect of reducing the number of images used by randomly selecting a subset from the full set of diffraction images and repeating the entire post-refinement, merging, MR and refinement processes using this subset.

Table 1.

XFEL diffraction data sets used in this study

DOI: http://dx.doi.org/10.7554/eLife.05421.005

	Myoglobin	Clostridium pasteurianum hydrogenase	Thermolysin
Space group	P6	P4₂2₁2	P6₁22
Resolution used (Å)	20.0–1.35	45.0–1.60	50.0–2.10
Unit cell dimensions (Å)	a = b = 90.8, c = 45.6	a = b = 111.2, c = 103.8	a = b = 92.7, c = 130.5
No. of unique reflections	46,555	85,273	19,995
No. of images* indexed	757	177	12,692
No. of images with spots to resolution used	307	75	1957
Average no. of spots on an image (to resolution used)	1628	3640	352
Energy spectrum	SASE†	SASE†	SASE†
Detector	Rayonix MX325HE	Rayonix MX325HE	CSPAD‡
Sample delivery method	fixed target	fixed target	Electrospun jet

This is the number of images indexed using cctbx.xfel program, and in the case of thermolysin it is the number of images indexed for one of the two wavelengths.

SASE: self-amplified spontaneous emission.

CSPAD: Cornell-SLAC pixel array detector.

XFEL diffraction data sets used in this study DOI: http://dx.doi.org/10.7554/eLife.05421.005 This is the number of images indexed using cctbx.xfel program, and in the case of thermolysin it is the number of images indexed for one of the two wavelengths. SASE: self-amplified spontaneous emission. CSPAD: Cornell-SLAC pixel array detector. Diffraction data for both myoglobin and hydrogenase were collected from frozen crystals mounted on a standard goniometer setup (Cohen et al., 2014), whereas the thermolysin data were collected using an electrospun liquid jet to inject nanocystals into a vacuum chamber (Sierra et al., 2012; Bogan, 2013). The completeness of each data set was better than 90% at the limiting resolution used in our tests (Tables 2, 3, 4). Each diffraction data set involved a different number of images due the differing diffraction quality of the crystals. Statistics of post-refinement and atomic model refinement for myoglobin DOI: http://dx.doi.org/10.7554/eLife.05421.006 Values in parentheses correspond to highest resolution shell. Post-refined parameters are shown as the mean value, with the standard deviation in parentheses. Molecular replacement scores reported by Phaser (McCoy et al., 2007): log-likelihood gain (LLG) and translation function (TFZ). Statistics of post-refinement and atomic model refinement for hydrogenase DOI: http://dx.doi.org/10.7554/eLife.05421.007 Values in parentheses correspond to highest resolution shell. Post-refined parameters are shown as the mean value, with the standard deviation in parentheses. Molecular replacement scores reported by Phaser (McCoy et al., 2007): log-likelihood gain (LLG) and translation function (TFZ). Statistics of post-refinement and atomic model refinement for thermolysin DOI: http://dx.doi.org/10.7554/eLife.05421.008 Values in parentheses correspond to highest resolution shell. Post-refined parameters are shown as the mean value, with the standard deviation in parentheses. Molecular replacement scores reported by Phaser (McCoy et al., 2007): log-likelihood gain (LLG) and translation function (TFZ).

Myoglobin

For myoglobin, we used both an XFEL diffraction data set consisting of 757 diffraction images (Table 1) collected by the SSRL-SMB group using a goniometer-mounted fixed-target grid (Cohen et al., 2014), and a randomly selected subset of 100 diffraction images. The diffraction images were from crystals in random orientations, with a single still image collected from each crystal.

Convergence of post-refinement

Convergence properties for our post-refinement method for myoglobin are shown in Figures 3 and 4, and a representative example of the first macrocycle for a selected diffraction image is provided in Figure 3. The order of the three microcycle post-refinement iterations was: scale factors (SF—Equation 17), crystal orientation (CO—Equation 5), reciprocal spot size (RR—Equations 3 and 4), and unit-cell dimensions (UC—Equation 5). The partiality model target function T (Equation 1) markedly decreased in the first microcycle and fully converged in the last microcycle. The spot position residual T (Equation 2), also decreased both during post-refinement of the crystal orientation and the unit-cell parameters.

Figure 3.

Post-refinement during the first macrocycle of post-refinement for myoglobin.

Shown are the values of the refined parameters and target functions during the first macrocycle of post-refinement for a representative diffraction image of the myoglobin XFEL diffraction data set. The iterative post-refinement included SF (scale factors), CO (crystal orientation), RR (reflection radius parameters), and UC (unit-cell dimensions) for three microcycles.

DOI: http://dx.doi.org/10.7554/eLife.05421.009

Post-refinement during the first macrocycle of post-refinement for myoglobin.

Convergence of post-refinement after five macrocycles for myoglobin.

The plots illustrate the convergence of post-refined parameters, target functions, and quality indicators during post-refinement over five macrocycles. A subset of 100 randomly selected diffraction images from the myoglobin XFEL diffraction data was used. For each specified target function and refined parameter, changes are plotted relative to the previous macrocycle, whereas the quality metric CC is shown as absolute numbers. The changes in post-refined parameters and target functions are shown as ‘box plots’. The bottom and top of the blue box are the first (Q1) and third (Q3) quartiles. The red line inside the box is the second quartile (Q2; median). The black horizontal lines extending vertically from the box indicate the range of the particular quantity at a 1.5 interquartile range (Q3–Q1). The plus signs indicate any items beyond this range. DOI: http://dx.doi.org/10.7554/eLife.05421.010 Figure 5 shows the results for five macrocycles for post-refinement using the subset of 100 randomly selected still images of the myoglobin XFEL diffraction data set. The partiality model target function T (Equation 1) continually decreased in the first three macrocycles. The average spot position residual T (Equation 2) decreased in the first cycle and converged in the next cycle. The quality metric CC also converged within the first three macrocycles.

Figure 5.

Merging statistics for myoglobin.

DOI: http://dx.doi.org/10.7554/eLife.05421.011

Merging statistics for myoglobin.

(A) Percent completeness and (B) average number of observations plotted as a function of resolution for the myoblogin XFEL diffraction data set consisting of all 757 diffraction images (Table 1) and a randomly selected subset of 100 diffraction images. (C) CC for the averaged merged, mean-intensity scaled with partiality correction, and post-refined myoglobin diffraction data sets consisting of 100 and 757 diffraction images. DOI: http://dx.doi.org/10.7554/eLife.05421.011 Inaccuracies in the starting parameters obtained from indexing and integration of still images may limit the radius of convergence and the accuracy of the post-refined parameters. The sources of such errors will be the subject of future improvement in indexing and integration in cctbx.xfel. Nonetheless, for the systems studied here the post-refinements converged within 3–5 cycles.

Improvements due to post-refinement

For the myoglobin diffraction data set using all 757 images (Table 2, Figure 6A,B), the CC value improved after post-refinement, especially for those reflections in the low-resolution shells (Figure 5C; Table 2).

Table 2.

Statistics of post-refinement and atomic model refinement for myoglobin

DOI: http://dx.doi.org/10.7554/eLife.05421.006

No. images	100			757
Resolutiona (Å)	20.0–1.35 (1.40–1.35)			20.0–1.35 (1.40–1.35)
Completenessa (%)	80.0 (22.2)			97.7 (79.8)
Average no. observations per unique hkla	4.0 (1.2)			25.7 (2.0)
	Averaged-merged	Mean-scaled partiality corrected	Post-refined	Averaged merged	Mean-scaled partiality corrected	Post-refined
Post-refinement parametersb
Linear scale factor G₀	1.00 (0.00)	2.79 (5.02)	1.00 (1.04)	1.00 (0.00)	2.19 (3.83)	0.89 (1.07)
B	0.0 (0.0)	0.0 (0.0)	3.2 (7.8)	0.0 (0.0)	0.0 (0.0)	6.2 (8.3)
γ₀ (Å⁻¹)	NA	0.00135 (0.00028)	0.00128 (0.00022)	NA	0.00147 (0.00042)	0.00132 (0.00034)
γ_y (Å⁻¹)	NA	0.00 (0.00)	0.00007 (0.00080)	NA	0.00 (0.00)	0.00007 (0.00009)
γ_x (Å⁻¹)	NA	0.00 (0.00)	0.00010 (0.00011)	NA	0.00 (0.00)	0.00008 (0.00010)
γ_e (Å⁻¹)	NA	0.00200 (0.00)	0.00344 (0.00266)	NA	0.00200 (0.00)	0.00423 (0.00323)
Unit cell
a (Å):	90.4 (0.4)	90.4 (0.4)	90.5 (0.4)	90.4 (0.4)	90.4 (0.4)	90.5 (0.3)
c (Å)	45.3 (0.4)	45.3 (0.4)	45.3 (0.3)	45.3 (0.3)	45.3 (0.3)	45.3 (0.3)
Average T_pr Start/End	NA	NA	19.39 (7.68)/7.17 (3.38)	NA	NA	19.83 (7.54)/6.02 (2.59)
Average T_xy (mm²) Start/End	NA	NA	169.74 (132.56)/132.02 (104.08)	NA	NA	170.66 (144.52)/133.42 (109.58)
CC_1/2(%)	81.3	79.6	86.5	91.8	95.7	98.2
Molecular replacement scoresc
LLG	2837.	5043.	5291.	8264.	8364.	9320.
TFZ	10.5	13.0	13.4	13.7	13.8	14.0
Structure-refinement parameters
R (%)	39.4	28.0	23.5	21.1	20.3	17.8
R_free (%)	42.1	29.4	24.8	23.1	22.5	19.7
Bond r.m.s.d.	0.006	0.006	0.004	0.006	0.006	0.006
Angle r.m.s.d.	1.14	0.98	0.79	1.03	1.35	0.86
Ramachandran statistics
Favored (%)	98.0	98.0	98.0	98.0	98.0	98.0
Outliers (%)	0.0	0.0	0.0	0.0	0.0	0.0

Values in parentheses correspond to highest resolution shell.

Post-refined parameters are shown as the mean value, with the standard deviation in parentheses.

Molecular replacement scores reported by Phaser (McCoy et al., 2007): log-likelihood gain (LLG) and translation function (TFZ).

Figure 6.

Impact of post-refinement and number of images on electron density and model quality for myoglobin.

DOI: http://dx.doi.org/10.7554/eLife.05421.012

Impact of post-refinement and number of images on electron density and model quality for myoglobin.

(A) Difference Fourier (mF-DF) omit maps around the heme group (which was omitted from molecular replacement and atomic model refinement) for the averaged merged, the mean-scaled partiality-corrected merged, and the post-refined myoglobin XFEL diffraction data sets consisting of all 757 diffraction images (Table 1) and a randomly selected subset of 100 diffraction images. The maps are contoured are at 2.5 σ. (B) A plot of crystallographic R and R values vs resolution after atomic model refinement using the specified myoglobin diffraction data sets with inclusion of the heme group, SO4, and water molecules. DOI: http://dx.doi.org/10.7554/eLife.05421.012 Omit maps were used to compare the quality of the diffraction data processed with the different methods. Specifically, we omitted the heme group from the molecular replacement search model (PDB ID: 3U3E) and in subsequent atomic model refinement, and calculated mF-DF difference maps (Figure 6). The real-space correlation coefficient of the heme group to the difference maps calculated from the post-refined diffraction data sets is higher than that calculated from the corresponding averaged merged diffraction data sets using the same set of diffraction images (Figure 6A). After initial model refinement with the heme group omitted, we included the heme group and well-defined water molecules and completed the atomic model refinement. The post-refined diffraction data set produced the best R and R values, followed by the mean-scaled partiality corrected, with the averaged merged diffraction data sets yielding the poorest refinement statistics. Overall, comparison of the CC (Figure 5), omit map quality, and R values (Figure 6B) shows that post-refinement substantially improves scaling and correction of the diffraction data with respect to the mean-scaled partiality-corrected diffraction data set. Thus, post-refinement against the iteratively improved reference set is superior to methods that only consider each diffraction image individually, even when the reflections are scaled and corrected for partiality.

100 diffraction images are sufficient for myoglobin structure refinement

Given the significant improvements obtained by post-refining all available images, we tested whether accurate diffraction data and refined atomic models could be obtained using fewer diffraction images by post-refining the randomly selected subset of 100 myoglobin diffraction images. Since this subset is only 80% complete, the CC is poorer than that of the full diffraction data set consisting of 757 images, but it is nonetheless greatly improved relative to the corresponding non-post-refined diffraction data set (Figure 5). Moreover, the real-space correlation coefficient of the heme group with the difference map obtained with the post-refined 100 diffraction images is better than that calculated from the averaged merged diffraction data set using all the 757 diffraction images (Figure 6A), despite the higher completeness and CC value of the latter data set (Figure 5C). Thus, post-refinement both improves diffraction data quality for a given set of images and reduces the number of diffraction images required for structure determination and refinement from serial diffraction data.

Comparison with a synchrotron data set

We also compared the post-refined XFEL difference map (using all 757 diffraction images) with that calculated from an isomorphous synchrotron data set and model (PDB ID: 1JW8, excluding reflections past 1.35 Å resolution to make the resolution of the diffraction data sets equivalent). The omit maps and real-space correlation coefficients for the heme group were of comparable quality (Figure 7).

Figure 7.

Quality of synchrotron vs. post-refined XFEL diffraction data sets for myoglobin.

Difference Fourier (mF-DF) omit maps at 1.35 Å around the heme group (which was omitted from molecular replacement and model refinement), generated from (A) the synchrotron diffraction data and corresponding model with PDB ID 1JW8 (for comparison, all reflections past 1.35 Å resolution were excluded) and (B) the post-refined myoglobin XFEL diffraction data set using all 757 diffraction images (Table 1). The maps are contoured at 2.5 σ.

DOI: http://dx.doi.org/10.7554/eLife.05421.013

Quality of synchrotron vs. post-refined XFEL diffraction data sets for myoglobin.

Hydrogenase

XFEL diffraction data for Clostridium pasteurianum hydrogenase were measured from eight crystals by the Peters (University of Montana) and SSRL-SMB groups using a goniometer-mounted fixed-target grid (Cohen et al., 2014). This experiment generated 177 diffraction images that could be merged to a completeness of 91%, with more than half of the diffraction images containing reflections to 1.6 Å (each diffraction image typically has approximately 3000 spots). We also used a randomly selected subset of 100 diffraction images to assess the effect of post-refinement on a smaller number of images. The CC value improved significantly with post-refinement (Table 3). For quality assessment, the Fe-S cluster was omitted from both the molecular replacement search model (PDB ID 3C8Y) and subsequent atomic model refinement. The omit map densities for the post-refined diffraction data sets using the complete set of 177 diffraction images and the randomly selected subset of 100 diffraction images (83% complete) clearly show the entire Fe-S cluster whereas the densities using the averaged merged data sets are much poorer (Figure 8A). Upon atomic model refinement with the Fe-S clusters and water molecules included, the R and R values for both post-refined data sets were significantly better than the averaged merged case (Figure 8B).

Table 3.

Statistics of post-refinement and atomic model refinement for hydrogenase

DOI: http://dx.doi.org/10.7554/eLife.05421.007

No. images	100		177
Resolutiona (Å)	45.0–1.60 (1.66–1.60)		45.0–1.60 (1.66–1.60)
Completenessa (%)	83.0 (47.7)		91.2 (63.5)
Average no. observations per unique hkla	4.4 (1.7)		7.13 (2.3)
	Averaged-merged	Post-refined	Averaged-merged	Post-refined
Post-refinement parametersb
Linear scale factor G₀	1.00 (0.00)	0.56 (1.27)	1.00 (0.00)	0.53 (1.22)
B	0.0 (0.0)	10.0 (7.0)	0.0 (0.0)	10.5 (6.9)
γ₀ (Å⁻¹)	NA	0.00132 (0.00042)	NA	0.00126 (0.00041)
γ_y (Å⁻¹)	NA	0.00002 (0.00004)	NA	0.00002 (0.00004)
γ_x (Å⁻¹)	NA	0.00008 (0.00009)	NA	0.00008 (0.00011)
γ_e (Å⁻¹)	NA	0.00269 (0.00138)	NA	0.00288 (0.00160)
Unit cell
a (Å):	110.1 (0.4)	110.4 (0.3)	110.1 (0.4)	110.3 (0.4)
c (Å)	103.1 (0.4)	103.1 (0.2)	103.0 (0.4)	103.0 (0.2)
Average T_pr Start/End	NA	28.20 (10.86)/5.92 (2.35)	NA	26.47 (12.70)/5.22 (2.72)
Average T_xy (mm²) Start/End	NA	623.36 (314.57)/381.23 (198.44)	NA	564.30 (267.45)/372.28 (202.28)
CC_1/2 (%)	62.0	77.3	71.7	84.8
Molecular replacement scoresc
LLG	53,352.	9612.	7229.	11774.
TFZ	69.2	75.9	75.0	79.0
Structure-refinement parameters
R (%)	33.4	25.3	29.1	22.0
R_free (%)	36.7	28.9	31.3	25.0
Bond r.m.s.d.	0.006	0.007	0.007	0.007
Angle r.m.s.d.	1.43	1.50	1.68	1.97
Ramachandran statistics
Favored (%)	96.3	97.0	97.0	96.7
Outliers (%)	0.0	0.0	0.0	0.0

Values in parentheses correspond to highest resolution shell.

Post-refined parameters are shown as the mean value, with the standard deviation in parentheses.

Molecular replacement scores reported by Phaser (McCoy et al., 2007): log-likelihood gain (LLG) and translation function (TFZ).

Figure 8.

Impact of post-refinement on the hydrogenase diffraction data set.

(A) Difference Fourier (mF-DF) omit maps of one of the four Fe-S clusters (which were omitted in molecular replacement and atomic model refinement) for the averaged merged and the post-refined hydrogenase XFEL diffraction data sets consisting of all 177 diffraction images (Table 1) and a randomly selected subset of 100 diffraction images. The maps are contoured at 3 σ. (B) Crystallographic R and R values vs resolution after atomic model refinement using the specified diffraction data sets with inclusion of the three Fe-S clusters and water molecules.

DOI: http://dx.doi.org/10.7554/eLife.05421.014

Impact of post-refinement on the hydrogenase diffraction data set.

Thermolysin

For thermolysin, we tested the entire deposited XFEL diffraction data set consisting of 12,692 diffraction images (Table 1) (Hattne et al., 2014; the diffraction data are publicly archived in the Coherent X-ray Imaging Data Bank, accession ID 23, http://cxidb.org), as well as a randomly selected subset of 2000 diffraction images. In this experiment, the crystal-to-detector distance gave a maximum resolution of 2.6 Å at the edge and 2.1 Å at the corners of the detector. Thus, a large number of diffraction images were required to achieve reasonable completeness of the merged data set for reflections in the 2.1—2.6 Å resolution range. As in the other two cases, post-refinement significantly improved the CC value (Table 4). For quality assessment, zinc and calcium ions were omitted from the thermolysin molecular replacement search model (PDB ID: 2TLI) and subsequent atomic model refinement. Post-refinement improved the peak heights of both the zinc and calcium ions (Table 4).

Table 4.

Statistics of post-refinement and atomic model refinement for thermolysin

DOI: http://dx.doi.org/10.7554/eLife.05421.008

No. images	2000		12,692
Resolutiona (Å)	50.0–2.10 (2.18–2.10)		50.0–2.10 (2.18–2.10)
Completenessa (%)	81.3 (24.3)		96.5 (74.8)
Average no. observations per unique hkla	32.8 (1.2)		176.6 (2.4)
	Averaged-merged	Post-refined	Averaged-merged	Post-refined
Post-refinement parametersb
Linear scale factor G₀	1.00 (0.00)	1.65 (1.66)	1.00 (0.00)	2.26 (75.12)
B	0.0 (0.0)	23.0 (33.8)	0.0 (0.0)	30.1 (59.8)
γ₀ (Å⁻¹)	NA	0.00052 (0.00040)	NA	0.00051 (0.00039)
γ_y (Å⁻¹)	NA	0.00001 (0.00003)	NA	0.00001 (0.00003)
γ_x (Å⁻¹)	NA	0.00002 (0.00004)	NA	0.00002 (0.00004)
γ_e (Å⁻¹)	NA	0.00110 (0.00129)	NA	0.00103 (0.00128)
Unit cell
a (Å):	92.9 (0.3)	92.9 (0.2)	92.9 (0.3)	92.9 (0.3)
c (Å)	130.5 (0.5)	130.4 (0.4)	130.5 (0.5)	130.4 (0.4)
Average T_pr Start/End	NA	1.15 (0.49)/0.55 (0.23)	NA	1.15 (0.52)/0.28 (0.13)
Average T_xy (mm²) Start/End	NA	168.13 (117.29)/167.72 (106.14)	NA	169.01 (122.20)/170.00 (122.57)
CC_1/2 (%)	77.7	93.5	94.3	98.8
Molecular replacement scoresc
LLG	3590.	4491.	5477.	6022.
TFZ	8.9	9.7	24.1	24.6
Structure-refinement parameters
R (%)	25.2	19.5	20.7	18.4
R_free (%)	29.1	24.0	23.9	21.1
Bond r.m.s.d.	0.004	0.002	0.002	0.002
Angle r.m.s.d.	0.75	0.58	0.59	0.62
Ramachandran statistics
Favored (%)	95.9	94.6	95.2	94.9
Outliers (%)	0.0	0.0	0.0	0.0
Zinc peak height
Zn(1) (σ)	14.0	16.0	14.3	20.9
Zn(2) (σ)	3.6	5.1	7.7	7.1
Average peak height for calcium ions (σ)	9.7	11.3	14.2	16.1

Values in parentheses correspond to highest resolution shell.

Post-refined parameters are shown as the mean value, with the standard deviation in parentheses.

Molecular replacement scores reported by Phaser (McCoy et al., 2007): log-likelihood gain (LLG) and translation function (TFZ).

Anomalous difference Fourier peak heights

The thermolysin diffraction data were collected at a photon energy just above the absorption edge of zinc, so we compared the anomalous signals with and without post-refinement. We used the same four diffraction data sets (i.e., averaged-merged, post-refined, with 2000 and 12,692 diffraction images, respectively), but processed them keeping Friedel mates separate. We refined the atomic model of thermolysin lacking zinc and calcium ions, and calculated anomalous difference Fourier maps (Figure 9). We observed two anomalous difference peaks near the active site above 3 σ using the post-refined data sets. In contrast, the second, smaller peak is not visible in the anomalous difference map using the ‘averaged-merged’ data set with 2000 images, and it had not been clearly visible in the previous data analysis of the thermolysin XFEL data set (PDB ID: 4OW3; Hattne et al., 2014). A previous thermolysin structure (PDB ID: 1LND; Holland et al., 1995) reported two zinc sites in the active site that correspond to the two anomalous-difference peaks observed with our post-refined data set. Although the crystallization condition used in our case did not have the high concentration (10 mM) of zinc used in the Holland et al. study, the second anomalous difference peak suggests the presence of this second zinc site.

Figure 9.

Impact of post-refinement on the anomalous signal in the thermolysin diffraction dataset.

Anomalous difference Fourier maps for the averaged merged (A, C) or the post-refined (B, D) thermolysin XFEL diffraction data sets consisting of all 12,692 diffraction images (A, B—Table 1) and a randomly selected subset of 2000 diffraction images (C, D). The anomalous difference Fourier maps were computed using phases from the thermolysin atomic model (but excluding zinc and calcium ions), refined separately against each diffraction data set. All maps are contoured at 3 σ; the peak heights for the two zinc ions are indicated.

DOI: http://dx.doi.org/10.7554/eLife.05421.015

Impact of post-refinement on the anomalous signal in the thermolysin diffraction dataset.

Difference map reveals a bound dipeptide

When the molecular replacement model of thermolysin was refined against the post-refined data, we observed a well-connected electron density feature in the mF-DF map near the active site. In contrast, in the deposited model refined against the original XFEL data (Hattne et al., 2014; PDB ID: 4OW3), weak density features in this region were interpreted as water molecules. We found several examples of deposited thermolysin structures that have a dipeptide in this region (e.g., PDB entry 2WHZ with Tyr–Ile, PDB entry 2WI0 with Leu–Trp, and PDB entry 8TLN with Val–Lys). We interpreted the shape of the difference density as a Leu–Lys dipeptide, superimposed its structure and calculated real-space correlation coefficients. The dipeptide had a higher real-space correlation coefficient (CC) with the maps calculated from the post-refined diffraction data than those calculated from the averaged merged diffraction data. The electron density for both post-refined diffraction data sets is also better connected than that of the averaged merged diffraction data set (Figure 10A). The R and R values of the refined complete model using the post-refined diffraction data are lower than those using the averaged merged data throughout the entire resolution range (Figure 10B).

Figure 10.

Impact of post-refinement on the quality of electron density maps and models of thermolysin.

(A) Difference Fourier (mF-DF) maps revealing a Leu–Lys dipeptide near the zinc site for the averaged merged and the post-refined thermolysin XFEL diffraction data sets consisting of all 12,692 diffraction images (Table 1) and a randomly selected subset of 2000 diffraction images, respectively. The maps are contoured at 3 σ. (B) Crystallographic R and R values vs resolution for the refinements after atomic model refinement using the specified diffraction data sets and with inclusion of two zincs, calcium ions, and the Leu–Lys dipeptide.

DOI: http://dx.doi.org/10.7554/eLife.05421.016

Impact of post-refinement on the quality of electron density maps and models of thermolysin.

Effect of completeness

The completeness of the merged data sets has a direct impact on the overall quality of the diffraction data set (CC), quality of the electron density maps and the refined structures (Tables 2–4, and Figure 6). When completeness is high, adding more images to increase the multiplicity of observations has only a modest impact on the quality of the final refined structures using the post-refined diffraction data. For example, when subsets ranging from 2000 to 12,000 thermolysin diffraction images (all subsets 100% complete at 2.6 Å) were post-refined the peak height in the omit map for the larger of the two anomalous sites (Figure 11C), the CC values, and the R values of the refined structures did not improve significantly when more than 8000 images were used.

Figure 11.

Convergence of structure refinements for the post-refined thermolysin XFEL data set at 2.6 Å resolution, using increasing numbers of diffraction images.

(A) Average number of observations per unique hkl. (B) CC for merged subsets using 2000–12,000 images (100% completeness for all subsets). (C) Peak height (σ) in the omit map for the largest peak. (D) R and R after refining the thermolysin model without zinc and calcium ions against the corresponding post-refined diffraction data sets.

DOI: http://dx.doi.org/10.7554/eLife.05421.017

Convergence of structure refinements for the post-refined thermolysin XFEL data set at 2.6 Å resolution, using increasing numbers of diffraction images.

Discussion

Diffraction data collection using conventional x-ray sources typically employs the rotation method, in which a single crystal is rotated through a contiguous set of angles, and the diffraction patterns are recorded on a 2-D detector. If a full data set can be collected from a single crystal without a prohibitive level of radiation damage, diffraction data processing is a well-established and reliable process. In contrast, processing of XFEL diffraction data, which are collected from crystals in random orientations as ‘still’ diffraction images, requires new methods and implementations such as those described here. Improved data collection and processing methods, particularly those that can significantly reduce the amount of sample needed to assemble a complete and accurate diffraction data set, are important for making XFELs useful for certain challenging investigations in structural biology. We developed a post-refinement method for still diffraction images, such as those obtained at XFELs, and implemented it in new computer program, prime, that applies a least-squares minimization method to refine parameters as defined in our partiality model. Other post-refinement methods for XFEL diffraction data have been described recently (Kabsch, 2014; White, 2014), but our implementation differs from these reports. Kabsch uses a partiality model in which an Ewald offset correction is defined as a Gaussian function of angular distance from the Ewald sphere. White used the intersecting volume between the reflection and the limiting-energy Ewald spheres defined by the energy spectrum for the partiality calculation, and calculates the initial reference data set by averaging all observations without scaling. Neither report describes an application to experimental XFEL diffraction data, so we cannot compare these methods to the results presented here. We have demonstrated here that our implementation of post-refinement substantially improves the quality of the diffraction data from three different XFEL experiments. Moreover, the resulting structures can be refined to significantly lower R and R values, with electron density maps that reveal novel features more clearly, than those using non-post-refined XFEL data sets. A key feature of our method is that the parameters that define the diffracted spot are iteratively refined against the reference set. This approach is superior to methods that only consider each diffraction image individually. Moreover, our post-refinement procedure allows accurate diffraction data sets to be extracted from a much smaller number of images (average number of observations) than that necessary without post-refinement. Thus, this development will make XFEL crystallography accessible to many challenging problems in biology for which sample quantity is a major limiting factor. At present, it is difficult to assess the relative quality of post-refined XFEL data studied here with conventional rotation data measured at a synchrotron. The comparison of myoglobin omit maps (Figure 7) suggests that the SR data are perhaps somewhat better, but more systematic studies will be needed to understand the relative merits of the different data sets. We suspect that rotation data would be better due to the ability to directly measure full reflections (at least by summation of partials) without modeling partiality, which is still a relatively crude process (see below). However, a comparison between still data sets measured at a synchrotron and an XFEL is needed to deconvolute the effect of rotation vs other differences between these sources. Our formulation of post-refinement employs the simplifying assumption that reflections are spherical volumes. More sophisticated models consider crystal mosaicity to have three components, each with a distinct effect on the reciprocal lattice point (Juers et al., 2007; Nave, 1998, 2014). First, the domain size (the average size of the coherently scattering mosaic blocks) produces reciprocal lattice points of constant, finite size: small domains produce large-sized spots, while large domains produce small spots, as there is an inverse (Fourier) relation between spot size and domain size. Second, unit-cell variation among domains produces reflections that are spheres whose radii increase with distance from the origin. In cctbx.xfel, mosaicity (modeled as isotropic parameter) and effective domain size are taken into account when predicting which reflections are in diffracting position prior to integration (Sauter et al., 2014; Sauter, 2015). Third, orientational spread among mosaic domains produces spots shaped like spherical caps. Each cap subtends a solid angle that depends on the magnitude of the spread. In addition, anisotropy in crystal mosaicity is not considered; this would require refining separate parameters along each lattice direction. Finally, the rugged energy spectrum that results from the SASE process of the XFEL is not yet considered in our current model. These issues will require future investigation.

Materials and methods

Partiality model

The observed intensity I(i) for observation i of Miller index is a thin slice through a three-dimensional reflection. To calculate partiality, we assume that the measurement is an infinitely thin, circular sample of a spherical volume (Figure 1B). We assume a monochromatic beam as the starting point to define the Ewald offset correction Eoc. The Eoc of any reflection centered on the Ewald sphere is defined as 1; this position corresponds to the maximum partial intensity that could be measured for the reflection. The Eoc for any other position is defined as a function of the normal distance from the Ewald sphere to the center of the reciprocal lattice point (the offset distance, r), and of the reciprocal-lattice radius of the spot r, which is a function of the crystal mosaicity and spectral dispersion (Figure 1B). The Eoc can be described by the ratio of the observed area (A) with a radius r to the Ewald-offset corrected area (A) with a radius r (Figure 1B). The SASE spectrum emitted by the XFEL is broad and varies from shot-to-shot (Zhu et al., 2012). To calculate the Ewald sphere, we set the wavelength to be the centroid of the SASE spectrum recorded with each shot. For XFEL data measured with a seeded beam (Amann et al., 2012), the spectrum is narrow and constant from shot-to-shot, and this single value can be used in this case. In order to model spectral dispersion and the possible effects of asymmetric beam divergence, we adapt the rocking curve model described in Winkler et al. (1979). The four-parameter function used for the rocking curve is where the first term includes the contribution by spectral dispersion and the second term models beam anisotropy. Specifically,where γ0 is a parameter that is initially set to the r.m.s.d. of the Ewald offset calculated for all the reflections on a given image, γ represents the width of the energy spread and the unit-cell variation (the initial value of γ is calculated from the average energy spread), and θ is the Bragg angle. The second term is provided by:where α is the azimuthal angle going from meridional (α = 0) to equatorial (α = π/2) . The values of γ and γ are initially set to 0. The distribution of r values for the myoglobin case with 757 images after post-refinement is shown in Figure 12. The parameters γ, γ, γ, γ0 are refined within a microcycle (Figure 2).

Figure 12.

Distribution of the Ewald sphere offset r.

The histogram shows the distribution of r calculated after post-refinement for myoglobin using 757 diffraction images. The number of observations after applying the reflection selection criteria for merging and outlier rejections for this 1.35 Å data set is 1,136,447 (∼96% of the total observed reflections). The standard deviation is 0.0016 1/Å or approximately 0.12° (when calculated with the mean of the energy distribution).

DOI: http://dx.doi.org/10.7554/eLife.05421.018

Distribution of the Ewald sphere offset r.

Calculating the reciprocal lattice point offset

The crystal orientation is described in a right-handed coordinate system with the z-axis pointing to the source of the incident beam and the y-axis vertical (Figure 1A). We define the crystal orientation by rotations in the order θz, θy, θx about these axes. For each Miller index (i), the reciprocal lattice point vector (i) is obtained by applying orthogonalization and rotation matrixes O and R:wherewhere is the rotation matrix for a rotation around the i-th axis, are the reciprocal unit-cell parameters, and . As shown in Figure 1A, the displacement to (i) from the center of the Ewald sphere is given by:where 0 = (0, 0, −1/λ). The offset distance is thus the difference between the length of (i) and the Ewald-sphere radius,

The Ewald-offset correction function Eoc

We introduce a smooth approximation of the area ratio Eoc (see ‘Results’) in order to circumvent the undefined first derivative when the ratio is zero. We use a Lorentzian function (f) to model the radius as function of distance from the Ewald sphere: The function is normalized so that f(r = 0) = 1.0 when the reciprocal-lattice point is centered on the Ewald sphere, so that We then use the ratio of the observed area (A) with a radius r to the Ewald-offset corrected area (A) with a radius r (Figure 1B) that corresponds to the full width at half maximum (FWHM), Γ, in the Lorentzian function. Using the Lorentzian function to describe the falloff in radius as we move away from the Ewald sphere makes the Eoc function differentiable at r = r. For the reciprocal lattice volume being bound by a sphere of radius r centered on the reciprocal lattice point, the intersecting area of the volume is given by:where The Eoc is then given by the ratio of this intersecting area to the area when this reflection is centered on the Ewald sphere (A), By setting the FWHM of Γ proportional to the radius, r, at half Eoc,we arrive at the Ewald-offset correction function (Figure 13A)

Figure 13.

The Ewald-offset correction function.

DOI: http://dx.doi.org/10.7554/eLife.05421.019

The Ewald-offset correction function.

(A) Ewald-offset correction Eoc (Equation 14) viewed as a function of the reciprocal-lattice radius (r) and the offset distance (r). (B) A slice through Eoc at r = 0.003, comparing Eoc (Equation 14) and Eoc (Equation 11). DOI: http://dx.doi.org/10.7554/eLife.05421.019 The use of this Lorentzian approximation to derive the Eoc function vs an actual sphere function, Eoc, is illustrated in Figure 13B.

Correction to full intensity

To adjust the observed still intensity to its equivalent at zero offset, we apply the Ewald-offset correction to the observed intensity,where I(i) is the observed partial intensity i of Miller index on image m, (i) is the Ewald-offset correction, and G is a scale function for image m. We then convert this maximum partial intensity to a full intensity estimate by correcting for the volume of the spot, a factor of :where Note that I(i) will be on an arbitrary scale, and appropriate scaling methods may be applied to place the data on a quasi-absolute scale prior to structure determination and refinement, as is done for conventional rotation data.

Refinement of crystal orientation, reflection width, and unit-cell parameters

We refine image m by first minimizing the target function:whereand the scale function G comprises a linear scale factor G0 and a B-factor: We apply a spot position restraint as a second target function in subsequent steps during a microcycle using the x, y positions determined by the spot-finding step of data processing (Hattne et al., 2014; Kabsch, 2014).where and are the observed and calculated spot centroids, respectively. The Levenberg–Marquardt (LM) algorithm from the scipy python library (Oliphant, 2007), which is a combination of the gradient descent and the Gauss–Newton iteration, is used to minimize the target function residuals. The refinement of the unit-cell parameters (a, b, c, α, β, γ) takes crystal symmetry constraints into account to make the procedure more robust. After these iterative refinement cycles are complete, we apply the refined parameters to the reflection intensities of each still, and then merge the same reduced Miller indices (from all stills) to obtain the zero-offset still intensities, which are used for the new reference intensity set (see next section).

Reflection selection criteria

At each step in a microcycle, the user can select reflections that are used for post-refinement of a parameter group using the following criteria: resolution range, signal strength (I/σ(I)), and the Ewald offset correction value. In addition to these selection criteria, deviations from the target unit-cell dimensions (specified as a fraction of each dimension) can also be used in the merging step so that only diffraction patterns with acceptable unit-cell dimension values are included in the merged reflection set. Each post-refinement parameter group can have its own separate set of reflection selection criteria.

Merging procedure

Starting from the observed intensities, we obtain the full-volume intensity, I(i), from I(i) by first applying the Ewald offset correction (Equation 15) and then the full-intensity correction (Equation 16). Prior to merging equivalent observations, we detect outliers using an iterative rejection scheme, discarding reflections with intensity more or less than a user-specified cutoff (3 σ default, where σ is defined as the standard deviation of the distribution of the full reflections I). Finally, in order to obtain the merged reflection set, we calculate 〈I〉 from the intensity of reflections with the same reduced Miller indices using the sigma-weighted average:whereand is derived from the calculation of error: Since G is a function of G and B, and Eoc is a function of crystal orientation, mosaicity, and unit-cell parameters, the error estimates for G can be further calculated as:and ΔEoc2 can be calculated similarly by summing all over products of partial derivatives and errors estimated for each parameter in the Eoc function (square root of the diagonal elements of the covariance matrix). We use CC as a quality indicator for the diffraction data sets (Diederichs and Karplus, 2013). We calculate CC by randomly partitioning all (partial) intensity observations of a given reflection into two groups. We reject any reflections with fewer than four observations; for all other reflections, we merge the observations in each group using Equation 20. CC is then calculated as the correlation between these two independently merged diffraction data sets.

Partial derivatives of the diffraction parameters

Letfor observed partial intensity i of miller index .

Scale factor, G and B.

The derivatives of function g with respect to G: The derivatives of function g with respect to B:

Crystal rotation angles (θ, θ, θ).

Although three rotation angles θ, θ, θ can be refined, a rotation around the beam direction (z-axis) has no component on the reciprocal-lattice offset (rh) from the Ewald sphere—therefore, the derivative with respect to θ is 0. The partial derivatives with respect to the remaining parameters can be derived in a similar way—here, only the derivatives with respect to are θ given.whereand R is the rotation matrix of the still image. The derivatives of the g function (Equation 24) with respect to θ and the unit-cell parameters can be calculated by substituting the last partial derivatives of R with the appropriate ones.

Unit-cell parameters

For unit-cell parameters, constraints imposed by crystallographic space groups are applied during the refinement—e.g., tetragonal systems only have two free parameters (a and c) since a = b and α = β = γ = 90. Other restraint conditions, such as allowable refinement limits of the unit-cell dimensions, can also be applied as a ‘penalty terms’ in the least-squares refinement. The partial derivatives with respect to each unit-cell parameter in reciprocal units (here, is given and ):where , , and are as derived in (2) and

Reflection radius, r

The reflection radius that accommodates effects of crystal mosaicity and spectral dispersion, described by the four parameters, γ0, γ, γ, and γ, has following derivatives: For ,where is derived in (Equation 27) and For γ and γ, the and are the same as derived for γ and

Polarization correction

The XFEL beam is nearly 100% polarized in the horizontal direction. The optics at both the LCLS XPP and CXI stations do not introduce additional polarization. To account for the polarization of the primary beam, for a given reflection, we consider the angle between the sample reflection plane formed by the vector and the -z-axis, and the laboratory horizontal (Figure 14).

Figure 14.

Geometry of the incident and diffracted beam for polarization correction.

DOI: http://dx.doi.org/10.7554/eLife.05421.020

Geometry of the incident and diffracted beam for polarization correction.

The diagram shows a reflection on a plane formed by its reciprocal-space vector and the -z-axis at angle . This reflection is affected by the polarization of the incoming primary beam in both the horizontal (x) and vertical (y) directions. DOI: http://dx.doi.org/10.7554/eLife.05421.020 As described in Kahn et al. (1982), the beam I0 incident on the sample crystal can be described in terms of two components, one parallel (σ) and the other perpendicular (π) to the plane of reflection: Each of these components is affected by the polarization of the primary beam in both the horizontal (x) and vertical (y) directions. Using fx and fy as the fractions horizontal and vertical in the laboratory frame (fx + fy = 1),andwhere f and f are the polarization fractions in the x and y directions. After reflection, only I is attenuated: By substituting I and I from Equations 30 and 31 in Equation 32, we arrive atwhere the bracketed expression is P (Kahn et al., 1982).

Molecular replacement and atomic model refinement protocol

To ensure atomic model refinements against the various diffraction data sets were as comparable as possible, we used a standard semi-automated solution and refinement protocol. First, we performed molecular replacement phasing with known structures as search models (PDB ID 3U3E for myoglobin, 3C8Y for hydrogenase, and 2TLI for thermolysin) with all heteroatoms, water molecules, and ligands removed. Molecular replacement was carried out with Phaser (McCoy et al., 2007) using default settings, with r.m.s.d. set to 0.8. The resulting solutions were then refined using phenix.refine (Afonine et al., 2012) in two cycles. In the first cycle, we carried out rigid body refinement, positional (xyz) refinement with automatic correction of Asn, Gln and His sidechain orientations, and atomic displacement parameter (ADP) refinement. We then used the difference density maps for missing ligands and heteroatoms obtained from this cycle to calculate real-space correlation coefficients using phenix.get_cc_mtz_pdb from the PHENIX software suite (Adams et al., 2010) for myoglobin and thermolysin and the program ‘Map Correlation’ from the CCP4 software (Winn et al., 2011) for hydrogenase. These omit difference density maps are shown in Figures 6, 7, 8, 10. In the second cycle, all ligands and heteroatoms were placed in the difference density maps and combined with the refined structure from the first cycle using Coot (Emsley et al., 2010). The second cycle employed positional and ADP refinement with target weights optimization and water update was carried out with these complete models. The structures were validated by MolProbity (Chen et al., 2010). Final refinement statistics (Tables 2, 3, 4) were analyzed with phenix.polygon (Urzhumtseva et al., 2009) and found to be within acceptable range for other structures at similar resolutions. For the thermolysin structure obtained from anomalous diffraction data (processed keeping Friedel pairs separate), only one cycle of atomic model refinement was carried out. All figures were made in PyMOL (The PyMOL Molecular Graphics System, Version 1.5.0.4 Schrödinger, LLC.).

Computer program

The computer program, prime, is implemented as a part of the cctbx computational crystallography toolbox (Grosse-Kunstleve et al., 2002). Download and installation instructions are available on the cctbx website (http://cctbx.sourceforge.net).

Note added at proof

Subsequent to acceptance of this article, a paper was published by Ginn et al. (2015) describing an alternative method for orientation refinement as compared to the method of Sauter et al. (2014), and partiality estimation for each individual image, but without post-refinement. eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers. Thank you for sending your work entitled “Enabling X-ray Free Electron Laser Crystallography for Challenging Biological Systems from a Limited Number of Crystals” for consideration at eLife. Your article has been favorably evaluated by John Kuriyan (Senior editor) and three reviewers, one of whom, Stephen Harrison, is a member of our Board of Reviewing Editors. The Reviewing editor and the other reviewers discussed their comments before we reached this decision, and the Reviewing editor has assembled the following comments to help you prepare a revised submission. The manuscript represents a substantial advance in the processing of XFEL diffraction data, through the introduction of postrefinement methods. Although it provides no direct comparison with other procedures (from Kabsch or White), data for the three test cases considered show considerably improved statistics. Moreover, the total number of frames required to compile a data set is very much smaller than with the so-called “Monte Carlo” method. The manuscript is well written and thorough (especially in the provision of equations in the Methods section). A possible, but forgivable, omission was the inclusion of a more challenging test case that could demonstrate a genuine increase in the effective resolution of a dataset from the new procedures. The reviewers had the following modest concerns, questions and requests: A. Theory:1) Sphere model for Eoc. The basic idea of postrefinement is that the ratio of the intensity of a partially recorded reflection to its value in a properly corrected and scaled reference data set is a very sensitive measure of the orientation and unit-cell parameters of the crystal and of diffracting-range parameters such as mosaic spread, energy dispersion, and range of unit-cell parameters within the diffracting volume. Equations (1) and (2) summarize the application to XFEL stills, with Eoc as the critical function requiring definition and evaluation. The authors choose a sphere model for Eoc, which is reasonable for cases in which the diffracting range is dominated by energy dispersion and unit-cell variation. For a mosaic crystal with no variation in unit cell across the diffracting volume and mosaicity higher than the energy dispersion, the reciprocal-space shape of the diffraction spot will not be spherical; it will intersect the Ewald sphere as an arc, since the Bragg angle and hence the distance of any component of the spot from the origin of reciprocal space will be (ex hypothesi) invariant, while the range of azimuthal angles of the spot on the detector will depend on the mosaic spread (assumed to be non-zero). With cryopreserved crystals, the assumption that a combination of unit-cell variation and energy dispersion dominates is almost certainly a good one, but it may not hold for tiny crystals at ambient temperature in an injected beam. Anisotropy of some of the parameters may also make other shapes a better fit. The approach in the paper is, of course, generalizable to other shapes (with much “hairier” expressions for Eoc and its derivatives). In any case, the authors should discuss the assumptions that go into the sphere approximation. 2) Lorentz factor. A clear discussion of Lorentz factor is important, to give the paper full archival value as a complete treatment of the intensity correction problem. Formally, there is no Lorentz factor for a still. This statement is easy to prove using the “sinc” formula given in Equation 1 of the cited article by Kirian et al. (2010). If two different relps lie precisely on the Ewald sphere, then the value of the sinc function is simply equal to the square of the number of unit cells, regardless of resolution or any other geometric factor. All that remains is the polarization (which is not a Lorentz factor) and the incident intensity, which is the same for every spot. The only terms that remain hkl-dependent are the structure factor F, and the solid angle subtended by a pixel. The latter has some semblance to a Lorentz factor, but disappears upon pixel integration if the detector is corrected to be spherical. The spreading out of the spot due to mosaic spread and spectral dispersion in reciprocal space could be considered a Lorentz factor, but in the context of the present work, this should be part of the “partiality”. B. Questions: 1) In the test cases, the data quality for the subset of images (e.g. 2,000 for thermolysin) is clearly lower than using the entire dataset. Is there any indication of convergence when considering data quality metrics vs the number of images included, or does inclusion of all images always give the best data? 2) The Discussion section is relatively brief. Even with the improved processing, the data quality falls significantly short of what would be expected for conventional SR rotation data collection. Does the analysis provide any pointers to the remaining major sources of error? 3) Figure 6 (myoglobin data): For the high resolution terms, post-refinement appears to make the data worse as judged by the R and R metrics. Why? 4) There are many fewer spots per image for thermolysin than for the other two datasets. What is the definition of a “spot” in this context? 5) It is not clear if a separate resolution limit is applied to each image during the final merging step. Can this be clarified? 6) Figure 9: What is the second peak that is clearly visible when all images are used? Perhaps it would be useful to quote the largest “noise” peak as well as that for the Zn. 7) Table 3: The hydrogenase data were collected with a seeded beam, and yet the term representing the energy dispersion γe is larger than that for thermolysin and almost as large as for the myoglobin data. Why? C. Request: The paper should have a complete list of all the parameters and symbols in the equations and their definitions (as Acta Cryst may still do and certainly used to do). Many of the parameters (such as theta(x) and theta(y)) were defined only in the figures, and it might indeed clutter the text to define each of them immediately after their first appearance in equation (1). A. Theory: 1) Sphere model for Eoch. The basic idea of postrefinement is that the ratio of the intensity of a partially recorded reflection to its value in a properly corrected and scaled reference data set is a very sensitive measure of the orientation and unit-cell parameters of the crystal and of diffracting-range parameters such as mosaic spread, energy dispersion, and range of unit-cell parameters within the diffracting volume. Equations (1) and (2) summarize the application to XFEL stills, with Eoch as the critical function requiring definition and evaluation. The authors choose a sphere model for Eoc, which is reasonable for cases in which the diffracting range is dominated by energy dispersion and unit-cell variation. For a mosaic crystal with no variation in unit cell across the diffracting volume and mosaicity higher than the energy dispersion, the reciprocal-space shape of the diffraction spot will not be spherical; it will intersect the Ewald sphere as an arc, since the Bragg angle and hence the distance of any component of the spot from the origin of reciprocal space will be (ex hypothesi) invariant, while the range of azimuthal angles of the spot on the detector will depend on the mosaic spread (assumed to be non-zero). With cryopreserved crystals, the assumption that a combination of unit-cell variation and energy dispersion dominates is almost certainly a good one, but it may not hold for tiny crystals at ambient temperature in an injected beam. Anisotropy of some of the parameters may also make other shapes a better fit. The approach in the paper is, of course, generalizable to other shapes (with much “hairier” expressions for Eoch and its derivatives). In any case, the authors should discuss the assumptions that go into the sphere approximation. The spherical model used in this work is indeed a crude approximation of the diffraction spots. In the Discussion, we now describe the factors that contribute to spot shape and the consequent limitations of our model that will need to be addressed in the future. 2) Lorentz factor. A clear discussion of Lorentz factor is important, to give the paper full archival value as a complete treatment of the intensity correction problem. Formally, there is no Lorentz factor for a still. This statement is easy to prove using the “sinc” formula given in of the cited article by . If two different relps lie precisely on the Ewald sphere, then the value of the sinc function is simply equal to the square of the number of unit cells, regardless of resolution or any other geometric factor. All that remains is the polarization (which is not a Lorentz factor) and the incident intensity, which is the same for every spot. The only terms that remain hkl-dependent are the structure factor F, and the solid angle subtended by a pixel. The latter has some semblance to a Lorentz factor, but disappears upon pixel integration if the detector is corrected to be spherical. The spreading out of the spot due to mosaic spread and spectral dispersion in reciprocal space could be considered a Lorentz factor, but in the context of the present work, this should be part of the “partiality”. We agree that there is no Lorentz correction for a stationary crystal and monochromatic beam. We had tried to follow the discussion of Kabsch (2014) on this point, but now we explicitly note that there is no Lorentz correction for a still, and have removed the discussion of the Kabsch paper on this topic. B. Questions: 1) In the test cases, the data quality for the subset of images (e.g. 2,000 for thermolysin) is clearly lower than using the entire dataset. Is there any indication of convergence when considering data quality metrics vs the number of images included, or does inclusion of all images always give the best data? In the last section of Results we now describe a comparison of thermolysin diffraction data sets merged using 2,000-12,000 images. To avoid potential differences arising from different levels of completeness, which would confound this analysis, we truncated the diffraction data at 2.6 Å to insure that each of these sets was 100% complete. Comparison of CC values, electron density maps and model R values shows that there is little improvement beyond 8,000 images. 2) The Discussion section is relatively brief. Even with the improved processing, the data quality falls significantly short of what would be expected for conventional SR rotation data collection. Does the analysis provide any pointers to the remaining major sources of error? This is an excellent question, but we do not feel that we can directly compare SR rotation and XFEL data at this point. The one comparison that we present (Figure 7) suggests that the SR data are at least somewhat better, but it is difficult to quantify. It is likely that rotation data would be better due to the ability to directly measure full reflections (at least by summation of partials) without modeling partiality, which is still a relatively crude process. However, we believe that a comparison between still diffraction data sets collected at SR and XFELs is needed to deconvolute the effect of rotation vs. other differences between SR and XFEL sources. This will be a subject of future investigation. We have added a brief discussion of these issues to the new Discussion. 3) (myoglobin data): For the high resolution terms, post-refinement appears to make the data worse as judged by the R and Rfree metrics. Why? We assume that the question refers to the 100 image subset; the full post-refined 757 image set has lower or comparable R values in all bins. The 100-image set is the minimum number of images required for successful molecular replacement and an interpretable omit map (the heme group). While the R and R values of the 757-image set (97.7% completeness) improved in all resolution shells, they improved only to approximately 1.7 Å, where the completeness drops below 90%. We observed that completeness has an impact on post-refinement procedure and the post-refined data sets. We now note this effect in the last section of Results, “Effect of completeness”. 4) There are many fewer spots per image for thermolysin than for the other two datasets. What is the definition of a “spot” in this context? We now describe the criteria for “spot” definition in the subsection headed “Preparation of the observed intensities”. 5) It is not clear if a separate resolution limit is applied to each image during the final merging step. Can this be clarified? The cctbx.xfel program applies separate resolution cutoffs on each image. This is now noted in the Results section. 6) : What is the second peak that is clearly visible when all images are used? Perhaps it would be useful to quote the largest “noise” peak as well as that for the Zn. We thank the reviewer for pointing out this feature, which turns out not to be noise. We suggest that the second anomalous peak may indicate a second zinc ion: indeed, a previous thermolysin structure (PDB ID: 1LND; Holland et al., 1995) has two zinc ions bound to the same active site, and their locations match with the anomalous peaks observed in the post-refined maps of Figure 9. We re-refined the thermolysin structure with two zinc ions (the refined B-factors for the first zinc ion with occupancy 1.0 and the second zinc ion with occupancy 0.5 are 24.4 and 30.9, respectively). Interestingly, adding the second zinc ion resulted in an improvement of difference density of the dipeptide near the zinc sites (see Table 4 for updated refinement statistics). Thus, in addition to the missing dipeptide in the original structure (PDB ID: 4OW3; Hattne et al., 2014), this adds another feature that was not clearly visible before the post-refinement procedures. 7) : The hydrogenase data were collected with a seeded beam, and yet the term representing the energy dispersion γe is larger than that for thermolysin and almost as large as for the myoglobin data. Why? We mistakenly thought that these data had been measured with a seeded beam, as a single energy value was present in the header records of each frame. However, after conferring with the experimental team that collected the data (Cohen et al., 2014), we discovered that in fact the data were collected with the usual SASE spectrum; there was a hardware problem that prevented recording the energy spectrum per frame. We have revised our manuscript accordingly. C. Request: The paper should have a complete list of all the parameters and symbols in the equations and their definitions (as Acta Cryst may still do and certainly used to do). Many of the parameters (such as theta(x) and theta(y)) were defined only in the figures, and it might indeed clutter the text to define each of them immediately after their first appearance in . We have added a full list of parameters and symbols with their definitions in the Notation section.

28 in total

1. Potential for biomolecular imaging with femtosecond X-ray pulses.

Authors: R Neutze; R Wouts; D van der Spoel; E Weckert; J Hajdu
Journal: Nature Date: 2000-08-17 Impact factor: 49.962

2. Changes to crystals of Escherichia coli beta-galactosidase during room-temperature/low-temperature cycling and their relation to cryo-annealing.

Authors: Douglas H Juers; Jeffrey Lovelace; Henry D Bellamy; Edward H Snell; Brian W Matthews; Gloria E O Borgstahl
Journal: Acta Crystallogr D Biol Crystallogr Date: 2007-10-17

3. Nanoflow electrospinning serial femtosecond crystallography.

Authors: Raymond G Sierra; Hartawan Laksmono; Jan Kern; Rosalie Tran; Johan Hattne; Roberto Alonso-Mori; Benedikt Lassalle-Kaiser; Carina Glöckner; Julia Hellmich; Donald W Schafer; Nathaniel Echols; Richard J Gildea; Ralf W Grosse-Kunstleve; Jonas Sellberg; Trevor A McQueen; Alan R Fry; Marc M Messerschmidt; Alan Miahnahri; M Marvin Seibert; Christina Y Hampton; Dmitri Starodub; N Duane Loh; Dimosthenis Sokaras; Tsu-Chien Weng; Petrus H Zwart; Pieter Glatzel; Despina Milathianaki; William E White; Paul D Adams; Garth J Williams; Sébastien Boutet; Athina Zouni; Johannes Messinger; Nicholas K Sauter; Uwe Bergmann; Junko Yano; Vittal K Yachandra; Michael J Bogan
Journal: Acta Crystallogr D Biol Crystallogr Date: 2012-10-18

Review 4. X-ray free electron lasers motivate bioanalytical characterization of protein nanocrystals: serial femtosecond crystallography.

Authors: Michael J Bogan
Journal: Anal Chem Date: 2013-03-21 Impact factor: 6.986

5. Femtosecond protein nanocrystallography-data analysis methods.

Authors: Richard A Kirian; Xiaoyu Wang; Uwe Weierstall; Kevin E Schmidt; John C H Spence; Mark Hunter; Petra Fromme; Thomas White; Henry N Chapman; James Holton
Journal: Opt Express Date: 2010-03-15 Impact factor: 3.894

6. Features and development of Coot.

Authors: P Emsley; B Lohkamp; W G Scott; K Cowtan
Journal: Acta Crystallogr D Biol Crystallogr Date: 2010-03-24

7. New Python-based methods for data processing.

Authors: Nicholas K Sauter; Johan Hattne; Ralf W Grosse-Kunstleve; Nathaniel Echols
Journal: Acta Crystallogr D Biol Crystallogr Date: 2013-06-18

8. Better models by discarding data?

Authors: K Diederichs; P A Karplus
Journal: Acta Crystallogr D Biol Crystallogr Date: 2013-06-15

9. MolProbity: all-atom structure validation for macromolecular crystallography.

Authors: Vincent B Chen; W Bryan Arendall; Jeffrey J Headd; Daniel A Keedy; Robert M Immormino; Gary J Kapral; Laura W Murray; Jane S Richardson; David C Richardson
Journal: Acta Crystallogr D Biol Crystallogr Date: 2009-12-21

10. Phaser crystallographic software.

Authors: Airlie J McCoy; Ralf W Grosse-Kunstleve; Paul D Adams; Martyn D Winn; Laurent C Storoni; Randy J Read
Journal: J Appl Crystallogr Date: 2007-07-13 Impact factor: 3.304

39 in total

1. The New Macromolecular Femtosecond Crystallography (MFX) Instrument at LCLS.

Authors: Sébastien Boutet; Aina Cohen; Soichi Wakatsuki
Journal: Synchrotron Radiat News Date: 2016-02-01

Review 2. A Bright Future for Serial Femtosecond Crystallography with XFELs.

Authors: Linda C Johansson; Benjamin Stauch; Andrii Ishchenko; Vadim Cherezov
Journal: Trends Biochem Sci Date: 2017-07-18 Impact factor: 13.807

3. Solving XFEL's image problem.

Authors: Stéphane Larochelle
Journal: Nat Methods Date: 2015-05 Impact factor: 28.547

4. Membrane protein structure determination by SAD, SIR, or SIRAS phasing in serial femtosecond crystallography using an iododetergent.

Authors: Takanori Nakane; Shinya Hanashima; Mamoru Suzuki; Haruka Saiki; Taichi Hayashi; Keisuke Kakinouchi; Shigeru Sugiyama; Satoshi Kawatake; Shigeru Matsuoka; Nobuaki Matsumori; Eriko Nango; Jun Kobayashi; Tatsuro Shimamura; Kanako Kimura; Chihiro Mori; Naoki Kunishima; Michihiro Sugahara; Yoko Takakyu; Shigeyuki Inoue; Tetsuya Masuda; Toshiaki Hosaka; Kensuke Tono; Yasumasa Joti; Takashi Kameshima; Takaki Hatsui; Makina Yabashi; Tsuyoshi Inoue; Osamu Nureki; So Iwata; Michio Murata; Eiichi Mizohata
Journal: Proc Natl Acad Sci U S A Date: 2016-10-31 Impact factor: 11.205

5. Advances in X-ray free electron laser (XFEL) diffraction data processing applied to the crystal structure of the synaptotagmin-1 / SNARE complex.

Authors: Artem Y Lyubimov; Monarin Uervirojnangkoorn; Oliver B Zeldin; Qiangjun Zhou; Minglei Zhao; Aaron S Brewster; Tara Michels-Clark; James M Holton; Nicholas K Sauter; William I Weis; Axel T Brunger
Journal: Elife Date: 2016-10-12 Impact factor: 8.140

6. Destruction-and-diffraction by X-ray free-electron laser.

Authors: Jimin Wang
Journal: Protein Sci Date: 2016-06-13 Impact factor: 6.725

Review 7. An outlook on using serial femtosecond crystallography in drug discovery.

Authors: Alexey Mishin; Anastasiia Gusach; Aleksandra Luginina; Egor Marin; Valentin Borshchevskiy; Vadim Cherezov
Journal: Expert Opin Drug Discov Date: 2019-06-11 Impact factor: 6.098

8. Structure of photosystem II and substrate binding at room temperature.

Authors: Iris D Young; Mohamed Ibrahim; Ruchira Chatterjee; Sheraz Gul; Franklin Fuller; Sergey Koroidov; Aaron S Brewster; Rosalie Tran; Roberto Alonso-Mori; Thomas Kroll; Tara Michels-Clark; Hartawan Laksmono; Raymond G Sierra; Claudiu A Stan; Rana Hussein; Miao Zhang; Lacey Douthit; Markus Kubin; Casper de Lichtenberg; Pham Long Vo; Håkan Nilsson; Mun Hon Cheah; Dmitriy Shevela; Claudio Saracini; Mackenzie A Bean; Ina Seuffert; Dimosthenis Sokaras; Tsu-Chien Weng; Ernest Pastor; Clemens Weninger; Thomas Fransson; Louise Lassalle; Philipp Bräuer; Pierre Aller; Peter T Docker; Babak Andi; Allen M Orville; James M Glownia; Silke Nelson; Marcin Sikorski; Diling Zhu; Mark S Hunter; Thomas J Lane; Andy Aquila; Jason E Koglin; Joseph Robinson; Mengning Liang; Sébastien Boutet; Artem Y Lyubimov; Monarin Uervirojnangkoorn; Nigel W Moriarty; Dorothee Liebschner; Pavel V Afonine; David G Waterman; Gwyndaf Evans; Philippe Wernet; Holger Dobbek; William I Weis; Axel T Brunger; Petrus H Zwart; Paul D Adams; Athina Zouni; Johannes Messinger; Uwe Bergmann; Nicholas K Sauter; Jan Kern; Vittal K Yachandra; Junko Yano
Journal: Nature Date: 2016-11-21 Impact factor: 49.962

9. Transcription with a laser: Radiation-damage-free diffraction of RNA Polymerase II crystals.

Authors: Guowu Lin; Simon C Weiss; Sandra Vergara; Carlos Camacho; Guillermo Calero
Journal: Methods Date: 2019-04-25 Impact factor: 3.608

Review 10. Serial Femtosecond Crystallography of G Protein-Coupled Receptors.

Authors: Benjamin Stauch; Vadim Cherezov
Journal: Annu Rev Biophys Date: 2018-03-15 Impact factor: 12.981