Literature DB >> 29652263

X-ray and UV radiation-damage-induced phasing using synchrotron serial crystallography.

Nicolas Foos1, Carolin Seuring2, Robin Schubert3, Anja Burkhardt4, Olof Svensson1, Alke Meents2, Henry N Chapman2, Max H Nanao1.   

Abstract

Specific radiation damage can be used to determine phases de novo from macromolecular crystals. This method is known as radiation-damage-induced phasing (RIP). One limitation of the method is that the dose of individual data sets must be minimized, which in turn leads to data sets with low multiplicity. A solution to this problem is to use data from multiple crystals. However, the resulting signal can be degraded by a lack of isomorphism between crystals. Here, it is shown that serial synchrotron crystallography in combination with selective merging of data sets can be used to determine high-quality phases for insulin and thaumatin, and that the increased multiplicity can greatly enhance the success rate of the experiment.

Entities:  

Keywords:  experimental phasing; genetic algorithms; radiation damage; radiation-damage-induced phasing; synchrotron serial crystallography

Mesh:

Substances:

Year:  2018        PMID: 29652263      PMCID: PMC5892880          DOI: 10.1107/S2059798318001535

Source DB:  PubMed          Journal:  Acta Crystallogr D Struct Biol        ISSN: 2059-7983            Impact factor:   7.652


Introduction

Radiation induces many changes in macromolecular crystals. Amongst these is a reduction in occupancy or the movement of atoms, which is referred to as specific radiation damage. Specific radiation damage can be induced by X-ray or UV light and affects metals, Sγ atoms in disulfides, thiol linkages and terminal O atoms in carboxylates (with the latter only being induced by X-rays; Ravelli & McSweeney, 2000 ▸; Burmeister, 2000 ▸; Weik et al., 2000 ▸; Pattison & Davies, 2006 ▸). Specific radiation damage can be of major concern to practitioners of macromolecular crystallography (MX), but in some cases such damage can be used to determine phases experimentally (Ravelli et al., 2003 ▸, 2005 ▸; Zwart et al., 2004 ▸; Banumathi et al., 2004 ▸; Weiss et al., 2004 ▸; Schiltz et al., 2004 ▸; Ramagopal et al., 2005 ▸; de Sanctis & Nanao, 2012 ▸; de Sanctis et al., 2016 ▸). This technique is called radiation-damage-induced phasing (RIP) and, by analogy to single isomorphous replace­ment (SIR), two data sets are used to calculate differences in structure factors (between damaged and less damaged states). However, unlike in SIR, no soaking of heavy atoms is required. If the decrease in occupancy at specific sites is large enough and global radiation damage has been minimized, the positions of radiation damage can be determined. UV RIP generally has the advantage of inducing less general global radiation damage compared with X-ray RIP (Nanao & Ravelli, 2006 ▸; de Sanctis et al., 2016 ▸). When performed on a single crystal or indeed at the same position of single crystals, RIP has the advantage of relatively high isomorphism between the damaged and undamaged data sets. This is a key difference between RIP and traditional isomorphous methods, in which the experiment is performed on different crystals and the introduction of a heavy atom frequently introduces non-isomorphism. Depending on the ratio of specific to global damage, the number of sites and their susceptibility, a wide range of relative changes to intensities can be expected. Initial estimates of the maximal signal based on Crick & Magdoff (1956 ▸) suggested that even modest reductions to occupancies of 26% for six disulfide S atoms could lead to changes in intensity of 10% at 2θ = 0 (Crick & Magdoff, 1956 ▸; Ravelli et al., 2003 ▸). In practice, a wide range of R values between damaged and undamaged data sets have been observed: up to 14% overall for trypsin despite low (∼4%) internal R values (Nanao et al., 2005 ▸). This differentiates RIP from the other dominant phasing method based on endogenous chemical groups: long-wavelength sulfur SAD. Thus, the potentially high signal and the lack of a requirement for chemical modification of crystals provides a potentially useful alternative method to traditional isomorphous and anomalous methods. However, one key limitation of X-ray and UV RIP approaches is that a minimum of two complete data sets must normally be collected. Two solutions to this limitation are to collect one large data set and subdivide it into two sub-data sets in a ‘segmented RIP’ analysis (de Sanctis & Nanao, 2012 ▸) or to model specific damage as a function of dose, as in SHARP (Schiltz et al., 2004 ▸; Schiltz & Bricogne, 2008 ▸, 2010 ▸). In segmented RIP, one collects a large high total dose data set, and the first images collected are treated as a low-damage data set and the last images are treated as a damaged data set. Finally, in cases of large crystals, multiple positions can be collected from a single crystal, allowing the measurement of one complete low-damage data set prior to UV/X-ray exposure. However, the utility of this approach is limited by the trend towards smaller crystals, as well as by intra-crystal non-isomorphism. In UV RIP experiments, the amount of damage depends on the UV source, on the composition of the unit cell and on the crystal volume. In particular, the limited light-penetration depth in macromolecular crystals is a significant challenge to the homogenous illumination of larger crystals. Thus, using small crystals has significant advantages if complete data sets can be collected. While penetration depth is not an issue for X-ray damage, improvements to phasing can be expected if high-multiplicity data sets can be collected. To this end, we have employed recent developments in synchrotron serial crystallography (SSX) to greatly increase the recorded signal at a given dose by combining data from multiple crystals (Diederichs & Wang, 2017 ▸). A major challenge in implementing SSX-RIP is to efficiently deal with non-isomorphism between crystals. Simulated diffraction patterns for free-electron laser serial femtosecond crystallography (SFX), where there is no rotation during exposure, have indicated that such an approach is possible, but it has not yet been demonstrated experimentally (Galli, Son, White et al., 2015 ▸; Galli, Son, Barends et al., 2015 ▸). Here, we show for the first time that SSX can be used to successfully phase macromolecular crystals of thaumatin and insulin de novo by X-ray RIP and UV RIP, and explore the relationship between dose, multiplicity and RIP signal.

Methods

Crystallization

The thaumatin crystals used for the X-ray RIP experiment were prepared as described in Nanao et al. (2005 ▸). The cubic insulin crystals used for the UV RIP experiment were obtained from porcine insulin purchased from Sigma–Aldrich (catalogue No. I-5523). Crystals of cubic zinc-free insulin were grown via hanging-drop vapour diffusion by mixing 4.5 µl protein solution at a concentration of 1.5 mg ml−1 in 0.05 M sodium phosphate, 0.01 M ethylenediaminetetraacetate tri­sodium salt (Na3EDTA), pH 10.4–10.8 with 1.5 µl reservoir solution [0.05 M sodium phosphate buffer, 0.01 M Na3EDTA, 20%(v/v) ethylene glycol pH 10.4]. The mixture was equilibrated against 500 µl reservoir solution. Single crystals of ∼6 × 6 × 6 µm in size were obtained after 1–2 days at 298 K.

Crystal harvesting

Thaumatin samples were prepared using a buffer with glycerol as a cryoprotectant at a final concentration of 20%, and the crystal slurry of ∼20 × 20 × 20 µm crystals was then harvested with micro-meshes (MicroMeshes with 10 µm holes; MiTeGen catalogue No. M3-L18SP-10). Cubic insulin was directly harvested on silicon chips (Supplementary Fig. S1; Roedig et al., 2016 ▸, 2017 ▸).

Data collection

Data were collected at 100 K using a Dectris PILATUS3 2M detector on the ID23-2 microfocus beamline (fixed energy 14.2 keV) at the European Synchrotron Radiation Facility (Flot et al., 2010 ▸). UV illumination of insulin crystals was performed using high-power UV-LEDs as described by de Sanctis et al. (2016 ▸). Data collection was performed using the MeshAndCollect workflow (Zander et al., 2015 ▸). The RIP workflow uses this approach, but performs multiple collections at each position identified from the diffractive map. No explicit X-ray burn was implemented in this workflow, and the data-collection parameters were 100 frames of 0.1° oscillation with 30 ms exposure time at 8.74 × 1010 photons s−1 and 0.8728 Å wavelength with a beam size of 10 × 8 µm, chosen such that the approximate dose regime would reach 1–4 MGy over the course of six data collections. The dose regime was estimated with RADDOSE3D based on the crystal dimensions and photon flux (Zeldin et al., 2013 ▸). This particular range was chosen based on previous work, which showed that the RIP signal is optimal at ∼2 MGy (Bourenkov & Popov, 2010 ▸; de Sanctis & Nanao, 2012 ▸). In each successive exposure 100 frames of 0.1° oscillation were collected, resulting in 10° sub-data sets. The same oscillation range was used for each sub-data set. The first exposure was then used as the ‘before’ data set and subsequent exposures as the ‘after’ data set. The ‘before’ data set, while not damage-free, has the lowest dose. The ‘after’ data set is the highest dose, most damaged data set.

Data processing

Data reduction was performed using XDS (Kabsch, 2010 ▸) through the GreNAdeS automated pipeline at the ESRF (Monaco et al., 2013 ▸) and data were reprocessed using the REFERENCE_DATASET keyword. All diffraction images have been deposited with Zenodo (https://doi.org/10.5281/zenodo.1035765). Because even in these model systems there can be some variation in data quality and isomorphism between the sub-data sets, selection of only some of the sub-data sets for merging was performed. This was performed using the CODGAS genetic algorithm (GA; Zander et al., 2016 ▸). CODGAS applies principles of biological natural selection in order to select which sub-data sets to merge, based on a target function which is composed of merging statistics [for example 〈I/σ(I)〉, R meas, CC1/2 and completeness]. Different potential merging solutions are randomly generated using default target-function weights, followed by rounds of optimization by maximizing the target function.

Substructure determination

Each pair of data sets (‘before’ and ‘after’) was then treated as a standard RIP experiment, varying the scaling (K) of the before and after data sets in SHELXC, which offers a native implementation of the RIP phasing strategy, as described in Nanao et al. (2005 ▸), Ravelli et al. (2005 ▸) and Sheldrick (2010 ▸). Varying the scaling (K) and running SHELXC/D/E was performed using a Perl script. The sampling of K was from 0.97 to 1.01 in increments of 0.00211. Substructure determination was performed in SHELXD using NTRY 5000, SHEL 500 2.2 and FIND 9 for thaumatin, and NTRY 5000, SHEL 500 2.0 and FIND 6 for insulin. The high-resolution limits were chosen based on the resolution at which 〈d′/σ(d′)〉 drops below 1.5.

Phasing and phase improvement

Phasing and phase improvement was performed in SHELXE using solvent flattening and five cycles of autobuilding (Sheldrick, 2010 ▸; Thorn & Sheldrick, 2013 ▸).

Refinement and a posteriori analysis

ANODE (Thorn & Sheldrick, 2011 ▸) was used for the determination of F o − F o model-phased RIP difference electron-density map peak heights. For both this calculation and the evaluation of phase errors, a refined atomic model was used. The refinement procedure was as follows. Molecular replacement was performed using MOLREP (Vagin & Teplyakov, 2010 ▸) with PDB entry 5fgt for thaumatin and PDB entry 9ins for insulin. The models were rebuilt manually in Coot and then refined using BUSTER (Emsley et al., 2010 ▸; Bricogne et al., 2011 ▸). The final refinement step was performed with the PDB_REDO webserver (Joosten et al., 2014 ▸) in both cases. The weighted mean phase errors (wMPE) were calculated using SHELXE with the -x option and the same refined model as was used in ANODE (Sheldrick, 2010 ▸). The substructure correctness was calculated with phenix.emma (with default parameters, except for ‘tolerance’, which was set to 1.5 Å), using a reference pseudo-atom substructure which was generated by ANODE with the F A data from SHELXC in RIP mode (Adams et al., 2010 ▸; Thorn & Sheldrick, 2011 ▸).

Results

Data quality

Each data set acquired from both thaumatin and insulin microcrystals in the MeshAndCollect workflow was merged using CODGAS to obtain complete data sets. The high-resolution limit was chosen based on the bin with a CC1/2 higher than 25% (Karplus & Diederichs, 2012 ▸). The merging statistics indicated that all ‘before’ and ‘after’ data sets are of high quality, with high completeness, high CC1/2, high 〈I/σ(I)〉 and low R meas values (Tables 1 ▸ and 2 ▸). The variation in the numbers of sub-data sets selected for each cases (Expo. X or Before_X, After_X) results from the stochastic nature of the GA initialization. In the thaumatin cases, the increasing number of sub-data sets used to obtain a full data set might be due to degradation of the individual sub-data-set quality owing to nonspecific radiation damage, i.e. more sub-data sets are required for equivalent data quality. The lack of completeness at low resolution (inner shell) of Expo. 5 and Expo. 6 for thaumatin could be attributed to an orientation bias of the crystal because of the sample holder that was used and the fact that only small oscillations are performed. High-resolution limits were selected based on the statistics of the last data set (‘Expo. 6’ for thaumatin and ‘After_UV’ for insulin), and the same resolution limits were used for all other data sets.
Table 1

Thaumatin X-ray RIP sub-data-set data-collection statistics

Exposures (Expo.) 1–6 were obtained by successive data collection executed through a single diffractive map determined by the MeshAndCollect workflow.

Data-set nameExpo. 1Expo. 2Expo. 3Expo. 4Expo. 5Expo. 6
Space group P41212 P41212 P41212 P41212 P41212 P41212
a, b, c (Å)58.31, 58.31, 150.9858.33, 58.33, 151.1358.42, 58.42, 151.0658.43, 58.43, 151.2158.52, 58.52, 151.3458.31, 58.31, 150.96
α, β, γ (°)90, 90, 9090, 90, 9090, 90, 9090, 90, 9090, 90, 9090, 90, 90
Cumulative dose per sub-data set (MGy)0.721.161.742.322.903.48
No. of sub-data sets (100 crystals collected)222524363332
Resolution range (Å)
 Overall100–1.40100–1.40100–1.40100–1.40100–1.40100–1.40
 Inner shell100–6.26100–6.26100–6.26100–6.26100–6.26100–6.26
 Outer shell1.44–1.401.44–1.401.44–1.401.44–1.401.44–1.401.44–1.40
Total No. of reflections
 Inner shell96011057510527157441477713935
 Overall806228915459878916132455212192891172721
 Outer shell577676555762992947688734783749
No. of unique reflections
 Overall522435011650825524784849348745
 Inner shell706653677707548632
 Outer shell378836563670379837083535
Completeness (%)
 Inner shell99.992.495.699.676.989.4
 Outer shell100.096.296.399.796.893.4
 Overall99.995.796.899.891.993.3
Multiplicity
 Inner shell13.5916.1915.5522.2726.9622.04
 Outer shell15.2517.9317.1624.9523.5523.69
 Overall15.3418.2617.2925.2425.1424.06
R merge (%)
 Inner shell5.65.46.15.55.65.2
 Outer shell224.0218.2222.6262.7262.4276.1
 Overall18.121.217.620.318.720.5
R meas (%)
 Inner shell5.85.55.65.65.85.3
 Outer shell231.8224.4229.2268.0268.0281.8
 Overall18.721.818.120.719.120.9
I/σ(I)〉
 Inner shell36.6139.2941.9145.2855.7248.90
 Outer shell1.061.221.221.101.161.10
 Overall10.5911.8712.1512.7113.5813.38
CC1/2 (%)
 Inner shell99.9*99.9*99.9*99.9*99.9*99.9*
 Outer shell28.1*35.2*38.7*28.2*31.6*33.1*
 Overall99.8*99.7*99.9*99.8*99.9*99.9*
Anomalous correlation coefficient
 Inner shell914313514
 Outer shell0−20−13−1
 Overall−1−20011
SigAno
 Inner shell0.8000.8930.8440.9450.8960.978
 Outer shell0.6600.6630.6820.6640.6880.651
 Overall0.7690.7730.7840.7800.7900.785

R merge = and R meas = .

CC1/2 values that are significant at the 0.1% level are marked by an asterisk.

Table 2

Cubic insulin UV RIP sub-data-set data-collection statistics

Before_UV.1 is the first data set obtained before UV-light exposure. Before_UV.2 is a second data set, without UV light to control for the effects of X-ray damage. After_UV is the data set obtained after UV-light exposure. Note that not all data sets from the MeshAndCollect procedure were used. For each final data set, the selection of sub-data sets to merge was performed using a genetic algorithm.

Data-set nameBefore_UV.1Before_UV. 2After_UV
Space group I213 I213 I213
a, b, c (Å)78.92, 78.92, 78.9278.78, 78.78, 78.7878.88, 78.88, 78.88
α, β, γ (°)90, 90, 9090, 90, 9090, 90, 90
Cumulative dose per sub-data set (MGy)0.430.861.29
No. of sub-data sets917688
Resolution range (Å)
 Overall100–1.4100–1.4100–1.5
 Inner shell100–6.26100–6.26100–6.71
 Outer shell1.44–1.401.44–1.401.54–1.50
Total No. of reflections
 Inner shell188501556214660
 Overall161657213331651219845
 Outer shell1210809871091770
No. of unique reflections
 Inner shell355356295
 Overall313153127525409
 Outer shell235823361926
Completeness (%)
 Inner shell99.7100.0100.0
 Outer shell100.0100.0100.0
 Overall100.0100.0100.0
Multiplicity
 Inner shell53.0943.7149.69
 Outer shell51.3442.2547.65
 Overall51.6242.6248.08
R merge (%)
 Inner shell12.713.518.0
 Outer shell318.4394.6522.6
 Overall22.525.150.0
R meas (%)
 Inner shell12.813.718.2
 Outer shell321.6399.4528.1
 Overall22.725.450.5
I/σ(I)〉
 Inner shell52.6347.6236.45
 Outer shell2.211.641.62
 Overall18.6716.4713.45
CC1/2 (%)
 Inner shell100.0*99.7*99.8*
 Outer shell71.0*58.6*57.1*
 Overall99.9*99.8*99.8*
Anomalous correlation coefficient
 Inner shell292410
 Outer shell−71−2
 Overall12−1
SigAno
 Inner shell1.1991.0820.943
 Outer shell0.7060.7230.696
 Overall0.8210.8170.783

CC1/2 values that are significant at the 0.1% level are marked by an asterisk.

For each final data set, the selection of which sub-data sets to merge was performed using a genetic algorithm. This accounts for some of the variability in the statistics between successive data sets. Furthermore, because some orientations of crystals are preferred because of the harvesting method (crystals mounted on meshes), this can lead to lower completeness in some cases. For later data sets this, in combination with the fact that completeness is weighted less heavily than 〈I/σ(I)〉 and R meas in the GA, led to a reduction in the completeness (in all resolution shells), but with a concomitant increase in multiplicity and 〈I/σ(I)〉. This could be owing to crystals in less common orientations not being selected by the GA because of lower average 〈I/σ(I)〉 values resulting from radiation damage. Examination of sub-data sets included in Expo. 1 but missing in Expo. 5 and Expo. 6 indeed revealed lower 〈I/σ(I)〉 values and higher R meas values.

RIP signal

The dispersive signal increases as a function of dose (Supplementary Fig. S2). This is an important metric of the RIP signal, but we have focused our analysis on RIP peak heights, which are a more sensitive indicator of the intensity of the RIP signal. It should be emphasized that this is a ‘post mortem’ analysis, which requires a high-quality phase set. In order to determine RIP peak heights, model phases are used to calculate an F before − F after difference map using the scaled F A (the structure-factor amplitudes for the substructure atoms) values from SHELXC. This difference map is then searched for peaks. The location of the peaks reveals which atoms in the structure are damaged, and the peak height indicates the magnitude of the damage and thereby the strength of the RIP signal. In the thaumatin X-ray RIP experiment, the strongest peaks can be found over the Cys126 S atom. Fig. 1 ▸ depicts the average maximum peak heights as a function of dose. A large amount of RIP signal is present, even at relatively modest doses (for example 1.16 MGy). This signal increases dramatically when the dose is increased to 1.74 and 2.32 MGy, but only modest gains are observed above this dose (Figs. 1 ▸ a and 2 ▸ a–2 ▸ e). Negative peaks can also occur in a RIP difference map, which correspond to the shifting of atoms to new positions. A well known example of this is the movement of the Sγ position in a disulfide bond to a new position. These negative peaks are generally of a lower magnitude than the positive peaks, probably because when an Sγ is in a disulfide there are fewer possible rotamers than without the thiol linkage. Inspection of negative peaks in the difference map nevertheless also reveals large peaks: up to 14.24 standard deviations above the mean difference density (Figs. 2 ▸ f–2 ▸ j). Although there was no evidence of anomalous signal in the merging statistics, we calculated anomalous peak heights using ANODE but found that there were no peaks above 4.8 standard deviations above the mean density value. Therefore, no RIPAS (RIP with anomalous scattering) analysis was performed. For the UV RIP experiment, in order to distinguish between X-ray and UV damage, a second set of sub-data sets was collected before UV exposure (control). The average RIP peak height between the first two X-ray data sets (Before_UV.1 and Before_UV.2 in Table 2 ▸) was 4.24 standard deviations above the mean, showing that there was very little X-ray radiation damage between these data sets (Figs. 3 ▸ b and 3 ▸ d). However, comparing the third data set (After_UV in Table 2 ▸, which occurred after UV-LED exposure and had the same data-collection parameters and dose as the previous two data sets) with the first data set (Before_UV.1) revealed significant peaks in the RIP maps (Figs. 3 ▸ a and 3 ▸ c). The maximum and minimum peak heights were 23.34 and −8.99 standard deviations above the mean, respectively, with the largest differences over Cys7 and Cys20 around the Cys Sγ atom. As in the X-ray RIP experiment, there was very little anomalous signal, with the highest peak being 6.7 standard deviations above the mean density value.
Figure 1

RIP peak height as a function of dose in thaumatin. (a) Maximum and (b) minimum peak heights in the model-phased F before − F after difference electron-density map in standard deviations above the mean. The point for each value corresponds to the average value of the peak height for all K values used in SHELX. The error bars represent the standard deviation of the peak height.

Figure 2

Model-phased RIP difference electron-density maps calculated for the thaumatin X-ray RIP data. (a)–(e) represent increasing dose points (Expo. 2, Expo. 3, Expo. 4, Expo. 5 and Expo. 6, respectively) subtracted from the first data set (Expo. 1). Difference density is shown as a green mesh contoured at 6σ. The disulfide bond between Cys126 and Cys177 shows the highest electron density. (f)–(j) are the same difference maps as (a)–(e) but contoured at −6.5σ in the vicinity of Cys66.

Figure 3

Model-phased RIP difference map for the cubic insulin UV-RIP experiment. Positive difference electron density contoured at 6σ is represented as a green mesh surrounding cysteine S atoms. Negative electron-density difference is contoured at −5σ as a red mesh in the vicinity of cysteine S atoms. (a) and (c) are RIP difference maps calculated between data sets Before_UV.1 and After_UV. (b) and (d) are RIP difference maps calculated between data sets Before_UV.1 and Before_UV.2, i.e before UV-light exposure. The X-­ray-only difference maps show little evidence of radiation damage, whereas the before UV illumination–post UV illumination difference map shows strong positive peaks at cysteine Sγ positions as well as a new peak appearing near Cys20.

Determination of RIP substructures can be difficult owing to the generally large number of atoms in the radiation-damage substructure. Indeed, one of the primary heuristics used in experimental phasing with SHELXD, analysis of the plot of CC(all) versus CC(weak), is of limited use for RIP except in very high signal cases (Supplementary Fig. S3). However, one metric of substructure-solution success that can be applied a posteriori is to compare experimental substructures with a pseudo-atom reference substructure. The pseudo-atom substructure was calculated with SHELXC and ANODE using the highest RIP peak heights and the refined model. Peaks above the threshold value of six standard deviations above the mean difference value are retained. This reference can then be compared with the final substructures produced by SHELXD. Comparison of the reference and the experimentally determined substructures results in a percentage correctness. For cubic insulin the reference contained six positive and negative sites, while for thaumatin there were 14 positive and negative sites. Both thaumatin (X-ray RIP) and cubic insulin (UV RIP) produced substructures that could be used to produce interpretable phases. Because we have previously shown that down-weighting of the after data-set intensities after an initial scaling can improve all steps of RIP phasing, we evaluated a range of K values (Nanao et al., 2005 ▸; de Sanctis et al., 2016 ▸; Zubieta & Nanao, 2016 ▸). Because SHELXC/D/E were conceived for pipelines, it is feasible to evaluate a large number of K values automatically via a simple script. For each K value, we determined the percentage of substructure correctness as described above, as well as its average across all K values (average substructure correctness). For cubic insulin, the average substructure correctness was 57.67% (Fig. 4 ▸). For the most favourable thaumatin dose (3.48 MGy), the average substructure correctness was 29.47% (Figs. 4 ▸ and 5 ▸). While the quality of insulin substructures was uniformly high and was relatively unaffected by the scaling factor K, the thaumatin substructures could be greatly improved by applying K values of 0.97421, 0.98474 and 0.99737, which produced 46% correct substructures compared with 6% at K = 1.01 (Fig. 4 ▸). Interestingly, despite the small differences in RIP difference-map peak height at the higher doses (Fig. 1 ▸), only the highest dose data set produced correct substructures for thaumatin (Fig. 5 ▸ and Supplementary Fig. S4). For thaumatin, we used a 〈d′/σ(d′)〉 value of 1.3–1.5 to determine the high-resolution cutoff in SHELX. However, using one of the best K values (0.97421) and re-running the same SHELXD substructure determination at different maximal resolutions, we found that the optimal resolution cutoff appeared around 2.8–3.5 Å. This corresponds to 〈d′/σ(d′)〉 values of 2–2.5 (Supplementary Fig. S5). This reinforces the notion that rather than relying solely on a cutoff based on difference statistics, it is sometimes advisable to try different resolution cutoffs. Because of the strong RIP signal in cubic insulin, the entire positive substructure was determined across all runs from 1.5 to 4.0 Å.
Figure 4

Quality of substructure determination with insulin and thaumatin data sets. The correctness of the substructure is expressed as the percentage of conserved sites in the experimental substructure compared with the reference structure (the reference model was determined by identifying peaks in a model-phased RIP difference map). Green dots correspond to the cubic insulin substructures. Blue stars correspond to thaumatin substructures for the highest dose (3.48 MGy). Below the red dashed line, the substructure correctness is less than 45%.

Figure 5

Quality of substructure determination of thaumatin as a function of X-ray dose. For each dose, the best substructure from a range of K values is compared against the reference. Calculation of the substructure correctness is performed as described previously. Below and including 2.9 MGy the substructure is not determinable.

Phase calculation

RIP phasing proceeds in a manner similar to SIR, with the major difference being the existence of negatively occupied sites. Since no substructure-determination programs can currently determine substructures that include both positively and negatively occupied sites, the full substructure must be obtained by bootstrapping. This can be performed iteratively by rounds of phase improvement and the identification of peaks (positive and negative) in difference Fourier maps. In RIP, this process can be critical because of the starting incompleteness of the substructure (Nanao et al., 2005 ▸). However, the signal in the cubic insulin UV RIP was high enough to show very little dependency on scaling K (Fig. 4 ▸), which has previously been observed for other UV RIP experiments (Nanao & Ravelli, 2006 ▸). Weighted mean phase errors (wMPEs) calculated from the phases determined in SHELXE using the final bootstrapped substructure compared with a refined model were uniformly excellent, with an average wMPE across all K of 18.5° (Fig. 6 ▸). As has previously been observed for other phasing methods, solution of the structure is likely when the correlation coefficient of the partially automatically built SHELXE model exceeds 25% and the average number of residues per fragment is greater than 10 residues. By contrast, the phase calculation for thaumatin is more sensitive to K values. At even the highest dose (3.48 MGy), only a few values yielded interpretable electron-density maps (Fig. 6 ▸). Phasing analysis was only performed at this dose point in view of this difficulty in phasing even with substructures that were approximately four times more complete than lower dose points (Fig. 5 ▸). Interestingly, despite the fact that the RIP peak height flattened out at a dose of 2.3 MGy, phasing and substructure determination were not successful at this dose or even at 2.9 MGy, but only at 3.48 MGy (Fig. 7 ▸ and Supplementary Fig. S6).
Figure 6

Phase errors of experimental phasing as a function of the scaling factor K. The wMPE is the best phase error compared with a refined model. Green dots correspond to cubic insulin and blue stars correspond to thaumatin for a dose of 3.48 MGy. The red dashed line indicates a phase error of 35°, below which maps are of excellent quality.

Figure 7

Phase errors of X-ray RIP experimental phasing of thaumatin as a function of dose.

Influence of multiplicity

Obtaining data sets with high multiplicity and completeness is at odds with controlling radiation damage. For this reason, especially in cases of small crystals and/or low symmetry, it can be difficult to obtain the two complete data sets required for X-ray RIP from a single crystal (de Sanctis & Nanao, 2012 ▸). Therefore, RIP has not been able to benefit from the advantages to phasing of high-multiplicity data sets (Usón et al., 2003 ▸; Pike et al., 2016 ▸). Because SSX RIP multiplicity is limited only by the diversity and number of crystals, SSX offers the possibility of obtaining much higher multiplicity data sets for both ‘damaged’ and ‘undamaged’ states. We therefore were interested in the effect of multiplicity on the various metrics and stages of phasing. For these analyses we started with the very high multiplicity data sets discussed earlier (thaumatin Expo. 1 and 6, and insulin Before_UV.1 and After_UV) and reduced their multiplicity incrementally to create new data sets (Tables 3 ▸ and 4 ▸). While there are many potential strategies to reduce the multiplicity, such as decreasing the number of images in each sub-data set or removing sub-data sets based on specific criteria such as I/σ(I), we have taken a practical approach to the reduction of multiplicity and randomly omitted sub-data sets. Enough data sets were removed incrementally to reduce the multiplicity by 1.5–2-fold at a time. Furthermore, in order to reduce the effects of resolution, we used the same resolution range for all data sets, even if it produced poor statistics in some of the outer resolution shells.
Table 3

Thaumatin X-ray RIP overall data-collection statistics after multiplicity reduction

Each data set has had its multiplicity artificially reduced compared with the original data set (Expo. 1–Expo. 6 in Table 1 ▸) by removing enough images to reduce the multiplicity by 1.5–2-fold. For the After series, a larger number of sub-data sets are used compared with the Before series, because the starting full data sets also required more sub-data sets to achieve 〈I/σ(I)〉 values comparable to the earlier dose points, possibly because of a degradation in data quality after X-ray damage. The same resolution ranges were used for all data sets, which caused the outer shell statistics to degrade in some cases.

 X-ray RIP
Data-set nameBefore_A Before_B Before_C Before_D Before_E Before_F After_A After_B After_C After_D After_E After_F
No. of sub-data sets141210863242220181613
Resolution range (Å)
 Overall100–1.4100–1.4100–1.4100–1.4100–1.4100–1.4100–1.4100–1.4100–1.4100–1.4100–1.4100–1.4
 Outer shell1.44–1.401.44–1.401.44–1.401.44–1.401.44–1.401.44–1.401.44–1.401.44–1.401.44–1.401.44–1.401.44–1.401.44–1.40
Total No. of reflections
 Overall513560440049366446292681219403109888879971806757733172660329586699477028
 Outer shell36745314792622420990157437911628105758752356471564192634088
No. of unique reflections
 Overall515695153450962508175030743461487424873648597485964795047937
 Outer shell370437043698367936313083353535343518351834843484
Completeness (%)
 Inner shell97.697.696.396.096.080.889.489.489.389.385.985.9
 Outer shell97.897.897.697.195.981.493.493.493.093.092.192.1
 Overall98.698.697.597.296.283.193.293.293.093.091.791.7
Multiplicity
 Inner shell8.907.716.425.123.822.2416.4715.1613.8412.4311.509.32
 Outer shell9.908.497.095.704.332.5617.7616.2914.8813.4012.039.78
 Overall9.958.537.195.764.362.5218.0516.5515.0813.5812.239.95
R merge (%)
 Inner shell5.55.55.75.45.64.65.15.15.04.95.04.8
 Outer shell201.3199.1244.9184.0205.7174.7278.9273.9285.9279.9284.5263.6
 Overall17.217.117.716.317.014.720.420.320.820.820.919.5
R meas (%)
 Inner shell5.85.96.16.06.55.75.25.25.25.25.25.1
 Outer shell212.2211.9219.8202.1232.4211.0286.5282.3295.3290.2296.2277.1
 Overall18.118.219.017.919.217.721.020.921.521.521.820.4
I/σ(I)〉
 Inner shell31.4929.1826.7023.9018.8515.1445.5341.8440.0238.6637.6435.12
 Outer shell0.980.930.850.820.610.480.940.920.850.820.770.75
 Overall9.138.517.747.055.534.2511.7211.3210.6910.279.779.20
CC1/2 (%)
 Inner shell99.8*99.8*99.8*99.8*99.7*99.4*99.9*99.9*99.9*99.9*99.9*99.8*
 Outer shell23.4*22.7*18.8*18.3*12.7*10.9*28.3*29.2*28.3*26.3*23.3*22.5*
 Overall99.7*99.7*99.6*99.5*99.3*99.2*99.8*99.8*99.8*99.8*99.8*99.8*

CC1/2 values that are significant at the 0.1% level are marked by an asterisk.

The effect of multiplicity on RIP signal and phasing

In cubic insulin, the effect of multiplicity is readily apparent. In previous RIP experiments, multiplicities of approximately fourfold to 1.5 Å resolution (Nanao et al., 2005 ▸) were typically achieved. Because multiple crystals can be used in SSX, the multiplicity can be greatly increased. In particular, we observed exponential gains in RIP peak signal, as assessed by the maximal peak height in RIP difference maps, up to 12-fold multiplicity for insulin (Figs. 8 ▸ a and 8 ▸ b) and up to sevenfold multiplicity for thaumatin (Figs. 8 ▸ c and 8 ▸ d). These gains can be seen in both positive and negative peak heights. The point of diminishing returns occurs around 25-fold multiplicity for insulin and eightfold multiplicity for thaumatin. This trend continues into phase determination. A threshold of signal strength occurs at a multiplicity of four for cubic insulin (UV RIP; Figs. 9 ▸ a and 9 ▸ b). As has been seen for other phasing methods, there is a ‘grey area’ where there is sufficient signal to determine interpretable phases, but there is not sufficient signal to determine correct RIP substructures. In other words, if the known substructure is used as a starting point in SHELXE then phasing succeeds, but obviously this is an artificial situation. For thaumatin X-ray RIP, we observed a similar shift in the multiplicity requirements for substructure determination and phasing: substructure determination and phasing required a multiplicity of six, which could be reduced to four when starting from the known substructure.
Figure 8

The effect of artificially reducing data-set multiplicity on average model-phased RIP difference-map peak height. (a) and (b) correspond to the maximum and minimum peak heights in the model-phased F before − F after difference electron-density map for the cubic insulin UV RIP data. (c) and (d) correspond to the maximum and minimum peak heights for the thaumatin X-ray RIP data. Peak heights are averaged across all K values. Red points correspond to the original data set without multiplicity reduction. The error bars represent the standard deviation of the peak heights across different K values for scaling.

Figure 9

Experimental phasing for insulin UV RIP (a) and thaumatin X-ray RIP (b) starting from the known (blue stars) or experimentally determined substructures (green circles). The best wMPE across all trials is reported compared with a refined model.

Discussion

RIP offers a complementary method to traditional anomalous and isomorphous methods for the experimental determination of phases. Although RIP can also be used in combination with anomalous and isomorphous methods, it is a useful method on its own, particularly when heavy-atom derivatization or seleno­methionine substitution is difficult. Recent advances in multiple-crystal techniques have made it practical to determine high-resolution structures from X-ray data acquired from a large number of crystals. Here, we show that a serial approach yields sufficient signal to determine phases de novo by both X-ray RIP and UV RIP for these two test systems. In this study, we have assembled low-dose and high-dose data sets independently; however, we are also exploring the possibility of improving RIP signal by optimizing both data sets simultaneously. In this way, the isomorphous signal could be improved depending on which sub-data sets are selected. Because of the relatively high symmetry of thaumatin and insulin, we have simply used the strongest sub-data set as a reference for indexing other sub-data sets during processing. However, in some cases an individual sub-data set might not contain enough reflections for this purpose. In these cases, alternate methods for indexing (and resolving indexing ambiguity) might become necessary, for example using the method developed by Brehm & Diederichs (2014 ▸). For very incomplete sub-data sets, scaling becomes impossible owing to a lack of common reflections, unless a reference data set is available. For X-ray RIP, we show that improvements can be made to the RIP signal up to 4 MGy. This suggests a guideline for the design of serial RIP experiments. For example, at the ESRF this information can be easily used in the MeshAndCollect workflow within MXCuBE (Gabadinho et al., 2010 ▸; Zander et al., 2015 ▸). Specifically, once the diffractive map has been constructed, an estimation of dose rate is provided and the user can modify not only the individual data-collection parameters but also the number of times that each position is re-collected. The user could therefore change the experimental parameters to provide 1 MGy per sub-data set, and collect each position four times. Aside from the ease of the experiment, one key advantage of the serial approach is that much higher data quality at a given dose can be achieved per final data set compared with single crystals. This increase in the number of diffraction patterns facilitates the collection of high-multiplicity data sets. High multiplicity has in turn been shown to be critical for phasing success in many types of experimental phasing, particularly SAD (Cianci et al., 2008 ▸). However, because the traditional approach to RIP calls for extremely low dose ‘before’ and ‘after’ data sets, RIP has not typically benefitted from high-multiplicity data collections. Indeed, in some cases it can be challenging even to collect two complete low-multiplicity data sets. Here, we show that the serial approach can be used to produce high-multiplicity data sets with excellent statistics and furthermore that exponential increases in RIP peak heights occur as a function of multiplicity up to a point of diminishing returns of eightfold and 25-fold multiplicity for thaumatin and insulin, respectively. As has been previously shown for single-crystal X-ray RIP, initial scaling by conventional methods followed by downscaling of the high-dose data sets can significantly improve substructure solution (Nanao et al., 2005 ▸). There still is no way to a priori find the best scale factor K, besides trying multiple K values, but the scriptability and direct support of this parameter in SHELXC makes the process straightforward. It is possible that other methods, such as adjusting the K value to maximize non-origin Patterson peak heights, might also be effective. Furthermore, while running SHELXE for each K value adds computing time, it is compensated by the calculation of two critical statistics from SHELXE: the correlation coefficient of the partially automatically built model against the native data and the average fragment size. These two parameters are highly predictive of phasing success for RIP, as with other phasing methods, and are the primary means by which one can evaluate the success of RIP in new systems. Because these are test systems, we do not yet know whether these patterns will be borne out in cases of low symmetry or large non-isomorphism. It is worth noting that we have focused on the most well known X-ray-sensitive groups: disulfides. However, in future work we hope to extend this to other radiation-sensitive atoms such as oxygen atoms in carboxylates and heavy atoms such as selenium, in which the anomalous signal can be combined with the RIP signal as previously described for single crystals (Schiltz et al., 2004 ▸; Ravelli et al., 2005 ▸). Supplementary Figures.. DOI: 10.1107/S2059798318001535/di5013sup1.pdf

Each original data set (Before_UV.1–After_UV; Table 2 ▸) has its maximal multiplicity artificially reduced compared with the starting data set. For the After series, a larger number of sub-data sets are used compared with the Before series, because the starting full data sets also required more sub-data sets to achieve 〈I/σ(I)〉 values comparable to the earlier dose points, possibly because of a degradation in data quality after X-ray damage. The same resolution ranges were used for all data sets, which caused the outer shell statistics to degrade in some cases.

 UV RIP
Data-set nameBefore_Ai Before_Bi Before_Ci Before_Di Before_Ei Before_Fi Before_Gi Before_Hi
No. of sub-data sets614121116543
Resolution range (Å)
 Overall100–1.4100–1.4100–1.4100–1.4100–1.4100–1.4100–1.4100–1.4
 Outer shell1.44–1.401.44–1.401.44–1.401.44–1.401.44–1.401.44–1.401.44–1.401.44–1.40
Total No. of reflections
 Overall1082251726413374875196084106764892327144553423
 Outer shell809525429428131312318075673353944030
No. of unique reflections
 Overall3131731317312911476629308281182615924211
 Outer shell23582358235823572279213119791857
Completeness (%)
 Inner shell100.0100.099.298.976.171.165.464.9
 Outer shell100.0100.0100.0100.097.190.884.479.2
 Overall100.0100.0100.099.893.689.883.677.4
Multiplicity
 Inner shell35.7224.0512.446.534.594.133.582.63
 Outer shell34.3323.0211.936.263.543.162.722.17
 Overall34.5623.1911.986.283.643.172.732.20
R merge (%)
 Inner shell13.812.19.48.47.67.97.86.9
 Outer shell291.9232.5170.1154.2147.6139.5148.1134.3
 Overall22.719.214.112.110.410.210.29.8
R meas (%)
 Inner shell14.012.59.89.28.59.09.18.3
 Outer shell296.3237.7177.9168.6172.3164.0177.9167.0
 Overall23.015.1914.813.212.011.912.112.0
I/σ(I)〉
 Inner shell44.1739.8629.9922.5919.3517.3915.9313.66
 Outer shell2.112.111.781.310.890.900.760.74
 Overall16.3215.1911.718.796.395.975.304.81
CC1/2 (%)
 Inner shell99.8*99.6*99.5*99.2*99.3*98.9*98.8*99.5*
 Outer shell69.0*67.0*55.538.726.428.8*27.125.2*
 Overall99.9*99.8*99.5*99.2*99.2*99.1*99.0*99.1*
 UV RIP
Data-set nameAfter_Ai After_Bi After_Ci After_Di After_Ei After_Fi After_Gi After_Hi
No. of sub-data sets614121116543
Resolution range (Å)
 Overall100–1.5100–1.5100–1.5100–1.5100–1.5100–1.5100–1.5100–1.5
 Outer shell1.54–1.501.54–1.501.54–1.501.54–1.501.54–1.501.54–1.501.54–1.501.54–1.50
Total No. of reflections
 Overall84702457286429980515675085897714225691642477
 Outer shell6338043224323078120586604547743353224
No. of unique reflections
 Overall2540625411254052538824902242012310521081
 Outer shell19251926192619261897184817621613
Completeness (%)
 Inner shell100.0100.099.799.396.390.285.878.3
 Outer shell100.0100.0100.0100.098.896.291.884.0
 Overall100.0100.0100.099.998.095.290.983.0
Multiplicity
 Inner shell34.6533.4011.936.303.523.092.572.07
 Outer shell32.9222.4511.986.263.482.962.462.00
 Overall33.3422.5411.806.173.452.952.462.01
R merge (%)
 Inner shell18.114.313.213.88.58.28.07.6
 Outer shell534.3438.7301.8315.2371.6476.4775.52747.3
 Overall50.440.626.226.122.123.425.528.3
R meas (%)
 Inner shell18.514.714.115.110.09.89.89.7
 Outer shell542.6448.8315.3344.0436.1572.2959.73514.1
 Overall51.241.627.428.625.828.031.435.9
I/σ(I)〉
 Inner shell31.5229.2322.5617.1713.1811.8610.529.39
 Outer shell1.491.481.320.950.620.520.320.12
 Overall11.8911.058.916.604.784.163.432.47
CC1/2 (%)
 Inner shell99.8*99.9*78.4*99.3*99.2*98.5*98.5*98.8*
 Outer shell53.5*51.8*43.5*22.08.78.92.22.4
 Overall99.7*99.7*94.4*96.1*98.4*97.8*97.2*97.1*

CC1/2 values that are significant at the 0.1% level are marked by an asterisk.

  41 in total

1.  Structural effects of radiation damage and its potential for phasing.

Authors:  Sankaran Banumathi; Petrus H Zwart; Udupi A Ramagopal; Miroslawa Dauter; Zbigniew Dauter
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2004-05-21

2.  Phasing in the presence of severe site-specific radiation damage through dose-dependent modelling of heavy atoms.

Authors:  M Schiltz; P Dumas; E Ennifar; C Flensburg; W Paciorek; C Vonrhein; G Bricogne
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2004-05-21

3.  Radiation-induced site-specific damage of mercury derivatives: phasing and implications.

Authors:  Udupi A Ramagopal; Zbigniew Dauter; Radhakannan Thirumuruhan; Elena Fedorov; Steven C Almo
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2005-08-16

4.  Optimization of data collection taking radiation damage into account.

Authors:  Gleb P Bourenkov; Alexander N Popov
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2010-03-24

5.  Features and development of Coot.

Authors:  P Emsley; B Lohkamp; W G Scott; K Cowtan
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2010-03-24

6.  Serial Synchrotron X-Ray Crystallography (SSX).

Authors:  Kay Diederichs; Meitian Wang
Journal:  Methods Mol Biol       Date:  2017

7.  Automatic processing of macromolecular crystallography X-ray diffraction data at the ESRF.

Authors:  Stéphanie Monaco; Elspeth Gordon; Matthew W Bowler; Solange Delagenière; Matias Guijarro; Darren Spruce; Olof Svensson; Sean M McSweeney; Andrew A McCarthy; Gordon Leonard; Max H Nanao
Journal:  J Appl Crystallogr       Date:  2013-05-15       Impact factor: 3.304

8.  Exploiting the anisotropy of anomalous scattering boosts the phasing power of SAD and MAD experiments.

Authors:  Marc Schiltz; Gérard Bricogne
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2008-06-18

9.  Towards phasing using high X-ray intensity.

Authors:  Lorenzo Galli; Sang-Kil Son; Thomas R M Barends; Thomas A White; Anton Barty; Sabine Botha; Sébastien Boutet; Carl Caleman; R Bruce Doak; Max H Nanao; Karol Nass; Robert L Shoeman; Nicusor Timneanu; Robin Santra; Ilme Schlichting; Henry N Chapman
Journal:  IUCrJ       Date:  2015-09-30       Impact factor: 4.769

10.  MeshAndCollect: an automated multi-crystal data-collection workflow for synchrotron macromolecular crystallography beamlines.

Authors:  Ulrich Zander; Gleb Bourenkov; Alexander N Popov; Daniele de Sanctis; Olof Svensson; Andrew A McCarthy; Ekaterina Round; Valentin Gordeliy; Christoph Mueller-Dieckmann; Gordon A Leonard
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2015-10-31
View more
  3 in total

1.  Single-support serial isomorphous replacement phasing.

Authors:  Nicolas Foos; Mahmoud Rizk; Max H Nanao
Journal:  Acta Crystallogr D Struct Biol       Date:  2022-05-09       Impact factor: 5.699

Review 2.  MicroED in natural product and small molecule research.

Authors:  Emma Danelius; Steve Halaby; Wilfred A van der Donk; Tamir Gonen
Journal:  Nat Prod Rep       Date:  2020-09-17       Impact factor: 13.423

3.  Current status and future opportunities for serial crystallography at MAX IV Laboratory.

Authors:  Anastasya Shilova; Hugo Lebrette; Oskar Aurelius; Jie Nan; Martin Welin; Rebeka Kovacic; Swagatha Ghosh; Cecilia Safari; Ross J Friel; Mirko Milas; Zdenek Matej; Martin Högbom; Gisela Brändén; Marco Kloos; Robert L Shoeman; Bruce Doak; Thomas Ursby; Maria Håkansson; Derek T Logan; Uwe Mueller
Journal:  J Synchrotron Radiat       Date:  2020-08-21       Impact factor: 2.616

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.