| Literature DB >> 25664746 |
Oliver B Zeldin1, Aaron S Brewster2, Johan Hattne3, Monarin Uervirojnangkoorn1, Artem Y Lyubimov1, Qiangjun Zhou1, Minglei Zhao1, William I Weis1, Nicholas K Sauter2, Axel T Brunger1.
Abstract
Ultrafast diffraction at X-ray free-electron lasers (XFELs) has the potential to yield new insights into important biological systems that produce radiation-sensitive crystals. An unavoidable feature of the `diffraction before destruction' nature of these experiments is that images are obtained from many distinct crystals and/or different regions of the same crystal. Combined with other sources of XFEL shot-to-shot variation, this introduces significant heterogeneity into the diffraction data, complicating processing and interpretation. To enable researchers to get the most from their collected data, a toolkit is presented that provides insights into the quality of, and the variation present in, serial crystallography data sets. These tools operate on the unmerged, partial intensity integration results from many individual crystals, and can be used on two levels: firstly to guide the experimental strategy during data collection, and secondly to help users make informed choices during data processing.Entities:
Keywords: Data Exploration Toolkit; X-ray free-electron lasers; ultrafast diffraction
Mesh:
Year: 2015 PMID: 25664746 PMCID: PMC4321488 DOI: 10.1107/S1399004714025875
Source DB: PubMed Journal: Acta Crystallogr D Biol Crystallogr ISSN: 0907-4449
The five command-line applications in the Data Exploration Toolkit
|
| Uses hierarchical clustering to visualize and cluster the unit cells output from the integration step. |
|
| Visualizes the orientational distribution of the real-space crystal axes, revealing any bias that may be present. Laboratory-frame orientations of the |
|
| Aggregates data of partial intensities over all images. Generates a scatter plot of slope and intercept for all images, and histograms of gradient and standard errors on the fits. Also creates a super-plot of all of the partial log intensities |
|
| Generates a plot of log( |
|
| Convenience utility to provide an aggregate of the unit cell, orientations and intensity histograms in a single frame with a single command. |
Figure 1Hierarchical clustering of the test data, with both a linear (top) and a log (bottom) y axis. Each branch of the tree below the threshold (5000 Å2) is defined as a cluster and colored individually. Single-element clusters are labeled in blue. The two crystal forms are shown in green and black, representing the long-cell and short-cell forms, respectively. The median of each cluster can then be used as a target in order to obtain significantly higher indexing rates.
Figure 2Orientational distribution of real-space axes (a axis, top; b, middle; c, bottom) from the test sample, showing a significant amount of bias, probably owing to the preferred orientations of a crystal within the loop. The unit vector describing the direction orientation of the crystal axis is projected onto a unit sphere, and its orientation relative to the beam is shown in terms of latitude/longitude for ease of interpretation. Thus (0, 0) is along the beam, North/South represent up/down and East/West represent right/left. This tool is complementary to the PHENIX reflection viewer (phenix.data_viewer; Adams et al., 2010 ▶), which visualizes any missing wedges in reciprocal space, since it provides a direct reference back to the laboratory frame, allowing experimental adjustments to be made where possible. For each crystal, the laboratory-frame direction of the three real-space axes is shown as a yellow spot. The averaged density of real space axes is added to help in the interpretation of trends.
Figure 3Intensity statistics for the partial unmerged data integrated without a target cell. The main plot shows a monotonically decreasing trend for intensities as a function of increasing scattering angle. The plots on the left, aggregated from the per-frame pseudo-Wilson plots (examples are shown in Supplementary Fig. S2), show a number of outliers, consistent with the presence of outliers in Fig. 1 ▶. A similar plot, but for when the long cell was used as a crystal target, exhibits fewer outliers and is shown as Supplementary Fig. S3. Extreme values of either G (scale factor; intercept in the main plot) or −2B (the slope in the right-hand plot) may be caused by mis-indexing, and a filter to remove these can be applied by using the clustering toolkit from within a Python script within the cctbx.xfel environment.