| Literature DB >> 32128347 |
Filip Leonarski1, Aldo Mozzanica1, Martin Brückner1, Carlos Lopez-Cuenca1, Sophie Redford1, Leonardo Sala1, Andrej Babic1, Heinrich Billich1, Oliver Bunk1, Bernd Schmitt1, Meitian Wang1.
Abstract
In this paper, we present a data workflow developed to operate the adJUstiNg Gain detector FoR the Aramis User station (JUNGFRAU) adaptive gain charge integrating pixel-array detectors at macromolecular crystallography beamlines. We summarize current achievements for operating at 9 GB/s data-rate a JUNGFRAU with 4 Mpixel at 1.1 kHz frame-rate and preparations to operate at 46 GB/s data-rate a JUNGFRAU with 10 Mpixel at 2.2 kHz in the future. In this context, we highlight the challenges for computer architecture and how these challenges can be addressed with innovative hardware including IBM POWER9 servers and field-programmable gate arrays. We discuss also data science challenges, showing the effect of rounding and lossy compression schemes on the MX JUNGFRAU detector images.Entities:
Year: 2020 PMID: 32128347 PMCID: PMC7044001 DOI: 10.1063/1.5143480
Source DB: PubMed Journal: Struct Dyn ISSN: 2329-7778 Impact factor: 2.920
FIG. 1.Data flow envisioned for kilohertz framerate JUNGFRAU detectors at the Swiss Light Source MX beamlines. Blue arrows represent the flow of x-ray images (the most throughput critical), red arrows the flow of metadata, and the yellow arrow the flow of sensor information.
Summary of data rates in GB/s for large format JUNGFRAU detectors used for macromolecular crystallography at the Paul Scherrer Institute.
| Application | Detector size (Mpixel) | Number of modules | Frame rate (kHz) | Data rate (GB/s) |
|---|---|---|---|---|
| SwissFEL | 16 | 32 | 0.1 | 3.4 |
| Swiss Light Source (2018) | 4 | 8 | 1.1 | 9.2 |
| Swiss Light Source (2021) | 10 | 20 | 2.2 | 46.1 |
FIG. 2.Pseudo-code for the JUNGFRAU data conversion procedure without frame summation.
FIG. 3.Roofline analysis is a method to compare the performance of a current implementation (loop, function) with the best possible for given hardware. Two values are taken into account—arithmetic intensity, i.e., the number of floating-point operations per volume of data (X-axis) and performance, i.e., the number of floating-point operations per unit of time (Y-axis). Dotted lines represent “ceilings”—horizontal lines correspond to limits on the number of CPU operations, while diagonal lines represent bandwidth limitation of memory and CPU cache. A purple dot represents the performance of Loop3 on Fig. 2 (no frame summation)—since the dot is positioned above the DDR memory ceiling, it shows that the procedure is using the full performance of CPU cache of level 3 (L3). Both loop performance and roofline limits are measured with Intel Advisor 2019 and are aggregated over 48 cores. The number of floating-point operations per second is calculated over loop execution time only.
Data quality indicators in function of rounding the JUNGFRAU pixel readout value to a multiple of photon count for the lysozyme crystal dataset collected at the Swiss Light Source X06SA beamline with the JUNGFRAU 4M at 1.1 kHz using 12.4 keV x-rays, 100% beam transmission, and 0.088°/880 μs steps. 2045 images (180° rotation) were taken for data analysis. The low resolution shell is defined as 50–3.25 Å, while the high resolution shell is 1.18–1.31 Å. The size of the dataset was calculated after compressing with Bitshuffle/LZ4.
| Rounding to a multiple (photons) | Rmeas low/high res. shell (%) | Mean anomalous peak height for S (σ) | Refinement statistics Rwork/Rfree (%) | Bitshuffle/LZ4 compression (bits/pixel) |
|---|---|---|---|---|
| 1/8 | 2.2/18.2 | 15.3 | 11.7/13.6 | 5.0 |
| 1/4 | 2.2/18.3 | 15.1 | 11.7/13.4 | 4.1 |
| 1/2 | 2.1/18.5 | 15.2 | 11.6/13.4 | 3.1 |
| 1 | 2.1/18.9 | 15.2 | 11.6/13.5 | 2.3 |
| 2 | 2.2/22.8 | 14.7 | 11.8/13.5 | 1.5 |
| 4 | 2.1/27.8 | 14.3 | 12.5/14.5 | 0.90 |
| 8 | 2.1/30.6 | 13.9 | 15.4/17.6 | 0.39 |
Comparison of lossless compression algorithms in terms of total time to generate converted HDF5 file (reading raw gain + ADC frames, conversion to photon, frame summation, compression, and writing converted data to SSDs) processing time with XDS and compression factor for a large lysozyme dataset collected at the Swiss Light Source X06SA beamline JUNGFRAU 4M at 1.1 kHz using 12.4 keV x-rays, 100% beam transmission, and 0.088°/880 μs steps. 2045 images (180° rotation) were taken for data analysis. For writing corresponding frequency is noted. Writing time was averaged over 10 runs while processing time over 20 runs due to smaller differences.
| Compression algorithm | Writing time/frequency | Processing time | Compression (bit/pxl) |
|---|---|---|---|
| No compression | 12.9 s/158 Hz | 73.5 s | 16.0 |
| LZ4 | 6.4 s/320 Hz | 73.3 s | 6.8 |
| Bitshuffle/LZ4 | 3.7 s/550 Hz | 74.3 s | 2.3 |
| Zstd | 6.3 s/324 Hz | 75.4 s | 2.8 |
| Bitshuffle/Zstd | 5.8 s/351 Hz | 73.2 s | 1.8 |
| Gzip | 66.7 s/31 Hz | 75.1 s | 2.4 |
Data quality indicators with SZ lossy compression for the lysozyme crystal dataset collected at the Swiss Light Source X06SA beamline with the JUNGFRAU 4M at 1.1 kHz using 12.4 keV x-rays, 100% beam transmission, and 0.088°/880 cs steps. 2045 images (180° rotation) were taken for data analysis. The low resolution shell is defined as 50–3.31 Å, while the high resolution shell is 1.18–1.11 Å. The number of reflection observations accepted per resolution shell is taken from the output of the XDS CORRECT step.
| Absolute error bound in SZ | Rmeas low/high res. shell (%) | Number of accepted observations; low/high res. shell | Mean anomalous peak height for S (σ) | Refinement statistics Rwork/Rfree (%) | Compression factor (bits/pixel) |
|---|---|---|---|---|---|
| 0.0 | 2.1/18.9 | 20 784/17 040 | 15.2 | 11.6/13.5 | 2.3 |
| 1.0 | 2.5/31.1 | 21 047/16 792 | 13.1 | 12.7/14.2 | 1.1 |
| 2.0 | 2.3/22.4 | 20 656/12 275 | 13.0 | 13.8/16.0 | 0.43 |
| 4.0 | 2.3/44.8 | 21 438/13 096 | 9.7 | 14.1/16.0 | 0.11 |
| 8.0 | 2.9/54.0 | 21 489/13 360 | 8.7 | 14.5/16.1 | 0.042 |
FIG. 4.Conceptual design of data acquisition system for JUNGFRAU 10M for MX beamlines at the Paul Scherrer Institute with IBM AC 922 system, 2 Alpha-Data 9H3 FPGA boards and single 2 × 100 GbE Mellanox Connect-X network card. Detector operates at 2.2 kHz framerate. Maximal possible bandwidth of each interface is marked in parentheses according to hardware specifications (U: unidirectional bandwidth, B: bidirectional bandwidth). Assumes a compression factor of 4 with Bitshuffle/LZ4.