| Literature DB >> 35371510 |
Yulong Zhuang1,2, Salah Awel3, Anton Barty3, Richard Bean4, Johan Bielecki4, Martin Bergemann4, Benedikt J Daurer5, Tomas Ekeberg6, Armando D Estillore3, Hans Fangohr1,2,4,7, Klaus Giewekemeyer4, Mark S Hunter8, Mikhail Karnevskiy4, Richard A Kirian9, Henry Kirkwood4, Yoonhee Kim4, Jayanath Koliyadu4, Holger Lange10,11, Romain Letrun4, Jannik Lübke3,10,12, Abhishek Mall1,2, Thomas Michelat4, Andrew J Morgan13, Nils Roth3,12, Amit K Samanta3, Tokushi Sato4, Zhou Shen5, Marcin Sikorski3, Florian Schulz10,14, John C H Spence9, Patrik Vagovic3,4, Tamme Wollweber1,2,10, Lena Worbs3,12, P Lourdu Xavier1,3,10, Oleksandr Yefanov3, Filipe R N C Maia6,15, Daniel A Horke3,10,16, Jochen Küpper3,10,12,17, N Duane Loh5,18, Adrian P Mancuso4,19, Henry N Chapman3,10,12, Kartik Ayyer1,2,10.
Abstract
One of the outstanding analytical problems in X-ray single-particle imaging (SPI) is the classification of structural heterogeneity, which is especially difficult given the low signal-to-noise ratios of individual patterns and the fact that even identical objects can yield patterns that vary greatly when orientation is taken into consideration. Proposed here are two methods which explicitly account for this orientation-induced variation and can robustly determine the structural landscape of a sample ensemble. The first, termed common-line principal component analysis (PCA), provides a rough classification which is essentially parameter free and can be run automatically on any SPI dataset. The second method, utilizing variation auto-encoders (VAEs), can generate 3D structures of the objects at any point in the structural landscape. Both these methods are implemented in combination with the noise-tolerant expand-maximize-compress (EMC) algorithm and its utility is demonstrated by applying it to an experimental dataset from gold nanoparticles with only a few thousand photons per pattern. Both discrete structural classes and continuous deformations are recovered. These developments diverge from previous approaches of extracting reproducible subsets of patterns from a dataset and open up the possibility of moving beyond the study of homogeneous sample sets to addressing open questions on topics such as nanocrystal growth and dynamics, as well as phase transitions which have not been externally triggered. © Yulong Zhuang et al. 2022.Entities:
Keywords: XFELs; coherent X-ray diffractive imaging (CXDI); single particles
Year: 2022 PMID: 35371510 PMCID: PMC8895023 DOI: 10.1107/S2052252521012707
Source DB: PubMed Journal: IUCrJ ISSN: 2052-2525 Impact factor: 4.769
Figure 1(a) Examples of diffraction patterns in the data set. The colour scale maximizes at four photons. (b) The workflow of the CLPCA method. (c) The upper row shows some typical frame averages in the dataset on a logarithmic scale, while the lower row shows their corresponding real-space 2D density projections via phase retrieval. The real-space field of view is 132.4 nm.
Figure 2(a) An illustration of the common line between two patterns of similar-shaped objects but with different orientations. (b) The distribution of 1000 averages in the 3D CLPCA space. Different colours are manually divided groups from the first two components: contaminants (blue), cubes (red), transition (yellow) and spheres (green). Typical patterns for averages in each group are also shown on a logarithmic scale.
Figure 3A brief illustration of the absolute embedding neural network (NN) model. (a) The average pattern from Dragonfly. (b) A polar representation of the pattern. (c) A stack of 1D Fourier transform magnitudes along the angular axis for each radial bin. The odd frequency components (due to inversion symmetry) and the higher frequencies for signals at smaller radii have been removed. These represent the feature vectors for the neural network. (d) An example of using absolute CLPCA on a selected cubic subset of frames (red dots). The grey dots represent the embedding of the pattern averages from the whole dataset. (e) Training labels from CLPCA.
Figure 4(a) A schematic diagram of the variational auto-encoder (VAE). It consists of two convolutional neural networks as encoder and decoder, and the model is fitted by comparing the similarity between the input and decoder-generated 2D slices with additional regularization by assuming the latent parameter follows an N(0, 1) distribution. (b) The distribution of the 10 000 bootstrapped 2D average patterns in CLPCA space (grey). Red dots are selected average patterns along the melting sequence used for the VAE analysis. (c) CLPCA-2 plotted against gain for the selected patterns from panel (b) along the melting sequence.
Figure 5(a) VAE-encoded 1D latent number z plotted against CLPCA-2 for the input sample patterns. The colour coding is the gain parameter of each pattern. (b) A histogram of latent number z. (c) A histogram of CLPCA-2. (d) The volume ‘evolution’ along z, with volumes calculated with a density threshold equal to 10−4 of the total mass. The dashed horizontal line is the size of the support volume in phase retrieval. Vertical grey lines show the locations of the 12 selected z numbers. (e) The top three rows are the VAE-generated intensity volumes from the 12 selected z numbers on a logarithmic scale, the three rows showing slices in (from row 1 to row 3) the 〈100〉, 〈110〉 and 〈111〉 directions, respectively. The bottom three rows show their corresponding density projections in real space.
Figure 6(a) The distribution of the dataset in 2D latent space, colour coded by CLPCA-2. Black crosses in the plot are the 200 selected latent numbers z. (b) The 〈100〉 direction projections of the 200 real-space densities retrieved from the logarithmic intensity volumes generated from the 200 selected latent numbers z [black crosses in panel (a)]. Semi-transparent slices are volumes generated from latent regions with no or very few data points.