Literature DB >> 35679347

Harnessing interpretable and unsupervised machine learning to address big data from modern X-ray diffraction.

Jordan Venderley¹, Krishnanand Mallayya¹, Michael Matty¹, Matthew Krogstad², Jacob Ruff³, Geoff Pleiss⁴, Varsha Kishore⁴, David Mandrus⁵, Daniel Phelan², Lekhanath Poudel^6,7, Andrew Gordon Wilson⁸, Kilian Weinberger⁴, Puspa Upreti^2,9, Michael Norman², Stephan Rosenkranz², Raymond Osborn², Eun-Ah Kim¹.

Abstract

The information content of crystalline materials becomes astronomical when collective electronic behavior and their fluctuations are taken into account. In the past decade, improvements in source brightness and detector technology at modern X-ray facilities have allowed a dramatically increased fraction of this information to be captured. Now, the primary challenge is to understand and discover scientific principles from big datasets when a comprehensive analysis is beyond human reach. We report the development of an unsupervised machine learning approach, X-ray diffraction (XRD) temperature clustering (X-TEC), that can automatically extract charge density wave order parameters and detect intraunit cell ordering and its fluctuations from a series of high-volume X-ray diffraction measurements taken at multiple temperatures. We benchmark X-TEC with diffraction data on a quasi-skutterudite family of materials, (CaxSr[Formula: see text])3Rh4Sn13, where a quantum critical point is observed as a function of Ca concentration. We apply X-TEC to XRD data on the pyrochlore metal, Cd2Re2O7, to investigate its two much-debated structural phase transitions and uncover the Goldstone mode accompanying them. We demonstrate how unprecedented atomic-scale knowledge can be gained when human researchers connect the X-TEC results to physical principles. Specifically, we extract from the X-TEC-revealed selection rules that the Cd and Re displacements are approximately equal in amplitude but out of phase. This discovery reveals a previously unknown involvement of [Formula: see text] Re, supporting the idea of an electronic origin to the structural order. Our approach can radically transform XRD experiments by allowing in operando data analysis and enabling researchers to refine experiments by discovering interesting regions of phase space on the fly.

Entities: Chemical

Keywords: X-ray scattering; big data; machine learning

Year: 2022 PMID： 35679347 PMCID： PMC9214512 DOI： 10.1073/pnas.2109665119

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 12.779

From the early days of X-ray diffraction (XRD) experiments, they have been used to access atomic-scale information in crystalline materials. The primary challenge has always been how to interpret the angle-dependent scattering intensities of the resultant diffraction patterns (Fig. 1). Bragg and Bragg’s initial insights into how to interpret such data (1) enabled the direct determination of crystal structures for the first time, and they were duly awarded a Nobel prize. Since the phase of the X-ray photon is lost in the measurement, the most common approach to interpreting XRD data is to employ forward modeling using the increasingly sophisticated tools of crystallography developed over the past century. These have been remarkably successful in determining the structure of highly crystalline materials, from simple inorganic solids to complex protein crystals. However, subtle structural changes can be difficult to determine when they only result in marginal changes in intensities without any change in peak locations (2). Furthermore, thermal and quantum fluctuations captured in diffuse scattering away from the Bragg peaks are beyond the reach of conventional crystallographic analysis. The information-rich diffuse scattering is typically weaker than Bragg scattering by several orders of magnitude and can be difficult to differentiate from background noise.

Fig. 1.

(A) Schematic geometry of the X-ray scattering measurements. A monochromatic X-ray beam is incident on the sample, which rotates about the orthogonal axis while images are captured on a fast area detector. The reciprocal space map shows the coverage of a single plane in the 3D volume after capturing images over a full sample rotation. A 3D volume of reciprocal space covered by the X-ray scattering is shown on the right. Each red dot is a single Bragg peak. With an X-ray energy of 87 keV, a volume of over 10,000 Å-3 is measured, containing over 10,000 BZs if the unit cell dimension is 10 Å. Real space positions of atoms (Top) and the corresponding scattering intensities (Bottom) calculated from simulated 1D crystals with a unit cell containing two atoms, illustrating (B) a high-symmetry phase, with (C) distortions due to CDW order, (D) IUC order, and (E) short-range IUC order. In B, the high-symmetry phase produces peaks at integer . In C, displacements of the orange atoms by double the size of the unit cell producing additional superlattice peaks at half-integer as well as changes in the other peak intensities. See Fig. 2 for X-TEC–aided detection of CDW order in (CaSr)3Rh4Sn13. In D, IUC distortions of the orange atoms by change the peak intensities without producing additional superlattice peaks. In E, every orange atom is displaced by , with a 70% probability of nearest neighbors having the same displacement. This finite correlation length has a small impact on the total scattering (black) but produces broad diffuse scattering (blue, ×70,000 scale compared to total scattering). See Fig. 3 for the detection of IUC distortions and their diffuse scattering in Cd2Re2O7 with X-TEC. (F) Bond patterns on the pyrochlore lattice associated with an E distortion as inferred in Cd2Re2O7. The two space groups refer to the two different components of E with each bond color denoting a different bond length. The amount of distortion of each bond from the average bond (gray) is indicated by etc., along with the respective bond color. A Mexican hat potential energy E governs the fluctuations between the two E components in the broken symmetry phase. See Fig. 4 for the X-TEC–aided resolution of the two E components and their fluctuations in Cd2Re2O7.

Fig. 2.

Illustration and benchmarking of X-TEC. (A) A flowchart describing the execution of X-TEC. The steps are described in the Implementation of X-TEC section and further detailed in . (B) Raw XRD image showing a slice of the reciprocal space in the plane, at T = 30 K (Left) and T = 220 K (Right). The CDW superlattice peaks are visible at T = 30 K and are absent at T = 220 K. (C and D) X-TECs results of the Sr3Rh4Sn13 XRD data with spanning the reciprocal space where reciprocal lattice units (r.l.u.). The clustering assignments are color-coded as blue, brown, and gray. In C, the lines represent cluster means, and the shaded region shows 1 SD, interpolated between 24 temperature points of measurement. D shows the pixels at in the plane that passed the thresholding (), colored according to their cluster assignments. X-TEC correctly identifies the blue clusters with the CDW super lattice peaks, brown clusters with Bragg peaks, and gray clusters with diffuse scattering. The blue cluster mean (solid line) in C represents the rescaled intensity trajectories of all CDW peaks in the data. (E) An order parameter like quantity is estimated from the CDW (blue) cluster and is shown for four samples at different values of Ca doping x. The is estimated from the cluster means by subtracting the minimum from each cluster mean and appropriate normalization. (F) extracted from a manually selected CDW peak at for the four Ca doping x shows a qualitatively similar trajectory to that of X-TEC in E. (G) The critical temperatures estimated from the X-TEC extracted (yellow filled circles) overlaid onto the known phase diagram from ref. 26 based on phase boundaries from thermodynamic measurements and transport.

Fig. 3.

X-TEC analysis of Cd2Re2O7 XRD data. (A) Crystal structure of Cd2Re2O7 showing only Cd and Re, in the high-temperature cubic phase. (B) Temperature dependence of the specific heat of Cd2Re2O7, showing the second-order phase transition at 200 K and the first-order phase transition at 113 K (). Three temperature ranges are marked as phase I (), phase II (), and phase III (). (C) X-TEC results on the cubic forbidden Bragg peaks from high-resolution XRD data, showing temperature dependence of the mean intensity of each cluster (the cluster assignments are obtained from 30 K 150 K data; see , for details). The lines are average intensity trajectories of their respective cluster assignments from all cubic forbidden Bragg peaks in the data. The solid lines show three-cluster (K = 3) X-TEC-d trajectories, color coded as black, red, and blue. The dashed line shows two-cluster (K = 2) X-TEC-s (peak averaged) trajectories, colored yellow and green. The temperatures of the two structural phase transitions are shown as dotted lines. (D) The X-TEC-d cluster color assignments (black, red, and blue) of the thresholded pixels, as well as X-TEC-s cluster assignments of the Bragg peaks (marked as yellow and green squares centering the Bragg peaks), in a section of the h = 0 plane, where k and l are in r.l.u. The color coding of the clusters is the same as in C. (E and F) The regions in the vicinity of two Bragg peaks at (Left) and (Right) are magnified to show that the peak centers in both belong to the black cluster, while halos form two distinct clusters (red and blue, respectively) separated from their peak centers. X-TEC-d and X-TEC-s together show that red (blue) diffuse halos and the yellow (green) Bragg peaks lock into a strict one-to-one correspondence with both exhibiting a rigid selection rule. The raw intensity plotted for (Left) and (Right) along a line cut (the gray dashed line shown in the respective zoom-ins) confirm the temperature dependence of the red and blue halo intensities represented by the cluster means in C. Specifically, the peak has enhanced diffuse scattering above K, consistent with the temperature dependence of the red cluster mean. The peak shows an anomaly near and a suppressed diffuse scattering above, consistent with the temperature dependence of the blue cluster mean.

Fig. 4.

Order parameters and their fluctuations inferred from X-TEC analysis of cubic forbidden Bragg peaks. (A) The filled symbols are the two-cluster mean intensity trajectories of peak averaged data (yellow and green trajectories from Fig. 3), and solid lines are fits to these cluster means based on the model assuming δx displacements (yellow) and δz displacements (green) of cations to vary as , with a common order parameter exponent of as discussed in . (B) Schematic diagram of the relative z axis displacements of cation sublattices for the Cd (orange) and Re (gray) with respect to the cubic phase, inferred from the fit in A. The X-TEC–discovered selection rule and the fit establish the approximately equal magnitude but out-of-phase displacements and . (C) The characteristic temperature dependences of the diffuse clusters are revealed by the z-scored intensities (for each intensity, subtract their mean over T and then divide their SD in T). The red and blue trajectories correspond to the respective cluster average of the z-scored intensities. Lines are guides for the eyes. (D) The calculated Landau mode intensities as a function of T (). Outside of the critical region near (200 K), the intensity is dominated by the Goldstone mode intensity. Note the resemblance of the calculated intensity to the diffuse trajectory in C. (E) Main panel shows the temperature dependence of the diffuse scattering line-cut profiles near Bragg peak, whose intensities are integrated within the manually selected dashed lines in the plane, shown in Inset. From a visual inspection of their temperature dependence, the shaded gray region is excluded from diffuse scattering. Inset shows the intensity distribution in plane at 100 K around the Bragg peak. The red curves enclose the X-TEC-d determined region for diffuse scattering. The red boundary cleverly avoids the diagonal Bragg streak which is not a part of the diffuse scattering (matching the shaded gray region near the peak in the main panel). (F) The temperature trajectories of diffuse scattering intensities (dotted lines) and their average intensity (solid line) near the peak, from the manually selected regions of diffuse scattering in E. Vertical dashed lines mark (200 K) and (113 K). The trajectories show the same qualitative features of the X-TEC-d red (square symbol) diffuse trajectory in C, with strong scattering at and enhanced intensity above 113 K reflecting the stronger Goldstone fluctuations from z axis displacements shown in B.

The massive data that modern facilities generate, spanning three-dimensional (3D) reciprocal space volumes that include Brillouin zones (BZs) (Fig. 1), at rates of GB/h should capture the systematics of such subtle atomic-scale information. Yet the sheer quantity of data presents a major challenge. Overcoming this challenge is of paramount importance especially in searching for an unknown order parameter and its fluctuations. Specifically, two types of orders and their fluctuations are targets of XRD (see the illustration for a 1D system in Fig. 1 ): those that change the size of the unit cell, such as charge density waves (CDW), and those that involve intraunit cell (IUC) distortions. XRD evidence of CDW order is the emergence of new superlattice peaks, which can be weak and fluctuating, often requiring a targeted search (3, 4). XRD evidence of IUC order is even subtler changes in structure factors of Bragg peaks (5), unless there are changes in extinction rules. However, the ubiquity of electronic nematic order (6, 7) has turned the study of electronically driven IUC order into an increasingly important scientific objective. Electronically driven IUC order and related hidden order phases typically have profound consequences for the electronic structure as revealed by various probes, yet are often accompanied by subtle structural distortions. Examples range from 3d oxides like cuprates, to 4d and 5d oxides like ruthenates and iridates, to 4f and 5f heavy fermion materials like URu2Si2. These small distortions can challenge conventional crystallographic structural refinement that only tracks Bragg peaks and deduce the structural symmetry by fitting all the atomic positions in a forward model. As an example of proposed CDW order, the quasi-skutterudite family, (CaSr)3X4Sn13, where X is a transition metal ion like Co, Rh, or Ir, exhibits marginal Fermi liquid behavior. Much like in cuprates and heavy fermion materials such as YbRh2Si2, this order can be suppressed to very low temperatures, leading to a linear in temperature resistivity over a large range in temperature. As an example of IUC distortion, in the pyrochlore, Cd2Re2O7, a very subtle structural distortion is associated with large changes in the specific heat and susceptibility. This led Fu (8) to propose the presence of spin nematic order, and some evidence for this was provided by subsequent nonlinear optics measurements (9). Moreover, the inversion breaking structural order itself is novel, whose candidate description by an E tensor could support pseudo-Goldstone fluctuations between its two components, and (Fig. 1) (10). Interestingly, both of these examples exhibit superconductivity at low temperatures, leading to the question of how superconductivity is related to these orders. To extract atomic-scale information encoded in massive XRD data volumes, much needed is a versatile, interpretable, and scalable approach that can reveal order parameters and fluctuations associated with CDW orders and IUC orders: the vision behind XRD temperature clustering (X-TEC). For the analysis of complex experimental data, dimension reduction and machine learning techniques are increasingly employed (11–18), with an emphasis on supervised learning using hypothesis-driven synthetic data (11–13). To date, most applications of unsupervised techniques to materials data have been limited to exploration of compositional phase diagrams of alloys (19–21). However, an interpretable and unsupervised approach aiming at discovering interaction-driven emergent phenomena in quantum materials such as order parameters and fluctuations can greatly benefit scientific progress. For versatility, we opted for an unsupervised approach guided by a fundamental principle of statistical mechanics: the balance between the energy (E) and entropy (S) resting on the temperature (T). A change in the collective state of a system occurs in the direction of minimizing the Helmholtz free energy F (22): When the temperature T is lowered below a certain threshold, the entropy S gives way to the ordered state dominated by the system Hamiltonian. Hence, the temperature (T) evolution of the XRD intensity for reciprocal space point , must be qualitatively different if the given reciprocal space point reflects order parameters or their fluctuations. Tracking the temperature evolution of thousands of BZs to identify systematic trends and correlations in any comprehensive manner is impossible to achieve manually without selection bias. X-TEC embodies the principle of Eq. 1 by clustering the temperature series associated with a given , according to qualitative features in the temperature dependence, as in high-dimensional clustering approaches that learn qualitative differences in the voice trains for speaker verification (23). X-TEC achieves interpretability and scalability by using a simple Gaussian mixture model (GMM) (24) at its core and incorporates correlation among nearby points and within and across BZs using label smoothing similar to how signals from different cameras can be correlated for computer vision (25).

Implementation of X-TEC

In Fig. 2, we provide a flowchart giving a bird’s-eye view of the X-TEC execution. We briefly describe the steps below and provide further details in . Comprehensive XRD temperature series data are obtained for each point spanning grid points in a 3D reciprocal space, at 10 to 30 temperatures (step A). The raw data are first passed through a thresholding algorithm that identifies and removes the overwhelming low-intensity background (step B). Next, the intensities, at points that passed the thresholding, undergo a rescaling to reduce the dynamic range of the intensity scale (step C). At this point, the user has to decide between two modes of rescaling depending on the nature of the data of interest. To focus on intensities that show a large variation in temperature, the user selects a mean based rescaling: , where is the mean value of the temperature trajectory at . On the other hand, if the focus is on subtle changes in the intensity–temperature trajectories (low-variance trajectories), one selects a variance-based rescaling (z-scoring) given by , where is the SD of the temperature trajectory at . The preprocessed data are now ready for the X-TEC clustering. At this point, the user sets the number of clusters K, starting with an initial guess (step D). Illustration and benchmarking of X-TEC. (A) A flowchart describing the execution of X-TEC. The steps are described in the Implementation of X-TEC section and further detailed in . (B) Raw XRD image showing a slice of the reciprocal space in the plane, at T = 30 K (Left) and T = 220 K (Right). The CDW superlattice peaks are visible at T = 30 K and are absent at T = 220 K. (C and D) X-TECs results of the Sr3Rh4Sn13 XRD data with spanning the reciprocal space where reciprocal lattice units (r.l.u.). The clustering assignments are color-coded as blue, brown, and gray. In C, the lines represent cluster means, and the shaded region shows 1 SD, interpolated between 24 temperature points of measurement. D shows the pixels at in the plane that passed the thresholding (), colored according to their cluster assignments. X-TEC correctly identifies the blue clusters with the CDW super lattice peaks, brown clusters with Bragg peaks, and gray clusters with diffuse scattering. The blue cluster mean (solid line) in C represents the rescaled intensity trajectories of all CDW peaks in the data. (E) An order parameter like quantity is estimated from the CDW (blue) cluster and is shown for four samples at different values of Ca doping x. The is estimated from the cluster means by subtracting the minimum from each cluster mean and appropriate normalization. (F) extracted from a manually selected CDW peak at for the four Ca doping x shows a qualitatively similar trajectory to that of X-TEC in E. (G) The critical temperatures estimated from the X-TEC extracted (yellow filled circles) overlaid onto the known phase diagram from ref. 26 based on phase boundaries from thermodynamic measurements and transport. There are two modes for X-TEC clustering: X-TEC smoothed (X-TEC-s) and X-TEC detailed (X-TEC-d). X-TEC-d assigns cluster labels independently to the trajectories at , while X-TEC-s incorporates label smoothing among neighboring q points within and across BZs. X-TEC-s is best suited for detecting order parameters reflected in the peak centers, while X-TEC-d can probe finer details in the diffuse scattering and reveal the nature of fluctuations in high-resolution data. The user makes a decision (step E) to choose X-TEC-s for order parameters or X-TEC-d for their fluctuations. Using X-TEC-s and X-TEC-d in tandem can reveal systematic correlations between order parameters captured by peak centers and fluctuations captured by diffuse scattering in an unprecedented manner. For X-TEC-s (step E.2), the user can choose the label smoothing approach to enforce local correlations in the cluster label assignments of neighboring . If the size of the dataset is large, the user can opt for a faster and rudimentary version of label smoothing enforced through peak averaging, where intensities of connected pixels in reciprocal space are replaced by their pixel-averaged intensity. Following the X-TEC clustering, the results are visualized and interpreted (step F). The user observes the K distinct temperature trajectories of the clustered data as well as the cluster labels assigned to the points in reciprocal space. The visual interpretation aids the user to arrive at the optimal number of clusters K such that increasing K does not reveal any more distinct trajectories (step G). The clustered trajectories and their labels in space are now ready for interpretation to aid possible new discoveries such as the identification of hidden orders and selection rules. At the heart of X-TEC-d is the standard GMM applied to the temperature series, , treated as a point in the d-dimensional space. With the number of clusters K, X-TEC-d attempts to model each point in the dataset to be independently and identically drawn from a weighted sum of K distinct multivariate normal distributions. The hyperparameters to be learned are the mixing weights π, d-dimensional means , and -dimensional covariances . The associated model log-likelihood is Here is the probability density for the kth multivariate Gaussian with mean and covariance evaluated at , i.e., The probability, , that the temperature series labeled by belongs to the kth cluster isaccording to Bayes’ theorem. X-TEC learns the hyperparameters using a stepwise expectation maximization (EM) algorithm (27) (). Much like mean-field theory familiar to physicists, the EM algorithm iteratively searches for the saddle point of the lower bound of the log-likelihood where λ is a Lagrange multiplier. The cluster assignment of a given reciprocal space point is then determined by the converged value of the clustering expectation . For X-TEC-s with label smoothing, the algorithm first constructs a nearest neighbor graph in momentum space, connecting reciprocal space points that share similar momenta. For each point, the neighbors are weighted by their distance in momentum space and the weights normalized. Label smoothing averages the cluster assignments of a point with its (weighted) neighbors. We incorporate this smoothing step between the E and M step of the GMM.

CDW Order and X-TEC Benchmarking

In order to demonstrate the power of X-TEC in action and benchmark its results, we first analyze a collection of data in the vicinity of a putative CDW quantum critical point. Sr3Rh4Sn13 is a quasi-skutterudite compound that has a CDW transition at ∼138 K and a superconducting transition at 4.7 K (26). Doping with calcium applies chemical pressure that suppresses the CDW transition, and electrical resistivity and heat capacity experiments on (CaSr)3Rh4Sn13 provided evidence of a quantum critical point at a composition of x = 0.9 (Fig. 2), corresponding to a peak in the superconducting dome (26), reminiscent of the cuprate phase diagram (28). This interpretation was supported both by inelastic X-ray measurements of soft phonon modes (29) and, more recently, X-ray measurements of the CDW order parameter in the related family, (CaSr)3Ir4Sn13 (30). We have been developing highly efficient methods of mapping out such phase diagrams using high-energy X-rays on Sector 6-ID-D at the Advanced Photon Source using a monochromatic X-ray energy of 87 keV (31). Images are collected on a fast area detector (Pilatus 2M CdTe) at a frame rate of 10 Hz while the sample is continuously rotated through 360 at a speed of 1/s (Fig. 1). These rotation scans are repeated twice to fill in gaps between the detector chips, so a single measurement represents an uncompressed data volume of over 100 GB collected in under 20 min. This allows comprehensive measurements of the temperature dependence of a material in 12 h or less. Using a cryostream, we are able to vary the temperature from 30 to 300 K. The rotation scans sweep through a large volume of reciprocal space, containing over 10,000 BZs (Fig. 1); when the data are transformed into reciprocal space coordinates, the 3D arrays are typically reduced in size by an order of magnitude. More details of both the measurement and data reduction workflow are given in ref. 31; see also and ref. 32. Fig. 2 shows the raw XRD images in the plane, at T = 30 and T = 220 K. At T = 30 K, the CDW superlattice peaks are clearly seen at and symmetry equivalents with respect to the cubic Bragg peaks, which are absent at the higher temperature. In (CaSr)3Rh4Sn13, we applied X-TEC to the XRD data on four compounds, , to map out the phase diagram as a function of both temperature and doping automatically. In Fig. 2, we present cluster means and variances of the three-cluster (K = 3) results for undoped Sr3Rh4Sn13. The optimal number of clusters is obtained as the minimum number needed to separate the distinct temperature trajectories (). The temperature dependence of the learned means of the blue, brown, and gray clusters makes it evident that the blue cluster represents the order parameter and the temperature at which it falls to 0 is the critical temperature, K. The clustering results can be interpreted by locating the cluster assignments in reciprocal space, as shown in Fig. 2. The location of the blue pixels (which correspond to the blue cluster) identifies the ordering wave vector q as expected from the raw images in Fig. 2. The diffuse scattering is captured by the gray clusters, while the Bragg peaks are captured by the brown clusters. The three clusters are first identified from an X-TEC-d clustering (see , for the X-TEC-d results), and the label smoothing is applied to the blue and brown clusters (peak centers) after excluding the gray diffuse scattering. Label smoothing keeps the clustering output to be smoothly connected in the vicinity of each peak, simplifying interpretation. Plotting the CDW order parameters extracted automatically by X-TEC at each doping, we can track the evolution of the critical temperature T as a function of chemical pressure (Fig. 2), allowing us to map out the quantum phase diagram associated with the CDW ordering in (CaSr)3Rh4Sn13, in a similar way to ref. 30, without any prior knowledge of the wave vectors or transition temperatures. A comparison of the X-TEC extracted CDW order parameter (Fig. 2) with that from a manually selected superlattice peak (Fig. 2) shows excellent agreement. In the past, we would have analyzed such data by manually identifying a few superlattice peaks, with the assumption that they are representative of the whole, and fitting their temperature dependence. This may be justified in many cases, but in doing so, we would be ignoring over 99% of the data, limiting the statistical precision available from such comprehensive datasets and potentially missing secondary components of the order parameter. X-TEC eliminates the danger of selection bias in such analyses. The large data volume also allows us to utilize the 3D-Δ PDF method (31), in order to determine the nature of the atomic distortions both below and above T, which will be discussed in a future publication.

IUC Order, Fluctuations, and Selection Rules

We now employ X-TEC-s and X-TEC-d in tandem to study hidden IUC order and order parameter fluctuations in the pyrochlore metal Cd2Re2O7 (33–35) (Fig. 3), whose low-temperature phases have recently attracted much interest and controversies (9, 36–41). X-TEC analysis of Cd2Re2O7 XRD data. (A) Crystal structure of Cd2Re2O7 showing only Cd and Re, in the high-temperature cubic phase. (B) Temperature dependence of the specific heat of Cd2Re2O7, showing the second-order phase transition at 200 K and the first-order phase transition at 113 K (). Three temperature ranges are marked as phase I (), phase II (), and phase III (). (C) X-TEC results on the cubic forbidden Bragg peaks from high-resolution XRD data, showing temperature dependence of the mean intensity of each cluster (the cluster assignments are obtained from 30 K 150 K data; see , for details). The lines are average intensity trajectories of their respective cluster assignments from all cubic forbidden Bragg peaks in the data. The solid lines show three-cluster (K = 3) X-TEC-d trajectories, color coded as black, red, and blue. The dashed line shows two-cluster (K = 2) X-TEC-s (peak averaged) trajectories, colored yellow and green. The temperatures of the two structural phase transitions are shown as dotted lines. (D) The X-TEC-d cluster color assignments (black, red, and blue) of the thresholded pixels, as well as X-TEC-s cluster assignments of the Bragg peaks (marked as yellow and green squares centering the Bragg peaks), in a section of the h = 0 plane, where k and l are in r.l.u. The color coding of the clusters is the same as in C. (E and F) The regions in the vicinity of two Bragg peaks at (Left) and (Right) are magnified to show that the peak centers in both belong to the black cluster, while halos form two distinct clusters (red and blue, respectively) separated from their peak centers. X-TEC-d and X-TEC-s together show that red (blue) diffuse halos and the yellow (green) Bragg peaks lock into a strict one-to-one correspondence with both exhibiting a rigid selection rule. The raw intensity plotted for (Left) and (Right) along a line cut (the gray dashed line shown in the respective zoom-ins) confirm the temperature dependence of the red and blue halo intensities represented by the cluster means in C. Specifically, the peak has enhanced diffuse scattering above K, consistent with the temperature dependence of the red cluster mean. The peak shows an anomaly near and a suppressed diffuse scattering above, consistent with the temperature dependence of the blue cluster mean. The Cd2Re2O7 goes through a second-order transition at K from the cubic pyrochlore structure (phase I) to a structure that breaks inversion symmetry (phase II), with a large thermodynamic signature in the specific heat (Fig. 3). Most studies conclude that the space group of phase II is the component of E symmetry (37). At a lower temperature, a first-order transition at K (phase III) is observed and is proposed to arise from the other component of E, which is the space group (37). An additional transition at 80 K is posited following recent Raman data showing line splittings consistent with a lowering to orthorhombic symmetry (speculated to be an F222 space group) (42). The results for phase II are consistent with the picture where and are the two components of the E order parameter, a rank-2 tensor. The degeneracy between these two states is lifted at sixth order in Landau theory (43), resulting in a pseudo-Goldstone mode encoding fluctuations between the two phases (44, 45) (Fig. 1). Raman scattering (10) shows a strong central peak that appears to be the Goldstone mode, along with a higher-frequency mode which appears to be the Higgs mode [although this has been recently questioned based on pump–probe measurements (41)]. The uniqueness of this situation is that although pseudo-Goldstone modes have been seen in other materials, notably ferroelectrics, they typically exist at much higher frequencies (45). The fact that this is not the case for Cd2Re2O7 indicates that the anisotropy in the Landau free energy is anomalously small. Confirmation of such low-frequency fluctuations has so far remained beyond the reach of XRD. However, the E structural order of phase II is now questioned after the discovery of a purported electronic order from second harmonic generation (SHG) (9). While the SHG data also show the E structural order, they reveal the surprising fact that the E order does not have the expected temperature dependence of a primary order parameter, unlike the signal, which does (9, 38–40). The proposed space group of phase III is also controversial in that earlier SHG data (36) did not show the expected rotation of the signal from to that should accompany such a phase transition. A combination of small atomic displacements with crystallographic twinning (46) has made it challenging to determine the true structure of these low-symmetry states using traditional crystallographic approaches (47, 48). The relationship between the E structural order and the proposed hidden order indicated by the SHG data has also remained elusive to XRD probes. We performed X-ray scattering measurements over a wide temperature range (30 K K) on a single crystal of Cd2Re2O7, which our measurements show is untwinned, at least in phase II. This may be due to the small volume (400 × 200 × 50 µ m3) required for our synchrotron measurements. We first performed scans using an X-ray energy of 87 keV, which contained scattering spanning nearly 15,000 BZs, in order to search for previously undetected peaks and determine the systematic (HKL) dependence of the Bragg peak intensities at each temperature (). To better understand the order parameter fluctuations, we then reduced the energy to 60 keV to improve the resolution and increased the number of temperatures, particularly near the phase transitions. We comprehensively analyzed the resulting datasets (32) with a combined volume of nearly 8 TB using X-TEC-s and X-TEC-d in a time frame of a few minutes (see , for details on preprocessing and CPU times for X-TEC analysis). We illustrate the sharp characteristics of the order parameter and its fluctuations by focusing on the cubic-forbidden peaks in Figs. 3 and 4 (see , for the clustering results that selects cubic-forbidden peaks as the order parameter of phase II). Fig. 3 shows the K = 2 clustering means of X-TEC-s and K = 3 clustering means of X-TEC-d on all the cubic-forbidden peaks in the data over the temperature range of [30 K, 150 K].* Both outcomes presented big surprises. First, the X-TEC-s outcome separated the cubic forbidden peaks that behave like the order parameter of phase II into two subgroups: one that quickly flattens in phase II to abruptly rise in phase III (yellow) and the other that continues to rise in phase II to abruptly drop in phase III (green). Second, X-TEC-d clustering separates out the diffuse regions associated with each of the subgroups of cubic-forbidden peaks to define their own clusters with temperature dependencies that are qualitatively different (red and blue in Fig. 3) and distinct from the temperature dependencies of the peak centers. Order parameters and their fluctuations inferred from X-TEC analysis of cubic forbidden Bragg peaks. (A) The filled symbols are the two-cluster mean intensity trajectories of peak averaged data (yellow and green trajectories from Fig. 3), and solid lines are fits to these cluster means based on the model assuming δx displacements (yellow) and δz displacements (green) of cations to vary as , with a common order parameter exponent of as discussed in . (B) Schematic diagram of the relative z axis displacements of cation sublattices for the Cd (orange) and Re (gray) with respect to the cubic phase, inferred from the fit in A. The X-TEC–discovered selection rule and the fit establish the approximately equal magnitude but out-of-phase displacements and . (C) The characteristic temperature dependences of the diffuse clusters are revealed by the z-scored intensities (for each intensity, subtract their mean over T and then divide their SD in T). The red and blue trajectories correspond to the respective cluster average of the z-scored intensities. Lines are guides for the eyes. (D) The calculated Landau mode intensities as a function of T (). Outside of the critical region near (200 K), the intensity is dominated by the Goldstone mode intensity. Note the resemblance of the calculated intensity to the diffuse trajectory in C. (E) Main panel shows the temperature dependence of the diffuse scattering line-cut profiles near Bragg peak, whose intensities are integrated within the manually selected dashed lines in the plane, shown in Inset. From a visual inspection of their temperature dependence, the shaded gray region is excluded from diffuse scattering. Inset shows the intensity distribution in plane at 100 K around the Bragg peak. The red curves enclose the X-TEC-d determined region for diffuse scattering. The red boundary cleverly avoids the diagonal Bragg streak which is not a part of the diffuse scattering (matching the shaded gray region near the peak in the main panel). (F) The temperature trajectories of diffuse scattering intensities (dotted lines) and their average intensity (solid line) near the peak, from the manually selected regions of diffuse scattering in E. Vertical dashed lines mark (200 K) and (113 K). The trajectories show the same qualitative features of the X-TEC-d red (square symbol) diffuse trajectory in C, with strong scattering at and enhanced intensity above 113 K reflecting the stronger Goldstone fluctuations from z axis displacements shown in B. The reciprocal space distribution of the clusters reveals precise selection rules and tight correlation between the order parameter tracked in X-TEC-s and the fluctuations revealed in X-TEC-d. Due to the orders of magnitude differences in intensity scales, X-TEC-s is dominated by the peak centers. X-TEC-d separated out the peak centers from the halos of diffuse regions. Combining the two results, we present the X-TEC-s outcome through the color of the peak centers detected in X-TEC-d. The (HKL) assignments of the two subgroups in X-TEC-s, and their associated diffuse halos in X-TEC-d (Fig. 3), reveal strict selection rules. Yellow peaks (with red halos) are of the form , while green peaks (with blue halos) have or , in the cubic indices of phase I. The mean intensity trajectories of red and blue clusters in Fig. 3 indicate that the red halo sustains intensity throughout phase II to only dive down at K while the blue halo picks up intensity at around to abruptly die out at around 90 K. The temperature evolution of representative line cuts shown in Fig. 3 confirm these observations in the raw data.

Discussion

The systematics in the temperature dependencies of different cubic-forbidden peaks and their diffuse halos revealed using the two modes of X-TEC on the entire 8 TB of data present an unprecedented opportunity to extract atomic-scale clues regarding the hidden order. First, we can extract an order parameter critical exponent associated with the structural transition that is reflecting the entire dataset from the X-TEC-s mean trajectories. Fig. 4 shows the temperature dependence of the two peak averaged clusters (yellow and green) of cubic-forbidden peaks and their fits, in which we treat the displacements as order parameters with a common exponent β (). Both clusters fit to the common exponent of close to . This is close to the value expected for a 2D-XY system (49). This is a surprise in that the E signal observed by SHG scales linearly in , which is instead of the expected indicated by theory (38), whereas it is the signal that scales like . Second, we can convert the selection rule revealed by X-TEC into atomic distortions. The selection rule shows that the two clusters correspond to two distinct classes of structure factor, whose values only depend on the distortions of the Cd and Re sublattices: the yellow cluster consists of peaks that are dominated by z axis displacements , and those in the green cluster are dominated by in-plane displacements, along x or y depending on the Wyckoff position, () (Fig. 4). The flat temperature dependence of the yellow cluster below 180 K results from out-of-phase distortions of the Cd and Re sublattices. The refined values of and are approximately equal and opposite (Fig. 4). This is another surprising result. Previous refinements (50) indicate that the Re displacements are small, and this is consistent with a density functional theory study (42). Small Re displacements are expected if the 5d electrons in Re play a passive role in the structural transition as the Re are in an almost ideally bonded octahedral environment, compared to Cd which is underbonded because of its two short Cd–O and six long Cd–O bonds. Therefore, a large displacement of Re implies that this is a consequence of the configuration of Re being unstable to spin nematic order that should lead to valence bond ordering (different Re–Re bonds, as illustrated in Fig. 1) in a given Re tetrahedron as proposed in other pyrochlores (51). Third, the connection between the two diffuse halo clusters (red and blue) and the selection rule for the peak centers draws us to the unusual and distinct temperature dependence of the diffuse regions (Fig. 4). Strong critical scattering at is clear in both clusters, but the diffuse contribution is much stronger in the red halo throughout phase II. The role between the two halos reverses at . We attribute the fluctuations reflected in the sustained intensity of the red halo to the Goldstone mode manifest through strong z axis fluctuations. To investigate this further, we turn to a description of the various modes (see , for more details of the calculations). Above , one has a soft mode whose energy should go to zero at . Below this, the soft mode splits into a Higgs mode (fluctuations in the amplitude of the E order) and a Goldstone mode (fluctuations in the phase, that is, fluctuations between and ). The latter would be at zero energy if there were no anisotropy. In Landau theory, the first anisotropy term appears at sixth order and the next one at eighth order in the free energy. These two must be of opposite sign in order to have a second transition at (43). Their difference changes sign at . The net result is that one has a Goldstone mode that starts at zero energy at , rises slightly with lowering T, then dips down again at , and then rises again below this. This can be appreciated by the intensities associated with the various modes (Fig. 4), noting that the Goldstone mode’s coupling to the X-rays is quadratic in the E order parameter (52) reflecting the fact that it does not exist above (the analog of the soft mode below is the Higgs mode). From the calculated intensities, one sees that the Goldstone mode completely dominates outside of the critical region near . The calculated behavior is remarkably similar to the XRD data (Fig. 4), with a pronounced cusp at . This is strong indication that the diffuse scattering is indeed due to structural fluctuations associated with the Goldstone mode. We now benchmark X-TEC findings of order parameter fluctuations and their coupling to the Bragg peaks against the conventional approach. In the conventional manual approach, one would be forced to select a few Bragg peaks and carefully identify their diffuse region and hope for this hand-picked subset of the data to be representative. The identification of diffuse region in this approach requires tracking temperature dependence of line cuts to separate the diffuse region from the Bragg peak, background scattering, and other streaking artifacts. Fig. 4, Inset, shows that the diffuse region automatically identified by X-TEC is faithful to the conventional definition of the diffuse region. Such a manual approach is laborious at best as apparent from Fig. 4 and can potentially miss the selection rules governing different Bragg peaks and their diffuse scattering, which are apparent only from an extensive analysis of both the diffuse scattering and the Bragg peaks. We are further limiting the statistical precision available from such comprehensive datasets.

Summary

In summary, we developed X-TEC, an unsupervised and interpretable ML algorithm for voluminous XRD data that is guided by the fundamental role temperature plays in emergent phenomena. By analyzing the entire dataset over many BZs and making use of temperature evolutions, X-TEC can pick up subtle features representing both order parameters and fluctuations from higher-intensity backgrounds. The two modes, X-TEC-s and X-TEC-d, allow for discovery of systematics in order parameters and its fluctuations despite orders of magnitude differences in intensities. The algorithm is fast with O(10) minutes of run time for the tasks presented here. Using X-TEC, we discovered that the superconductor family (CaSr)3Rh4Sn13 exhibits CDW order, and we mapped out its phase diagram. In Cd2Re2O7, we conclusively identified the primary order parameter of the K transition. We further revealed the nature of the IUC atomic distortions in a way that has eluded crystallographic analysis until now. Finally, we revealed XRD evidence of a structural Goldstone mode. The unprecedented degree of microscopic information we have been able to unearth from the XRD is fitting for such comprehensive data but would have been impossible by manual inspection. Instead of determining critical exponents by fitting a handful of peaks, X-TEC provides a means of including the entire data volume by clustering peak intensities from thousands of BZs to produce an analysis that is both robust and rapid in future studies of such phase diagrams. Once X-TEC is integrated to the experimental workflow at the beamline, it can guide the measurements through a real-time analysis of the temperature dependencies. An exciting prospect is to direct the X-TEC extracted data toward automated approaches in inverse scattering problem to efficiently identify the underlying microscopic models (53). Given the general structure of X-TEC, we anticipate it to be broadly applicable to other fields beyond XRD.

Methods

Installing X-TEC, Codes, and Tutorials.

The X-TEC codes can be installed through the Python Package Index (PyPI) distribution or from the GitHub source https://github.com/KimGroup/XTEC. The GitHub repository provides instructions to install X-TEC as well as three Jupyter notebook tutorials on X-TEC-d, X-TEC-s with label smoothing, and X-TEC-s with peak averaging.

The X-TEC Pipeline.

Further details on the X-TEC machinery are provided in , describing the X-ray data collection, the X-TEC processing for the (CaSr)3Rh4Sn13 data, and the EM algorithm for GMM. , provides another X-TEC benchmarking example with a CDW material: TiSe2. The details about the Cd2Re2O7 analysis are provided in .

21 in total

1. Order by distortion and string modes in pyrochlore antiferromagnets.

Authors: Oleg Tchernyshyov; R Moessner; S L Sondhi
Journal: Phys Rev Lett Date: 2002-01-28 Impact factor: 9.161

2. Intra-unit-cell electronic nematicity of the high-T(c) copper-oxide pseudogap states.

Authors: M J Lawler; K Fujita; Jhinhwan Lee; A R Schmidt; Y Kohsaka; Chung Koo Kim; H Eisaki; S Uchida; J C Davis; J P Sethna; Eun-Ah Kim
Journal: Nature Date: 2010-07-15 Impact factor: 49.962

3. Ambient pressure structural quantum critical point in the phase diagram of (Ca(x)Sr(1-x))(3)Rh(4)Sn(13).

Authors: S K Goh; D A Tompsett; P J Saines; H C Chang; T Matsumoto; M Imai; K Yoshimura; F M Grosche
Journal: Phys Rev Lett Date: 2015-03-04 Impact factor: 9.161

4. Machine learning in electronic-quantum-matter imaging experiments.

Authors: Yi Zhang; A Mesaros; K Fujita; S D Edkins; M H Hamidian; K Ch'ng; H Eisaki; S Uchida; J C Séamus Davis; Ehsan Khatami; Eun-Ah Kim
Journal: Nature Date: 2019-06-19 Impact factor: 49.962

5. A parity-breaking electronic nematic phase transition in the spin-orbit coupled metal Cd₂Re₂O₇.

Authors: J W Harter; Z Y Zhao; J-Q Yan; D G Mandrus; D Hsieh
Journal: Science Date: 2017-04-21 Impact factor: 47.728

6. Evidence of an Improper Displacive Phase Transition in Cd_{2}Re_{2}O_{7} via Time-Resolved Coherent Phonon Spectroscopy.

Authors: J W Harter; D M Kennes; H Chu; A de la Torre; Z Y Zhao; J-Q Yan; D G Mandrus; A J Millis; D Hsieh
Journal: Phys Rev Lett Date: 2018-01-26 Impact factor: 9.161

7. Integrating Neural Networks with a Quantum Simulator for State Reconstruction.

Authors: Giacomo Torlai; Brian Timar; Evert P L van Nieuwenburg; Harry Levine; Ahmed Omran; Alexander Keesling; Hannes Bernien; Markus Greiner; Vladan Vuletić; Mikhail D Lukin; Roger G Melko; Manuel Endres
Journal: Phys Rev Lett Date: 2019-12-06 Impact factor: 9.161

8. Structural characterisation of amorphous solid dispersions via metropolis matrix factorisation of pair distribution function data.

Authors: Harry S Geddes; Helen Blade; James F McCabe; Leslie P Hughes; Andrew L Goodwin
Journal: Chem Commun (Camb) Date: 2019-10-03 Impact factor: 6.222