| Literature DB >> 35481363 |
Luis Cerdán1, Daniel Roca-Sanjuán1.
Abstract
The theoretical prediction of molecular electronic spectra by means of quantum mechanical (QM) computations is fundamental to gain a deep insight into many photophysical and photochemical processes. A computational strategy that is attracting significant attention is the so-called Nuclear Ensemble Approach (NEA), that relies on generating a representative ensemble of nuclear geometries around the equilibrium structure and computing the vertical excitation energies (ΔE) and oscillator strengths (f) and phenomenologically broadening each transition with a line-shaped function with empirical full-width δ. Frequently, the choice of δ is carried out by visually finding the trade-off between artificial vibronic features (small δ) and over-smoothing of electronic signatures (large δ). Nevertheless, this approach is not satisfactory, as it relies on a subjective perception and may lead to spectral inaccuracies overall when the number of sampled configurations is limited due to an excessive computational burden (high-level QM methods, complex systems, solvent effects, etc.). In this work, we have developed and tested a new approach to reconstruct NEA spectra, dubbed GMM-NEA, based on the use of Gaussian Mixture Models (GMMs), a probabilistic machine learning algorithm, that circumvents the phenomenological broadening assumption and, in turn, the use of δ altogether. We show that GMM-NEA systematically outperforms other data-driven models to automatically select δ overall for small datasets. In addition, we report the use of an algorithm to detect anomalous QM computations (outliers) that can affect the overall shape and uncertainty of the NEA spectra. Finally, we apply GMM-NEA to predict the photolysis rate for HgBrOOH, a compound involved in Earth's atmospheric chemistry.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35481363 PMCID: PMC9097286 DOI: 10.1021/acs.jctc.2c00004
Source DB: PubMed Journal: J Chem Theory Comput ISSN: 1549-9618 Impact factor: 6.578
Figure 1Sample of 500 observations (points) drawn from an unknown distribution and the estimated joint PDF (shaded contours) assuming K = 2 components for the GMM model. The diamonds mark the location of the mixture means.
Figure 2Electronic absorption cross-section spectrum of benzene reconstructed from 250 geometries using (a) GMM-NEA and (b) auto-δ. The shaded areas represent the reconstruction of 95% CIs. The target spectrum (black lines) is included for comparison purposes.
Optimal Model Parameters for Each of the Bands/Transitions Used to Reconstruct the Spectra in Figure
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| δ | 0.035 | 0.039 | 0.025 | 0.025 | 0.026 | 0.026 | 0.023 | 0.024 | 0.025 | 0.021 |
| δ | 0.094 | 0.118 | 0.074 | 0.066 | 0.069 | 0.072 | 0.058 | 0.075 | 0.071 | 0.064 |
| 6|VVV | 2|EVE | 3|EEE | 2|VVI | 3|VVI | 2|VVI | 3|VVE | 3|VVI | 3|VVE | 3|VVE |
Empirical bandwidths for the target spectrum.
Empirical bandwidths for the auto-δ spectrum.
Number of mixtures (K) and GMM models for the GMM-NEA spectrum. VVV: ellipsoidal, varying volume, shape, and orientation; EVE: ellipsoidal, equal volume, and orientation; EEE: ellipsoidal, equal volume, shape, and orientation; VVI: diagonal, varying volume, and shape; VVE: ellipsoidal and equal orientation. For a visualization of these model constraints, check Table 3 and Figure 2 in mclust documentation.[59]
Figure 3Electronic absorption cross-section spectrum for each of the transitions in benzene reconstructed from 250 geometries using GMM-NEA (red lines) and auto-δ (green lines). The shaded areas represent the reconstruction of 95% CIs. The target spectrum (black lines) is included for comparison purposes.
Figure 4Dependence of (a) bRIC and (b) RIC on the number of geometries used for reconstructing the electronic absorption spectra of benzene using GMM-NEA (red points) and auto-δ (green crosses). The RIC values reported for the spectra reconstructed using the KREG model[30] (black stars) have been included in (b) for comparison purposes. The markers and error bars indicate the average and standard deviation over 25 independent random draws. The same y-scale has been used in both panels for the sake of better comparison.
Figure 5Evolution of bRICseq with the number of geometries used for reconstructing the electronic absorption spectra of benzene using (a) GMM-NEA and (b) auto-δ. Each line represents an independent experiment. The markers indicate the average over these experiments. The horizontal dotted lines mark the location of bRICseq = (0.1, 0.05, 0.025).
Figure 6Electronic absorption cross-section spectrum of the U6OH radical reconstructed from 100 geometries using GMM-NEA (red lines) and auto-δ (green lines) in the presence (a) and absence (b) of outliers. The shaded areas represent the reconstruction of 95% CIs.
Figure 7(a) Electronic absorption cross-section spectrum of HgBrOOH reconstructed from 200 geometries using a unique empirical bandwidth for all transition (δ = 0.05 eV). (b) Same as (a) but using GMM-NEA (red lines) and auto-δ (green lines). The shaded areas in (a,b) represent the reconstruction of 95% CIs. The inset in (b) details the contribution of three spectra in the region of maximum solar radiation. (c) Evolution of the photolysis rate J with the number of geometries used for reconstructing the electronic absorption spectra of HgBrOOH using GMM-NEA. Each line represents an independent experiment. The markers indicate the average over these experiments.