| Literature DB >> 31065514 |
Kunal Ghosh1,2, Annika Stuke2, Milica Todorović2, Peter Bjørn Jørgensen3, Mikkel N Schmidt3, Aki Vehtari1, Patrick Rinke2,4.
Abstract
Deep learning methods for the prediction of molecular excitation spectra are presented. For the example of the electronic density of states of 132k organic molecules, three different neural network architectures: multilayer perceptron (MLP), convolutional neural network (CNN), and deep tensor neural network (DTNN) are trained and assessed. The inputs for the neural networks are the coordinates and charges of the constituent atoms of each molecule. Already, the MLP is able to learn spectra, but the root mean square error (RMSE) is still as high as 0.3 eV. The learning quality improves significantly for the CNN (RMSE = 0.23 eV) and reaches its best performance for the DTNN (RMSE = 0.19 eV). Both CNN and DTNN capture even small nuances in the spectral shape. In a showcase application of this method, the structures of 10k previously unseen organic molecules are scanned and instant spectra predictions are obtained to identify molecules for potential applications.Entities:
Keywords: DFT calculations; artificial intelligence; excitation spectra; neural networks; organic molecules
Year: 2019 PMID: 31065514 PMCID: PMC6498126 DOI: 10.1002/advs.201801367
Source DB: PubMed Journal: Adv Sci (Weinh) ISSN: 2198-3844 Impact factor: 16.806
Figure 1a) Atomic structure of the N‐methyl‐N‐(2,2,2‐trifluoroethyl)formamide molecule and b) its corresponding Coulomb matrix representation.
Figure 2Canonical illustration of the three neural network types: a) the multilayer perceptron (MLP); b) the convolutional neural network (CNN); and c) the deep tensor neural network (DTNN). Green circles to the left represent the molecular input and yellow circles to the right the output (here 16 excitation energies or the molecular excitation spectrum). The gray blocks are schematics for fully connected hidden layers, convolutional blocks, pooling layers, and state vectors. Nodes corresponding to atom types in the DTNN are represented as blue squares and the distances matrix between different atoms as pink squares. Parameter tensors (red squares) project the vectors encoding atom types and the interatomic distance matrix into a vector with same dimensions as the atom type encodings. The DTNN is evaluated iteratively, building up more complex interactions between atoms with each iteration.
Figure 3Root mean square error (RMSE) and squared correlation (R 2) for the sixteen molecular excitations for the different neural network architectures and data sets. The states are labeled in descending order from the highest occupied molecular orbital (state number 0).
Summary of the RMSE for the 16 excitations and the RSE for spectra for the 6k and the 132k datasets. The results are averages over 5 runs, except for the spectra predictions of 132k dataset which were averaged over 3 runs. The resulting statistical error is at most ±0.003 and has therefore been omitted from this table
| Datasets → | 6k | 132k | ||
|---|---|---|---|---|
| Model ↓ | Levels [eV] | Spectra | Levels [eV] | Spectra |
| MLP | 0.317 | NA | NA | NA |
| CNN | 0.304 | 0.057 | 0.231 | 0.039 |
| DTNN | 0.251 | 0.051 | 0.186 | 0.029 |
Figure 4Comparison of CNN and DTNN spectra predictions: the first column depicts RSE histograms for 13 000 test molecules from the 132k dataset. The following three columns show the spectra of the best, an average, and one of the worst predictions compared to the corresponding reference spectrum. The colored circles mark the histogram positions of the selected molecules.
Figure 5Spectral scan of the 10k diastereoisomer dataset performed with the DTNN: a) histogram of molecules that have spectral intensity at a certain energy. The four molecules in the inset are outliers that give rise to the peak with lowest energy. b) The six molecules that have the highest ionization energy. c) Average spectrum of all molecules in the dataset (red line). The gray lines mark the averages of the ±1 confidence level of the DTNN predictions.