| Literature DB >> 33034008 |
Tobias Morawietz1, Nongnuch Artrith2.
Abstract
Atomistic simulations have become an invaluable tool for industrial applications ranging from the optimization of protein-ligand interactions for drug discovery to the design of new materials for energy applications. Here we review recent advances in the use of machine learning (ML) methods for accelerated simulations based on a quantum mechanical (QM) description of the system. We show how recent progress in ML methods has dramatically extended the applicability range of conventional QM-based simulations, allowing to calculate industrially relevant properties with enhanced accuracy, at reduced computational cost, and for length and time scales that would have otherwise not been accessible. We illustrate the benefits of ML-accelerated atomistic simulations for industrial R&D processes by showcasing relevant applications from two very different areas, drug discovery (pharmaceuticals) and energy materials. Writing from the perspective of both a molecular and a materials modeling scientist, this review aims to provide a unified picture of the impact of ML-accelerated atomistic simulations on the pharmaceutical, chemical, and materials industries and gives an outlook on the exciting opportunities that could emerge in the future.Entities:
Keywords: Drug discovery; Energy materials; Industrial applications; Machine learning; Neural networks; Quantum mechanics
Mesh:
Substances:
Year: 2020 PMID: 33034008 PMCID: PMC8018928 DOI: 10.1007/s10822-020-00346-6
Source DB: PubMed Journal: J Comput Aided Mol Des ISSN: 0920-654X Impact factor: 3.686
Fig. 1Atomistic simulation methods can be broadly categorized into two classes depending on the way the system is described: using quantum mechanical (QM) calculations based on the electronic structure or molecular mechanics (MM) methods with predefined functional forms. Due to their higher computational cost QM-based simulations are limited to smaller systems while MM-based methods are more efficient but rely on many approximations and are often derived from experimental input. The goal of QM-based machine learning is to raise the efficiency of QM methods without sacrificing their transferability, predictive power and ability to describe complex bonding patterns including the breaking and forming of chemical bonds
Fig. 2Workflow for machine learning-accelerated atomistic simulations: first, reference calculations are performed for a set of configurations using a quantum mechanical (QM) method such as density-functional theory (DFT). The resulting QM energies (and potentially forces) are then used to train a machine learning model that maps the atomic structure to its corresponding energy and by that learns the potential-energy surface (PES) of the atomistic system. Once trained, the resulting ML model yields a continuous representation of the PES that can be efficiently evaluated and allows to perform molecular dynamics (MD) or Monte Carlo (MD) simulations for larger systems and on longer time-scales than possible with direct QM-based simulations
Fig. 3Diagram of the high-dimensional neural network that combines the atomic ANNs of all atoms in a structure for an N-atom system. The output is the total energy E, which is the sum of the individual atomic energy contributions , which are in turn the outputs of atomic feed-forward ANNs
Fig. 4Diagram of the high-dimensional neural network potential for multicomponent systems: The total energy of the system is obtained as a sum of a short-range energy (E) obtained as shown in Fig. 3 and a long-range electrostatic energy (E), which is calculated from atomic charges . Both the short-range atomic energies and the atomic charges depend on the local atomic environments and are constructed by atomic ANNs [96]
Fig. 5Schematic of radial and angular descriptors used for the representation of local atomic environments (left). The descriptor functions extract features that are used as input values for atomic energy ANNs. Separate ANNs for each atomic species (chemical element) are trained, so that the total energy of a binary material consist of two terms (right)
Fig. 6The Chebyshev descriptor (implemented in ænet [127]) enables the simulation of multicomponent compositions with many different chemical species. (a) Basis functions of Eqs. (9) and (10) (Chebyshev polynomials) up to order 5 for a cutoff radius of 8.0 Å. The polynomial of order 0 is constant 1 and not shown. (b) and (c) show the accuracy of artificial neural network (ANN) potentials in terms of the root-mean-squared error (RMSE) compared to the QM reference method (DFT) as function of the size of the structural fingerprint (descriptor) for (b) an inorganic solid () with increasing number of chemical species (from the set Li, O, Ti, Ni, Mn, Sc, V, Cr, Fe, Co, and Cu) and (c) a data set with conformations of the 20 proteinogenic amino acids (5 chemical species: H, C, N, O, S; green diamonds) and their complexes with divalent cations (amino acid data taken from Ref. [131]). (Reproduced with permission from Ref. [132]. Copyright (2017), American Physical Society.)
Fig. 7Illustration of the systematic construction of ML potentials through the refinement of the reference data set in an active learning setup. The error , i.e., the difference between the reference DFT and the ANN energies, for structures obtained in MD simulations decreases upon each iteration, from Fit 1 to Fit 3, as the sampling of the configurational space improves. (Adapted with permission from Ref. [102])
Fig. 8The computational complexity of ANN potentials scales linearly with the number of atoms. The plot shows the evaluation time per atom as function of the number of atoms for periodic structures with increasing size up to one million atoms. (Reproduced with permission from [127])
Examples of properties calculated from machine learning (ML) potential simulations or using ML models based on quantum mechanical reference data compared to reference values, where available
| Property | System | ML Prediction | Reference value | Year | Refs. |
|---|---|---|---|---|---|
| Drug discovery | |||||
| Reaction free energy | Glycine proton transfer | 7.7 kcal/mol | DFT: 8.1 kcal/mol | 2018 | [ |
| Reaction barrier | Glycine proton transfer | 9.9 kcal/mol | DFT: 10.2 kcal/mol | 2018 | [ |
| Solvation free energy | Acetic acid | DFT: | 2019 | [ | |
| Acetamide | DFT: | 2019 | [ | ||
| Acetone | DFT: | 2019 | [ | ||
| Benzene | DFT: | 2019 | [ | ||
| Ethanol | DFT: | 2019 | [ | ||
| Methylamine | DFT: | 2019 | [ | ||
| Aqueous LiF pair | Exp.[ | 2020 | [ | ||
| Li-ion batteries | |||||
| Li diffusivity | Exp.[ | 2019 | [ | ||
| Activation energy | 0.5 − 0.8 eV | N/A | 2019 | [ | |
| 1.21-1.46 eV | Exp.[ | 2020 | [ | ||
| Amorphous- | 0.55 eV | Exp.[ | 2017 | [ | |
| 0.16 eV | Exp.[ | 2020 | [ | ||
| 0.2 − 0.22 eV | Exp.[ | 2020 | [ | ||
| 0.56 ± 0.05 eV | N/A | 2020 | [ | ||
| 0.62 ± 0.04 eV | Exp.: [ | 2020 | [ | ||
| 0.79 ± 0.10 eV | Exp.: [ | 2020 | [ | ||
| LiCl | 1.11 ± 0.13 eV | Exp.: [ | 2020 | [ | |
x is the relative lithium content in the amorphous Li-Si alloys and varies during battery charge and discharge
Fig. 9Machine learning simulations for free energy calculations: a Intramolecular proton transfer reaction of glycine in water by Shen and Yang [137] using a QM/MM-NN setup in which an MLP is iteratively trained (top) to represent the energy difference between a low-level (DFTB) and a high-level (B3LYP) QM method. In the final iteration (bottom) the MLP correctly predicts the zwitterionic glycine tautomer as the predominant form, improving on the inaccurate description with the low-level method. b Solvation free energy of LiF in water by Jinnouchi et al. [140] obtained from MLP-accelerated simulations trained on only the thermodynamic endpoints. The top panels show snapshots from thermodynamic integration simulations that correspond to the fully interacting system (left) and the system at small interactions (right), respectively. In the bottom panel pair-correlation functions of LiF in water obtained from the MLP (black line) are compared to results from QM simulations (red dashed line). A comparison of the ion solvation free energies is reported in Table 1
Fig. 10Machine learning prediction of spectroscopic properties: a IR spectrum of the protonated alanine tripeptide by Gastegger et al. [168] obtained from a composite ML approach in which the interatomic potential and the molecular dipoles are represented by individual ML models (Reproduced with permission from Ref. [168]—Published by The Royal Society of Chemistry). In the top panel, the calculated spectrum obtained from ML models representing two different QM methods (BP86 and BLYP) is compared to the experimental spectrum [169]. The bottom panels show spectral contributions from the three main conformers. b Temperature-dependent Raman spectra of liquid water by Morawietz et al. [170, 171] calculated from MLP simulations and compared to experimental measurements. As shown in the top panels (Reprinted with permission from Ref. [170]. Copyright 2018 American Chemical Society), MLP-based simulations are able to accurately capture subtle spectral features like the bimodal OH stretching region and allow to identify molecules in overcoordinated environments by linking vibrational motion to structural parameters (bottom panel, Reprinted with permission from Ref. [171]. Copyright 2019 American Chemical Society)
Fig. 11Machine learning prediction of spectroscopic properties: Anharmonic Raman spectra of the Paracetamol crystal in forms I and II by Raimbault et al. [172] calculated with an ML model (SA-GPR) of the polarizability tensor trained on form I only. The top panels show the low- and high-frequency parts of the Raman spectrum for form I compared to the reference QM results (ab initio). ML results were obtained from an ensemble of 16 models from which uncertainties have been estimated (shaded area). The results for form II in the bottom panels demonstrate the high transferability of the ML model which can accurately represent the overall lineshape of the unseen molecular crystal
Fig. 12ML-based simulations for the exploration of phase diagrams of inorganic materials. a Temperature and pressure dependent phase diagram of potassium obtained from MD simulations using an ML potential [192]. Each point in the figure represents the result from an individual ML-based MD simulation in the NVT statistical ensemble. Symbols distinguish between different equilibrium phases. (Reproduced with permission from [192]). b Phase diagram of gallium nucleation from the melt using metadynamics MD simulations with an ML potential [193]. The predicted phase diagram (red lines) is compared to the experimentally measured phase diagram (blue lines). c Crystal structures of the CuZr alloys and of the Cu and Zr constituents used for training of an ANN potential by Andolina et al. [194]. The ANN potential trained on the crystalline phases was shown to predict the properties of the amorphous CuZr alloy with remarkable accuracy
Fig. 13ML potential simulation of catalyst materials: a surface phase diagrams of the of low-index surfaces of the alloy with different terminations (Au, Cu, and mixed) as function of the Au/Cu chemical potentials, as predicted by DFT (top) and by an ANN potential (bottom). Symbols denote different facets, and surface terminations are indicated by line types and colors (yellow = Au terminated, blue = mixed, green = Cu). Exemplary Wulff constructions corresponding to three different chemical potentials are also shown. (Reproduced with permission from Ref. [202]) b Formation energies and convex hull construction for CuAu nanoparticles with 55 atoms. Different colors and point sizes indicate different chemical potentials used in grand canonical () MC simulations. (Reproduced with permission from Ref. [203].). c Low-energy structures of Pt nanoparticles in hydrogen atmosphere. The energies of the particle structures are shown relative to the most stable configuration. Statistics of the Pt-Pt nearest neighbor distances and the average Pt coordination number as function of the relative energy are shown in panel (d). (Reproduced with permission from Ref. [205])
Fig. 14Lithium transport in amorphous silicon anodes for lithium-ion batteries. a Atomic structures of alloy nanoparticle during delithiation (battery discharge) [145]. The change of the composition in the core of the nanoparticles is shown in subfigure (a.I), and the change in the Si coordination numbers are shown in (a.II and a.III). Panels (a.IV) and (a.V) show an Arrhenius plot with the temperature-dependent lithium diffusivity in bulk amorphous LiSi alloys and representative bulk structures for different Li:Si ratios, respectively. b Formation energies of amorphous LiSi structures as predicted by two different ANN potential approaches (ANN and INN, implanted neural networks) compared to the DFT reference energies [213]. (c) Arrhenius plot for Li diffusion in different amorphous silicon structures (left) and visualization of the electron localization function (ELF) for different structural motifs in the amorphous LiSi, Li bonding to an undercoordinated Si atom (top) and Li bonding to a fully coordinated Si atom (bottom). The numbers indicate the Bader charges of the Li and Si atoms. (Reproduced with permission from reference [147])
Fig. 15Machine learning simulations for solid-state batteries: a Arrhenius plot with Li diffusivities obtained from ab initio MD (AIMD) simulations using a learning-on-the-fly (LOTF) ML potential based on the MTP method [153]. The ML potential simulations make low temperatures accessible that are closer to room temperature, whereas conventional AIMD simulations are limited to very high temperatures that are not relevant for battery operation. (Reproduced with permission from Ref. [153]) b Representative structure (I) and DFT phase diagram (II) of LiPON near-ground-state crystal structures [214]. Two different composition lines for nitrogen doping are indicated in yellow (Li replacement) and green (Li addition), respectively. Panel (III) shows the corresponding defect formation energies for nitrogen doping, as calculated with ANN-potential augmented sampling and DFT calculations. All defect structures are predicted to be unstable with respect to decomposition into , , and , showing that amorphous LiPON is metastable. Nitrogen doping via Li replacement is thermodynamically favored over doping with Li addition. (Reproduced with permission from Ref. [214])