| Literature DB >> 34101036 |
Pavlo O Dral1,2, Fuchun Ge3, Bao-Xin Xue4,3, Yi-Fan Hou4,3, Max Pinheiro5, Jianxing Huang4,3, Mario Barbatti5.
Abstract
Atomistic machine learning (AML) simulations are used in chemistry at an ever-increasing pace. A large number of AML models has been developed, but their implementations are scattered among different packages, each with its own conventions for input and output. Thus, here we give an overview of our MLatom 2 software package, which provides an integrative platform for a wide variety of AML simulations by implementing from scratch and interfacing existing software for a range of state-of-the-art models. These include kernel method-based model types such as KREG (native implementation), sGDML, and GAP-SOAP as well as neural-network-based model types such as ANI, DeepPot-SE, and PhysNet. The theoretical foundations behind these methods are overviewed too. The modular structure of MLatom allows for easy extension to more AML model types. MLatom 2 also has many other capabilities useful for AML simulations, such as the support of custom descriptors, farthest-point and structure-based sampling, hyperparameter optimization, model evaluation, and automatic learning curve generation. It can also be used for such multi-step tasks as Δ-learning, self-correction approaches, and absorption spectrum simulation within the machine-learning nuclear-ensemble approach. Several of these MLatom 2 capabilities are showcased in application examples.Entities:
Keywords: Gaussian process regression; Kernel ridge regression; Machine learning; Neural networks; Quantum chemistry
Mesh:
Substances:
Year: 2021 PMID: 34101036 PMCID: PMC8187220 DOI: 10.1007/s41061-021-00339-5
Source DB: PubMed Journal: Top Curr Chem (Cham) ISSN: 2364-8961
Fig. 1Overview of tasks performed by MLatom
Fig. 2Creating a machine learning (ML) model with MLatom can involve automatic model selection (hyperparameter tuning) using different types of the training set split into sub-training and validation sets and different sampling procedures
Fig. 3Estimating the accuracy of the ML model
Fig. 4Flowchart for the learning curve task
Fig. 5Left Schematic representation of the machine learning–nuclear ensemble approach (ML-NEA). Right Implementation of ML-NEA for calculating absorption spectra. Blue quantum chemical (QC) data, orange ML
Fig. 6Flowchart for interfaces
Interfaced third-party software with their versions and ML model types tested here
| Program (version) | ML model types | Developers | Language | URL |
|---|---|---|---|---|
| sGDML [ | sGDML [ | Chmiela et al | Python | |
GAP (1598976566) QUIP (5c61598e4) | GAP [ | Csányi, Bartók, Kermode et al | Fortran | github.com/libAtoms/QUIP |
| TorchANI [ | ANI [ | Gao, Ramezanghorbani, Smith, Isayev, Roitberg, et al | Python | github.com/aiqm/torchani |
| DeePMD-kit [ | DPMD [ DeepPot-SE [ | Wang, Zhang, Han, E et al | C + + Python | github.com/deepmodeling/deepmd-kit |
| PhysNet [ | PhysNet [ | Unke, Meuwly | Python | github.com/MMunibas/PhysNet |
Main program developers, programming languages of the majority of the code, and URL addresses to access the program are provided. References to ML model types (and third-party packages themselves where available) are also given
Main tunable hyperparameters in the Gaussian approximation potential (GAP) model type and their corresponding keywords in the gap_fit program
| Hyperparameter | Keyword | Description | Default values in MLatoma |
|---|---|---|---|
| default_sigma | List of regularization parameters for energy, force, viral and hessian | {0.0005, 0.001, 0.1, 0.1} | |
| zeta | Power of kernel | 4 | |
| delta | Scaling of kernel | 1 | |
| cutoff | Cutoff radius | 6 | |
| cutoff_transition_width | Cutoff transition width | 0.5 | |
| n_max | Number of radial basis functions | 6 | |
| l_max | Number of angular basis functions | 6 | |
| atom_sigma | Gaussian smearing width of atom density | 0.5 |
aValues chosen to provide reasonable accuracy for a small molecule (ethanol) by manual testing on the MD17 data set [48]
Table of the main tunable hyperparameters in ANI model type related to the local AEV descriptor and their corresponding keywords in the TorchANI program
| Hyperparameter | Keyword | Description | Default values in MLatoma |
|---|---|---|---|
| Rcr | radial cutoff radius | 5.3 | |
| Rca | angular cutoff radius | 3.5 | |
| EtaR | radial smoothness in radial part | {16} | |
| ShfR | list of radial shifts in radial part | {0.90, 1.17, 1.44, 1.71, 1.98, 2.24, 2.51, 2.78, 3.05, 3.32, 3.59, 3.86, 4.12, 4.39, 4.66, 4.93} | |
| EtaA | radial smoothness in angular part | {8} | |
| ShfA | list of radial shifts in angular part | {0.90, 1.55, 2.20, 2.85} | |
| ShfZ | list of angular shifts | {0.19, 0.59, 0.98, 1.37, 1.77, 2.16, 2.55, 2.95} | |
| Zeta | angular smoothness | {32} |
Hyperparameters for neural networks are not listed
aTaken from the example script on the website of the program (https://aiqm.github.io/torchani-test-docs/examples/nnp_training.html)
Main hyperparameters in DeepPot-SE model type, and their corresponding keywords in the DeePMD-kit program. Hyperparameters for neural networks are not listed
| Hyperparameter | Keyword | Description | Default values in MLatoma |
|---|---|---|---|
| filter_neuron | filter_neuron | List of numbers of neurons in filter network | {30, 60} |
| n_axis_neuron | Number of columns in | 6 | |
| n_neuron | n_neuron | List of numbers of neurons in fitting net | {80, 80, 80} |
| rcut | Cutoff radius | 6.5 | |
| rcut_smth | Radius cutoff transition starts | 6.3 | |
| sel_a | sel_a | Maximum numbers of neighboring atoms | 10 for each element |
aTaken from [31]
Main tunable hyperparameters in PhysNet, and their corresponding keywords
| Hyperparameter | Keyword | Description | Default values in MLatoma |
|---|---|---|---|
| num_features | Number of input features | 128 | |
| num_basis | Number of radial basis functions | 64 | |
| num_blocks | Number of modules | 5 | |
| num_residual_atomic | Number of residual blocks after interaction | 2 | |
| num_residual_interaction | Number of residual blocks in interaction | 3 | |
| num_residual_output | Number of residual blocks in output block | 1 | |
| cutoff | Cutoff radius | 10 |
aTaken from [22]
Root-mean-squares errors (RMSEs) in energies and energy gradients for DeepPot-SE potential of ethanol potential energy surface trained on 1 k random training points for the independent test set of 20 k randomly chosen test points for hyperparameters start_lr and decay_rate
taken from the literature (Sets Aa and Bb) and optimized using MLatom’s interfaces.c
| Set A from [ | Set B from [ | Optimized | |
|---|---|---|---|
| start_lr (starting learning rate) | 0.005 | 0.001 | 0.005675 |
| decay_rate (decay rate) | 0.96 | 0.95 | 0.9688 |
| RMSE in energies (kcal/mol) | 0.96 | 3.20 | 0.74 |
| RMSE in gradients (kcal/mol/Å) | 2.53 | 6.36 | 1.77 |
aHyperparameters are taken for the DeepPot-SE model used for MD17 data set
bHyperparameters are taken for the DPMD model used for MD17 data set
cIn DeePMD-kit, a step decay schedule is used for learning rate decay. The related hyperparameter starting learning rate (start_lr) and the decay rate (decay_rate) were optimized, while the decay steps (decay_steps) were fixed to 200 with a stopping batch (stop_batch) set to 40,000. The search space was set to be from 0.0001 to 0.01 for starting learning rate and from 0.9 to 0.99 for the decay rate. Both spaces were set to be linear for 10 attempts of searching. The geometric mean of RMSE in energies and its gradients was used as the validation error
Fig. 7Part of input and output of MLatom for hyperparameter optimization of DeepPot-SE model using the interfaces to the hyperopt and DeePMD-kit packages
Fig. 8a Input file for learning curve task using the permuted RE descriptor with kernel ridge regression (KRR) (used with the Gaussian kernel, i.e., the ML model type is a permutationally invariant KREG). The scheme for the learning curve is defined with keywords lcNtrains and lcNrepeats. b A three-dimensional (3D) representation of an ethanol molecule. Atoms are numbered by their order in the MD17 data set [41]. Hydrogen atoms in methyl and methylene groups are permuted separately, as defined in the input using the option permInvNuclei = 4–5.6–7–8. c Model performances with different descriptors and training set sizes. Hyperparameter optimization was performed throughout. Markers and error bars show the mean and standard deviation values of RMSEs in predictions for 20 k independent test points. All data sets were randomly sampled
Fig. 9a A 3D representation of the CH3Cl molecule. Atoms are numbered by their order in the data set from reference [9]. Inside CH3Cl, hydrogen atoms 3, 4, and 5 are indistinguishable, and thus their permutations should result in no difference in molecular properties. b Sample input script for training ML models using the Δ-learning and structure-based sampling [5] for the selection of the training set. c ML energies vs. reference CCSD(T)/CBS energies. ML models were trained with the 10% points of the whole data set and were tested with the remaining 90% points. R2 is approaching 1 in all cases, with slightly larger values for more accurate models and thus are not shown for clarity. Right column Δ-learning models with MP2/aug-cc-pVQZ energies as a baseline. Left column ML model trained with reference CCSD(T)/CBS energies directly. Bottom row Data sets were split by random sampling. Top row Data sets were split by structure-based sampling
Fig. 10a Structure of the acridophosphine derivative molecule investigated here. MLatom input file and the list of additional required files for the ML-NEA calculations and the resulting spectrum. QC calculation details are defined in the Gaussian input files. Alternatively, the user can provide pre-calculated results (useful to refine spectra). The number of training points (200) was determined automatically by MLatom, and the resulting cross section ML-NEA is compared to the cross section obtained using traditional QC-NEA with the same points and single-point convolution approach (QC-SPC) based on broadening lines only for the equilibrium geometry. The broadening factor for QC-NEA is 0.05 eV and for QC-SPC 0.3 eV. The reference (ref) spectrum is the experimental cross section from [60]. b ML-NEA spectra with sample input file for 200, 250, 300, and 2 k training points. c Sample input file and spectra calculated with 50 k, 100 k, 150 k, 200 k, 300 k, 400 k, 500 k, and 1 M points in the nuclear ensemble. The spectra are shifted vertically for clarity