Huilin Ye1, Weikang Xian1, Ying Li1,2. 1. Department of Mechanical Engineering, University of Connecticut, Storrs, Connecticut 06269, United States. 2. Polymer Program, Institute of Materials Science, University of Connecticut, Storrs, Connecticut 06269, United States.
Abstract
Machine learning (ML) has emerged as one of the most powerful tools transforming all areas of science and engineering. The nature of molecular dynamics (MD) simulations, complex and time-consuming calculations, makes them particularly suitable for ML research. This review article focuses on recent advancements in developing efficient and accurate coarse-grained (CG) models using various ML methods, in terms of regulating the coarse-graining process, constructing adequate descriptors/features, generating representative training data sets, and optimization of the loss function. Two classes of the CG models are introduced: bottom-up and top-down CG methods. To illustrate these methods and demonstrate the open methodological questions, we survey several important principles in constructing CG models and how these are incorporated into ML methods and improved with specific learning techniques. Finally, we discuss some key aspects of developing machine-learned CG models with high accuracy and efficiency. Besides, we describe how these aspects are tackled in state-of-the-art methods and which remain to be addressed in the near future. We expect that these machine-learned CG models can address thermodynamic consistent, transferable, and representative issues in classical CG models.
Machine learning (ML) has emerged as one of the most powerful tools transforming all areas of science and engineering. The nature of molecular dynamics (MD) simulations, complex and time-consuming calculations, makes them particularly suitable for ML research. This review article focuses on recent advancements in developing efficient and accurate coarse-grained (CG) models using various ML methods, in terms of regulating the coarse-graining process, constructing adequate descriptors/features, generating representative training data sets, and optimization of the loss function. Two classes of the CG models are introduced: bottom-up and top-down CG methods. To illustrate these methods and demonstrate the open methodological questions, we survey several important principles in constructing CG models and how these are incorporated into ML methods and improved with specific learning techniques. Finally, we discuss some key aspects of developing machine-learned CG models with high accuracy and efficiency. Besides, we describe how these aspects are tackled in state-of-the-art methods and which remain to be addressed in the near future. We expect that these machine-learned CG models can address thermodynamic consistent, transferable, and representative issues in classical CG models.
Over the past decades,
molecular dynamics (MD) simulations have
become one of the most important computational techniques to study
the relationship between the properties of materials and the interactions
of atoms, especially with the advances of computational resources,
i.e., high-performance computing (HPC).[1] Although it has achieved great success, the progress driven by the
demand to model more complex systems across multiple spatial and temporal
scales is severely restrained by the limitations of computing aspects.[2] As shown in Figure a, the sweet spot for MD simulations is confined
by the boundaries of memory limit (size of the system), communication
limit (time across multiscale), and parallel limit (HPC). To model
a larger system or a longer time, coarse-graining is required to overcome
the current limitations and push the boundaries of MD simulations
for a broad range of applications, such as self-assembly of organic
molecules and crystallization of polymers.[3−5]
Figure 1
(a) Boundaries of all-atom
molecular dynamics (MD) simulation using
high performance computing facilities in terms of system size and
time scale. (b) Schematic to show two types of coarse-grained methods:
bottom-up and top-down in different time and length scales. Besides,
the coarse-graining process of a polypeptide, polyalanine, is shown
in the inset.
(a) Boundaries of all-atom
molecular dynamics (MD) simulation using
high performance computing facilities in terms of system size and
time scale. (b) Schematic to show two types of coarse-grained methods:
bottom-up and top-down in different time and length scales. Besides,
the coarse-graining process of a polypeptide, polyalanine, is shown
in the inset.The central problem for coarse-graining
is how to link simulations
of detailed models with simulations of coarse-grained (CG) ones through
the propagation of information. Note that the CG model enables efficient
simulations covering different spatial and temporal scales, from the
quantum mechanics with the highest level of accuracy to the macroscopic
continuum model that depends on the theoretical constitutive law or
experimental empirical relationship. According to the direction of
information propagation, CG methods can be classified into two different
categories: bottom-up[3] and top-down[6] approaches. For the bottom-up methods, the fundamental
physical principles at the smaller scales are used to parametrize
a model at a CG scale; while the behavior at larger scales is used
to inform the interactions at smaller scales in top-down CG methods.
When designing a CG model, the first step is to define a pseudo atom
or CG site. These sites can be designed to represent combined groups
of multiple atoms or functional groups. For example, Figure b shows the coarse-graining
process of polyalanine. The pseudo atom sites can be either groups
of hydrogens or functional groups. The second key aspect in developing
the CG model is to define an effective energy function U for the CG model, which determines the interactions between pseudo
atoms.In the top-down CG methods, the energy function U is constructed and parametrized by either atomistic simulations
or experiments to reproduce target properties, such as density, diffusivity,
and partition energy. For example, Marrink et al.[6] developed the MARTINI model for biomolecular simulations,
especially for the modeling of lipids and proteins. In the MARTINI
model, the force field is parametrized in a systematic way to reproduce
the free energies between polar and apolar phases of a large number
of chemical compounds through increasing the number of possible interaction
levels of the CG sites. On the contrary, in the bottom-up CG methods,
the target properties are not used to optimize the potentials; instead,
they are predicted by the CG potentials. Typical bottom-up methods
use structure- and force-based approaches, depending on the target
quantities. If the method aims to reproduce the structural distributions
given by all-atom (AA) simulations, then it is structure-based, which
is known as the iterative Boltzmann inversion (IBI) method, the inverse
Monte Carlo (IMC) method, and the relative entropy method.[5] The force-matching method intends to match the
force distribution of the CG model to that obtained from AA molecular
simulations. Each method has its advantages and disadvantages: the
IBI method can reproduce the correct structural distributions and
conformations but misses the right thermodynamics and many-body free-energy
landscape; while the force-matching method captures many-body effects
and proper dynamics, it can misrepresent the structural distributions.Although these CG models are straightforward and can model much
larger systems with reasonable computation costs, they still suffer
from some important issues in terms of thermodynamic representability,
transferability, and consistency.[5] The
CG model cannot adequately reproduce the properties of the AA system
due to the degeneration of freedoms during the CG process. Some properties
that are highly sensitive to small-scale phenomena will be lost in
the simulations using CG models.[7] Thus,
the CG process from AA coordinates to CG sites strongly depends on
the understanding of the physical problem and chemical intuition,
and a poor CG process will lose more important information. Besides,
both CG methods take distributions from AA molecular simulations for
a specific thermodynamic state. Therefore, the derived effective CG
potentials have limited thermodynamic transferability to other conditions,
such as temperature and pressure. It results in the limited applications
of the CG potentials into a subset of thermodynamics states. Last
but not the least, the thermodynamic consistency can be influenced
by the CG process. The most common inconsistency is caused by the
dilemma of balancing structural and dynamic properties. Taking the
IBI method as an example, the forces that CG sites bear are not rigorously
reproduced as the optimization objective is the preservation of the
structural distributions, compared to the force-matching method. Consequently,
the optimized CG model, based on the IBI method, usually shows faster
dynamics. It requires a dynamic rescaling factor, despite faithfully
preserving structural distributions and properties.[5]Recently, machine learning (ML) of potentials is
emerging as an
alternative approach, in comparison with classical force fields with
explicit functions. This method represents potential-energy surfaces
by training large data sets from density-functional theory (DFT) calculations.
Machine-learned AA potentials have excellent representability, accuracy,
efficiency, and thermodynamic transferability, in comparison to DFT
simulations. Similarly, machine-learned CG potentials are proposed
to represent the free-energy landscape of AA molecular models with
high efficiency and accuracy. Thus, ML of CG potentials becomes a
new way to bridge the gap between accurate but computationally expensive ab initio methods and approximate but computationally cheap
CG method.[8−19] Particularly, machine-learned CG potentials can predict the physical
properties of complex molecular systems with ab initio accuracy. The general idea of this approach is to use statistical
learning techniques, such as artificial neural network (ANN),[13] deep neural network (DNN),[12] and convolutional neural network (CNN),[18] to name a few, to formulate complex potentials with many
model parameters, which are undetermined and optimized using ab initio or AA molecular simulation results as references.
Similarly, based on the classes of CG methods, the machine-learned
CG methods have two categories: bottom-up machine-learned CG method[9−11] and top-down machine-learned CG method,[12−16] as listed in Table .
Table 1
Classification of Different Machine-Learned
CG Modelsa
encoder and decoder Gubmel-softmax
reparametrization
all-atom MD simulation
Kernel-based[14]
kernel optimization
Hessian kernel mean force estimation
atomistic simulation
Graph-based[16]
graph-theoretic principles
adjacency matrix structure of material
atomistic
simulation
(p, q) in the architecture column are the numbers of nodes in
the layers of the neural network.
(p, q) in the architecture column are the numbers of nodes in
the layers of the neural network.For the top-down machine-learned CG method, an empirical
expression
of the interactions between CG particles is given in prior, such as Tersoff–Brenner potential for CGwater[9] and Lennard-Jones potential for simple molecular
liquids,[10] to perform MD simulations. Usually,
the potential parameters in these expressions are undetermined and
optimized by statistical learning techniques to match the structural
properties (e.g., radial distribution functions, angular distribution
functions) and dynamics properties (e.g., diffusion coefficient) with
the corresponding experimental/computational results. On the contrary,
in the bottom-up machine-learned CG method, an unknown CG potential U should be introduced and U is related
to the representations (features associated with the coordinates of
CG particles) through high-dimensional neural networks.[12] The neural networks can be trained and optimized
by using the data set from AA molecular simulations. For instance,
the mean force is used to regularize the optimization of the loss
function in the learning process. In this study, we will review some
of the typical machine-learned CG methods in these two categories.
And we will focus more on the ML framework, the validation of the
trained model, and its applications. We will also discuss the open
problems and challenges facing the ML of CG models and possible solutions.
We expect machine-learned CG models can play an important role in
understanding the physical and mechanical properties of complex molecular
systems and hope that this work inspires future studies in this field.
Top-down Machine-Learned CG Methods
Machine-Learned
Bond Order Potential for CG
Water
Accurate and efficient molecular models for water molecules
are highly desirable to understand their phase transformation behaviors
at different temperatures or pressures. However, it remains challenging
to develop robust and efficient molecular models for water molecules.
Recently, Chan and co-workers[9] have introduced
an ML workflow to train the CG models that can accurately describe
the behaviors of ice, supercooled, and normal liquid water at the
mesoscopic scale. Two bond-order CG models are utilized: bond-order
potential (BOP) and BOP with on-the-fly dihedrals (BOPdih), which are both based on the Tersoff–Brenner potential with
7 and 11 parameters undetermined and to be optimized, respectively.In conventional MD simulations, these parameters in the potential
functions are given a priori. It limits the application
of the potential, and modeling systems at different thermodynamic
states requires the recalibration of the potential parameters. In
this study, the CG potential parameters are optimized by a multilevel
evolutionary strategy (hierarchical objective genetic algorithm–HOGA)
as shown in Figure a. A two-stage optimization technique is introduced to find the global
minimum in the objective value landscape. During the global optimization,
the genetic algorithm[20] is used. It begins
with the random sampling of the parameter sets N. The objective value for each of these parameter
sets is evaluated, and the convergence is checked until the convergence
criteria are met. If not, a new list of N is generated using genetic operations. Typically,
this stage can return a list of close-to-optimal parameter sets. And
using these parameter sets, the second stage local optimization technique
starts to search the final parameter sets with the Nelder–Mead
simplex algorithm.
Figure 2
(a) Work flow for ML CG model of water molecules through
hierarchical
objective genetic algorithm. Two stages: global minimization and local
minimization are shown. (b) Comparisons of the results of density
anomaly, diffusion coefficients, radial distribution functions and
heat capacity between the experiments and predictions by ML CG model.
(c) Application of ML model to predict the nucleation and crystallization
process of water (Reprinted with permission from the work of Chan
et al.[9] Copyright 2019 Springer Nature
Limited).
(a) Work flow for ML CG model of water molecules through
hierarchical
objective genetic algorithm. Two stages: global minimization and local
minimization are shown. (b) Comparisons of the results of density
anomaly, diffusion coefficients, radial distribution functions and
heat capacity between the experiments and predictions by ML CG model.
(c) Application of ML model to predict the nucleation and crystallization
process of water (Reprinted with permission from the work of Chan
et al.[9] Copyright 2019 Springer Nature
Limited).The extensive training data set
of energies and structural properties
for ice and liquid water are taken from both the atomistic model (TIP4/2005)
through MD simulations and experimental data. The data set is chosen
with the requirement to ensure the adequate representation of the
diverse configurational space of ice and liquid water. The trained
CG model is first validated by comparing the structural and dynamic
properties with experimental data, such as radial distribution function
(RDF), angular distribution function (ADF), and diffusion coefficient.
From Figure b, it
is found that the ML-BOP models can successfully capture the best-known
thermodynamic anomaly and the existence of a density maximum at 277
K. Also, it can correctly describe the freezing/melting transition
at 273 ± 1 K, densities of ice (140–273 K), and water
(243–273 K) within 1.4% of experiments. For the transport properties
of water, the room temperature diffusivity is calibrated as ∼3
× 10–5 cm2 s–1, which is close to that in experiments (2.3 × 10–5 cm2 s–1). Regarding the structural
property, the O–O RDFs for ice at 77 K and water at 254 K are
compared with the experiments. As shown in Figure b, the location and intensities of the peaks
of these RDF are in good agreement with the corresponding experimental
results. Finally, the heat capacities C for liquid water are investigated. It can reproduce
the thermodynamic anomaly indicated by the sharp increase in C of supercooled water (cf. Figure b). After the validation,
a representative test case is performed on multimillion water molecules
to study nucleation of supercooled water, leading up to the formation
and growth of grains as shown in Figure c. The water is slowly cooled from 275 to
210 K over 130.4 ns. When the first nucleation forms, the temperature
is held at 210 K for another 100 ns, and the first nucleation event
is followed by a slow transformation (150 ns), an accelerated transformation
of a large number of nuclei (200 ns), and completion of grain growth
to form polycrystalline ice.This workflow demonstrates the
high accuracy of the machine-learned
CG model in capturing the structural and dynamic properties of water,
in comparison to the experimental results. Besides, this workflow
shows almost the same computational cost as the other CG method—the
monatomic water (mW) model—but with higher accuracy. Nevertheless,
this workflow suffers from some limitations. First, it is partially
temperature-transferable. It can only reproduce the thermodynamics
within the regime which is covered in the training data set. Second,
it cannot fully reproduce all the physical properties. For example,
to improve the predictions on diffusion coefficient and temperature
of maximum density, the prediction accuracy of the melting point is
sacrificed.
Transfer-Learning-Based
CG Method for Simple
Molecular Fluids
In MD simulations, once the interatomic
potential is specified, many properties like RDF and ADF are determined.
However, the inverse problem–parametrization of the potential
for a specific property is not straightforward. Moradzadeh et al.[10] adopted DNN to study this inverse problem, in
terms of the relation between the RDF and the Lennard-Jones (LJ) potential
parameters. Figure a shows the schematic of the DNN. The training data is taken from
MD simulations with the potential parameters and thermodynamic states
(density ρ and temperature T) sampled over
a wide range. Here, the LJ potential has the formwhere C12 and C6 are potential
parameters. After performing
the MD simulations, the RDFs g(r) under specific thermodynamic states (ρ, T) are collected. And the complex relationship between potential parameters
(C12 and C6) and RDFs can be expressed as (C12, C6) = f(g(r); ρ, T), where f is
a vector function. In this work, a feed-forward neural network (FFNN)
is adopted to represent this function based on the universal approximation
theorem as x = (g(r), ρ1, ρ2, ..., ρ, T1, T2, ..., T,), where x is the input vector composed
of the concatenation of system i RDF (size of n) and thermodynamic states (each with a size of p) in the data set with a total size of m. The node in DNN applies a linear transformation after receiving
the input signal and is followed by a nonlinear activation function,
resulting in an output signal. Finally, the DNN can be mathematically
expressed aswhere ϕ is the
nonlinear activation function of layer k, W and b are weights and biases. The FFNN is trained
based on the loss function: mean-squared error (MSE) between the ground-truth
and the predicted parameterswhere θ represents
the free parameters
like the weights and biases. v( and v((θ, x)
are the ground-truth and predicted LJ interaction parameters.
Figure 3
(a) Schematic
of a deep neural network (DNN). (b) Comparison of
the pair potential parameters determined from the DNN with the ground-truth
values for training, validation, testing, and transferability data
sets. (c) Comparison of RDFs obtained with DNN, relative entropy,
and simplex CG models and the all-atom (AA) model. Three types of
molecules are studied: CO, F2, and CH4 (Reproduced
from the work of Moradzadeh et al.[10] Copyright
2019 American Chemical Society).
(a) Schematic
of a deep neural network (DNN). (b) Comparison of
the pair potential parameters determined from the DNN with the ground-truth
values for training, validation, testing, and transferability data
sets. (c) Comparison of RDFs obtained with DNN, relative entropy,
and simplex CG models and the all-atom (AA) model. Three types of
molecules are studied: CO, F2, and CH4 (Reproduced
from the work of Moradzadeh et al.[10] Copyright
2019 American Chemical Society).The performance of DNN is examined by investigating two cases.
The first one is the generalizability and transferability of the interatomic
potential parameters. The generalizability refers to the use of DNN
to estimate the case, which is not part of the training data set,
but falls within the range of the training data set; the transferability
refers to the use of DNN to estimate the case, which is outside the
range of the training data set. One-to-one comparison between the
ground-truth (denoted by subscript GT) and predicted (denoted by subscript
DL) potential parameters are shown in Figure b. The left four and right four panels are
for C12 and C6, respectively. In each panel, the first one is for the training
process; the second one shows the validation; the third one gives
the generalizability, and the last one demonstrates the transferability.
The accuracy is quantified by introducing the mean absolute percentage
error (MAPE), defined as . The solid black line represents that the
predicted parameters are exactly consistent with the ground-truth
parameters. The dashed black lines are the boundaries in which the
∼99% of prediction points are located, in comparison to the
ground-truth of the training data set. The low MAPE in Figure b demonstrates the DNN has
good generalizability and transferability.The second one is
transfer learning, which is studied by developing
single-bead CG models for simple molecular liquids such as carbon
monoxide, fluorine, and methane. The corresponding RDFs predicted
from DNN, simplex, relative entropy method, and AA molecular simulations
are shown in Figure c. It indicates that DNN prediction has a high accuracy from these
comparisons. However, this high accuracy is only limited to simple
molecular liquids. For complex molecules that cannot be described
by the simple LJ potential functions, the CG process to a single LJ
particle will lose plenty of information, leading to the wrong predictions
of both structural and dynamic properties.
ANN-Assisted
Particle Swarm Optimization for
Transferable CG Models
Particle swarm optimization (PSO),
inspired by Mother Nature and first applied to animate social behaviors,
is an iterative global optimization technique powered by population
migration. Although it is capable of nonlinear optimization, the convergence
becomes slow when applied to force field development because of the
increasing amount of expensive MD simulations. In addition, due to
its population-based nature, its optimization path relies on only
the local bests and the global bests, instead of all the MD results.
Therefore, a cost-effective way to harness all the costly simulation
data is desirable. Recently, Bejagam et al.[11] developed an artificial neural network (ANN) assisted-PSO framework
to optimize the potential parameters for CG models.In a traditional
PSO optimization, a particle represents a state in the multidimensional
optimization space. In the case of CG potential development, the state
represents the potential parameters. In each iteration of the PSO,
MD simulations are first performed with particle-specified potential
parameters. By comparing each of the MD results and target properties,
a fitness value, which quantifies the particle-specific deviation
from the target properties, is assigned to the particle as a personal
score. For each particle, the local best will be updated if the current
fitness value is higher than all the previous values. The best of
all the local bests is assigned as the global optimized point for
the current iteration and used to guide the next iteration. The process
of optimization is performed until the global best particle (the best
potential parameters) being capable of producing the target properties
with a satisfactory small, usually 2–5%, error.On top
of the classic PSO framework, an ANN is used to accelerate
the convergence of the PSO. As shown in Figure a, in each iteration of the PSO, after the
MD runs, all the MD data are used to train a dynamic ANN. The ANN
then makes a prediction of an extra particle (a set of potential parameters).
This extra particle is fed back to the group of particles in the PSO.
In the next iteration, the new group of particles is used to advance
the PSO. Thus, the combination of PSO and ANN algorithms could dramatically
accelerate the searching process for the best solution for CG potential
parameters.
Figure 4
Artificial neuron network (ANN) assisted particle swarm optimization
(PSO). (a) Flowchart of the ANN-PSO. In every iteration, the ANN trained
by data collection of all the previous iterations provides a new particle
(a set of force field parameters). (b) Molecules D2O and
DMF coarse-grained to nonpolar beads. (c) Optimization of the D2O. Error histories of systems with 40, 8, and 4 particles.
(d) Optimization of the DMF. Error histories of systems with 40, 8,
and 4 particles (Reproduced from the work of Bejagam et al.[11] Copyright 2018 American Chemical Society).
Artificial neuron network (ANN) assisted particle swarm optimization
(PSO). (a) Flowchart of the ANN-PSO. In every iteration, the ANN trained
by data collection of all the previous iterations provides a new particle
(a set of force field parameters). (b) Molecules D2O and
DMF coarse-grained to nonpolar beads. (c) Optimization of the D2O. Error histories of systems with 40, 8, and 4 particles.
(d) Optimization of the DMF. Error histories of systems with 40, 8,
and 4 particles (Reproduced from the work of Bejagam et al.[11] Copyright 2018 American Chemical Society).Molecules of D2O and DMF, as depicted
in Figure b, are used
to demonstrate
the competence of the proposed framework. The molecules are coarse-grained
as one and two nonpolarizable beads, respectively. The single-bead
CG model for the D2O needs two parameters, i.e., the ϵ
and σ, in 12–6 LJ potential, to descried the interaction
between beads. The model for DMF needs five parameters: kb the strength describing the bonding between the AM beads
and the CGD2 beads, ϵAM and σAM for
the AM beads, and ϵCGD2 and σCGD2 for the CGD2 beads respectively. The results of the optimization
for the D2O and the DMF are shown in Figure c and d, respectively. The error of each
iteration is calculated as the root-mean-square error between the
optimization and the experimental target values of the densities and
diffusion coefficients. The particle numbers used in the optimization
are 40, 8, and 4 from the left to the right, respectively. In all
the cases, the ANN-PSO method shows better prediction and fast convergence
with smaller errors than the PSO method.In this study, the
ANN is used to exploit all the MD results that
the PSO produces to accelerate the convergence of the PSO, reducing
the demand for expensive MD simulations. However, because one still
needs to predefine the CG potential forms (12–6 LJ form in
this study). Generally speaking, this framework can only preserve
limited structural and dynamic properties for the CG models.
Bottom-up Machine-Learned CG Approaches
DeepCG:
Constructing a CG Model via DNN
Defining an accurate free
energy function in the space of CG variables
is always the key to developing a CG model. But, it is the most difficult
part, which requires substantial physical and chemical intuitions.
ML methods can address this problem in a more accurate and automated
way. Nevertheless, most of the ML approaches so far focus on the representation
of potential energy surface, such as the deep potential method.[21] It allows us to perform MD simulations with
comparable accuracy as the ab initio molecular dynamics
(AIMD) but only at the cost of classical empirical force fields. To
find a good ML approach to represent the free energy surface, Zhang
et al.[12] developed the deep coarse-grained
potential (DeePCG) scheme. Note that the ML of AA and CG models tends
to represent potential-energy and free-energy surfaces, respectively,
by high-dimensional neural networks.In DeePCG, a neural network
representation U(ξ) is adopted for the CG potential U(ξ) (see Figure a). Here, ξ variables are the coordinates
of the CG particles. The CG potential is assumed to be the sum of
the local contributions of the CG particles, i.e., , is the potential contribution of the CG
particle i. It is constructed in two steps: (1) The
first is local frame transformation according to the descriptor {D}, which includes radial distance and angular information (Figure b). (2) The descriptor {D} will be given as input to a fully connected FFNN
(total of four layers with two hidden layers) to compute the potential
contribution of the CG particle i (see Figure a). Because the CG potential
in the DeePCG is free energy that is not directly available, the optimization
of CG potential is required. Here, the force-matching scheme is employed
through fitting accurate mean forces from AIMD simulations. Therefore,
the loss function in DeePCG is defined aswhere D is the number of
configurations for CG variables in the data set and F(ξ) is the mean force estimated from AIMD simulations. The optimization
of the loss function is fulfilled by the stochastic gradient descent
(SGD) method, which is a highly noncanvex function corresponding to
a rugged landscape in the large parameter space.
Figure 5
(a) Schematic of the
deep neural network (DNN). (b) Example for
CG of water molecules and the local transformation of coordinates
of CG sites. {D} and {D} are radial and angular descriptors,
respectively. (c) (upper) comparison of the O–O RDFs for liquid
water between AIMD and DeePMD and DeePCG simulation; (lower) deviations
of DeePMD and of two DeePCG models relative to the AIMD result. (d)
O–O–O ADFs of liquid water from AIMD, DeePMD, and DeePCG
simulations with four different cutoff radii (Reprinted with permission
from the work of Zhang et al.[12] Copyright
2018 AIP Publishing LLC).
(a) Schematic of the
deep neural network (DNN). (b) Example for
CG of water molecules and the local transformation of coordinates
of CG sites. {D} and {D} are radial and angular descriptors,
respectively. (c) (upper) comparison of the O–O RDFs for liquid
water between AIMD and DeePMD and DeePCG simulation; (lower) deviations
of DeePMD and of two DeePCG models relative to the AIMD result. (d)
O–O–O ADFs of liquid water from AIMD, DeePMD, and DeePCG
simulations with four different cutoff radii (Reprinted with permission
from the work of Zhang et al.[12] Copyright
2018 AIP Publishing LLC).To demonstrate the capability of the DeePCG, the CG model of liquid
water is adopted. The training data set is obtained from the AIMD
simulations. The AIMD data set consists of a total of 40 000
snapshots, during a 20 ps-long trajectory in the NVT ensemble with the system size N = 192 atoms (64
H2O molecules) at T = 300 K. After the
training process, the NVT simulation with the trained DeePCG model
is performed on the CG waters. The results of O–O RDF and O–O–O
ADFs are compared with those from AIMD and DeePMD (see Figure c). It is found the DeePCG
can reproduce the same RDF as those from AIMD and DeepMD. Also, the
ADF results are examined for the CG system with different cutoff Rc values as shown in Figure d. In DeepCG, the CG site is the oxygen,
and the cutoff determines the maximum distance where the surrounding
CG particles can interact with each other. It is found that, with
the increase of the cutoff distance Rc, the ADF from DeePCG will be more consistent with those from AIMD
and DeePMD. It means that a short cutoff distance may lose information
contained within the interactions among the CG particles.From
the modeling of liquid water, DeePCG demonstrates the same
accuracy as AIMD. And it performs faster than the corresponding AIMD
since DeePCG only considers the local contributions of the potential
functions, similar to classical atomistic MD simulations (see Figure b). Nevertheless,
in the DeePCG simulations, the force discontinuity is found due to
the sharp cutoff and limitation of the descriptors for angular information
(see ADF results in Figure d). Moreover, the thermodynamic transferability of the developed
CG model remains to be further examined with DeePCG.
CGnet: ML CG Force Fields
Another
force-matching based ML approach for CG potential is named CGnet,
as shown in Figure a. The difference is that in the CGnet, the CG potential U(x) is represented by ANN. Similarly,
the ANN includes a transformation y = g(x) from CG particle’s coordinates x to a set of features y that contains the invariance
of the free energy. The invariance is conserved by using the features:
distances between all pairs of CG particles and the angles between
three consecutive CG particles like those shown in Figure b. The features y are then given as inputs of the ANN. The loss function in ANN is
defined:where θ are parameters
used in ANN,
ξ(R) are matrices of CG particle’s coordinates,
and ξ(F(R)) are instantaneous force
components projected on the CG coordinates, with F(R) defined as the instantaneous atomistic forces. This loss
function reflects the potential mean force (PMF) error between the
mean force ξ(F(R))) and the CG force
predicted by the gradient of CG potential—ΔU(ξ(r); θ).
To minimize this error, another gradient layer is added into the ANN
to compute the derivatives with respect to the input coordinates/features.
With this ANN at hand, the trained CG potential form can be used to
perform MD simulations that will produce new CG coordinates. When
part of the new coordinates are outside the training data set, it
is possible that the ANN can generate unphysical predictions. To avoid
the unphysical predictions, a regularized CGnet is introduced by utilizing
“regularization” methods (see Figure a). In the regularized CGnet, a baseline
(prior) energy U0(x) is added
into the energy function as U(x; θ)
= U0(x) + Unet(x; θ), where Unet is the neural network free energy as used in CGnet.
The role of U0(x) is to enforce U → ∞ for unphysical states,
in which the new CG coordinates are outside the training data set.
Figure 6
(a) ML
schemes of CGnet and regularized CGnet. (b) Comparisons
of force and free energy between predictions from CGnet, feature regression,
and the exact results for the two-dimensional toy model. (c) Free
energy profiles of alanine dipeptide using all-atom and machine-learned
CG models (Reproduced from the work of Wang et al.[13] Copyright 2019 American Chemical Society).
(a) ML
schemes of CGnet and regularized CGnet. (b) Comparisons
of force and free energy between predictions from CGnet, feature regression,
and the exact results for the two-dimensional toy model. (c) Free
energy profiles of alanine dipeptide using all-atom and machine-learned
CG models (Reproduced from the work of Wang et al.[13] Copyright 2019 American Chemical Society).A two-dimensional toy model is used as an illustration to
show
the predicted potential energies by using different CG models. The
two-dimensional potential energy has the analytical form , shown in Figure b. The CG mapping is defined by the projection
of a two-dimensional model onto the x-axis (see Figure b). A long simulation
trajectory of the two-dimensional model is obtained, which is used
as the training data set. The CG potential and mean force are compared
between those from feature regression, i.e., least-squares regression,
and those from CGnet and regularized CGnet. It is found that within
the training data set, both feature regression and CGnets (both CGnet
and regularized CGnet) can accurately capture the mean force and free
energy. However, outside the training data set, the predictions from
CGnet severely deviates from the exact results, but the regularized
CGnet, to some extent, can still nearly capture the exact results
due to the introduction of the prior energy (here is a harmonic energy
form). To further demonstrate the application of CGnet, the coarse-graining
of alanine dipeptide in an explicit solvent is studied. To make a
comparison, another CG model named the “spline model”
is also studied.[2] The free energy profiles
from these three CG models are displayed with the result from the
AA molecular model (see Figure c). It can be found that only the regularized CGnet model
can correctly reproduce the position of all the main free-energy minima.
On the contrary, the spline model cannot capture the energy minimal
corresponding to the positive value of the dihedral angle ϕ,
and the CGnet can only reproduce part of the free energy minima.The above results demonstrate that the CGnets (CGnet and regularized
CGnet) can be used to reproduce effective free energies for CG models,
which can capture the equilibrium distribution of a specific atomistic
model. However, the CGnet here is not transferable to the study of
different systems, since it is designed ad hoc for
a specific molecule. Besides, the CGnet can only reproduce the structural/configurational
properties of the system with the dynamic properties remained to be
explored using alternative approaches.
Kernel-Based
ML of the CG Model for Efficient
MD Simulation
Since current ML models for CG force fields
suffer from high computational cost at every integration time step,
Scherer et al.[14] proposed a kernel-based
ML of the CG model for efficient simulations of molecular liquids.
The central idea is to utilize a kernel machine to learn the mappingwhere Q is the representation
of the systems, i.e., transformation of the coordinates of the atoms
and F is the force. The kernel function is K̂ = K(Q, Q). Directly learning this mapping is very challenging, due to the
large interpolation size of Q, even within a region
defined by the cutoff distance rcut. In
this work, the Q is decomposed into two terms q(2) and q(3), which
represent two-body pair and three-body triplet interactions, respectively.
To predict a rotationally invariant property, the two-body pair representation
is chosen as the interparticle distance: , and
the three-body triplet is chosen as
a vector of three interparticle distances: (see the interaction m part in Figure a).
Accordingly, the force F can be split into 2 terms f(2) and f(3), respectively
(see the body expansion in Figure a). Therefore, the problem is simplified as the learning
of the local mapping: q( → f(, b = 2, 3, with the local kernel k̂(. Once trained for the local kernel,
the ML model can be used to predict the local interactions, f*. And after correlating the local kernel k̂( to the global kernel K̂, the new ML model can be used to predict the global force F*.
Figure 7
(a) (left) Description of the body expansion on pairs
and triplets.
Two- and three-body forces are shown. (right) Representations used
for two- and three-body interactions. (b) Learning curves for two-body
LJ fluid with different kernel matrices. (c) Learning curves for three-body
SW fluids with different kernel matrices. (d) Comparison of MD simulation
of tabulated three-body kernel predictions with the reference SW simulation,
including RDF and ADF results. (e) Comparisons of RDF and ADF between
atomistic simulations, the CG model using the force-matching scheme,
and the SW model (Reproduced from the work of Scherer et al.[14] Copyright 2020 American Chemical Society).
(a) (left) Description of the body expansion on pairs
and triplets.
Two- and three-body forces are shown. (right) Representations used
for two- and three-body interactions. (b) Learning curves for two-body
LJ fluid with different kernel matrices. (c) Learning curves for three-body
SW fluids with different kernel matrices. (d) Comparison of MD simulation
of tabulated three-body kernel predictions with the reference SW simulation,
including RDF and ADF results. (e) Comparisons of RDF and ADF between
atomistic simulations, the CG model using the force-matching scheme,
and the SW model (Reproduced from the work of Scherer et al.[14] Copyright 2020 American Chemical Society).One of the key aspects in this ML framework is
to construct the
kernel matrix K̂. The basic energy conservation
requires the kernel matrix is curl-free and the property rotates with
the sample. Based on these requirements, three approaches are proposed:
explicit rotations, integration over SO(3), and Hessian kernel. The
performances of these three types of kernels are compared by studying
the learning process for a two-body Lennard-Jones fluid (Figure b) and three-body
Stillinger–Weber (SW) fluid (Figure c). It is found that, for the two-body interaction,
there is no difference between the performances of the three kernels.
However, for the three-body SW fluid, the Hessian kernel outperforms
the other two kernels. Besides, by sorting and permutating the triplet
representation vector q(3), the learning
process will be significantly accelerated. Before performing MD simulations
using the trained ML model, a switch function is used close to the
cutoff distance to avoid numerical instabilities and maintain energy
conservation. Furthermore, to reduce the error induced by the kernel
predictions, the training data size is increased by the covariant
meshing technique.To demonstrate this ML framework, the trained
CG model is adopted
to run MD simulations. The results of RDF and ADF are compared with
those from the referenced SW model. It is found all the curves are
almost identical, which means the kernel predictions lead to the correct
sampling of the canonical ensemble. In addition, the computational
costs of these simulations are comparable with the original SW potential,
without the restriction on the explicit functional form of three-body
potentials.This ML framework is further applied to the learning
of the CG
force field. After the CG process R = R(r), the instantaneous collective
forces (ICFs) are expressed as F(r) = ∑f(r), where f(r) are the atomistic
forces and i is the CG bead. Here, the mapping is
not directly from the R to mean force. Instead, a
two-step procedure is provided: first, the two-body force F2-body is obtained by the force-matching
scheme, and second, the three-body forces are averaged as the residual
force between target and two-body force: ΔF = F – F2-body. To learn these residual
forces, a binned three-body Hessian kernel is adopted. The simulation
results for liquid water with three different CG models are compared
in Figure e. The Hessian
kernel-based ML model is found to be very close to the atomistic results
of RDF and ADF. More interestingly, it is found the ML model is even
more accurate than the force-matched SW model.This kernel-based
ML framework demonstrates that by adding of switch
function close to the cutoff distance and covariant meshing of the
training data set, it can accurately capture two-body and three-body
interactions with the comparative computational cost as the original
SW model. And the extension to CG liquids confirms it is a promising
technique to construct efficient two- and three-body models for a
wide range of CG applications. However, for many-body interactions,
i.e., four-body (dihedral) interaction, more complex and efficient
kernel functions should be introduced. And the computational efficiency
could be a significant challenge, as more interactions are included
in the kernel matrix within the cutoff distance.
Autoencoders for CG Molecular Models
The selection
of an appropriate mapping from AA to CG models plays
a critical role in developing CG models to reproduce consistent dynamics,
structural correlation, and thermodynamics.[5] In general, the criteria for selecting CG mappings highly depends
on the prior physical and chemical knowledge and intuition. Although
many efforts have been devoted to develop forward- and back-mapping
algorithms, the statistical connections are missing to reversibly
bridge resolutions across different scales. To solve this issue, the
powerful unsupervised learning technique–variational autoencoders
(VAEs) are introduced. The VAEs compress the data through an information
bottleneck that can first continuously map complex data into a low-dimensional
latent space; second probabilistically infer the real data distribution
via a generating process.[22] Accordingly,
Wang et al.[15] proposed an autoencoder-based
generative modeling framework that includes four steps: (1) learning
discrete CG variables in latent space and decoding back to atomistic
detail via geometric back–mapping; (2) using a reconstruction
loss to help capture collective features from AA reference data; (3)
regularizing the CG space with a semisupervised mean instantaneous
force minimization to obtain a smoothed CG free-energy landscape;
and (4) variationally finding the highly complex CG potential that
matches the instantaneous mean force acting on the AA training data. Figure a shows the autoencoding
framework. The atomistic data are first reconstructed by encoding
atomistic trajectories through a low-dimensional bottleneck, and the
CG mapping is parametrized by using the Gumbel-softmax reparametrization.
In the training process, the encoder and decoder are variationally
optimized to minimize the reconstruction loss:where the first
term on the right-hand side
is atom-wise reconstruction loss and the second term is the average
instantaneous mean force regularization. ρh is the
hyperparameter. Using this loss function, the force fluctuations of
the encoded space is minimized by the instantaneous force regularizer
and the CG free-energy landscape will be smoothed.
Figure 8
(a) CG autoencoding framework
including the encoder, decoder, and
the reconstruction of original all-atom data. (b) Reconstruction and
mean force losses in Auto-Encoder training of gas-phase molecules
with different resolutions. (c) Coarse-graining encoding and decoding
for alanine dipeptide and the free energy profiles with different
CG resolutions, compared to the atomistic simulation. (d) Comparison
of the simulation results on RDF, bond distance distribution, and
mean squared displacement (MSD) between atomistic and CG models for
liquid ethane using a CG resolution of 2 per molecule (Reprinted with
permission from the work of Wang et al.[15] Copyright 2019 Springer Nature Limited).
(a) CG autoencoding framework
including the encoder, decoder, and
the reconstruction of original all-atom data. (b) Reconstruction and
mean force losses in Auto-Encoder training of gas-phase molecules
with different resolutions. (c) Coarse-graining encoding and decoding
for alanine dipeptide and the free energy profiles with different
CG resolutions, compared to the atomistic simulation. (d) Comparison
of the simulation results on RDF, bond distance distribution, and
mean squared displacement (MSD) between atomistic and CG models for
liquid ethane using a CG resolution of 2 per molecule (Reprinted with
permission from the work of Wang et al.[15] Copyright 2019 Springer Nature Limited).This unsupervised autoencoding process is first shown for gas-phase
ortho-terphenyl (OTP) (Figure b) and aniline (Figure c). In the case of OTP, it is found if the number of degrees
of freedom is less than 4, for example, 3-beads mapping that partitions
each of the phenyl rings, this encoding loses the configuration information
that describes the relative rotation of the two side rings, and thus
leads to higher error for decoded structures. In the case of a small
peptide molecule, the latent variables of 8 CG beads are adequate
to recover heavy atom positions through the arrangement of hydrogen
atoms is not fully reproduced.With the minimization of the
average of instantaneous mean forces
as a regularization term, the learning of a smooth CG free energy
surface is realized. This applicability of the framework is demonstrated
by bulk simulations of liquids. Here, an example of liquid ethane
is shown in Figure d. The CG resolution is 2. An autoencoder is trained to obtain the
latent CG variables, and then a neural network-based CG force field
is minimized by a force-matching scheme. Finally, the CG simulations
are performed at the same density and temperature as the atomistic
simulation. From the Figure d, it is found that the RDFs of both COM–COM and CG1–CG1
are captured accurately compared to those in the atomistic simulations.
Also, the bond length distributions are in good agreement with that
from atomistic simulations. The dynamic property informed from the
mean squared displacement (MSD) is also shown in Figure d. The CG model exhibits faster
dynamics, in comparison to atomistic simulations, which is the typical
issue in classical CG models.[5]This
ML framework proposes to use the CG coordinates as latent
variables that can be regularized with force. Through the training
of encoding mapping, deterministic decoding, and a CG potential, a
larger system is able to be simulated for a longer time, thus accelerating
the MD simulations. However, this framework has its limitations. First,
the deterministic CG mapping leads to an irreversible loss of information
that is reflected in the reconstruction of average AA molecular structures.
Second, due to the force-matching scheme, the individual pair correlation
cannot be fully recovered compared to the atomistic trajectories.
Lastly, as the same as other bottom-up approaches based on force-matching,
this framework can only reproduce structural correlation function
at one point in the thermodynamics space. It is not transferable among
different thermodynamic conditions, which should be further addressed.[18]
Graph-Based CG Molecular
Model
The
representability, in terms of describing the ground-truth system,
has drawn less attention compared to the transferability. However,
its importance still deserves special notice because it is the baseline
of building CG models and the success of a CG model relies on a good
representation.[5] Webb et al.[16] proposed a graph-based coarse-graining (GBCG)
scheme. The method essentially uses graph theory to describe the chemical
connectivity of an organic molecule, mapping the ground-truth system
of the molecule to a coarser description by basic graph operation
of edge contraction, with a controllable degree of the coarse-graining.Generally, a unique adjacency matrix is a complete description
of the chemical structure of an organic molecule. The specific adjacency
matrix includes vertexes, representing the atoms in the molecule,
and edges, standing for the topological connections of these atoms.
With this knowledge, the GBCG is nothing but a series of grouping
and contraction operations, as illustrated in Figure a. In the grouping process, related vertexes
are assigned to new groups; while in the contraction, vertexes in
the new groups are combined together to form coarser sites. Here,
a molecule with the SMILES string [C(= O)1OCC(CO)C(CC)C1] is coarse-grained into models with
12, 6, 2, and 1 sites in a sequential and systematic manner.
Figure 9
CG mapping
scheme based on the graphic description of organic molecules.
(a) Logistic of the graph-based coarse-graining (GBCG). The groupings
and contractions produce different levels of CG descriptions of the
original molecule. (b) (dashed-line box) Adjacency matrix A. (solid-line
box) Grouping method contracts the adjacency matrix A. The spectral
grouping ensures that those topologically important atoms are coarse-grained
later than those less important atoms. (c) Example of toluene. Sequential
CG mappings are shown. (d) Structural properties predicted by the
CG models with potentials developed based on the different levels
of coarse-graining. The results suggest the graphic CG mappings preserve
the structural properties well (Reproduced from the work of Webb et
al.[16] Copyright 2019 American Chemical
Society).
CG mapping
scheme based on the graphic description of organic molecules.
(a) Logistic of the graph-based coarse-graining (GBCG). The groupings
and contractions produce different levels of CG descriptions of the
original molecule. (b) (dashed-line box) Adjacency matrix A. (solid-line
box) Grouping method contracts the adjacency matrix A. The spectral
grouping ensures that those topologically important atoms are coarse-grained
later than those less important atoms. (c) Example of toluene. Sequential
CG mappings are shown. (d) Structural properties predicted by the
CG models with potentials developed based on the different levels
of coarse-graining. The results suggest the graphic CG mappings preserve
the structural properties well (Reproduced from the work of Webb et
al.[16] Copyright 2019 American Chemical
Society).The essential questions needed
to be answered in the GBCG are how
and in what sequence one should combine the vertexes. There are generally
speaking different methods. However, in the following, a simple protocol,
spectral grouping that uses only the adjacency matrix, is adopted
for a simple but effective demonstration. An example of dimethyl carbonate
is shown in Figure b. All the hydrogen atoms are first discarded and the rest are in
united-atoms fashion, for simplicity. The 6 united-atoms are defined
as 6 vertexes. A 6 × 6 symmetric adjacency matrix A, highlighted in the dash-line box, is used to describe the connectivity
of the system with the elements A equal to either 1 or 0, representing the connection between
vertexes N and N exists or not, respectively.
The grouping logic is in the solid-line box. The rank-order is defined
by the eigenvector corresponding to the largest eigenvalues of the
adjacency matrix A, based on the eigenvector centrality
that counts the contributions of the vertexes to the overall connectivity.
The grouping follows the queue, in which vertexes with a low-rank
order lie in the front. Every vertex is grouped with its direct neighbor.
The contraction of the atoms here is by the center of mass for simplicity.
According to the listed ranks, the carbon atoms 1 and 5 are merged
with their neighbors–the oxygen atoms 2 and 4 first. Then the
oxygen atom 6 is combined with its neighbor—the carbon atom
3. Following this protocol, a toluene molecule is coarse-grained as
shown in Figure c.
The original molecule is labeled as TOL0, and it is sequentially coarse-grained
into CG models TOL1, TOL2, and TOL3 that have 4, 2, and 1 beads, respectively.
Validation to the example is shown in Figure d, where RDFs of the CG models at three different
levels, equipped with potentials derived from IBI and force-matching
methods, are shown. These results suggest that the graph method gives
legitimate CG mappings that capture the essential structural properties,
in comparison to the ground-truth atomistic system.It is worth
noting that the difference between the GBCG from the
intuitive CG site definition as the center-of-mass of the original
molecule is that the mappings in this method are in a sequence, corresponding
to the topological connection of the chemical structure. Consequently,
the GBCG gives a systematic way to derive a CG mapping with chemical
essence.
Discussion and Remarks
CG molecular simulations offer a unique opportunity to address
challenging problems, such as phase separation, self-assembly, and
crystallization of organic molecules and polymers. However, the classical
CG models are limited by thermodynamic transferability and consistency.[5] Effective potential functions for the CG model
are usually derived and optimized from one set of thermodynamic conditions.
Thus, CG models derived from one thermodynamic state cannot be transferable
to another set of conditions, limiting their applications to a small
range of thermodynamic states. During the CG process, the free-energy
landscape of the system is dramatically changed. Thus, it leads to
thermodynamic inconsistency between AA and CG models. To overcome
these limitations, the ML-based CG models provide an alternative solution,
which can be optimized by using an extensive training data set of
forces and structures from AA molecular simulations. In direct comparison
with classical CG models, we believe that the ML-based CG models could
have the following advantages: (i) thermodynamic consistency as neural
networks can in principle approximate any function to arbitrary accuracy,
which captures the surface roughness and highly nonconvex free-energy
landscape of CG molecules; (ii) temperature/pressure transferability
as neural networks has multiple layers of nonlinear functions and
many model parameters, which can be optimized by extensive training
data from AA simulations at different thermodynamic states; (iii)
representability as neural networks can take two- and multibody molecular
descriptors as inputs, which represents many-body interactions that
are typically ignored in classical CG models. We should emphasize
that the ML of AA and CG models tends to represent potential-energy
and free-energy surfaces, respectively, by high-dimensional neural
networks. During the ML of the AA model, both energies and forces
are obtained directly from DFT simulations, leading to a fast and
accurate neural network construction for the potential-energy surface.
However, for the ML of CG models, only structures and forces are given
from AA molecular simulations. The free-energy surface is challenging
to estimate. Thus, most machine-learned CG models are formulated based
on the force-matching scheme.[12−15] Note that both structure and force distributions
are related to the free-energy landscape of the CG model. Therefore,
the structural distributions (e.g., RDF and ADF) should be used to
train the ML-based CG models in the near future. We also expect that
these ML-based CG models can unify both structure- and force-matching
during the optimization of neural network parameters.Analogous
to the typical ML of AA models for potential-energy surface
construction, the ML of a CG model includes four major steps: (1)
generating an extensive training data set using AA molecular simulations
at different thermodynamic states; (2) mapping from a AA model to
a CG model and extracting target structure or force distributions;
(3) transforming Cartesian coordinates of CG particles to a suitable
set of input coordinates (descriptors, features or symmetry functions)
for the training of neural networks; (4) defining the architecture
of neural networks and formulating a fitting procedure to optimize
the weights and minimize the loss function. Each step can dramatically
affect the accuracy and efficiency of the machine-learned CG models,
in terms of reproducing the correct structural or dynamical properties
of AA molecular systems. In the following, we discuss the efforts
and unsolved issues in each step.
Generation of Training
Data Set
The
ML is usually a highly nonlinear fitting process with enormous parameters
to be determined, in order to minimize the loss functions of neural
networks. Theoretically, the ML model can fit an arbitrary system
with the assumptions that the training data set is representative,
diverse enough and the computational resource is adequate. In reality,
the training data cannot cover all thermodynamic conditions. And actually,
we are only interested in a system within a specific regime of thermodynamic
states. Nevertheless, the training data set in the specific regime
should be representative and diverse, which means the training data
set needs to cover all the typical behaviors happening within this
regime in terms of different thermodynamic states (e.g., pressure
and temperature). The lack of data set will result in wrong thermodynamic
transferability and consistency. The CG model trained by the data
set within a specific regime of temperature or pressure cannot predict
the physical states outside the trained thermodynamic regime (no extrapolation
and exploration). In general, the sampling of the training data within
the interested regime should be uniform, except for the extreme events.
For example, two CGwater beads can approach each other closely. If
this scenario is not included in the training data set, the trained
ML model will produce unstable MD simulations when two CG beads are
close to each other. Frequent extrapolations will be necessary to
handle this situation, and it can generate unphysical results. Therefore,
the training data set should include enough information about different
events, particularly extreme events. An alternative approach to relieve
the lack of representation or diversity of the training data set is
applying physical constraints in the trained ML model, such as physics-informed
ML. For instance, in the CGnet model,[13] an a priori energy form is adopted to regularize
the motion of CG particles in the simulations when extrapolation is
needed. This can ensure the correct physical behaviors near the boundary
of the training data set, as shown in Figure b.
Mapping from AA to CG Models:
Representation
As mentioned before, the CG process from atomistic
coordinates
to CG sites strongly depends on the understandings of physical problems
and chemical intuition.[5] CG will degenerate
the system’s degree of freedom and lose some important information
that only can be observable in the atomistic simulations. A poor CG
representation will lead to the loss of more important information
and cannot reproduce it, no matter how efficiently and accurately
the CG model performs in the following steps. Typically, for the well-known
systems, for example, water molecules and peptides, it is easy to
define the CG site on the oxygen atom and the backbone carbon atom,
respectively. However, for unfamiliar systems, the center of mass
is the most commonly used definition of the CG site, which is straightforward
at first glance. But, the ambiguous nature of this method may jeopardize
its usefulness. For example, when coarse-graining a linear flexible
polymer chain, a CG site based on the center of mass of several monomers
could be an inferior choice because the thermodynamic fluctuation
for the center-of-mass potentially leads to an undesirable variation
of the CG site.[5] Therefore, rigorous methods
that give consistent and effective definitions of the mapping from
AA to CG models are particularly important. To address this issue,
the VAE-based ML framework can learn discrete CG variables in the
latent space and decode them back to atomistic detail with high accuracy.[15] Besides, the graph-based CG adopts the graphic
theory to describe the chemical connectivity of an organic molecule.[16] It maps the ground-truth system of the molecule
to a coarser description by basic graphic operation of edge contraction,
which provides a promising and efficient way for mapping from AA to
CG models.
Inputs of ML Models: Descriptors,
Features,
or Symmetry Functions
The potential energy of the CG model
should be only dependent on the internal interactions and be invariant
with respect to the translation, rotation, or permutation of the entire
molecules.[8] ML is a numerically fitting
method and the output of an ML model depends on the absolute value
of the input. Thus, Eulerian coordinates of the CG particles are not
suitable as the inputs for ML models. Accordingly, molecular descriptors,
features, or symmetry functions should be used to train these ML models.[23] The most straightforward input for the ML-based
CG model is the distance between two CG particles. However, this choice
is not unique and can0not capture many-body interactions. Currently,
there are different transformations of Eulerian coordinates of CG
particles as inputs of ML models, such as using local coordinate systems,[12] two- or three-body correlation functions,[14] permutation-invariant distance metrics,[24] and VAE-based encoder-decoder.[15] Apart from the energy invariance, the symmetry functions
for ML models should be first- and second-order continuous that the
force can be derived as the gradient of CG potential with respect
to coordinates. Furthermore, the symmetry functions should be broad
enough to capture the interactions between two nearby CG particles
and ensure the decay of energy to zero when the distance between two
CG particles approaches the cutoff distance. Therefore, too simple
or too small symmetry functions can lower the accuracy of machine-learned
CG models.[8]
Optimization
of Weights in Neural Networks
In the development of a machine-learned
CG model, the optimization
of parameters is always the most time-consuming part. For neural networks,
a variety of gradient-based optimization algorithms are available,
from simple gradient descent (back-propagation) and variants-like
RPROP (resilient back-propagation) or Adam to higher-order methods,
such as the L-BFGS method and extended Kalman filter (EKF).[25] The choice of optimization algorithms depends
on the specific problem. For example, EKF performs very well in training
one or two hidden layers with each having less than 100 neurons but
poorly in training networks with four or more hidden layers. Also,
considering the resources of HPC, we need to choose the optimization
algorithm, which can perform training with a large data set more efficiently.
Future Opportunities
So far, the
machine-learned CG models have made extensive efforts in either one
or two aspects of the above discussions to improve accuracy. This
has led to limited applications of these models. For example, the
methods of kernel-based and graph-based CG models focus on the representation
aspect and provide different protocols to retain the information contained
in the molecular structure as much as possible. While the optimization
of these CG models is limited to the basis of the force-matching scheme.
It can lead to poor reproductions of structural properties, such as
RDF, ADF, etc. A potential way to solve this issue is by adding another
loss function (target), such as RDF, to optimize the CG models. Due
to the complexity of high dimensional neural networks, both structural
and force distributions could be simultaneously reproduced by the
machine-learned CG models. In terms of the training data set, the
ML-BOP model focuses on the temperature-transferability by covering
a wide range of temperature states.[9] Meanwhile,
pressure-transferability remains to be studied. If we include more
train data sets at different pressures, the ML-BOP model may realize
pressure-transferability and, then, can predict the nucleation process
of the water molecules more accurately. For the choice of symmetry
functions, in general, it depends on the experience of trial-and-error.
On one side, the symmetry functions should be more than enough to
capture all the typical interactions among CG particles. On the other
side, symmetry functions should be small enough to have good performance
in terms of computational efficiency. Therefore, we need to pay close
attention to how to choose appropriate symmetry functions. In conclusion,
we are still facing enormous challenges and opportunities for constructing
an accurate and efficient machine-learned CG model to be thermodynamically
consistent, transferable, and representative.
Authors: Jaehyeok Jin; Alexander J Pak; Aleksander E P Durumeric; Timothy D Loose; Gregory A Voth Journal: J Chem Theory Comput Date: 2022-09-07 Impact factor: 6.578