Literature DB >> 32282206

Kernel-Based Machine Learning for Efficient Simulations of Molecular Liquids.

Christoph Scherer¹, René Scheid¹, Denis Andrienko¹, Tristan Bereau¹.

Abstract

Current machine learning (ML) models aimed at learning force fields are plagued by their high computational cost at every integration time step. We describe a number of practical and computationally efficient strategies to parametrize traditional force fields for molecular liquids from ML: the particle decomposition ansatz to two- and three-body force fields, the use of kernel-based ML models that incorporate physical symmetries, the incorporation of switching functions close to the cutoff, and the use of covariant meshing to boost the training set size. Results are presented for model molecular liquids: pairwise Lennard-Jones, three-body Stillinger-Weber, and bottom-up coarse-graining of water. Here, covariant meshing proves to be an efficient strategy to learn canonically averaged instantaneous forces. We show that molecular dynamics simulations with tabulated two- and three-body ML potentials are computationally efficient and recover two- and three-body distribution functions. Many-body representations, decomposition, and kernel regression schemes are all implemented in the open-source software package VOTCA.

Entities: Chemical Disease Species

Year: 2020 PMID： 32282206 PMCID： PMC7304872 DOI： 10.1021/acs.jctc.9b01256

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Machine learning (ML) techniques have a long history of interpolating the high-dimensional potential energy surface (PES) of molecular systems.[1−4] Access to an accurate PES enables the simulation of the system’s dynamics. In comparison to standard molecular dynamics (MD),[5] ML does not explicitly rely on fixed force field functions. This has recently resulted in impressive MD studies for various materials and molecular systems.[6−12] To get there, it does not quite suffice to gather reference configurations and train a Gaussian process (GP) or neural network—the interpolation space is simply too large for all but the simplest of systems. Instead, incorporating physical symmetries in the ML model effectively reduces the interpolation space, thereby helping convergence. An early realization of this aspect was demonstrated by Behler and Parrinello, who relied on symmetry functions to incorporate translational and rotational invariance:[13] If the physics of the problem does not depend on arbitrary translations, encode particle geometries by means of relative distances. A number of molecular representations have since built on these ideas to offer more efficient learning performance.[11,14−16] Other aspects that aim at incorporating physics include local dynamical symmetries found in molecules[17] and the learning of tensorial properties.[18−20] Incorporating symmetries does not have to be done at the level of the representation: one can also build it in the ML architecture. When building kernel-based ML models for force fields, the GP aims at inferring a vector-valued function.[21] The underlying vector field ought to remain curl-free—directly related to energy conservation in simple Hamiltonians—which can be implemented by learning the Hessian of the kernel.[22,23] Several variants of this strategy have by now been implemented to learn energy-conserving force fields.[6,10,24,25] In parallel, the learning of forces has sparked interest in building so-called covariant kernels, meaning that the prediction rotates with the sample in three-dimensional space. Glielmo et al. have derived an analytic expression for a covariant Gaussian kernel.[26] It effectively builds on the integration over all actions of the rotation group by means of a Haar measure.[27] Covariant ML models have more recently been developed for neural networks.[28,29] In addition to symmetries, physical intuition and approaches can be further implemented to help build better ML models. Rather than learning the full PES at once, it can prove easier to converge a series of ML models that follow a body decomposition of the interactions, akin to a cluster expansion or virial equation for a real gas.[6,11,30−32] Parametrizing ML models for each body term is not necessarily straightforward, because reference calculations typically contain all contributions at once. The contribution of each body is instead often inferred by means of a decomposition ansatz, which assumes the target property to be decomposable in terms of a linear combination of function values.[24,33] Once trained, an ML model is ready to be used as a force predictor coupled to an MD integrator. In the context of ab initio MD simulations based on quantum-mechanical calculations, an ML model leads to significant performance improvements: While the former scales with the number of electrons raised to some power (≥3), the ML model will typically scale almost linearly with the number of nuclei, albeit with a large prefactor. John and Csányi[30] pointed out that the situation is very different when running coarse-grained (CG) simulations[34,35]—lumping together atoms into superparticles or beads. Since they are typically parametrized from classical atomistic simulations, the scaling will be similar, apart from a small reduction due to the reduced number of particles.[30] To remedy this situation and recover some computational efficiency, one can resolve to project the ML model onto standard tabulated potentials, as suggested by Glielmo et al.[31] This is the objective of the present study. The scope of this paper is to summarize key aspects of applying GP regression to parametrize force fields of molecular liquids in a computationally efficient setting. A decomposition of the PES to two- and three-body interactions is presented, ultimately projected onto tabulated potentials to avoid making ML predictions at each integration time step. We compare several kernels that encode various levels of physics. Further technical refinements are incorporated, such as a smooth decay close to the interaction cutoff.[6] We also discuss performance during learning and structural accuracy in the condensed phase of a liquid. Results are presented for two example liquids: a two-body Lennard-Jones system and a three-body Stillinger–Weber system. Finally, we extend our implementation to the coarse-graining of molecular liquids. CG is an appealing resolution in soft matter due to its ability to reach longer length and time scales.[35,36] We specifically focus on bottom-up strategies that systematically derive a CG model from higher-level microscopic information.[37] Consistency between the equilibrium probability densities of the two models leads to the many-body potential of mean force (MB-PMF), which effectively replaces the coveted PES.[38] One practical approach to building a CG model is force matching (FM) or multiscale coarse-graining (MS-CG),[38−40] which projects the MB-PMF into the space of force fields defined by the CG basis set. Both kernel-based and neural-network approaches have recently been applied to solving the MS-CG problem.[30,41,42] Akin to the study of John and Csányi,[30] we solve the MS-CG problem using kernels on both two- and three-body interactions, but we project the resulting decompositions onto computationally efficient tabulated potentials. We discuss both parametrization strategies and structural accuracy of the resulting condensed phase of liquid water.

Kernel-Based Regression for Force Fields

We first recapitulate the terminology of a Gaussian process, keeping in mind physical quantities relevant to liquids. A sample i in our data set will represent particle i and its environment, which we will specify later. A particle can be an atom or a molecule. We propose to learn a mapping → O between the representation of sample i, , and an observable, O = O(). The observable may correspond to the sample’s energy or force, for instance. Training a kernel model is then equivalent to solving the set of linear equations = K̂α, where the kernel function K̂ = K(, ) = Cov(O, O) measures the similarity between samples and , and α is the vector of weight coefficients. The latter is optimized by inverting the regularized problemwhere λ implements Tikhonov regularization. Prediction of the observable for a configuration * is then given by an expansion of the kernel evaluations on the training setsumming over the N training points.

Particle-Decomposition Ansatz

The above-mentioned formulation can, in principle, be applied to learn local properties of a liquid, such as forces, . Reference high-level simulations provide the force of a particle embedded in its environment, represented by . A kernel machine can then be set out to learn the mapping → . Learning this mapping directly is in general challenging, due to the interpolation size of . Instead, we aim at breaking down the problem to ease convergence of the kernel and avoid extrapolation. We first limit the size of our particle environments or configurations, , by setting a cutoff, rcut. In other words, for a sample i only the neighbors j with r ≤ rcut are added to the environment . Note that this decomposition assumes that rcut is large enough to account for the physical interaction range, which is not always possible, e.g., for unscreened long-range Coulomb interactions. are further decomposed in body terms, , which consist of interparticle interactions: from pairwise, to triplets, all the way to fully many-body. In the following we limit our approach to the first two terms, though the method generalizes in a straightforward manner. The potential benefit of the body expansion is its reasonable convergence, which helps to reduce the size of . We denote the body decomposition using a superscript: (2) and (3) for vertices of pair and triplet interactions. These interparticle interactions can be associated with a set of local forces, (2) and (3), which are a priori unknown. The present framework thus motivates several mappingswhere K̂ and k̂( denote the global and local kernels and b runs over pairwise and triplet models. While learning a local model is a more attractive option, we do not have direct access to the local forces (2) and (3). The particle-decomposition ansatz assumes that the force acting on a particle due to its environment can be approximated by a set of constitutive interparticle interactionswhere p and t run over P pair and T triplet interactions. Each pair interaction is between a pair of particles, p = (ab), and each triplet is between three particles, t = (abc), where a, b, and c are the indices of particles in a pair or triplet; o is the index of the central particle of configuration i, in a pair p or triplet t. The body expansion and the indexing are illustrated in Figure . The total number of interparticle interactions for sample is thus M = P + T.

Figure 1

(left) Illustration of the body expansion of a configuration i on pairs and triplets, where the reference force (Q) acts on the central particle. The two-body terms are shown by interparticle vectors (green lines) and result in total pairwise force ∑ (2)((2)). A similar decomposition for triplets is used to fit the residual, i.e., total minus two-body, force. (right) Representations used for pairs and triplets. a, b, and c are the particles of the pair or triplet, and o is the central particle of the sample, which can be either a, b, or c. For triplets, two situations are possible: For obtuse triplets, with one edge larger than cutoff, the force on the central particle o = b is due to the reaction force on particle a. For acute triplets with all edges below the cutoff, the force on the central particle o = a, f, is a sum of two forces, f and f, which depend on all three angles, θ, θ, and θ. For vector differences, we use the standard convention, = – . Equation formally links the kernels K̂, k̂(2), and k̂(3). We will use the target properties available for the global kernel, K̂, in order to infer an ML model for the local interactions. To this end, we concatenate a set of N particle environments together with the corresponding set of M local interactionswhere we introduce the mapping matrix . Its component, , is a 3 × 3 identity matrix, , if particle environment i contains the pair or triplet interaction m, and a 3 × 3 zero matrix, , otherwise. L̂ is the bookkeeping matrix that connects the local interactions with the right particle environments. It has dimension 3N × 3M due to the three spatial dimensions. We can now relate the particle-environment kernel K̂ to its local counterpart, k̂, byleading to the relationship . Similar to L̂, both K̂ and k̂ have a block matrix structure, and , where elements and are 3 × 3 matrices. Training a model then corresponds to solving the equation = K̂α and taking advantage of the relationship between global and local kernelswith a regularization term λ and a 3N-dimensional vector of weight coefficients α. Once trained, the ML model can be used for predictions of the local interactions, *. An analogous link between the covariance matrices of the global and local kernels for prediction only yieldswhere k̂* denotes the local kernel between training and test data. Note that the prediction for entire particle environments yields a new mapping matrix L̂* for the test data,

Representations

It is useful to describe local interactions in terms of internal coordinates as they are invariant with respect to translations. For pairwise interactions we therefore use the interparticle vector of a pair m, = – to represent a pair of particles a and b. For triplet interactions we use the three interparticle vectors, , , and . When predicting a rotationally invariant property, such as energy, or when building a covariant kernel from a rotationally invariant one, it is sufficient to use a rotationally invariant representation. For a pair m with particles (a, b) we use the interparticle distancewhile for a triplet m with particles (a, b, c) we use a vector of three interparticle distances

Covariant, Conservative Fields

If the target property is a vector, such as a force, we expect the kernel to behave like a second rank tensor upon rotations of coordinates. Formally, this can be expressed as[26]for any two rotation matrices and . This condition imposes restrictions on the form of the kernel , forcing it to become covariant. Apart from covariance, we are interested in conservative force fields, for which the total energy of a particle is conserved. Mathematically, the force field has to be curl-free or be a gradient of a scalar function (energy). We now discuss three different approaches to construct a matrix-valued kernel, for which the target property rotates with the sample and is curl-free.

Explicit Rotations

We first propose to encode covariance in explicit rotations of individual local representations. To do this, we use the covariance of local kernelswhere for every local interaction m we introduced a rotation matrix . To relate all M pairs or triplets with their local counterparts, we introduce a block-diagonal matrix of rotation matricesWe can then link the 3N × 3N dimensional global kernel, K̂, with the 3M × 3M dimensional local kernels, k̂(2) and k̂(3)which is a variation of eq that includes rotations of individual local interactions. As such, training a kernel model amounts to a modification of eq and prediction becomes We now chose rotation matrices such that becomes invariant with respect to sample rotations, by rotating the individual pairs or triplets into a fixed axis system. In practice, we align pairs along the z-axis, and triplets are aligned in the zy plane, as explained in the Supporting Information. In the fixed frame interaction representations become rotationally invariant and we can use representations given by eqs and 11 and a standard Gaussian kernel to compare them:The covariance in this scheme is encoded into the explicit rotation matrix R̂. In practice, R̂ and L̂ can be combined by replacing unit 3 × 3 matrices in L̂ by corresponding rotation matrices. Note that each local kernel element is matrix-valued. We can assume that all three components are independent. However, since such kernel does not yield a conservative force, the ML model will have to learn it empirically. In the Supporting Information we derive an explicit expression for that results in a conservative three-body kernel.

Integration over SO(3)

A covariant matrix-valued kernel can also be constructed by integrating a scalar kernel over all rotation matrices, effectively summing over all actions of the rotation group:[26,31]where is the covariant 3 × 3 matrix-valued kernel, while kb represents a base scalar-valued kernel. Glielmo et al. derived an analytical solution for pairs, starting from a Gaussian kernel, kb(2)(, ) = exp(−( – )2/2σ2), yielding[26]where α = r2 + r2/4σ2, γ = rr/2σ2, and = /r. Note that covariance of is encoded in the tensor product between the two interparticle vectors, .

Hessian of the Scalar Kernel

An alternative statement consists in recognizing that the local forces are spatial derivatives of the potential energy of a respective pair or triplet, = −∇E. As such, a kernel between two forces can be expressed through the energies involved, and that the spatial derivatives can be taken out of the scalar kernel functionwhere we neglected the second term that contains a first-order partial derivative of the kernel function, because they will be zero around training points. Here, α and β run over the different components of the representation vector; o is the central particle of the samples, as illustrated in Figure . The Hessian-based kernel is proportional to the tensor productand is therefore covariant, similar to the covariant kernel of section . In addition, it is energy conserving, since it ensures a curl-free force field. We now choose a Gaussian kernel, eq , as scalar-valued kernel. For pairwise interactions, the representation (10) results in an energy-conserving force kernel:where = (δ – δ)/r; the expression for can be obtained by exchanging m with l in . Note that the Kronecker deltas only change the sign of the kernel element, depending on whether the central particle of the sample, o, coincides with particle a or b. The tensor product, , is responsible for the covariance of the kernel, similar to the kernel obtained by integrating over SO(3). The scalar prefactor is, however, different. For a three-body kernel, the representation (11) has three components, and the Hessian-based kernel becomes a sum over the three pair distances in a triplet:withIts covariance and conservativeness are now explicit: it is a sum of three tensor products and three radial fields. Generalization to any descriptor and scalar kernel is given in the Supporting Information. We now compare the performance of three ML models for Lennard-Jones (two-body) and Stillinger–Weber (three-body) fluids.

Two-Body Lennard-Jones Fluid

Simulation details of the Lennard-Jones (LJ) fluid are given in the Supporting Information. In Figure we compare the learning curves of the integration, Hessian, and rotation kernels. The LJ force field is learned accurately and efficiently irrespectively of the kernel. The ML model saturates at around 100 particle environments or configurations in the training set, corresponding to roughly 2900 reference pairwise interactions, with an error of ΔF ≈ 4 × 10–4 kcal/mol/Å per configuration. For rcut = 6.0 Å the average number of pairs per particle environment is 29, implying a mean error per pair of ≈ 8 × 10–5 kcal/mol/Å. Here we assume that the M individual pair forces of each configuration i are independent and their errors have the same magnitude. A per-pair error is close to the output precision of the reference forces in the simulation trajectory, 10–5 kcal/mol/Å. There is virtually no difference between the performances of the three different kernels, suggesting that they all encode the same amount of physics relevant to this system—they are covariant and curl-free—which is evident for radial fields.

Figure 2

Learning curves of the pair-force kernels applied to the LJ fluid. ΔF is the force error per configuration; N is the number of training configurations. The average number of pairs per configuration is 29. (inset) Predicted force fields, f(2)(r), are virtually identical to the original LJ reference. Each ML prediction is based on 400 training configurations. The prediction of the force field projected on the pairwise distance r leads to a quantitative recovery of the reference potential, as shown in the inset of Figure for N = 400 configurations.

Three-Body Stillinger–Weber Fluid

Next, we compare the learning of the three-body forces of the Stillinger–Weber (SW) fluid. Simulation details of the SW fluid are given in the Supporting Information. Note that we only compare the Hessian and the rotation kernels, since an analytic expression for the integration kernel exists only for the scalar case.[31] In Figure we compare the learning curves for the Hessian (green), rotation (blue), and conservative rotation (black) kernels. One can clearly see that the nonconservative kernel performs significantly worse than the Hessian kernel, but once we make it conservative, its performance becomes comparable to that of the Hessian kernel.

Figure 3

Learning curves of the three-body force kernels applied to the SW fluid. ΔF is the force error per configuration; N is the number of training configurations. The average number of triplets per configuration is 43 in all cases except the last curve. The Hessian kernel with ordered representation and all permutations has on average 52 triplets per particle, due to a different decomposition of the configuration into three-body interactions. In spite of the fact that the conservative rotation and Hessian kernels encode all physical symmetries of the force field, one can, however, further improve the ML model. To do this, we recall that the efficiency of the body decomposition relies, in part, on the interaction cutoffs: only those triangles which have two edges smaller than the cutoff contribute to the body expansion. If the central vertex of the triangle, o, is linked to an edge longer that the cutoff (e.g., bc in the obtuse triplet shown in Figure ), the force on this particle is only due to the shorter edge. In other words, a representation that emphasizes the shorter edge would lead to a faster learning. One can enforce this asymmetry by sorting the triplet representation vector, eq , with respect to the edge lengths. Indeed, the corresponding learning curve, Figure , orange, learns faster than the model with a simple descriptor. Similarly, for triangles with all three edges below the cutoff, the force on every vertex is a sum of two forces originating from the neighboring edges, as depicted for an acute triplet in Figure . Each force depends on the three angles of the triangle, and their “individual” representations differ only by permutations of elements of the representation vector, eq . Hence, the optimal representation for the total force on a particle should include three permutations of this representation. In fact, adding these permutations, in addition to ordering the descriptor, leads to an even better ML model, with the best learning curve shown in Figure , red symbols. In other words, this representation captures well the internal decomposition of the three-body force on three vertices of the triangle. This is the model we will be using in the remainder of this work, referring to it as a Hessian kernel model. For this model, 600 training configurations result in the average error of 0.04 kcal/mol/Å per configuration or of 0.006 kcal/mol/Å per triplet, given an average of 52 triplets per configuration.

Cutoffs and Switching Functions

To avoid numerical instabilities and ensure energy conservation in an MD simulation, the short-range forces ought to smoothly decay to zero at the cutoff distance, rcut. This is indeed the case for the LJ force field learned above; see the inset of Figure . If we, however, use the ML model for the SW fluid and predict forces for a triangle with fixed angles, the force shows some fluctuations at rcut, as shown Figure a. This is due to the limited training set size.

Figure 4

Comparison of the three-body Hessian kernel applied to the SW fluid, without and with switching function for d = 0.1 and d = 0.4 Å. In each case the ML models were based on 600 configurations. (a) Force scan of particle a along r(3). (b) Learning curves for the different ML models. One way to correct for these artifacts is to enforce smoothness via a switching function[6]which goes smoothly to zero in the transition region of rcut – d < r < rcut. In case of a standard force field one would multiply the interparticle force with the cutoff function fcut(r) with an appropriate transition region d. Applying fcut(r) to the ML model means that each entry of the mapping matrix is multiplied by fcut(r) for pairs and fcut(r) fcut(rm) for triplets. The same distance-dependent weight has to be applied to the mapping matrix of the training, L̂, and prediction, L̂*. As such, pairs and triplets that are close to the cutoff distance contribute less to the GP regression, guaranteeing a smooth decay of the prediction to zero at rcut. In Figure a we show force cross sections of triplets trained on N = 600 configurations of the three-body SW fluid using the Hessian kernel for different smoothing functions. One can see that incorporating fcut suppresses fluctuations around the cutoff. This is also reflected in the learning curves, shown in Figure b. The lowest test error per particle environment is about ΔF = 0.03 kcal/mol/Å for M = 600 configurations, obtained with d = 0.4 Å, corresponding to a mean test error per triplet of about ≈ 0.004 kcal/mol/Å. Though the accuracy is not significantly improved for the SW liquid, using the switching function is essential for coarse-grained force fields, discussed in section , to ensure that the mean force decays to zero at the cutoff.

Covariant Meshing

The error of the three-body force kernel predictions could be further decreased by increasing the training set size. This, however, is not possible due to the size of the kernel matrix, which reaches the computer memory limits: While the maximum number of configurations N is relatively small (hundreds), they involve tens of thousands of triplet interactions M. To bound the size of the kernel, we propose to mesh the representation. To make meshing efficient, we make use of the kernel covariance: All samples that are only rotated with respect to each other are assigned to the same bin. In other words, we mesh the rotationally invariant representation and supplement it with the rotation matrices. Apart from reducing the size of the kernel, meshing averages over forces, which as we will see in section results in smoother coarse-grained force fields. In practice, we precompute the pair or triplet kernel k̂bins on a fixed grid, limiting the number of kernel elements to Mbins × Mbins for training. Due to the covariance of the force kernels, one can choose an arbitrary reference orientation of the binned pair and triplet representations. For the two-body kernel, we use a linear binning with Mbins between rmin and rmax and an orientation of the pair vector along the z-axis. For the three-body kernel, we choose a uniform binning on a three-dimensional grid, as described in the Supporting Information. The binned pairs and triplets are rotated to the pairs and triplets of each sample with the help of the mapping matrix . The index i denotes the N configurations and m denotes the Mbins pairs or triplets in the set of binned representations. Each matrix element is a 3 × 3 matrix:where i denotes pairs or triplets of configuration i; is the rotation matrix that rotates pair or triplet i to the fixed orientation of the binned representation m. We apply a Gaussian smearing, w, so that a pair or triplet i is related to several binned representations. For pairswhere r and rbins are pair distances of the actual and binned pairs. For triplets the weights are a product of three Gaussians:The learning and predicting can be readily derived from eqs –9 and are given in the Supporting Information. In Figure we compare the meshed MLs for the LJ (two-body) and SW (three-body) liquids. As expected, reducing the number of bins systematically saturates the error to higher values, since it lowers the resolution of the predicted force field. Covariant meshing does allow us to include another 2 orders of magnitude in training data, but the high-resolution 1000-bin (LJ) or 9900-bin kernels (SW) do not result in smaller errors than the original kernel. We also do not observe a faster rate of learning. We therefore conclude that the covariant meshing is not needed to reduce the error for ML models of simple systems. However, as stated above, machine learning of coarse-grained force fields requires much larger training set sizes. We will therefore see that meshing is unavoidable and is, in fact, the key for building an accurate coarse-graining model.

Figure 5

(a) Learning curves of pair force kernels for the LJ fluid. Comparison of test errors for Hessian kernel with σ = 0.5 Å without and with binning with different number of bins from 200 to 1000. (b) Learning curves of three-body force kernels for the SW fluid. Comparison of test errors for Hessian kernel with switching function (eq ) with d = 0.4 Å without and with binning with different number of bins from 3600 to 9900. The optimized hyperparameters are σ1 = 0.7 Å, σ2 = 0.7 Å, and σ3 = 1.4 Å.

Tabulated Force Fields

We now use the ML force field to run MD simulations. To balance the flexibility of ML with the necessity for quick force evaluations,[31] we project the ML model on tabulated two- and three-body potentials, as described in the Supporting Information. Figure compares the distribution functions of the original SW model with the tabulated kernel predictions of the three-body forces. All curves are virtually identical, confirming that the kernel predictions projected on a tabulated potential lead to the correct sampling of the canonical ensemble. Note that the computational cost of these simulations is comparable to the original three-body SW potential, but there is no restriction on the functional form of the three-body potential, which in this case exactly matches the SW potential. The performance on a 20 core node with MPI paralellization is about 130 ns/day for the SW potential, about 140 ns/day for the tabulated kernel prediction with binning, and about 144 ns/day for the tabulated kernel prediction without binning. Evaluating the kernel prediction at each simulation time step would be more computationally expensive by orders of magnitude.[43] While a kernel prediction at each MD time step of an all-atom simulation is competitive with quantum-mechanical calculations, the benefit of a coarse-grained simulation with respect to its atomistic counterpart is not necessarily straightforward.

Figure 6

Comparison of MD simulation of tabulated three-body kernel predictions with the reference SW simulation. g(r) is the pair; p(θ) is the angular distribution functions. p(θ) is calculated using a cutoff of 3.7 Å. We use a Hessian kernel with a switching function and d = 0.4 Å, 600 configurations without binning, and 10 000 configurations with 9900 bins.

Coarse-Graining

We move on to the challenging task of learning a coarse-grained (CG) force field for molecular liquids. The idea of coarse-graining is to reduce the number of degrees of freedom by combining several atoms into one CG particle. We assume that the position of the CG particle i, , is its center of mass: We focus on a one-bead CG model of liquid SPC/E water.[44] Water is a particularly interesting liquid, as it has a strong local orientational order and therefore has attracted significant attention also regarding CG modeling. It is well-known that especially for CG water models pair potentials alone fail to fully capture the structural correlations of the atomistic system. To do so, at least three-body interaction terms are required.[45−48] Mapping the atomistic degrees of freedom to a CG representation leads to a degeneracy: many fine-grained configurations map to a single point in the conformational space of the CG model.[35] We will refer to the reference coarse-grained atomistic forces as instantaneous collective forces (ICFs):where are the atomistic forces of each atom j assigned to bead i. We now introduce the 3N-dimensional block vector of the N instantaneous collective forces of the N CG particles, . Noid and co-workers showed that the mean force, ⟨⟩, gives the required canonical distribution of coarse-grained variables.[38] The subscript denotes the canonical average over all fine-grained configurations that map to a specific CG configuration . Hence, one wants to learn the mapping between and the mean force (MF), not the ICF. In force matching, the ensemble averaging is performed implicitly when solving the linear regression problem.[39,49,50] Alternatively, the mean force can be obtained explicitly by performing additional atomistic simulations with constrained CG coordinates.[30] Here, we pursue a different approach, and average ICFs using covariant meshing, introduced in section . In fact, a representation which depends exclusively on the coarse-grained variables will automatically average the ICF into the MF. In practice, for the two-body kernel, we bin the center-of-mass distances. The resulting ML model should be identical to the two-body force matching (FM) results.[48,50] Figure illustrates the CG two-body force kernel predictions for 104 configurations with 1000 bins and a switching function, eq , with d = 0.4 Å. The use of switching is essential here: the MD simulations of ML models with and without switching are compared in the Supporting Information. We also show an average over 10 independent training sets, as it is often done in minibatch training in deep learning.[51] In all cases the binned kernel predictions are close to the FM results. Averaging over the resulting output tables helps to reduce the error by smoothing the force curves, but is not critical for MD simulations, where stochastic integrators of thermostats compensate for small force fluctuations.

Figure 7

CG SPC/E water. Two-body force matching is compared to a kernel prediction based on 104 configurations with 1000 bins and an average over 10 independent kernel predictions. Optimized hyperparameter σ = 0.5 Å; output on a grid up to rcut = 6 Å. We now apply this averaging scheme to the three-body forces and train ML on the residual force between target and pairwise model:[48]where is the ICF on CG particle i as given by eq and 2-body is the CG pairwise force.[48,50] We should make clear that 2-body is the pairwise force from force matching as we want to compare learning the same residual forces with that in our previous FM parametrization. To learn these residual forces, we use the binned three-body Hessian kernel including all permutations and a switching function, eq , with d = 0.4 Å. We use 9900 bins and hyperparameters of σ1 = 0.7 Å, σ2 = 0.7 Å, and σ3 = 1.4 Å. We then project the ML models onto tabulated force fields as described in section with a total Nout = 27 900 table entries. Finally, we average over the 10 independent table predictions. In Figure , we show the liquid water distribution functions for three different simulations: the original atomistic simulations, the force matching, and the Hessian kernel model. All schemes use a cutoff for the three-body forces of rcut = 3.7 Å and employ the same FM two-body forces with a cutoff of rcut = 12 Å.

Figure 8

CG SPC/E water system. Comparison of MD simulations for the atomistic simulations, coarse-grained model using force matching to parametrize the SW model, and Hessian kernel averaged over 10 predictions with 9900 bins. Both pair, g(r), and angular, p(θ), distribution functions are shown. The ML model is very close to the atomistic curve and reproduces the second peak of the RDF better than the FM parametrization of the SW potential. The angular distribution is also more accurate than the FM result. This stems from the additional flexibility of the fitting procedure: the FM fit imposes a fixed functional form on the radial dependence of the three-body force, while the ML model is fully flexible. The average pressure at the CG level is slightly too high at the atomistic density for both the ML and FM models, 4 and 3 kbar, respectively. We showed that one can correct for this by adding a small linear perturbation to the pair force without altering the structural properties.[48]

Software

All simulations are performed with the LAMMPS package.[52] A custom implementation of the features presented here can be found online.[53] It includes the pair styles sw/table and 3b/table as user packages. The coarse-graining procedure relies on the open-source software package VOTCA,[50] also available online.[54]

Conclusions

We have described several practical ingredients of the kernel-based regression for classical simulations of liquids. Our approach is to first train a machine learning model by using body decomposition and then project this model onto tabulated potentials. First, we explain in detail the general idea of the decomposition ansatz and use it to link the accessible particle-environment kernel with the local interaction kernels. We then concentrate on covariant and conservative kernels for pair and triplet interactions and compare three approaches: an analytic integration over the rotation group, the Hessian of a scalar kernel, and a kernel that makes predictions in a fixed axis system. The three kernels perform equally well on a two-body Lennard-Jones fluid, quickly converging to the precision of the reference data. Their limitations become evident when learning the three-body Stillinger–Weber (SW) fluid, where the lack of energy conservation is detrimental to the learning performance. Overall, we find that the best performing Hessian kernel is also the easiest to adapt and implement. Several technical points help us to further improve the learning performance. The use of a switching function close to the interaction cutoff reduces noisy features and is a useful addition for coarse-grained interactions. A postaveraging procedure over multiple ML models smoothens the resulting force fields. Covariant meshing of training configurations helps to increase the training set size. We present and implement these linear models in the general framework provided by the decomposition ansatz, which also offers the potential to build hybrid models that would learn only parts of the local interaction. Last, the extension to coarse-grained liquids has shown to be the most challenging. By switching from the potential energy surface to the many-body potential of mean force, coarse-graining introduces degeneracy in the reference configurations that map to a single point at the coarse-grained level. This introduces noise in the learning procedure. Rather than relying on extra simulations with constrained coarse-grained degrees of freedom, we propose to use covariant meshing to average the instantaneous force into a mean force. Using this approach we recover accurate two- and three-body structural distribution functions of the atomistic liquid water. The covariant meshing in combination with tabulation is a promising technique to construct computationally cheap and flexible two- and three-body models for a range of CG applications. All machine learning models and coarse-graining procedures are integrated in the VOTCA[54] and LAMMPS[53] packages which are publicly available.

28 in total

1. Effective force fields for condensed phase systems from ab initio molecular dynamics simulation: a new method for force-matching.

Authors: Sergei Izvekov; Michele Parrinello; Christian J Burnham; Gregory A Voth
Journal: J Chem Phys Date: 2004-06-15 Impact factor: 3.488

2. The multiscale coarse-graining method. VI. Implementation of three-body coarse-grained potentials.

Authors: Luca Larini; Lanyuan Lu; Gregory A Voth
Journal: J Chem Phys Date: 2010-04-28 Impact factor: 3.488

3. "Learn on the fly": a hybrid classical and quantum-mechanical molecular dynamics simulation.

Authors: Gabor Csányi; T Albaret; M C Payne; A De Vita
Journal: Phys Rev Lett Date: 2004-10-19 Impact factor: 9.161

4. The multiscale coarse-graining method. II. Numerical implementation for coarse-grained molecular models.

Authors: W G Noid; Pu Liu; Yanting Wang; Jhih-Wei Chu; Gary S Ayton; Sergei Izvekov; Hans C Andersen; Gregory A Voth
Journal: J Chem Phys Date: 2008-06-28 Impact factor: 3.488

5. Operators in quantum machine learning: Response properties in chemical space.

Authors: Anders S Christensen; Felix A Faber; O Anatole von Lilienfeld
Journal: J Chem Phys Date: 2019-02-14 Impact factor: 3.488

6. Non-covalent interactions across organic and biological subsets of chemical space: Physics-based potentials parametrized from machine learning.

Authors: Tristan Bereau; Robert A DiStasio; Alexandre Tkatchenko; O Anatole von Lilienfeld
Journal: J Chem Phys Date: 2018-06-28 Impact factor: 3.488