Literature DB >> 34930005

Energy Derivatives in Real-Space Diffusion Monte Carlo.

Jesse van Rhijn¹, Claudia Filippi¹, Stefania De Palo², Saverio Moroni².

Abstract

We present unbiased, finite-variance estimators of energy derivatives for real-space diffusion Monte Carlo calculations within the fixed-node approximation. The derivative dλE is fully consistent with the dependence E(λ) of the energy computed with the same time step. We address the issue of the divergent variance of derivatives related to variations of the nodes of the wave function both by using a regularization for wave function parameter gradients recently proposed in variational Monte Carlo and by introducing a regularization based on a coordinate transformation. The essence of the divergent variance problem is distilled into a particle-in-a-box toy model, where we demonstrate the algorithm.

Entities: Chemical

Year: 2021 PMID： 34930005 PMCID： PMC8757439 DOI： 10.1021/acs.jctc.1c00496

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Variational Monte Carlo (VMC) and diffusion Monte Carlo (DMC) are numerical stochastic approaches based on a real-space representation of a correlated trial wave function to study many-body quantum systems, including electronic structure problems. VMC calculates expectation values of quantum operators on the trial function, which in turn is optimized via minimization of a suitable cost function such as the variational energy. DMC further improves the VMC results through a stochastic implementation of the power method, which projects the lowest-energy component of the trial function. Its accuracy, in the fixed-node (FN) approximation almost invariably adopted to avoid the sign problem, is ultimately limited by the error in the nodal surface of the trial function.[1] In the past decade, the efficient calculation of analytic energy derivatives,[2,3] leveraging modern optimization methods,[4−6] spawned impressive progress in both accuracy and scope of the VMC method.[7−10] DMC largely benefits from advances in VMC because improved trial functions tend to have better nodes. However, it would be desirable to have efficient and unbiased estimators of derivatives in DMC as well to perform such tasks as the direct optimization of the nodal surface or DMC structural relaxation. This is still an open issue, with the latest developments featuring uncontrolled approximations and/or a very low efficiency.[11−13] Here, we present an algorithm to calculate unbiased energy derivatives in FN-DMC with finite variance and demonstrate it on a model where the effect of the nodes is dominant. In the Supporting Information, we also include an application to the lithium dimer.

Energy Derivatives

In both VMC and DMC, the energy is calculated aswhere R represents the coordinates of all the particles, EL(R) = HΨ(R)/Ψ(R) is the local energy of the trial function Ψ(R), and P(R) is proportional to the underlying probability distribution: in VMC, P(R) = Ψ2(R), and in DMC, P(R) = Ψ(R)Φ(R) with Φ(R) being the FN solution. The derivative with respect to a parameter λ is The variance of this naïve estimator is 0 if both Ψ and its derivative dλΨ are exact; however, for an approximate trial function, EL(R) diverges at the nodes as 1/d(R), where d = |Ψ|/∥∇Ψ∥, and the variance diverges as well.[14,15]

Regularized Estimators

In VMC, this problem was fully solved by Attaccalite and Sorella[14] with a reweighting scheme, hereafter dubbed AS, whereby one samples the square of a modified trial function Ψ̃ which differs from Ψ only for d smaller than a cutoff parameter ϵ and stays finite on the nodal surface of Ψ. A similar sampling scheme was proposed by Trail.[16] The AS estimator has the same average of the bare estimator of eq for any value of ϵ and finite variance. An alternative regularized estimator, recently proposed by Pathak and Wagner,[15] simply consists in multiplying the term in brackets of eq by the polynomial fϵ(x) = 7x6 – 15x4 + 9x2, with x = d/ϵ, whenever x < 1. This estimator, hereafter dubbed PW, has finite variance for any finite ϵ and a bias which vanishes as ϵ → 0. The polynomial is chosen in such a way to remove from the bias the linear term, shown[15] to be ∝∫01(fϵ – 1)dx, and to be continuously differentiable in (0, 1). The odd parity of fermionic wave functions near a node then implies a cubic leading term in the bias. Since various values of ϵ can be used in the same simulation, the bias can be eliminated at no cost by extrapolation. The PW estimator has been proposed for parameter gradients in the VMC optimization of Ψ, but it is equally applicable to VMC interatomic forces and, as will be shown, to generic derivatives in DMC. We introduce a third regularized estimator, which we denote as “warp” by analogy with the space–warp transformation of ref (17) devised to reduce the statistical noise of the forces. There, as a nucleus is displaced, a transformation is applied to the coordinates of nearby electrons, in such a way to maintain the electron–nucleus distances approximately constant. Here, the goal is to maintain constant the value of d(R) when the nodal surface is displaced by a variation of the parameter λ. In this way, the diverging term in the local energy does not change, and the variance of the derivative is finite. The warp transformation, illustrated in Figure , is defined aswhere primed quantities are calculated for the value λ′ of the parameter, n′ is the unit vector in the direction of ∇Ψ′(R), and u(d) is a cutoff function with support [0, ϵ] which decreases smoothly from 1 to 0, restricting the warp transformation to a region close to the nodal surface. We use the quintic polynomial with zero first and second derivatives at the boundaries of the support.

Figure 1

Schematic picture of the warp transformation. The curve S is the nodal surface for the value λ0 of the parameter λ. The value of d for the current configuration R is pictorially represented as the distance from S (this is strictly true only if Ψ is linear, i.e., close enough to the node). When a variation of λ from λ0 to λ′ shifts S to S′, the value of d changes to d′. Hence, we displace R by an amount Δ = d – d′ to R̅ in the direction of sign(Ψ′)∇Ψ′(R) so that the value of |Ψ′(R̅)|/∥∇Ψ′(R̅)∥ is approximately equal to d. For a finite increment of λ, the energy iswhere J = detJ = det∂R̅/∂R is the Jacobian of the transformation. For the practical implementation, we formulate a differential expression of the finite energy difference E′ – E so that no actual transformation is needed: the analytic derivative dλE, calculated at the value of the parameter λ = λ0, is This warp regularized estimator has finite variance and no bias for any value of ϵ (see the Supporting Information). Note that all the functions in eq are evaluated at λ = λ0, where R̅ = R and J = 1. Therefore, the warp transformation only contributes to the estimator through the derivatives ∂λR̅|λ and ∂λln J|λ, while the sampling is done over one and the same distribution P(R) for any parameter we may vary. Furthermore, for λ = λ0, the cofactors of the Jacobi matrix J are M = δ, and the seemingly awkward derivative of the Jacobian greatly simplifies, ∂λln J|λ = ∑M∂λJ|λ = ∑∂λJ|λ, so that the implementation of eq is not overly complicated. In particular, most of the derivatives needed are already present—or very similar to those already present—in VMC codes with analytic derivatives for structural and full variational optimization. The only exceptions are the off-diagonal components of the Hessian ∂2Ψ/∂R∂R and their derivatives with respect to λ, which contribute to ∂λJ. These extra derivatives are only needed when the nodal distance is smaller than the cutoff ϵ, that is, for a very small fraction of the sampled configurations. Furthermore, we will show (heuristically) that the bias incurred by neglecting these terms can be extrapolated out at no cost.

Variational Monte Carlo

Before addressing the derivatives in DMC, we compare the three regularized estimators PW, AS, and warp in VMC. To this purpose, it is expedient to consider a system stripped of all complexities of external and interparticle potentials so that we can focus exclusively on the divergence of the local energy at the nodal surface. Our toy model is a free particle in an elliptic box with hard walls, meant to represent the configuration of a generic system within a nodal pocket. Atomic units are used throughout. We choose Ψ = Ψ0(x,y) = a2 – x2/C – y2/(C – 1) with C = [cosh(1)]2 = 2.3810978, which is positive inside the ellipse, vanishes at the border, and is not the true ground state. Therefore, the ellipse is defined through the wave function, and we take the derivative with respect to the parameter a, which changes the size of the ellipse at constant eccentricity. The average and variance of the various estimators, calculated by quadrature, are shown in Figure . None of the regularized estimators entails uncontrolled approximations. In particular, although there is no rigorous way to establish the range where the leading correction in ϵ is sufficient, or the number of powers in ϵ needed over an arbitrary range, PW can be accurately extrapolated to the unbiased result. Here, the bias of the PW estimator has a leading contribution of ϵ2 because Ψ0 does not have odd parity across the node. The second-order bias can be removed with a different choice of the polynomial, for example, f̃ϵ(x) = 60x2 – 200x3 + 225x4 – 84x5, at the expense of a larger variance for the given value of the cutoff ϵ. The right panel shows that the relative efficiency may depend on the system at hand; in this example, it varies over the range of a considered.

Figure 2

VMC calculations of dE, with integrals performed by quadrature. Left panel: derivative at a = 1 obtained with the warp, PW, and AS estimators as a function of the cutoff ϵ, compared to the “exact” result dEfit, defined as the derivative of a fit to E(a) calculated separately for several values of a. AS and warp, shown here only near ϵ = 0.2, are unbiased; PW can be extrapolated to the unbiased result either including only the leading term in ϵ, here ϵ2, on a sufficiently small range (dashed line) or using a sufficiently large number of terms on an extended range (solid line). Middle panel: variance (in a logarithmic scale) of the various regularized estimators as a function of the cutoff. As ϵ vanishes, all schemes regress to the infinite variance of the bare estimator. Right panel: variance (in a logarithmic scale) of the various regularized estimators as a function of the parameter a. For the warp and AS estimators, ϵ = 0.2; for PW, data are reported for ϵ = 0.0125, 0.025, 0.05, 0.15, and 0.2 in the order of decreasing variance. Common data in the middle and the right panels are circled. The bias of the warp estimator when the off-diagonal elements of the Hessian are neglected is compared to the bias of the PW estimator in Figure . Since we need a non-diagonal Hessian to start with, we consider (i) a rotated, more eccentric ellipse with a further non-symmetrical distortion of the wave function, and (ii–iv) the positive lobe of a wave function limited to half of the original ellipse by the nodal line y + sin(αx) = 0, with α = 0.5, 1, and 1.5 (see the contour plots in Figure ). The derivative is taken with respect to a for Ψ1 and with respect to α for Ψ2–Ψ4. Within this (very limited) set of test cases, the bias is smaller and less system-dependent for the warp than for the PW estimator. The important result is that neither involves uncontrolled approximations as both can be extrapolated to the unbiased result in a single run. We have also verified that the warp estimator with the full Hessian is unbiased for finite ϵ.

Figure 3

Bias of the VMC derivative for PW and warp with approximate Hessian, calculated by quadrature for various wave functions Ψ1–Ψ4. In each case, we show a contour plot of the normalized Ψ with level lines from 0 in steps of 0.2 (the contour plot of Ψ0 in the top left inset defines the (x, y) scale). The colored labels near each curve indicate the powers in ϵ needed to extrapolate to the unbiased value with a five-digit accuracy. The bias of PW for Ψ2–Ψ4 has a leading term ϵ3 because of the use of the modified polynomial f̃ϵ. Empirically, we see that the warp estimator has a cubic leading term in all cases.

Diffusion Monte Carlo

We now consider the derivative in DMC. The FN-DMC algorithm is a branching random walk of many weighted walkers, generated by a short-time approximation G(R′,R) to the importance-sampled Green’s function, which asymptotically samples the distribution P(R) = Ψ(R)Φ(R).[1] The problem with the derivative estimator, eq , is the presence of the logarithmic derivative of P(R), which is not a known function of R. However, P is the marginal distribution of the joint probability density Pjoint of the whole random walk, which does have an explicit expression as a product of Green’s functions,where P0 is the (largely arbitrary) probability distribution of the initial configuration R0. Therefore, it is sufficient to consider the estimator of dλE in eq as an average over the whole trajectory of the random walk, rather than over the current configuration, to bring an explicitly known probability distribution to the fore.[13] This is similar to the calculation of forces in path integral Monte Carlo.[18] In practice, the inclusion of the entire trajectory in the estimator is not necessary. As shown in ref (13), the logarithmic derivative of the DMC density distribution P in the estimator of eq can be replaced with the summationover the last k steps of the random walk, with R being the current configuration R of eq . The omitted term,[13] ⟨[EL(R) – E]dλ ln P(R)⟩, vanishes for sufficiently large k because EL(R) and dλ ln P(R) become statistically independent variables and ⟨EL – E⟩ = 0. In eq , G(R′,R) is the transition rule from R to R′ of the random walk. It includes a Metropolis test to reduce the time-step error;[1] therefore, R′ is the configuration proposed when the walker is at R, and the next configuration R is R′ or R if the move is accepted or rejected, respectively. Note that in the formal expression of Pjoint, eq , the arguments of Green’s functions are integrated over, whereas in the contribution to the estimator of the derivative, eq , they are the particular values of the particles’ coordinates effectively sampled by the random walk. Correspondingly, the actual value taken by G(R′, R) iswhere T is the a priori transition probability, p is the probability of accepting the move, and W is the branching factor (see below). The inclusion of rejected configurations in eq and of the factor p or 1 – p in eq in the derivative of the full Green’s function is instrumental to obtain an estimate of dλE completely consistent with the DMC energy E(λ) calculated at the same time step. Their omission still can give an unbiased result in the limit τ → 0, but it may cause an unacceptably large time step error on the derivative.[13] The functions T, p, and W are standard[19]Here, τ is the time step, V = ∇lnΨ is the so-called velocity, and is the damping factor of its divergence near the nodes; the logarithm of the branching factor is also damped at the nodes, S(R) = [Eest – EL(R)]F(R) – ln(N/N0), where Eest is the best current estimate of the energy and N and N0 are the current and the target number of walkers. The presence of Eest, the Monte Carlo estimate of E, in the branching term S implies that the calculation of dλE includes a contribution proportional to dλE itself, This does not require prior knowledge of the result: we can calculate the factor F̅ and (dλE)0, the derivative when the contribution of eq is omitted, and combine them to get the unbiased result as dλE = (dλE)0 + F̅dλE, or dλE = (dλE)0/(1 – F̅). The main technical complication in DMC is the need to store a few quantities for each derivative and for each value of ϵ over the last k steps of each walker, namely, dλ ln G and ELdλ ln G, to implement eq with the probability distribution P of eq and F and ELF to implement eq . The AS regularized estimator has been applied to approximate DMC forces in refs (13) and (20). However, it pushes a finite density of walkers on the nodes, which is presumably not optimal in DMC. Furthermore, unlike in VMC, it requires[13] an extrapolation to ϵ → 0 which—at difference with the PW and warp estimators—cannot be done in a single run. Therefore, for the DMC derivatives, we consider only the PW and the warp estimators. For the former, we insert eq into eq and multiply each term of the resulting summation by a polynomial fϵ calculated at the appropriate configuration. For the latter, we just insert eqs in 5. For the warp estimator, the argument of the Jacobian needs some care: we evaluate ∂λ ln J in the proposed configuration R′ for both accepted and rejected moves; alternatively, we can include ∂λ ln J only for the accepted moves, provided the warp transformation is not considered in the derivatives at R′ when the move is rejected. We present results of DMC simulations with the wave function Ψ0, time step τ = 0.1, and target number of walkers N0 = 100. We have verified that in the limit τ → 0, we recover the analytic results[21] for the ground-state energy, E = 2q/a2 with q = 0.825352549, within a statistical error of less than 1 part in 10,000. To this level of accuracy, the population control bias[1] is negligible. Figure exposes the drawback of the bare estimator: the probability distribution p(ξ) for the block averages of the derivative features a right heavy tail, consistent with the expected[11,22] leading decay ∝|ξ – ξ0|–5/2. In the data trace, shown in the inset, heavy tails result in large spikes that would mar the smooth convergence of structural or variational optimization. Meaningful averages and statistical uncertainties of heavy-tailed distributions with known tail indices can be computed with a tail regression analysis.[23] This technique, however, requires a heavy post-processing not very practical for large-scale applications. The regularized estimators PW and warp, instead, have nearly Gaussian distributions amenable to standard statistical analysis with significantly smaller statistical errors and, most importantly, no large spikes in the data trace.

Figure 4

Histogram of 46,000 block averages, each of which taken over 10,000 steps of 100 walkers in a DMC calculation of dE|. The warp and PW estimators (with ϵ = 0.2 and 0.0125, respectively) have nearly Gaussian distributions. The bare estimator has a heavy-tailed distribution; the largest value in the present sample exceeds x = 10. Inset: the data trace of the first 1000 block averages. The central result of this work is shown in Figure . We calculate the energy E and its derivative dE for a set of values of a and compare the DMC derivatives with the derivative of a fit to the DMC energies. All the estimators (bare, extrapolated PW, and warp) are unbiased, which demonstrates the correctness of the proposed algorithm. For comparison, the variational drift-diffusion (VD) approximation of ref (13) gives for a = 1 a bias of ∼0.2, twice the full scale of the right panel, and it gets even worse for smaller time steps (although the VD approximation is devised to exploit good wave functions, while our Ψ0 is poor on purpose to test the unbiased estimators).

Figure 5

DMC calculations of dE with τ = 0.1. Left panel: the warp and PW results compare favorably with the “exact” result, defined as the derivative dEfit of a fit to DMC calculations of E(a). The analytic result dE = −4q/a3 differs from dEfit because the latter has a finite time step error. Right panel: the difference of the calculated derivatives with dEfit is shown on an expanded scale by subtracting the latter (hence, the solid black line is minus the time step error of dEfit). The PW derivatives are extrapolated to ϵ → 0. We also include the bare-estimator derivatives, with averages and statistical uncertainties obtained with the tail regression estimator analysis toolkit made available in ref (23). Small horizontal shifts are applied to the same—a data for clarity. For given a, the same run is used for all the estimators. Therefore, the statistical error is a direct measure of the square root of their relative efficiency. The statistical errors of the bare, PW, and warp estimators, averaged over the values of a shown in Figure , are in the ratio of 4.9:2.4:1. These figures may belittle the PW estimator somewhat because in this particular example, a large quadratic bias needs to be eliminated by extrapolation, but they convey the relevant message that both PW and warp are significantly more efficient than the bare estimator.

Conclusions

In summary, we have presented an algorithm to calculate unbiased, finite-variance derivatives in DMC. The estimate of the derivative with respect to a given parameter is fully consistent with the dependence on that parameter of the FN energy, calculated with the same time step. The tail regression statistical analysis[23] can cope with the problem of the infinite variance of the bare estimator. Alternatively, and more efficiently, both the recently proposed PW regularization[15] and the warp regularization introduced in this work can be used to good effect to eliminate the divergence of the variance.

15 in total

1. Weak binding between two aromatic rings: feeling the van der Waals attraction by quantum Monte Carlo methods.

Authors: Sandro Sorella; Michele Casula; Dario Rocca
Journal: J Chem Phys Date: 2007-07-07 Impact factor: 3.488

2. Alternative sampling for variational quantum Monte Carlo.

Authors: J R Trail
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2008-01-08

3. Heavy-tailed random error in quantum Monte Carlo.

Authors: J R Trail
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2008-01-08

4. Alleviation of the Fermion-sign problem by optimization of many-body wave functions.

Authors: C J Umrigar; Julien Toulouse; Claudia Filippi; S Sorella; R G Hennig
Journal: Phys Rev Lett Date: 2007-03-15 Impact factor: 9.161

5. Stable liquid hydrogen at high pressure by a novel Ab initio molecular-dynamics calculation.

Authors: Claudio Attaccalite; Sandro Sorella
Journal: Phys Rev Lett Date: 2008-03-17 Impact factor: 9.161

6. Chaotic versus nonchaotic stochastic dynamics in Monte Carlo simulations: a route for accurate energy differences in N-body systems.

Authors: Roland Assaraf; Michel Caffarel; A C Kollias
Journal: Phys Rev Lett Date: 2011-04-11 Impact factor: 9.161

7. Tail-regression estimator for heavy-tailed distributions of known tail indices and its application to continuum quantum Monte Carlo data.

Authors: Pablo López Ríos; Gareth J Conduit
Journal: Phys Rev E Date: 2019-06 Impact factor: 2.529

8. Simple formalism for efficient derivatives and multi-determinant expansions in quantum Monte Carlo.

Authors: Claudia Filippi; Roland Assaraf; Saverio Moroni
Journal: J Chem Phys Date: 2016-05-21 Impact factor: 3.488

9. QMCPACK: Advances in the development, efficiency, and application of auxiliary field and real-space variational and diffusion quantum Monte Carlo.

Authors: P R C Kent; Abdulgani Annaberdiyev; Anouar Benali; M Chandler Bennett; Edgar Josué Landinez Borda; Peter Doak; Hongxia Hao; Kenneth D Jordan; Jaron T Krogel; Ilkka Kylänpää; Joonho Lee; Ye Luo; Fionn D Malone; Cody A Melton; Lubos Mitas; Miguel A Morales; Eric Neuscamman; Fernando A Reboredo; Brenda Rubenstein; Kayahan Saritas; Shiv Upadhyay; Guangming Wang; Shuai Zhang; Luning Zhao
Journal: J Chem Phys Date: 2020-05-07 Impact factor: 3.488

10. TurboRVB: A many-body toolkit for ab initio electronic simulations by quantum Monte Carlo.

Authors: Kousuke Nakano; Claudio Attaccalite; Matteo Barborini; Luca Capriotti; Michele Casula; Emanuele Coccia; Mario Dagrada; Claudio Genovese; Ye Luo; Guglielmo Mazzola; Andrea Zen; Sandro Sorella
Journal: J Chem Phys Date: 2020-05-29 Impact factor: 3.488