Literature DB >> 36048172

Accelerated Convergence of Contracted Quantum Eigensolvers through a Quasi-Second-Order, Locally Parameterized Optimization.

Scott E Smart1, David A Mazziotti1.   

Abstract

A contracted quantum eigensolver (CQE) finds a solution to the many-electron Schrödinger equation by solving its integration (or contraction) to the two-electron space─a contracted Schrödinger equation (CSE)─on a quantum computer. When applied to the anti-Hermitian part of the CSE (ACSE), the CQE iterations optimize the wave function, with respect to a general product ansatz of two-body exponential unitary transformations that can exactly solve the Schrödinger equation. In this work, we accelerate the convergence of the CQE and its wave function ansatz via tools from classical optimization theory. By treating the CQE algorithm as an optimization in a local parameter space, we can apply quasi-second-order optimization techniques, such as quasi-Newton approaches or nonlinear conjugate gradient approaches. Practically, these algorithms result in superlinear convergence of the wave function to a solution of the ACSE. Convergence acceleration is important because it can both minimize the accumulation of noise on near-term intermediate-scale quantum (NISQ) computers and achieve highly accurate solutions on future fault-tolerant quantum devices. We demonstrate the algorithm, as well as some heuristic implementations relevant for cost-reduction considerations, comparisons with other common methods such as variational quantum eigensolvers, and a Fermionic-encoding-free form of the CQE.

Entities:  

Year:  2022        PMID: 36048172      PMCID: PMC9476659          DOI: 10.1021/acs.jctc.2c00446

Source DB:  PubMed          Journal:  J Chem Theory Comput        ISSN: 1549-9618            Impact factor:   6.578


Introduction

The contracted Schrödinger equation (CSE)[1−3] describes the projection of the molecular Schrödinger equation for a N-electron system onto a two-electron space, which generates the stationary-state condition of the two-electron reduced density matrix (2-RDM), instead of the wave function. Satisfaction of the anti-Hermitian part of the contacted Schrödinger equation (ACSE) by a quantum state is equivalent to its invariance, with respect to all infinitesimal two-body unitary transformations.[4−6] Solution of the ACSE for the 2-RDM was initially challenging because the equation is dependent on both the 2-RDM and three-electron RDM (3-RDM), making it indeterminate without additional information. However, the ACSE has been practically solved by reconstructing the 3-RDM by its cumulant expansion as a functional of the 2-RDM.[4,5,7,8] While the ACSE does not strictly imply the CSE,[9] the accuracy of its solution has been observed to approach that of full configuration interaction in the absence of reconstruction approximations.[4,10−12] The ACSE has been applied to computing strongly correlated ground and excited states in both chemical reactions and conical intersections.[12−15] Recently, the ACSE has been solved on quantum devices with applications to hydrogen chains, as well as the benzyne isomers.[10,16,17] On a quantum computer, the ACSE algorithm, known as a contracted quantum eigensolver (CQE), iteratively minimizes the residual of the ACSE, in contrast to the variational quantum eigensolvers (VQE) that minimize the energy, with respect to parameters according to the Rayleigh-Ritz variational principle. Instead of propagating only the 2-RDM as in the classical algorithm, we propagate a wave function through state preparation on a quantum computer. Thus, we avoid reconstructed RDMs, as the 2-RDM can be directly measured from the quantum state while the ACSE residual can be directly measured from an auxiliary quantum state, or a measured 3-RDM. The resulting ACSE algorithm is a potentially exact RDM approach that scales polynomially in the size of the molecular system. In this paper, we accelerate the convergence of the CQE for the ACSE by developing quasi-second-order algorithms with superlinear convergence. Convergence acceleration is important for avoiding the accumulation of noise on near-term intermediate-scale quantum (NISQ) computers, as well as achieving highly accurate solutions on future fault-tolerant quantum devices. We draw upon research on optimization algorithms on manifolds,[18−20] which have applications across science and engineering from vision to robotics, as well as related algorithms in electronic structure for orbital optimization.[21,22] We specifically develop and implement a quasi-Newton scheme with the Broyden–Fletcher–Goldfarb–Shanno update and nonlinear conjugate gradient algorithms. The quasi-second-order algorithms avoid storage of the Hessian matrix while providing superlinear convergence. We also demonstrate convergence properties as well as some approximate implementations of the search direction and finally compare the resulting CQE algorithms with classes of common quantum algorithms, including the variational quantum eigensolver.

THEORY

We discuss the solution of the ACSE via CQE in section , the local parametrization of the wave function in section , the quasi-second-order accelerations in section , and resource optimization in section .

Solution of the ACSE

Given a molecular system with Hamiltonian , we can write a contraction of the Schrödinger equation onto the two-particle space, known as the contracted Schrödinger equation (CSE).[1,3] The CSE can be split into a Hermitian and an anti-Hermitian part, the latter of which is called the anti-Hermitian CSE, or ACSE:[4,5] Here, is the residual of the ACSE, which is necessarily zero when |Ψ⟩ is an eigenstate of the wave function. We can also obtain the ACSE in eq by considering unitary transformations generated by a parameter ϵ and a two-body anti-Hermitian operator : The derivative of the energy, with respect to the elements of the operator 2P, yieldswhich we observe is equal to the residual of the ACSE. In the solution of the ACSE,[5] the energy and 2-RDM can be expressed as a system of differential equations, in terms of a discretized, timelike parameter λ that controls the transformation of the implicit wave function to minimize the energy. As λ increases, we approach a solution of the ACSE. On a quantum computer,[10] we have a potentially exponential advantage, in terms of simulating the exact 2-RDM (or 3-RDM) and 2A matrices.

Local Parameterization of the Contracted Quantum Eigensolver

We can describe the generic problem in the variational quantum eigensolver[23−26] for finding the ground-state wave function of a quantum system aswhere represents a vector of ν real parameters and we assume the wave function is properly normalized. Upon convergence, the following equation is satisfied for all k,indicating that the gradient, with respect to all parameters, vanishes. However, there can be significant problems associated with describing the appropriate parametrization of Ψ. The exact solution of the wave function scales exponentially, which might imply that an exponential number of parameters is necessary in a variational scheme. Using an operator such as unitary coupled cluster provides an exponential ansatz, but because the mapping from the Euclidean space to the unitary space is nonlinear, there can be singularities or unphysical minima in the parameter space.[18,27,28] In addition, it has been shown that high-dimensional parametrizations in a random variational ansatz generate barren plateaus where the variance in the energy gradients vanishes as the system size increases.[29,30] Because an exponential scaling parametrization is not feasible for larger systems, and limited excitation ansatz such as UCC singles and doubles are not sufficiently accurate, a slew of iterative schemes based on the VQE and UCC schemes that deviate from the traditional CC formalism have been proposed, providing scalable approaches that generally repeat or extend upon certain ansatz fragments.[31−33] One such approach is the adaptive derivative assembled pseudotrotterized VQE method, or ADAPT-VQE, which takes elements of the ACSE (or generalized UCCSD) to generate an increasingly more-complex variational problem.[34] In the CQE approach, we instead forego the global parametrization of the state and use an atlas of local parametrizations,[18] describing the trajectory of the state, with the parameter space being dependent on the contracted eigenvalue equation (which, here, is the ACSE). The term “local” here refers to the fact that our parameters are describing information solely in a small neighborhood around the state at the current iteration and do not contain parameters from previous iterations. Each local parametrization in the atlas is concretely generated by the exponential transformation of a two-body anti-Hermitian operator, providing a map between the Euclidean parameter space and the space of unitary transformations. Importantly, this compact mapping avoids oddities or singularities that can arise from a nonlinear mapping. Thus, the optimization is no longer defined by a fixed reference set of parameters but rather by a local parametrization at each iteration n with the of eq :where μ denotes the dimension of the two-body operator space, and the norm on P indicates that we are staying within a neighborhood around the wave function (replacing the ϵ in previous formulations). Our optimization is then satisfied if we have a Ψ such that, for all two-body operators,which implies that we have fulfilled the ACSE. At each iteration n, our current state is mapped to a new state through the exponential mapping. On the classical computer, the wave function is replaced by RDMs, and the transformation requires an approximate reconstruction of the 3-RDM, in terms of the 2-RDM. On a quantum computer, the transformation can be realized through standard means of exponential computations, where we transform the Fermionic operator to the Pauli basis and implement the exponential, in terms of elementary gates through its trotterized form. While it might be thought that the restriction of to a set of two-body operators is too restrictive, as clearly the two-body operator space does not locally parametrize the unitary group, the current approach iteratively constructs higher-order excitations from the reference wave function[5,9,35] (see section II.E. of ref (5) for a discussion of the ACSE ansatz). As we discuss in the next section, because the ACSE is solved iteratively, we can construct quasi-second-order algorithms if we choose each in the ansatz by considering not only the gradient of the current iteration but also the gradients of previous iterations, which contain information about the curvature.

Quasi-Second-Order CQE Algorithms

Previous algorithms for solving the ACSE use path-following[5] or descent[8] algorithms based on the gradient. While these algorithms are robust to reconstruction errors, gradient-descent algorithms are first-order algorithms with generally linear convergence. To accelerate convergence, we can consider choosing the search direction 2P by a second-order approach such as the Newton–Raphson method:where H is the Hessian matrix. Within a certain region of the state space, we are guaranteed quadratic convergence.[36] However, the elements of the Hessian are evaluated according towhich requires the 4-RDM or its approximation. To address this issue, we consider the BFGS quasi-Newton method, which uses the Broyden–Fletcher–Goldfarb–Shanno (BFGS) update within Davidon’s method[37,38] and is summarized in Table . At each step of the BFGS method, we update an approximate Hessian matrix through a secant equation where the update is designed to keep the Hessian positive definite. By including a direction based on the approximate Hessian, the BFGS method achieves a superlinear rate of convergence near the solution.
Table 1

Quasi-Newton CQE Algorithm

Quasi-Newton CQE
Set 0 → n
Initialize |Ψ0⟩, 2A0, and B0
Continue until ∥2An∥ < δ:
Step 1: Update 2Pn = Bn–12An
Step 2: minαnEnn)
◦ |Ψn+1⟩ = eαnnn
sn = αn2Pn
Step 3: Evaluate 2An+1
yn = 2An+12An
Step 4: Calculate Bn+1–1
Step 5:n + 1 → n
At any particular iteration, given a wave function |Ψ⟩, the ACSE residual , and the inverse of the approximate Hessian B, we define the step direction as We next minimize the energy by a line search to obtain a direction α that satisfies the conditions of sufficient descent and curvature (Wolfe conditions). In our local frame, we have the following auxiliary BFGS functions: s = and y = . Using these, we calculate , according to the BFGS formula: We then increase n and continue until the gradient norm ∥2A∥ satisfies a convergence threshold. Step 4 in Table can be replaced with a suitable update replacement, and, in the results, we demonstrate the use of a limited-memory BFGS implementation, denoted l-BFGS.[38] In practice, because of the symmetry of 2-RDM elements, we can store the 2A matrix as a vector in a compact representation, and then store the B–1 matrix exactly. For larger systems, it is likely that even this would be prohibitive, and instead the limited-memory approach would be necessary, where we store only k previous steps, which is equivalent to k 2-RDMs. On a classical device, the algorithm would be very similar, although instead of updating the wave function, we would have a 2-RDM update step. Both the 2-RDM and 2A updates would require the classical evaluation of the ACSE and hence, a reconstructed 3-RDM. On a quantum computer, we have significant advantages in that we are not reconstructing the 3-RDM. This means that we generate (up to statistical and noise-related errors) pure 2-RDMs at each step, and we do not have to consider the N-representability of the 2-RDM, or the step size in the solution of the differential equations. An important note is that if we want to evaluate the ACSE’s residual in the classical part of the algorithm rather than in the quantum part by tomography of an auxiliary state, we must be sure that the residual is sufficiently accurate to estimate the curvature. While an approximated 3-RDM can give good enough information to obtain chemical accuracy in many instances, for rigorous convergence, the cumulant portion of the 3-RDM should be measured by tomography. Practically, this would entail alternating evaluations of the 3-RDM (for the ACSE) and the 2-RDM (for energy evaluations). As an alternative to the quasi-Newton approaches, we can instead use the nonlinear conjugate gradient (CG) approaches.[38] The nonlinear CG method does not require the storage of a Hessian or approximate Hessian, and instead involves only a simple update step governing the contribution of the previous search direction . A description of the generic CQE algorithm with a CG solver is described in Table . There are numerous modifications to the conjugate gradient method, which we do not explore here.[39] Such modifications include preconditioning schemes and modified update coefficients, as well as additional criteria on resets and step lengths.
Table 2

Conjugate Gradient CQE Algorithm

Conjugate Gradient CQE
Set 0 ← n
Initialize |Ψ0⟩, 2A0
While ∥2An∥ > δ:
Step 2: minαnEnn)
Step 3: Evaluate 2An+1
Step 4: Calculate βn+1
Step 5:2Pn+1 = −2An+1 + βn+12Pn
Step 6:n + 1 → n

Resource-Optimized CQE Search Directions

In light of the current representation of the CQE algorithm, one can see that the resources that are demanded on the quantum computer will be heavily dependent on the selection and implementation on the search direction, as each term must be implemented individually. However, any modification to the search direction will be detrimental to the rate of convergence (and can potentially also negate the theoretical results). Thus, we would like to find a tradeoff between potentially reducing the number of terms and preserving a descent direction.[38] Therefore, we focus on two approximations that still preserve the essential nature of the ideal CQE approach: p-depth and operator sparsification. First, we introduce a more formal way of describing the ACSE ansatz at a given iteration n. Let be an ordered set of anti-Hermitian two-body operators: In the exact CQE approach, each iteration adds a new two-body operator to the set, and these collectively are the stored ansatz vectors. Using this, we can write the ACSE ansatz at the Nth iteration aswhere m is the total number of two-body exponential operators we are implementing. We define the p-depth as follows. Given a search direction , we iterate over the elements and the set of operators from l = m – p to l = m. If an element was included in a previous operator, we update that two-body operator as From the definition, the element will only be in one of the p-previous operators. If elements cannot be assigned to a previous , a new operator is appended. The ACSE only provides information on the gradient around the exterior of the wave function, and because, generally, each iteration does not commute with previous iterations for p > 0, this can be considered an approximate scheme of implementing the CQE. As an example, if all terms are included in the initial , then we have a single exponential form in the approximate linear region, and any p-depth greater than 1 will be equivalent. Note that such a case is similar to a generalized UCCSD ansatz with different ordering under the first-order trotterization; however, the method of updating does not reflect the true gradient terms and, as such, provides an approximation of the operator in the linear region. The second approach is an operator sparsification scheme, which effectively reduces the number of new terms appended at each step. The method of ordering is of particular importance, and we investigate two options. First, we can sort the elements according to their absolute value, which is important for implementation purposes (i.e., we can remove the smallest elements first). Alternatively, we can sort elements according to the energy contribution in the descent direction. This value is obtained element-wise as the product . For gradient-descent approaches, these two criteria are equivalent, and, in previous work, only the former approach was used. We can also control the number of terms that are removed through a constant c ∈ [0, 1]. This is a constant scaling factor where c = 1 (strict truncation) indicates only the largest term in is included and c = 0 (no truncation) indicates that all terms in are included. Another potentially useful control instead of a constant scaling factor (which we do not explore here) would be to limit explicitly the number of terms included in each term. The last approximation is a boolean INCLUDE = {True, False} option, whereby elements of the search direction that would be assigned to the previous p-operators due to the p-depth specified are included in the sparsification scheme. The controls here can be likened to techniques used in trotterization or compact forms of the two-body operator, with a goal of efficiently implementing the two-body ansatz.[40−42] They can be applied broadly to any optimizer instance, and could have different effects with different optimizers.

APPLICATIONS AND RESULTS

In this section, we look at applications of the optimized CQE scheme. First, we investigate the role of different optimizers. Second, We present discuss schemes related to practical implementations and modifications to the search direction. Third, we show results for linear hydrogen chains and lithium hydride with a CQE utilizing the unencoded ACSE, and finally, we compare our results to variational quantum eigensolvers, including the ADAPT-VQE algorithm.

Implementation of Optimized CQE

We begin by comparing the exact (up to a first-order trotterization) implementations of different methods for the H4 system at different bond lengths, corresponding with differing degrees of electron correlation. In this work, our convergence criteria is typically taken to be the Frobenius norm of the 2A matrix. For the conjugate gradient approach, we do not utilize any preconditioning, and we use the update strategies of Fletcher and Reeves.[38] The limited BFGS strategy utilizes three previously stored steps. Figure displays our results. Further computational details are included in the Appendix.
Figure 1

Comparison of methods for generating at different H–H distances for linear H4. Near equilibrium, similar patterns can be seen, but away from equilibrium, gradient-descent-based approaches slow down considerably. In particular, when optimizing the 1-d step size, according to a quadratic fit, an oscillatory pattern can be seen in the norm of 2A, indicating a potential valley in a direction between the oscillating gradients. The conjugate gradient approach allows for slightly more flexibility but exhibits stronger oscillations in the gradient. The quasi-Newton approaches offer quick convergence across all regions of dissociation.

Comparison of methods for generating at different H–H distances for linear H4. Near equilibrium, similar patterns can be seen, but away from equilibrium, gradient-descent-based approaches slow down considerably. In particular, when optimizing the 1-d step size, according to a quadratic fit, an oscillatory pattern can be seen in the norm of 2A, indicating a potential valley in a direction between the oscillating gradients. The conjugate gradient approach allows for slightly more flexibility but exhibits stronger oscillations in the gradient. The quasi-Newton approaches offer quick convergence across all regions of dissociation. While for the bond distance near equilibrium D = 1 Å, the different optimizers in the ACSE appear to have no apparent advantage, the more correlated distances show strong deviations between the different approaches. Particularly, in the 1.5 Å case, the convergence flattens when gradient-descent-based approaches are used. For the 2.0 Å case, this is accentuated, as we appear to have entered a region where the gradient is quite shallow. The quadratic step appears to be even slightly worse in optimizing the norm of the 2A matrix than a simple gradient descent. The conjugate gradient approach appears to be better but shows strong oscillations in the gradient norm (note if we loosen the update parameter, these oscillations decrease, but we do not observe a significant increase in convergence). The most successful approaches are the quasi-Newton methods, which are able to achieve high accuracy results in only a few iterations. The l-BFGS offers a reliable approach as well, with a quality between the conjugate gradient and full BFGS methods. While this approach is the closest to the ideal implementation and offers some advantages for a classical quantum simulation (i.e., similar to the classical ACSE, except instead of the 2-RDM, we only need to store the state vector), for near-term applications, there are several constraints. In a noiseless regime (note that the importance of noise is also relevant, but, since this also changes the optimization strategy, it is not addressed directly here), addressing the compactness or efficiency is an important problem. In particular, we would however like to know if we can reduce the amount of terms that are added at each iteration, using the ideas mentioned above. These result in an appproximate search direction, which we constrain to represent a descent direction. We use the 1.5 Å case, which contains nontrivial electron correlation and starts to differentiate between different optimizers, to look at ways that we can modify the search direction. Table explores the number of iterations and CNOT gate cost for a variety of options with the BFGS optimizer. In particular, we examine the absolute norm or energy contribution for the sparsity operator acting on the search direction, as well as the inclusion or exclusion of terms that appear in the p-depth addition scheme (specified by include). For each of these criteria, we look at different p-depths and values of the sparse scheme.
Table 3

Reported Number of Iterations and CNOT Gates (in brackets, × 103) for Approximate Implementation Schemes for the CQE with BFGS Optimizer with Linear H4 at a Distance of 1.5 Åa

 |Pnij;kl|
(AnPn)klij
 INCLUDE = False
INCLUDE = True
INCLUDE = False
INCLUDE = True
sparse[c]p-depthp-depthp-depthp-depth
 97531975319753197531
0.9*****7673*65117××283××47486080107
      [1.8][2.5] [4.0][8.9]  [18]  [1.6][2.2][4.0][4.0][8.1]
0.5*****252542**71139123119752635353963
      [0.90][0.90][2.6]  [0.90][5.4][6.6][7.6][13][0.91][1.9][2.5][3.3][7.6]
0.252628282731242424282963677081762626303646
 [1.6][3.4][4.6][5.8][11][1.3][1.3][2.2][6.5][11][0.93][0.93][3.7][6.1][15][1.2][1.2][3.3][5.1][8.8]
0.1252222222324212122212447567444542426262938
 [1.5][1.5][1.5][7.0][13][1.5][1.5][5.6][6.5][13][1.4][3.9][6.3][7.8][16][1.5][3.2][3.7][5.5][8.8]

We varied the inclusion of terms in the selection of P (sorted by either |P|, or by the descent condition), the manner of truncation of P, as well as varying the sparsity and the p-depth for each condition. The asterisk symbol (*) represents a run that did not converge, and the cross symbol (×) indicates runs that reached the maximum iterations (300).

We varied the inclusion of terms in the selection of P (sorted by either |P|, or by the descent condition), the manner of truncation of P, as well as varying the sparsity and the p-depth for each condition. The asterisk symbol (*) represents a run that did not converge, and the cross symbol (×) indicates runs that reached the maximum iterations (300). Many interesting trends emerge. First, there is a difference in application of the sparsification operator acting on elements according to their energy contribution or absolute value. Namely, when using large c for the absolute value, problems in the optimization can occur. These are instances where the search direction has little overlap with the gradient, and the largest term selected is ordered in such a way that it is not strictly increasing. However, the descent condition is able to converge across every configuration, albeit at different rates of convergence. This also leads to a stratification in the rates of convergence for the descent condition, which can be seen in Figure . While the absolute value condition leads to accelerated convergence in almost all instances, it does seem more sensitive to using restricted operators. Of course, the inclusion of previous terms does seem to alleviate this problem, and because it seems to occur when the search direction is nearly orthogonal to the gradient, resetting the optimization might allow the optimization to continue. It is also possible that the order of magnitude for the descent condition should be lower than the absolute value condition, although this could vary significantly based on the system.
Figure 2

Low threshold (∥2A∥ = 10–5) results for varying values of sparsity with different p-depths and acceptance criteria, utilizing the BFGS optimizer for molecular H4 at the intermediate distance (D = 1.5 Å), with the descent condition. In the top two plots, we use ACCEPT = True, where elements below the sparsity threshold are automatically included if they are in included in one of the previous p-terms. The bottom plot has ACCEPT = False. Only the first 125 iterations are shown.

Low threshold (∥2A∥ = 10–5) results for varying values of sparsity with different p-depths and acceptance criteria, utilizing the BFGS optimizer for molecular H4 at the intermediate distance (D = 1.5 Å), with the descent condition. In the top two plots, we use ACCEPT = True, where elements below the sparsity threshold are automatically included if they are in included in one of the previous p-terms. The bottom plot has ACCEPT = False. Only the first 125 iterations are shown. Second, the INCLUDE option has a strong impact on the rate of convergence. For both selection criteria, inclusion of previous terms clearly helps in assisting the overall convergence. As this can be considered as a way of increasing the pool of operators at each step, depending on both c and the p-depth, the advantage here makes sense. Third, the p-depth appears to have a 2-fold role. More generally, it serves to reduce the total number of terms needed in the ansatz. That is, as the p-depth increases, the number of total terms in the ansatz decreases. In addition, for a given c, we do see numerous instances where the total number of iterations decreases as the p-depth increases when INCLUDE = True. When INCLUDE = False, the trends are somewhat unclear, and the optimization has a tendency to be more sensitive. Interestingly, for c = 0.5, 0.25, and 0.125, with INCLUDE = False, the total iterations appear to increase as the p-depth increases, and then decrease. More sparse (see c = 0.9, 0.5) truncations result in slower convergence, which can be aided with the INCLUDE option, but not completely mitigated. While these results are obviously not generalizable to every system, it is likely that some of these trends can be seen elsewhere. We expect that the p-depth can lead to lower circuit depths. In addition, while sparser operators are desirable from a NISQ perspective, optimization, with respect to a single parameter, is clearly detrimental to the rate of convergence. This can be mitigated through expanding the pool directly with more terms and the sparsification operator, or indirectly through the p-depth.

Encoding-Free CQE

The encoding-free (or unencoded CQE) approaches for preparing states as an alternative to Fermionic state preparation have recently been explored,[33,43−45] within the VQE framework as well as in attempting to understand the success of heuristic and nonfermionic ansatz preparation. Recent works by the present authors showed that the Fermionic 2-RDM can be functionalized from a qubit-particle wave function.[46] As long as Fermionic tomography is performed on a N-qubit particle state, the 2-RDM represents a valid Fermionic 2-RDM. In addition, in recent work, we introduce an encoding-free CQE algorithm that evaluates the anti-Hermitian component of the two-qubit-particle contraction onto the Schrödinger equation. Table displays calculations for several bond distances of H5 for both the encoded and unencoded CQE cases with the BFGS algorithm as an example system.
Table 4

Comparisons of Total Iterations, Total CNOT Gate Count, and Average CNOT Gates Per Iteration for Unencoded and Encoded CQE, Using the BFGS Optimizer for H5 at Various Bond Distances from Equilibrium in the Minimal STO Basisa

DDeq (Å)iterations (CQE,UCQE)total CNOT × 104 (CQE,UCQE)⟨CNOTk⟩ × 102 (CQE,UCQE)
    
–0.2526, 331.7, 1.56.4, 4.5
+0.0029, 362.1, 1.87.4, 5.0
+0.2547, 493.8, 2.78.2, 5.5
+0.5040, 344.3, 2.611, 7.8
+0.7539, 285.7, 2.815, 9.9
+1.0042, 436.9, 4.416, 10.
+1.2547, 808.0, 7.417, 9.3

The accuracy of both approaches is largely similar across the dissociation curve, although, for +1.00 and +1.25 distance separation, the unencoded CQE requires more iterations.

The accuracy of both approaches is largely similar across the dissociation curve, although, for +1.00 and +1.25 distance separation, the unencoded CQE requires more iterations. The unencoded CQE under optimization matches the Fermionic case in most instances, and it consistently has a smaller average number of CNOT gates per iteration. For the two longest separation lengths, the number of iterations required does increase, leading to similar CNOT counts for the total ansatz. A future goal would be to incorporate compilation schemes or adjust the set of ACSE or unencoded ACSE excitations to favor a largely commuting pool.

Comparison with VQE

Finally, we compare the CQE approach utilizing a BFGS optimization with other known quantum algorithms. While in previous work,[16] similarities between iterative nature of the ACSE and ADAPT-VQE were discussed, here, we provide example calculations of VQE, ADAPT-VQE, and the ACSE that demonstrate fundamental differences in these algorithms. These are included in Figure , as well as in Table .
Figure 3

Simulations of molecular H4 and H6 for separations of 1 Å, and 2 Å utilizing the CQE, VQE, and ADAPT-VQE with a BFGS optimizer. For H4 a minimal ansatz is reached in the ADAPT through the VQE subroutine, and for H6 the CQE algorithm slightly outperforms the ADAPT. VQE, here using the UCCSD anastz, provides rapid convergence for near equilibrium states, but has significant errors at larger separations, which are overcome by the iterative ADAPT-VQE and CQE algorithms.

Table 5

Comparison of Gradient Element (VQE) and Residual Element (ACSE) Evaluations for the CQE, VQE, and ADAPT-VQE Methods, Corresponding to Simulations for Molecular H4 and H6 at Separations of 1 and 2 Åa

methodquantityH4, 1 ÅH4, 2 ÅH6, 1 ÅH6, 2 Å
CQEIterations13254660
 Residuals195037503864050400
VQEIterations719922
 Gradients18249410532574
ADAPT-VQEMacro10125184
 Micro7116712215854
 Gradients362122643112354988
 Residuals6607921683027720

Here, note tha the ADAPT-VQE has a symmetry-adapted pool of operators, which are not implemented here in the CQE or VQE approaches. The VQE tolerance is also taken to be quite low, i.e., 10–3 in the norm of the parameter vector (whereas the VQE subroutine in the ADAPT procedure is generally lower).

Simulations of molecular H4 and H6 for separations of 1 Å, and 2 Å utilizing the CQE, VQE, and ADAPT-VQE with a BFGS optimizer. For H4 a minimal ansatz is reached in the ADAPT through the VQE subroutine, and for H6 the CQE algorithm slightly outperforms the ADAPT. VQE, here using the UCCSD anastz, provides rapid convergence for near equilibrium states, but has significant errors at larger separations, which are overcome by the iterative ADAPT-VQE and CQE algorithms. Here, note tha the ADAPT-VQE has a symmetry-adapted pool of operators, which are not implemented here in the CQE or VQE approaches. The VQE tolerance is also taken to be quite low, i.e., 10–3 in the norm of the parameter vector (whereas the VQE subroutine in the ADAPT procedure is generally lower). While the VQE results in Figure are not that surprising, based on the use of the unitary coupled cluster ansatz, we still can see some interesting comparisons. For equilibrium distances, UCCSD provides a good ansatz, and there are numerous methods exploring the UCC ansatz.[31,47] By comparison, with ADAPT, we are able to obtain seemingly arbitrary convergence, matching previous work. However, note that the iterative cost of the ADAPT is much higher than either VQE or the CQE. The CQE, on the other hand, performs quite well in a variety of instances, with the most challenging case being dissociated H6, where higher-order excitations dominate and the system is strongly correlated. While the number of macro iterations for the ADAPT procedure might look only slightly worse than the CQE or VQE approaches, when taken into account with the VQE cost, (i.e., micro + macro iterations), the length of the ADAPT procedure is somewhat unwieldy, due to the VQE subroutine. The total number of gradient and residual evaluations for each of these instances is seen in Table . We can also look at qubit implementation of the ADAPT scheme, which here follows the qubit-particle excitation-based scheme of Yordanov et al.,[45] and not the quasi-particle approach taken by the original authors. We find a similar result to previous work, presented in Figure . Namely, for stretched LiH, conserving the particle number and projected spin leads to similar qubit-based excitations. The main difference seen (as a result of the relative scale mostly) in these results is for the CNOT cost of the unencoded CQE approach, although a similar decrease in the CNOT cost of the IQEB approach exists as well. While the approaches appear to be inversely related in the rate of convergence (through total iterations), and the CNOT count (where the CQE schemes are more costly), the number of parameter evaluations differs substantially. While here we do not distinguish between the residuals of the ACSE as parameters and the VQE parameters, since these can be obtained many different ways (for instance, the ACSE residuals can be taken from either the 2-RDM with a quantum solver or the 3-RDM on the quantum computer, and numerous methods of measuring VQE gradients exist as well), for larger system limits on the number of parameters should be considered.
Figure 4

Simulations of molecular lithium hydride at a separation of 2 Å utilizing CQE, VQE, Fermionic ADAPT-VQE and the iterative qubit-excitation based ADAPT-VQE (IQEB), essentially a unencoded APAPT-VQE algorithm. The upper-left shows the macro iterations of each scheme, which are nearly identical for the encoded and unencoded forms, and are only slightly slower for the ADAPT. The upper-right shows the total iterations, including micro iterations in the VQE procedure with a threshold of 10–3 in the parameter vector. The lower-left compares the energy convergence with the ansatz cost, showing that the ADAPT does produce a more compact representation of the ansatz. The lower-right shows the number of parameter evaluations (a lower bound on the number of gradients evaluated at each optimizer step), which is linear scaling with the ACSE procedure but for ADAPT becomes quadratic, with respect to the number of iterations.

Despite achieving quicker convergence, from all of these examples, we see that the main drawback to the CQE algorithm is the iteratively increasing CNOT cost, which is due to the use of additional gates at each step. While this should be reasonable for high performing quantum devices, for near-term devices, further reduction of the CNOT count is important. However, the gain in performance we see by performing a quasi-second order optimization in the local parameter space is quite significant. In addition, when compared to CNOT gates of the gradient-descent-based methods (as in Figure ), the optimized-ACSE allows for more flexibility in constructing the compact ansatz.

Discussion

While generic algorithms have been known for approaching the problem of quantum simulation for awhile, calculations involving increasingly complex systems have only recently begun to emerge. These require the advancement and development of new quantum algorithms, similar to the past century of classical quantum chemistry algorithms. The CQE offers an approach that is potentially applicable in the near-term, and provides a strong alternative to VQE. The present work improves the convergence of the CQE method for quantum simulation by introducing a quasi-second-order, local parametrization of the state space. While existing algorithms for solving the ACSE employ a local, first-order parametrization in that the state is iteratively updated by a series of gradient-based, unitary transformations,[4,5,8] here we present a local, quasi-second-order parametrization in which the state is iteratively updated by a series of unitary transformations that approximately incorporate energy-curvature information. Generally, this allows for more rapid convergence in the vicinity of the solution as well as more flexibility in the construction of CQE algorithms. The current work focusing on the BFGS optimizer[37,38] is an example of potential applications, and also is comparable to published work with other methods, but other optimizers and techniques can be readily implemented. The largest limitation when translating this work to NISQ devices is the presence of noise. Both sampling and device errors can negatively impact the state. These are present as errors in pure-state N-representability, since the state no longer represents a pure quantum state, or in misapplications of the target operator. Device errors are especially challenging since they corrupt the gate sequences underlying quantum simulation algorithms. Both additional sampling and noisy optimization techniques can be employed to mitigate the effects of noise. Because the CQE is iterative, it can also, in principle, adjust for some errors generated by noise in future iterations. Future work will further examine the effect of noise in the CQE algorithms. As the fidelity of quantum devices increases, the CQE will become more accurate and thereby, more applicable to larger molecular systems. From the discussion in section , we can see that the variational principle used in the VQE and in the current CQE algorithm are similar in that they solve an optimization problem, but differ in the goal of the minimization. In the VQE, we are often trying to minimize the energy of a state through a global parametrization of some wave function. In addition, barren plateaus or regions where the optimization fail become likely with an increasingly large parameter space and hence, the suitability (i.e., over or under parametrization) of the state is often in question. As an example, the UCCSD or generalized UCCSD ansatz, which both have approximately O(r4) parameters, are not sufficient to parametrize the state. Iterative constructions of the ansatz, such as the iterative generalized singles pair-cluster CC,[31] or the ADAPT-VQE, offer fewer parameters initially, but no bounds on the number of required parameters. In the ACSE algorithm, the local parametrization leads to a locally updated optimization, which is constant, with respect to the number of parameters. Importantly, in a VQE, the optimization converges toward a solution of the VQE problem, which is not the ACSE. As the VQE subproblem becomes larger and larger (i.e., in adaptive or iterative schemes), eventually the VQE solution can (but by design will not necessarily) satisfy the ACSE. In the CQE approach presented here, we have convergence toward our contracted eigenvalue problem, and not a variational subproblem. Simulations of molecular lithium hydride at a separation of 2 Å utilizing CQE, VQE, Fermionic ADAPT-VQE and the iterative qubit-excitation based ADAPT-VQE (IQEB), essentially a unencoded APAPT-VQE algorithm. The upper-left shows the macro iterations of each scheme, which are nearly identical for the encoded and unencoded forms, and are only slightly slower for the ADAPT. The upper-right shows the total iterations, including micro iterations in the VQE procedure with a threshold of 10–3 in the parameter vector. The lower-left compares the energy convergence with the ansatz cost, showing that the ADAPT does produce a more compact representation of the ansatz. The lower-right shows the number of parameter evaluations (a lower bound on the number of gradients evaluated at each optimizer step), which is linear scaling with the ACSE procedure but for ADAPT becomes quadratic, with respect to the number of iterations. From our calculations, we can also understand some elements of the ADAPT-VQE algorithm, since it is related to the CQE algorithm. The ADAPT-VQE method chooses the largest ACSE residual at each macro iteration. This leads to a flexible and efficient ansatz, that, when not restarted, will, by construction, improve the energy in the VQE. However, the reoptimized state is often not close to the previous state, highlighting the strong variational nature of the ADAPT algorithm. Restarting the VQE optimization, which has been done in some ADAPT work, can lead to a suboptimal solution of the ACSE, or for the VQE subroutine to fail. Recent work by Liu et al.[48] used a reconstructed 3-RDM in the ACSE to obtain approximate residuals to seed the ADAPT algorithm. Because these approximate gradients can differ in a substantial way from the exact gradients, more terms are needed, which significantly increases the variational flexibility of the ansatz. As a result, these calculations exhibited faster convergence and required fewer iterations than traditional ADAPT-VQE. However, unlike Liu et al.,[48] it is important not to use ACSE residuals from reconstructed 3-RDMs in the CQE, because these will likely lead to convergence issues. The 3-RDM can still be directly obtained on a quantum computer through a variety of techniques. Work involving qubit-particle approaches also shows promise, with another advantage being the increased number of commuting terms that exists between qubit-particle excitations, as opposed to Fermionic-particle excitations. In terms of memory storage, the CQE primarily stores the vectors of eq . The set of vectors is composed of l 2-RDMs, or O(lr4) vectors at most, although, as seen in the preceding discussion, each vector can be taken to be quite sparse. The optimization occurs over the dimension of , which here is O(r4) and not unreasonable for highly accurate electronic structure methods. This is equivalent to the number of parameters in a UCC generalized singles and doubles variational ansatz. However, the gradient evaluations in VQE and the CQE are quite different. For the ACSE, the gradients are evaluated at worst case using the O(r6) 3-RDM, and which can be measured more efficiently through circuit tomography. In contrast, VQE gradients scale with the number of parameters and the energy evaluation, which, in some instances, is O(r8). The ADAPT-VQE is difficult to analyze in terms of the number of parameters required, but because each iteration involves an evaluation of the entire pool (which can range from a minimal set to an entire two-body pool), in the standard Fermionic application the total number of gradient evaluations are strictly worse than the CQE. Another element that can be overlooked is that the ACSE is not necessarily equivalent to the CSE, except when all higher-order excitations are included. Despite this limitation, in the exponential form of the ACSE, higher-order excitations can be seen to emerge naturally through products of exponential two-body operations. In addition, by considering information on the curvature of the space beyond the gradient, we also should include contributions from triple and higher excitations in our selection of operators to propagate the wave function. In practice, the use of the CQE for solving the ACSE leads to a highly accurate solution. The primary drawback of the CQE when compared to an algorithm such as the ADAPT-VQE is the large number of CNOT gates. Even with low-error CNOT gates, efficient and noise-robust means of obtaining accurate gradients and 2-RDMs will be necessary. All of these also affect the success or failure of the underlying optimization algorithm, and so exploring noise-tolerant approaches will also be critical for near-term applications.

Conclusion

In this work, we address the convergence of the contracted quantum eigensolver using tools from optimization theory. By using methods beyond traditional gradient descent, we achieve superlinear convergence, allowing us to propagate the wave function rapidly toward a solution of the ACSE. Practical implementations where the search direction is modified to conserve quantum resources show promising reductions in the cost of the algorithm, and we expect further simplification schemes to be attempted aimed at improving the efficiency of the CQE approach. In addition, the present work provides a basis for understanding approaches that use the ACSE in pool selection, and could lead to further hybrid optimization schemes for use in NISQ applications.
  18 in total

1.  Enhanced computational efficiency in the direct determination of the two-electron reduced density matrix from the anti-Hermitian contracted Schrödinger equation with application to ground and excited states of conjugated π-systems.

Authors:  Andrew M Sand; David A Mazziotti
Journal:  J Chem Phys       Date:  2015-10-07       Impact factor: 3.488

2.  Anti-hermitian contracted schrödinger equation: direct determination of the two-electron reduced density matrices of many-electron molecules.

Authors:  David A Mazziotti
Journal:  Phys Rev Lett       Date:  2006-10-06       Impact factor: 9.161

3.  Quantum Solver of Contracted Eigenvalue Equations for Scalable Molecular Simulations on Quantum Computing Devices.

Authors:  Scott E Smart; David A Mazziotti
Journal:  Phys Rev Lett       Date:  2021-02-19       Impact factor: 9.161

4.  Qubit Coupled Cluster Method: A Systematic Approach to Quantum Chemistry on a Quantum Computer.

Authors:  Ilya G Ryabinkin; Tzu-Ching Yen; Scott N Genin; Artur F Izmaylov
Journal:  J Chem Theory Comput       Date:  2018-11-29       Impact factor: 6.006

5.  An efficient adaptive variational quantum solver of the Schrödinger equation based on reduced density matrices.

Authors:  Jie Liu; Zhenyu Li; Jinlong Yang
Journal:  J Chem Phys       Date:  2021-06-28       Impact factor: 3.488

6.  Strongly correlated mechanisms of a photoexcited radical reaction from the anti-Hermitian contracted Schrödinger equation.

Authors:  Jonathan J Foley; Adam E Rothman; David A Mazziotti
Journal:  J Chem Phys       Date:  2011-01-21       Impact factor: 3.488

7.  Compressing Many-Body Fermion Operators under Unitary Constraints.

Authors:  Nicholas C Rubin; Joonho Lee; Ryan Babbush
Journal:  J Chem Theory Comput       Date:  2022-02-15       Impact factor: 6.006

8.  Low-Depth Unitary Coupled Cluster Theory for Quantum Computation.

Authors:  Jia Chen; Hai-Ping Cheng; J K Freericks
Journal:  J Chem Theory Comput       Date:  2022-03-22       Impact factor: 6.006

9.  Barren plateaus in quantum neural network training landscapes.

Authors:  Jarrod R McClean; Sergio Boixo; Vadim N Smelyanskiy; Ryan Babbush; Hartmut Neven
Journal:  Nat Commun       Date:  2018-11-16       Impact factor: 14.919

10.  An adaptive variational algorithm for exact molecular simulations on a quantum computer.

Authors:  Harper R Grimsley; Sophia E Economou; Edwin Barnes; Nicholas J Mayhall
Journal:  Nat Commun       Date:  2019-07-08       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.