Scott E Smart1, David A Mazziotti1. 1. Department of Chemistry and The James Franck Institute, The University of Chicago, Chicago, Illinois 60637, United States.
Abstract
A contracted quantum eigensolver (CQE) finds a solution to the many-electron Schrödinger equation by solving its integration (or contraction) to the two-electron space─a contracted Schrödinger equation (CSE)─on a quantum computer. When applied to the anti-Hermitian part of the CSE (ACSE), the CQE iterations optimize the wave function, with respect to a general product ansatz of two-body exponential unitary transformations that can exactly solve the Schrödinger equation. In this work, we accelerate the convergence of the CQE and its wave function ansatz via tools from classical optimization theory. By treating the CQE algorithm as an optimization in a local parameter space, we can apply quasi-second-order optimization techniques, such as quasi-Newton approaches or nonlinear conjugate gradient approaches. Practically, these algorithms result in superlinear convergence of the wave function to a solution of the ACSE. Convergence acceleration is important because it can both minimize the accumulation of noise on near-term intermediate-scale quantum (NISQ) computers and achieve highly accurate solutions on future fault-tolerant quantum devices. We demonstrate the algorithm, as well as some heuristic implementations relevant for cost-reduction considerations, comparisons with other common methods such as variational quantum eigensolvers, and a Fermionic-encoding-free form of the CQE.
A contracted quantum eigensolver (CQE) finds a solution to the many-electron Schrödinger equation by solving its integration (or contraction) to the two-electron space─a contracted Schrödinger equation (CSE)─on a quantum computer. When applied to the anti-Hermitian part of the CSE (ACSE), the CQE iterations optimize the wave function, with respect to a general product ansatz of two-body exponential unitary transformations that can exactly solve the Schrödinger equation. In this work, we accelerate the convergence of the CQE and its wave function ansatz via tools from classical optimization theory. By treating the CQE algorithm as an optimization in a local parameter space, we can apply quasi-second-order optimization techniques, such as quasi-Newton approaches or nonlinear conjugate gradient approaches. Practically, these algorithms result in superlinear convergence of the wave function to a solution of the ACSE. Convergence acceleration is important because it can both minimize the accumulation of noise on near-term intermediate-scale quantum (NISQ) computers and achieve highly accurate solutions on future fault-tolerant quantum devices. We demonstrate the algorithm, as well as some heuristic implementations relevant for cost-reduction considerations, comparisons with other common methods such as variational quantum eigensolvers, and a Fermionic-encoding-free form of the CQE.
The contracted Schrödinger
equation (CSE)[1−3] describes the
projection of the molecular Schrödinger equation for a N-electron system onto a two-electron space, which generates
the stationary-state condition of the two-electron reduced density
matrix (2-RDM), instead of the wave function. Satisfaction of the
anti-Hermitian part of the contacted Schrödinger equation (ACSE)
by a quantum state is equivalent to its invariance, with respect to
all infinitesimal two-body unitary transformations.[4−6] Solution of
the ACSE for the 2-RDM was initially challenging because the equation
is dependent on both the 2-RDM and three-electron RDM (3-RDM), making
it indeterminate without additional information. However, the ACSE
has been practically solved by reconstructing the 3-RDM by its cumulant
expansion as a functional of the 2-RDM.[4,5,7,8] While the ACSE does
not strictly imply the CSE,[9] the accuracy
of its solution has been observed to approach that of full configuration
interaction in the absence of reconstruction approximations.[4,10−12] The ACSE has been applied to computing strongly correlated
ground and excited states in both chemical reactions and conical intersections.[12−15]Recently, the ACSE has been solved on quantum devices with
applications
to hydrogen chains, as well as the benzyne isomers.[10,16,17] On a quantum computer, the ACSE algorithm,
known as a contracted quantum eigensolver (CQE), iteratively minimizes
the residual of the ACSE, in contrast to the variational quantum eigensolvers
(VQE) that minimize the energy, with respect to parameters according
to the Rayleigh-Ritz variational principle. Instead of propagating
only the 2-RDM as in the classical algorithm, we propagate a wave
function through state preparation on a quantum computer. Thus, we
avoid reconstructed RDMs, as the 2-RDM can be directly measured from
the quantum state while the ACSE residual can be directly measured
from an auxiliary quantum state, or a measured 3-RDM. The resulting
ACSE algorithm is a potentially exact RDM approach that scales polynomially
in the size of the molecular system.In this paper, we accelerate
the convergence of the CQE for the
ACSE by developing quasi-second-order algorithms with superlinear
convergence. Convergence acceleration is important for avoiding the
accumulation of noise on near-term intermediate-scale quantum (NISQ)
computers, as well as achieving highly accurate solutions on future
fault-tolerant quantum devices. We draw upon research on optimization
algorithms on manifolds,[18−20] which have applications across
science and engineering from vision to robotics, as well as related
algorithms in electronic structure for orbital optimization.[21,22] We specifically develop and implement a quasi-Newton scheme with
the Broyden–Fletcher–Goldfarb–Shanno update and
nonlinear conjugate gradient algorithms. The quasi-second-order algorithms
avoid storage of the Hessian matrix while providing superlinear convergence.
We also demonstrate convergence properties as well as some approximate
implementations of the search direction and finally compare the resulting
CQE algorithms with classes of common quantum algorithms, including
the variational quantum eigensolver.
THEORY
We discuss the solution of the ACSE via CQE in section , the local parametrization
of the wave function in section , the quasi-second-order accelerations in section , and resource
optimization in section .
Solution of the ACSE
Given a molecular
system with Hamiltonian ,
we can write a contraction of the Schrödinger
equation onto the two-particle space, known as the contracted Schrödinger
equation (CSE).[1,3]The CSE
can be split into a Hermitian
and an anti-Hermitian part, the latter of which is called the anti-Hermitian
CSE, or ACSE:[4,5]Here, is the residual of the ACSE, which is necessarily
zero when |Ψ⟩ is an eigenstate of the wave function.
We can also obtain the ACSE in eq by considering unitary transformations generated by a parameter ϵ
and a
two-body anti-Hermitian operator :The derivative of the energy, with
respect to the elements of the
operator 2P, yieldswhich we observe is equal
to the residual of the ACSE. In the solution of the ACSE,[5] the energy and 2-RDM can be expressed as a system
of differential equations, in terms of a discretized, timelike parameter
λ that controls the transformation of the implicit wave function
to minimize the energy. As λ increases, we approach a solution
of the ACSE. On a quantum computer,[10] we
have a potentially exponential advantage, in terms of simulating the
exact 2-RDM (or 3-RDM) and 2A matrices.
Local Parameterization of the Contracted
Quantum Eigensolver
We can describe the generic problem in
the variational quantum eigensolver[23−26] for finding the ground-state
wave function of a quantum system aswhere represents a vector of ν
real parameters
and we assume the wave function is properly normalized. Upon convergence,
the following equation is satisfied for all k,indicating that the gradient, with
respect
to all parameters, vanishes. However, there can be significant problems
associated with describing the appropriate parametrization of Ψ.
The exact solution of the wave function scales exponentially, which
might imply that an exponential number of parameters is necessary
in a variational scheme. Using an operator such as unitary coupled
cluster provides an exponential ansatz, but because the mapping from
the Euclidean space to the unitary space is nonlinear, there can be
singularities or unphysical minima in the parameter space.[18,27,28] In addition, it has been shown
that high-dimensional parametrizations in a random variational ansatz
generate barren plateaus where the variance in the energy gradients
vanishes as the system size increases.[29,30] Because an
exponential scaling parametrization is not feasible for larger systems,
and limited excitation ansatz such as UCC singles and doubles are
not sufficiently accurate, a slew of iterative schemes based on the
VQE and UCC schemes that deviate from the traditional CC formalism
have been proposed, providing scalable approaches that generally repeat
or extend upon certain ansatz fragments.[31−33] One such approach
is the adaptive derivative assembled pseudotrotterized VQE method,
or ADAPT-VQE, which takes elements of the ACSE (or generalized UCCSD)
to generate an increasingly more-complex variational problem.[34]In the CQE approach, we instead forego
the global parametrization of the state and use an atlas of local
parametrizations,[18] describing the trajectory
of the state, with the parameter space being dependent on the contracted
eigenvalue equation (which, here, is the ACSE). The term “local”
here refers to the fact that our parameters are describing information
solely in a small neighborhood around the state at the current iteration
and do not contain parameters from previous iterations. Each local
parametrization in the atlas is concretely generated by the exponential
transformation of a two-body anti-Hermitian operator, providing a
map between the Euclidean parameter space and the space of unitary
transformations. Importantly, this compact mapping avoids oddities
or singularities that can arise from a nonlinear mapping. Thus, the
optimization is no longer defined by a fixed reference set of parameters
but rather by a local parametrization at each iteration n with the of eq :where μ denotes the dimension
of the
two-body operator space, and the norm on P indicates that we are staying within a neighborhood
around the wave function (replacing the ϵ in previous formulations).
Our optimization is then satisfied if we have a Ψ such that, for all two-body operators,which implies that we have fulfilled the ACSE.
At each iteration n, our current state is mapped
to a new state through the exponential mapping. On the classical computer,
the wave function is replaced by RDMs, and the transformation requires
an approximate reconstruction of the 3-RDM, in terms of the 2-RDM.
On a quantum computer, the transformation can be realized through
standard means of exponential computations, where we transform the
Fermionic operator to the Pauli basis and implement the exponential,
in terms of elementary gates through its trotterized form.While
it might be thought that the restriction of to a set of two-body operators
is too restrictive,
as clearly the two-body operator space does not locally parametrize
the unitary group, the current approach iteratively constructs higher-order
excitations from the reference wave function[5,9,35] (see section II.E. of ref (5) for a discussion of the
ACSE ansatz). As we discuss in the next section, because the ACSE
is solved iteratively, we can construct quasi-second-order algorithms
if we choose each in
the ansatz by considering not only the
gradient of the current iteration but also the gradients of previous
iterations, which contain information about the curvature.
Quasi-Second-Order CQE Algorithms
Previous algorithms
for solving the ACSE use path-following[5] or descent[8] algorithms
based on the gradient. While these algorithms are robust to reconstruction
errors, gradient-descent algorithms are first-order algorithms with
generally linear convergence. To accelerate convergence, we can consider
choosing the search direction 2P by a second-order approach such as the Newton–Raphson
method:where H is the Hessian
matrix. Within a certain region of the state
space, we are guaranteed quadratic convergence.[36] However, the elements of the Hessian are evaluated according
towhich requires the 4-RDM
or its approximation.To address this issue, we consider the
BFGS quasi-Newton method, which uses the Broyden–Fletcher–Goldfarb–Shanno
(BFGS) update within Davidon’s method[37,38] and is summarized in Table . At each step of the BFGS method, we update an approximate
Hessian matrix through a secant equation where the update is designed
to keep the Hessian positive definite. By including a direction based
on the approximate Hessian, the BFGS method achieves a superlinear
rate of convergence near the solution.
Table 1
Quasi-Newton
CQE Algorithm
Quasi-Newton CQE
Set 0 → n
Initialize |Ψ0⟩, 2A0, and B0
Continue until ∥2An∥ < δ:
Step 1: Update 2Pn = Bn–12An
Step 2: minαnEn(αnP̂)
◦ |Ψn+1⟩ = eαnP̂n|Ψn⟩
◦ sn = αn2Pn
Step 3: Evaluate 2An+1
◦ yn = 2An+1 – 2An
Step 4: Calculate Bn+1–1
Step 5:n + 1
→ n
At any particular iteration, given a wave function
|Ψ⟩, the ACSE residual , and the inverse of the
approximate Hessian B, we define the step direction
asWe next minimize the energy
by a line search to obtain a direction
α that satisfies the conditions
of sufficient descent and curvature (Wolfe conditions). In our local
frame, we have the following auxiliary BFGS functions: s = and y = . Using these, we calculate , according
to the BFGS formula:We then increase n and continue until the gradient
norm ∥2A∥ satisfies a convergence
threshold. Step 4 in Table can be replaced with a suitable update replacement, and,
in the results, we demonstrate the use of a limited-memory BFGS implementation,
denoted l-BFGS.[38]In practice, because of the symmetry of 2-RDM elements, we can
store the 2A matrix as a vector in a compact
representation, and then store the B–1 matrix exactly. For larger systems, it is likely that even this
would be prohibitive, and instead the limited-memory approach would
be necessary, where we store only k previous steps,
which is equivalent to k 2-RDMs. On a classical device,
the algorithm would be very similar, although instead of updating
the wave function, we would have a 2-RDM update step. Both the 2-RDM
and 2A updates would require the classical
evaluation of the ACSE and hence, a reconstructed 3-RDM.On
a quantum computer, we have significant advantages in that we
are not reconstructing the 3-RDM. This means that we generate (up
to statistical and noise-related errors) pure 2-RDMs at each step,
and we do not have to consider the N-representability
of the 2-RDM, or the step size in the solution of the differential
equations. An important note is that if we want to evaluate the ACSE’s
residual in the classical part of the algorithm rather than in the
quantum part by tomography of an auxiliary state, we must be sure
that the residual is sufficiently accurate to estimate the curvature.
While an approximated 3-RDM can give good enough information to obtain
chemical accuracy in many instances, for rigorous convergence, the
cumulant portion of the 3-RDM should be measured by tomography. Practically,
this would entail alternating evaluations of the 3-RDM (for the ACSE)
and the 2-RDM (for energy evaluations).As an alternative to
the quasi-Newton approaches, we can instead
use the nonlinear conjugate gradient (CG) approaches.[38] The nonlinear CG method does not require the storage of
a Hessian or approximate Hessian, and instead involves only a simple
update step governing the contribution of the previous search direction . A description of the
generic CQE algorithm
with a CG solver is described in Table . There are numerous modifications to the conjugate
gradient method, which we do not explore here.[39] Such modifications include preconditioning schemes and
modified update coefficients, as well as additional criteria on resets
and step lengths.
Table 2
Conjugate Gradient CQE Algorithm
Conjugate Gradient CQE
Set 0 ← n
Initialize |Ψ0⟩, 2A0
While ∥2An∥ > δ:
Step 2: minαnEn(αnP̂)
Step 3: Evaluate 2An+1
Step 4: Calculate
βn+1
Step 5:2Pn+1 = −2An+1 + βn+12Pn
Step 6:n + 1 → n
Resource-Optimized CQE
Search Directions
In light of the current representation
of the CQE algorithm, one
can see that the resources that are demanded on the quantum computer
will be heavily dependent on the selection and implementation on the
search direction, as each term must be implemented individually. However,
any modification to the search direction will be detrimental to the
rate of convergence (and can potentially also negate the theoretical
results). Thus, we would like to find a tradeoff between potentially
reducing the number of terms and preserving a descent direction.[38]Therefore, we focus on two approximations
that still preserve the essential nature of the ideal CQE approach: p-depth and operator sparsification. First, we introduce
a more formal way of describing the ACSE ansatz at a given iteration n. Let be an ordered set of
anti-Hermitian two-body
operators:In the exact CQE approach, each iteration adds
a new two-body operator
to the set, and these collectively are the stored ansatz vectors.
Using this, we can write the ACSE ansatz at the Nth iteration aswhere m is the total
number
of two-body exponential operators we are implementing. We define the p-depth as follows. Given a search direction , we iterate over the
elements and the set
of operators from l = m – p to l = m. If an element was included
in a previous operator, we
update that two-body operator asFrom the definition, the element
will only be in one of the p-previous operators.
If elements cannot be assigned to
a previous , a new operator is appended. The ACSE only
provides information on the gradient around the exterior of the wave
function, and because, generally, each iteration does not commute
with previous iterations for p > 0, this can be
considered
an approximate scheme of implementing the CQE. As an example, if all
terms are included in the initial , then we have a single exponential form
in the approximate linear region, and any p-depth
greater than 1 will be equivalent. Note that such a case is similar
to a generalized UCCSD ansatz with different ordering under the first-order
trotterization; however, the method of updating does not reflect the
true gradient terms and, as such, provides an approximation of the
operator in the linear region.The second approach is an operator
sparsification scheme, which
effectively reduces the number of new terms appended at each step.
The method of ordering is of particular importance, and we investigate
two options. First, we can sort the elements according to their absolute
value, which is important for implementation purposes (i.e., we can
remove the smallest elements first). Alternatively, we can sort elements
according to the energy contribution in the descent direction. This
value is obtained element-wise as the product . For gradient-descent approaches, these
two criteria are equivalent, and, in previous work, only the former
approach was used.We can also control the number of terms that
are removed through
a constant c ∈ [0, 1]. This is a constant
scaling factor where c = 1 (strict truncation) indicates
only the largest term in is
included and c = 0
(no truncation) indicates that all terms in are included. Another potentially
useful
control instead of a constant scaling factor (which we do not explore
here) would be to limit explicitly the number of terms included in
each term. The last approximation is a boolean INCLUDE = {True, False}
option, whereby elements of the search direction that would be assigned
to the previous p-operators due to the p-depth specified are included in the sparsification scheme. The controls
here can be likened to techniques used in trotterization or compact
forms of the two-body operator, with a goal of efficiently implementing
the two-body ansatz.[40−42] They can be applied broadly to any optimizer instance,
and could have different effects with different optimizers.
APPLICATIONS AND RESULTS
In this section, we look at
applications of the optimized CQE scheme.
First, we investigate the role of different optimizers. Second, We
present discuss schemes related to practical implementations and modifications
to the search direction. Third, we show results for linear hydrogen
chains and lithium hydride with a CQE utilizing the unencoded ACSE,
and finally, we compare our results to variational quantum eigensolvers,
including the ADAPT-VQE algorithm.
Implementation of Optimized
CQE
We begin by comparing the exact (up to a first-order
trotterization)
implementations of different methods for the H4 system
at different bond lengths, corresponding with differing degrees of
electron correlation. In this work, our convergence criteria is typically
taken to be the Frobenius norm of the 2A matrix. For the conjugate gradient approach, we do not utilize any
preconditioning, and we use the update strategies of Fletcher and
Reeves.[38] The limited BFGS strategy utilizes
three previously stored steps. Figure displays our results. Further computational details
are included in the Appendix.
Figure 1
Comparison of methods
for generating at
different H–H distances for linear
H4. Near equilibrium, similar patterns can be seen, but
away from equilibrium, gradient-descent-based approaches slow down
considerably. In particular, when optimizing the 1-d step size, according
to a quadratic fit, an oscillatory pattern can be seen in the norm
of 2A, indicating a potential valley in
a direction between the oscillating gradients. The conjugate gradient
approach allows for slightly more flexibility but exhibits stronger
oscillations in the gradient. The quasi-Newton approaches offer quick
convergence across all regions of dissociation.
Comparison of methods
for generating at
different H–H distances for linear
H4. Near equilibrium, similar patterns can be seen, but
away from equilibrium, gradient-descent-based approaches slow down
considerably. In particular, when optimizing the 1-d step size, according
to a quadratic fit, an oscillatory pattern can be seen in the norm
of 2A, indicating a potential valley in
a direction between the oscillating gradients. The conjugate gradient
approach allows for slightly more flexibility but exhibits stronger
oscillations in the gradient. The quasi-Newton approaches offer quick
convergence across all regions of dissociation.While for the bond distance near equilibrium D =
1 Å, the different optimizers in the ACSE appear to have
no apparent advantage, the more correlated distances show strong deviations
between the different approaches. Particularly, in the 1.5 Å
case, the convergence flattens when gradient-descent-based approaches
are used. For the 2.0 Å case, this is accentuated, as we appear
to have entered a region where the gradient is quite shallow. The
quadratic step appears to be even slightly worse in optimizing the
norm of the 2A matrix than a simple gradient
descent. The conjugate gradient approach appears to be better but
shows strong oscillations in the gradient norm (note if we loosen
the update parameter, these oscillations decrease, but we do not observe
a significant increase in convergence). The most successful approaches
are the quasi-Newton methods, which are able to achieve high accuracy
results in only a few iterations. The l-BFGS offers
a reliable approach as well, with a quality between the conjugate
gradient and full BFGS methods.While this approach is the closest
to the ideal implementation
and offers some advantages for a classical quantum simulation (i.e.,
similar to the classical ACSE, except instead of the 2-RDM, we only
need to store the state vector), for near-term applications, there
are several constraints. In a noiseless regime (note that the importance
of noise is also relevant, but, since this also changes the optimization
strategy, it is not addressed directly here), addressing the compactness
or efficiency is an important problem. In particular, we would however
like to know if we can reduce the amount of terms that are added at
each iteration, using the ideas mentioned above. These result in an
appproximate search direction, which we constrain to represent a descent
direction.We use the 1.5 Å case, which contains nontrivial
electron
correlation and starts to differentiate between different optimizers,
to look at ways that we can modify the search direction. Table explores the number
of iterations and CNOT gate cost for a variety of options with the
BFGS optimizer. In particular, we examine the absolute norm or energy
contribution for the sparsity operator acting on the search direction,
as well as the inclusion or exclusion of terms that appear in the p-depth addition scheme (specified by include). For each
of these criteria, we look at different p-depths
and values of the sparse scheme.
Table 3
Reported Number of
Iterations and
CNOT Gates (in brackets, × 103) for Approximate Implementation
Schemes for the CQE with BFGS Optimizer with Linear H4 at
a Distance of 1.5 Åa
|Pnij;kl|
(AnPn)klij
INCLUDE
= False
INCLUDE
= True
INCLUDE
= False
INCLUDE
= True
sparse[c]
p-depth
p-depth
p-depth
p-depth
9
7
5
3
1
9
7
5
3
1
9
7
5
3
1
9
7
5
3
1
0.9
*
*
*
*
*
76
73
*
65
117
×
×
283
×
×
47
48
60
80
107
[1.8]
[2.5]
[4.0]
[8.9]
[18]
[1.6]
[2.2]
[4.0]
[4.0]
[8.1]
0.5
*
*
*
*
*
25
25
42
*
*
71
139
123
119
75
26
35
35
39
63
[0.90]
[0.90]
[2.6]
[0.90]
[5.4]
[6.6]
[7.6]
[13]
[0.91]
[1.9]
[2.5]
[3.3]
[7.6]
0.25
26
28
28
27
31
24
24
24
28
29
63
67
70
81
76
26
26
30
36
46
[1.6]
[3.4]
[4.6]
[5.8]
[11]
[1.3]
[1.3]
[2.2]
[6.5]
[11]
[0.93]
[0.93]
[3.7]
[6.1]
[15]
[1.2]
[1.2]
[3.3]
[5.1]
[8.8]
0.125
22
22
22
23
24
21
21
22
21
24
47
56
74
44
54
24
26
26
29
38
[1.5]
[1.5]
[1.5]
[7.0]
[13]
[1.5]
[1.5]
[5.6]
[6.5]
[13]
[1.4]
[3.9]
[6.3]
[7.8]
[16]
[1.5]
[3.2]
[3.7]
[5.5]
[8.8]
We varied the
inclusion of terms
in the selection of P (sorted by either |P|, or
by the descent condition), the manner of truncation of P, as well as varying the sparsity and the p-depth
for each condition. The asterisk symbol (*) represents a run that
did not converge, and the cross symbol (×) indicates runs that
reached the maximum iterations (300).
We varied the
inclusion of terms
in the selection of P (sorted by either |P|, or
by the descent condition), the manner of truncation of P, as well as varying the sparsity and the p-depth
for each condition. The asterisk symbol (*) represents a run that
did not converge, and the cross symbol (×) indicates runs that
reached the maximum iterations (300).Many interesting trends emerge. First, there is a
difference in
application of the sparsification operator acting on elements according
to their energy contribution or absolute value. Namely, when using
large c for the absolute value, problems in the optimization
can occur. These are instances where the search direction has little
overlap with the gradient, and the largest term selected is ordered
in such a way that it is not strictly increasing. However, the descent
condition is able to converge across every configuration, albeit at
different rates of convergence. This also leads to a stratification
in the rates of convergence for the descent condition, which can be
seen in Figure . While
the absolute value condition leads to accelerated convergence in almost
all instances, it does seem more sensitive to using restricted operators.
Of course, the inclusion of previous terms does seem to alleviate
this problem, and because it seems to occur when the search direction
is nearly orthogonal to the gradient, resetting the optimization might
allow the optimization to continue. It is also possible that the order
of magnitude for the descent condition should be lower than the absolute
value condition, although this could vary significantly based on the
system.
Figure 2
Low threshold (∥2A∥ =
10–5) results for varying values of sparsity with
different p-depths and acceptance criteria, utilizing
the BFGS optimizer for molecular H4 at the intermediate
distance (D = 1.5 Å), with the descent condition.
In the top two plots, we use ACCEPT = True, where elements below the
sparsity threshold are automatically included if they are in included
in one of the previous p-terms. The bottom plot has
ACCEPT = False. Only the first 125 iterations are shown.
Low threshold (∥2A∥ =
10–5) results for varying values of sparsity with
different p-depths and acceptance criteria, utilizing
the BFGS optimizer for molecular H4 at the intermediate
distance (D = 1.5 Å), with the descent condition.
In the top two plots, we use ACCEPT = True, where elements below the
sparsity threshold are automatically included if they are in included
in one of the previous p-terms. The bottom plot has
ACCEPT = False. Only the first 125 iterations are shown.Second, the INCLUDE option has a strong impact on the rate
of convergence.
For both selection criteria, inclusion of previous terms clearly helps
in assisting the overall convergence. As this can be considered as
a way of increasing the pool of operators at each step, depending
on both c and the p-depth, the advantage
here makes sense. Third, the p-depth appears to have
a 2-fold role. More generally, it serves to reduce the total number
of terms needed in the ansatz. That is, as the p-depth
increases, the number of total terms in the ansatz decreases. In addition,
for a given c, we do see numerous instances where
the total number of iterations decreases as the p-depth increases when INCLUDE = True. When INCLUDE = False, the trends
are somewhat unclear, and the optimization has a tendency to be more
sensitive. Interestingly, for c = 0.5, 0.25, and
0.125, with INCLUDE = False, the total iterations appear to increase
as the p-depth increases, and then decrease. More
sparse (see c = 0.9, 0.5) truncations result in slower
convergence, which can be aided with the INCLUDE option, but not completely
mitigated.While these results are obviously not generalizable
to every system,
it is likely that some of these trends can be seen elsewhere. We expect
that the p-depth can lead to lower circuit depths.
In addition, while sparser operators are desirable from a NISQ perspective,
optimization, with respect to a single parameter, is clearly detrimental
to the rate of convergence. This can be mitigated through expanding
the pool directly with more terms and the sparsification operator,
or indirectly through the p-depth.
Encoding-Free CQE
The encoding-free
(or unencoded CQE) approaches for preparing states as an alternative
to Fermionic state preparation have recently been explored,[33,43−45] within the VQE framework as well as in attempting
to understand the success of heuristic and nonfermionic ansatz preparation.
Recent works by the present authors showed that the Fermionic 2-RDM
can be functionalized from a qubit-particle wave function.[46] As long as Fermionic tomography is performed
on a N-qubit particle state, the 2-RDM represents
a valid Fermionic 2-RDM. In addition, in recent work, we introduce
an encoding-free CQE algorithm that evaluates the anti-Hermitian component
of the two-qubit-particle contraction onto the Schrödinger
equation. Table displays
calculations for several bond distances of H5 for both
the encoded and unencoded CQE cases with the BFGS algorithm as an
example system.
Table 4
Comparisons of Total Iterations, Total
CNOT Gate Count, and Average CNOT Gates Per Iteration for Unencoded
and Encoded CQE, Using the BFGS Optimizer for H5 at Various
Bond Distances from Equilibrium in the Minimal STO Basisa
D – Deq (Å)
iterations (CQE,UCQE)
total CNOT × 104 (CQE,UCQE)
⟨CNOTk⟩
× 102 (CQE,UCQE)
–0.25
26, 33
1.7, 1.5
6.4, 4.5
+0.00
29, 36
2.1, 1.8
7.4, 5.0
+0.25
47, 49
3.8, 2.7
8.2, 5.5
+0.50
40, 34
4.3, 2.6
11, 7.8
+0.75
39, 28
5.7, 2.8
15, 9.9
+1.00
42, 43
6.9, 4.4
16, 10.
+1.25
47, 80
8.0, 7.4
17, 9.3
The accuracy of both approaches
is largely similar across the dissociation curve, although, for +1.00
and +1.25 distance separation, the unencoded CQE requires more iterations.
The accuracy of both approaches
is largely similar across the dissociation curve, although, for +1.00
and +1.25 distance separation, the unencoded CQE requires more iterations.The unencoded CQE under optimization
matches the Fermionic case
in most instances, and it consistently has a smaller average number
of CNOT gates per iteration. For the two longest separation lengths,
the number of iterations required does increase, leading to similar
CNOT counts for the total ansatz. A future goal would be to incorporate
compilation schemes or adjust the set of ACSE or unencoded ACSE excitations
to favor a largely commuting pool.
Comparison
with VQE
Finally, we
compare the CQE approach utilizing a BFGS optimization with other
known quantum algorithms. While in previous work,[16] similarities between iterative nature of the ACSE and ADAPT-VQE
were discussed, here, we provide example calculations of VQE, ADAPT-VQE,
and the ACSE that demonstrate fundamental differences in these algorithms.
These are included in Figure , as well as in Table .
Figure 3
Simulations of molecular H4 and H6 for separations
of 1 Å, and 2 Å utilizing the CQE, VQE, and ADAPT-VQE with
a BFGS optimizer. For H4 a minimal ansatz is reached in
the ADAPT through the VQE subroutine, and for H6 the CQE
algorithm slightly outperforms the ADAPT. VQE, here using the UCCSD
anastz, provides rapid convergence for near equilibrium states, but
has significant errors at larger separations, which are overcome by
the iterative ADAPT-VQE and CQE algorithms.
Table 5
Comparison of Gradient Element (VQE)
and Residual Element (ACSE) Evaluations for the CQE, VQE, and ADAPT-VQE
Methods, Corresponding to Simulations for Molecular H4 and
H6 at Separations of 1 and 2 Åa
method
quantity
H4, 1 Å
H4, 2 Å
H6, 1 Å
H6, 2 Å
CQE
Iterations
13
25
46
60
Residuals
1950
3750
38640
50400
VQE
Iterations
7
19
9
22
Gradients
182
494
1053
2574
ADAPT-VQE
Macro
10
12
51
84
Micro
71
167
1221
5854
Gradients
362
1226
43112
354988
Residuals
660
792
16830
27720
Here, note tha the ADAPT-VQE
has a symmetry-adapted pool of operators, which are not implemented
here in the CQE or VQE approaches. The VQE tolerance is also taken
to be quite low, i.e., 10–3 in the norm of the parameter
vector (whereas the VQE subroutine in the ADAPT procedure is generally
lower).
Simulations of molecular H4 and H6 for separations
of 1 Å, and 2 Å utilizing the CQE, VQE, and ADAPT-VQE with
a BFGS optimizer. For H4 a minimal ansatz is reached in
the ADAPT through the VQE subroutine, and for H6 the CQE
algorithm slightly outperforms the ADAPT. VQE, here using the UCCSD
anastz, provides rapid convergence for near equilibrium states, but
has significant errors at larger separations, which are overcome by
the iterative ADAPT-VQE and CQE algorithms.Here, note tha the ADAPT-VQE
has a symmetry-adapted pool of operators, which are not implemented
here in the CQE or VQE approaches. The VQE tolerance is also taken
to be quite low, i.e., 10–3 in the norm of the parameter
vector (whereas the VQE subroutine in the ADAPT procedure is generally
lower).While the VQE results
in Figure are not
that surprising, based on the use of the unitary
coupled cluster ansatz, we still can see some interesting comparisons.
For equilibrium distances, UCCSD provides a good ansatz, and there
are numerous methods exploring the UCC ansatz.[31,47] By comparison, with ADAPT, we are able to obtain seemingly arbitrary
convergence, matching previous work. However, note that the iterative
cost of the ADAPT is much higher than either VQE or the CQE. The CQE,
on the other hand, performs quite well in a variety of instances,
with the most challenging case being dissociated H6, where
higher-order excitations dominate and the system is strongly correlated.
While the number of macro iterations for the ADAPT procedure might
look only slightly worse than the CQE or VQE approaches, when taken
into account with the VQE cost, (i.e., micro + macro iterations),
the length of the ADAPT procedure is somewhat unwieldy, due to the
VQE subroutine. The total number of gradient and residual evaluations
for each of these instances is seen in Table .We can also look at qubit implementation
of the ADAPT scheme, which
here follows the qubit-particle excitation-based scheme of Yordanov
et al.,[45] and not the quasi-particle approach
taken by the original authors. We find a similar result to previous
work, presented in Figure . Namely, for stretched LiH, conserving the particle number
and projected spin leads to similar qubit-based excitations. The main
difference seen (as a result of the relative scale mostly) in these
results is for the CNOT cost of the unencoded CQE approach, although
a similar decrease in the CNOT cost of the IQEB approach exists as
well. While the approaches appear to be inversely related in the rate
of convergence (through total iterations), and the CNOT count (where
the CQE schemes are more costly), the number of parameter evaluations
differs substantially. While here we do not distinguish between the
residuals of the ACSE as parameters and the VQE parameters, since
these can be obtained many different ways (for instance, the ACSE
residuals can be taken from either the 2-RDM with a quantum solver
or the 3-RDM on the quantum computer, and numerous methods of measuring
VQE gradients exist as well), for larger system limits on the number
of parameters should be considered.
Figure 4
Simulations
of molecular lithium hydride at a separation of 2 Å
utilizing CQE, VQE, Fermionic ADAPT-VQE and the iterative qubit-excitation
based ADAPT-VQE (IQEB), essentially a unencoded APAPT-VQE algorithm.
The upper-left shows the macro iterations of each scheme, which are
nearly identical for the encoded and unencoded forms, and are only
slightly slower for the ADAPT. The upper-right shows the total iterations,
including micro iterations in the VQE procedure with a threshold of
10–3 in the parameter vector. The lower-left compares
the energy convergence with the ansatz cost, showing that the ADAPT
does produce a more compact representation of the ansatz. The lower-right
shows the number of parameter evaluations (a lower bound on the number
of gradients evaluated at each optimizer step), which is linear scaling
with the ACSE procedure but for ADAPT becomes quadratic, with respect
to the number of iterations.
Despite achieving quicker
convergence, from all of these examples,
we see that the main drawback to the CQE algorithm is the iteratively
increasing CNOT cost, which is due to the use of additional gates
at each step. While this should be reasonable for high performing
quantum devices, for near-term devices, further reduction of the CNOT
count is important. However, the gain in performance we see by performing
a quasi-second order optimization in the local parameter space is
quite significant. In addition, when compared to CNOT gates of the
gradient-descent-based methods (as in Figure ), the optimized-ACSE allows for more flexibility
in constructing the compact ansatz.
Discussion
While generic algorithms have been known for approaching the problem
of quantum simulation for awhile, calculations involving increasingly
complex systems have only recently begun to emerge. These require
the advancement and development of new quantum algorithms, similar
to the past century of classical quantum chemistry algorithms. The
CQE offers an approach that is potentially applicable in the near-term,
and provides a strong alternative to VQE.The present work improves
the convergence of the CQE method for
quantum simulation by introducing a quasi-second-order, local parametrization
of the state space. While existing algorithms for solving the ACSE
employ a local, first-order parametrization in that the state is iteratively
updated by a series of gradient-based, unitary transformations,[4,5,8] here we present a local, quasi-second-order
parametrization in which the state is iteratively updated by a series
of unitary transformations that approximately incorporate energy-curvature
information. Generally, this allows for more rapid convergence in
the vicinity of the solution as well as more flexibility in the construction
of CQE algorithms. The current work focusing on the BFGS optimizer[37,38] is an example of potential applications, and also is comparable
to published work with other methods, but other optimizers and techniques
can be readily implemented.The largest limitation when translating
this work to NISQ devices
is the presence of noise. Both sampling and device errors can negatively
impact the state. These are present as errors in pure-state N-representability, since the state no longer represents
a pure quantum state, or in misapplications of the target operator.
Device errors are especially challenging since they corrupt the gate
sequences underlying quantum simulation algorithms. Both additional
sampling and noisy optimization techniques can be employed to mitigate
the effects of noise. Because the CQE is iterative, it can also, in
principle, adjust for some errors generated by noise in future iterations.
Future work will further examine the effect of noise in the CQE algorithms.
As the fidelity of quantum devices increases, the CQE will become
more accurate and thereby, more applicable to larger molecular systems.From the discussion in section , we can see that the variational principle used in
the VQE and in the current CQE algorithm are similar in that they
solve an optimization problem, but differ in the goal of the minimization.
In the VQE, we are often trying to minimize the energy of a state
through a global parametrization of some wave function. In addition,
barren plateaus or regions where the optimization fail become likely
with an increasingly large parameter space and hence, the suitability
(i.e., over or under parametrization) of the state is often in question.
As an example, the UCCSD or generalized UCCSD ansatz, which both have
approximately O(r4) parameters,
are not sufficient to parametrize the state. Iterative constructions
of the ansatz, such as the iterative generalized singles pair-cluster
CC,[31] or the ADAPT-VQE, offer fewer parameters
initially, but no bounds on the number of required parameters. In
the ACSE algorithm, the local parametrization leads to a locally updated
optimization, which is constant, with respect to the number of parameters.
Importantly, in a VQE, the optimization converges toward a
solution of the VQE problem, which is not the ACSE. As the
VQE subproblem becomes larger and larger (i.e., in adaptive or iterative
schemes), eventually the VQE solution can (but by design will not
necessarily) satisfy the ACSE. In the CQE approach presented here,
we have convergence toward our contracted eigenvalue problem, and not a variational subproblem.Simulations
of molecular lithium hydride at a separation of 2 Å
utilizing CQE, VQE, Fermionic ADAPT-VQE and the iterative qubit-excitation
based ADAPT-VQE (IQEB), essentially a unencoded APAPT-VQE algorithm.
The upper-left shows the macro iterations of each scheme, which are
nearly identical for the encoded and unencoded forms, and are only
slightly slower for the ADAPT. The upper-right shows the total iterations,
including micro iterations in the VQE procedure with a threshold of
10–3 in the parameter vector. The lower-left compares
the energy convergence with the ansatz cost, showing that the ADAPT
does produce a more compact representation of the ansatz. The lower-right
shows the number of parameter evaluations (a lower bound on the number
of gradients evaluated at each optimizer step), which is linear scaling
with the ACSE procedure but for ADAPT becomes quadratic, with respect
to the number of iterations.From our calculations, we can
also understand some elements of
the ADAPT-VQE algorithm, since it is related to the CQE algorithm.
The ADAPT-VQE method chooses the largest ACSE residual at each macro
iteration. This leads to a flexible and efficient ansatz, that, when
not restarted, will, by construction, improve the energy in the VQE.
However, the reoptimized state is often not close to the previous
state, highlighting the strong variational nature of the ADAPT algorithm.
Restarting the VQE optimization, which has been done in some ADAPT
work, can lead to a suboptimal solution of the ACSE, or for the VQE
subroutine to fail. Recent work by Liu et al.[48] used a reconstructed 3-RDM in the ACSE to obtain approximate residuals
to seed the ADAPT algorithm. Because these approximate gradients can
differ in a substantial way from the exact gradients, more terms are
needed, which significantly increases the variational flexibility
of the ansatz. As a result, these calculations exhibited faster convergence
and required fewer iterations than traditional ADAPT-VQE.However,
unlike Liu et al.,[48] it is
important not to use ACSE residuals from reconstructed
3-RDMs in the CQE, because these will likely lead to convergence issues.
The 3-RDM can still be directly obtained on a quantum computer through
a variety of techniques. Work involving qubit-particle approaches
also shows promise, with another advantage being the increased number
of commuting terms that exists between qubit-particle excitations,
as opposed to Fermionic-particle excitations.In terms of memory
storage, the CQE primarily stores the vectors
of eq . The set of vectors
is composed of l 2-RDMs, or O(lr4) vectors
at most, although, as seen in the preceding discussion, each vector
can be taken to be quite sparse. The optimization occurs over the
dimension of , which here is O(r4) and not unreasonable for highly
accurate
electronic structure methods. This is equivalent to the number of
parameters in a UCC generalized singles and doubles variational ansatz.
However, the gradient evaluations in VQE and the CQE are quite different.
For the ACSE, the gradients are evaluated at worst case using the O(r6) 3-RDM, and which can be
measured more efficiently through circuit tomography. In contrast,
VQE gradients scale with the number of parameters and the energy evaluation,
which, in some instances, is O(r8). The ADAPT-VQE is difficult to analyze in terms of
the number of parameters required, but because each iteration involves
an evaluation of the entire pool (which can range from a minimal set
to an entire two-body pool), in the standard Fermionic application
the total number of gradient evaluations are strictly worse than the
CQE.Another element that can be overlooked is that the ACSE
is not
necessarily equivalent to the CSE, except when all higher-order excitations
are included. Despite this limitation, in the exponential form of
the ACSE, higher-order excitations can be seen to emerge naturally
through products of exponential two-body operations. In addition,
by considering information on the curvature of the space beyond the
gradient, we also should include contributions from triple and higher
excitations in our selection of operators to propagate the wave function.
In practice, the use of the CQE for solving the ACSE leads to a highly
accurate solution.The primary drawback of the CQE when compared
to an algorithm such
as the ADAPT-VQE is the large number of CNOT gates. Even with low-error
CNOT gates, efficient and noise-robust means of obtaining accurate
gradients and 2-RDMs will be necessary. All of these also affect the
success or failure of the underlying optimization algorithm, and so
exploring noise-tolerant approaches will also be critical for near-term
applications.
Conclusion
In this
work, we address the convergence of the contracted quantum
eigensolver using tools from optimization theory. By using methods
beyond traditional gradient descent, we achieve superlinear convergence,
allowing us to propagate the wave function rapidly toward a solution
of the ACSE. Practical implementations where the search direction
is modified to conserve quantum resources show promising reductions
in the cost of the algorithm, and we expect further simplification
schemes to be attempted aimed at improving the efficiency of the CQE
approach. In addition, the present work provides a basis for understanding
approaches that use the ACSE in pool selection, and could lead to
further hybrid optimization schemes for use in NISQ applications.