Sarai Dery Folkestad1, Eirik F Kjønstad1, Linda Goletto1, Henrik Koch1,2. 1. Department of Chemistry, Norwegian University of Science and Technology, N-7491, Trondheim, Norway. 2. Scuola Normale Superiore, Piazza dei Cavaleri 7, Pisa, 56126, Italy.
Abstract
We present efficient implementations of the multilevel CC2 (MLCC2) and multilevel CCSD (MLCCSD) models. As the system size increases, MLCC2 and MLCCSD exhibit the scaling of the lower-level coupled cluster model. To treat large systems, we combine MLCC2 and MLCCSD with a reduced-space approach in which the multilevel coupled cluster calculation is performed in a significantly truncated molecular orbital basis. The truncation scheme is based on the selection of an active region of the molecular system and the subsequent construction of localized Hartree-Fock orbitals. These orbitals are used in the multilevel coupled cluster calculation. The electron repulsion integrals are Cholesky decomposed using a screening protocol that guarantees accuracy in the truncated molecular orbital basis and reduces computational cost. The Cholesky factors are constructed directly in the truncated basis, ensuring low storage requirements. Systems for which Hartree-Fock is too expensive can be treated by using a multilevel Hartree-Fock reference. With the reduced-space approach, we can handle systems with more than a thousand atoms. This is demonstrated for paranitroaniline in aqueous solution.
We present efficient implementations of the multilevel CC2 (MLCC2) and multilevel CCSD (MLCCSD) models. As the system size increases, MLCC2 and MLCCSD exhibit the scaling of the lower-level coupled cluster model. To treat large systems, we combine MLCC2 and MLCCSD with a reduced-space approach in which the multilevel coupled cluster calculation is performed in a significantly truncated molecular orbital basis. The truncation scheme is based on the selection of an active region of the molecular system and the subsequent construction of localized Hartree-Fock orbitals. These orbitals are used in the multilevel coupled cluster calculation. The electron repulsion integrals are Cholesky decomposed using a screening protocol that guarantees accuracy in the truncated molecular orbital basis and reduces computational cost. The Cholesky factors are constructed directly in the truncated basis, ensuring low storage requirements. Systems for which Hartree-Fock is too expensive can be treated by using a multilevel Hartree-Fock reference. With the reduced-space approach, we can handle systems with more than a thousand atoms. This is demonstrated for paranitroaniline in aqueous solution.
The scaling properties of the coupled
cluster hierarchy of methods
severely limits the systems for which it is applicable.[1] The methods have polynomial computational scaling, , where N is a measure
of system size and n increases with accuracy of the
method. Memory and disk space requirements also increase significantly
as one moves up through the hierarchy.The development of reduced
cost and reduced scaling coupled cluster
methods has been an active topic for decades. Arguably, the most popular
approach has emerged from the work of Pulay and Sæbø.[2,3] They demonstrated that dynamical electronic correlation could be
compactly described using localized orbitals rather than canonical
orbitals; specifically, they used localized occupied molecular orbitals
(MOs), such as Boys[4] or Pipek-Mezey[5] orbitals, and projected atomic orbitals[2,3] (PAOs) to span the virtual space. Their local correlation approach
was later applied to coupled cluster theory by Hampel, Werner, and
Schütz.[6,7] Other local coupled cluster methods
include the local pair natural orbital[8,9] and the orbital-specific-virtual[10] coupled cluster methods. Whereas the success
of these local coupled cluster methods in the description of the ground
state correlation energy is indisputable, their extension to excited
states has turned out to be more complicated.[11−16]A different approach originates from the multireference coupled
cluster method of Oliphant and Adamowicz.[17−19] While introduced
to describe multireference character, the method is formulated in
the framework of single reference coupled cluster theory. An active
orbital space is used, and higher order excitation operators (e.g.,
triple or quadruple excitations) are included with some indices restricted
to the active space. Köhn and Olsen[20] recognized that the method could be used to reduce the cost for
single reference systems, and this was further demonstrated by Kállay
and Rolik.[21] The multilevel coupled cluster
(MLCC) approach, introduced by Myhre et al.,[22−24] is closely
related to this active space approach.In MLCC, the goal is
to accurately describe excitation energies
and other intensive properties, rather than extensive properties such
as correlation energies. This is done by restricting the higher order
excitation operators to excite within an active orbital space. For
example, in the multilevel CCSD (MLCCSD)[22,25] method, the double excitation operator is restricted to excite out
of active occupied orbitals and into active virtual orbitals. In this
work, we demonstrate the available computational savings of the multilevel
CC2 (MLCC2) and MLCCSD models introduced in ref (25); for sufficiently large
inactive spaces, we show that the cost is dominated by the lower-level
method. This has previously been demonstrated for multilevel CC3 (MLCC3)
by Myhre et al.[24]The scaling of
the lower-level model cannot, however, be avoided.
Therefore, in order to use these methods for large systems, they must
be combined with other multilevel or multiscale approaches. For instance,
MLCC could be used within a QM/MM[26,27] framework
or with the polarizable continnum model.[28,29] Here, we have chosen to perform MLCC calculations in a significantly
truncated MO basis. The truncation of the MO basis in coupled cluster
calculations is used routinely. For example, the frozen core approximation
falls into this category, and there are several examples of truncation
of natural orbitals, both of the virtual and occupied spaces.[30−35] The LoFEx[36,37] and CorNFLEx[38] approaches are also notable reduced space coupled cluster
approaches that target accuracy in the excited states. In these approaches,
a mixed orbital basis consisting of natural transition orbitals (NTOs)[39−41] and localized orbitals is used. The active space is expanded until
the excitation energies have converged to within a predefined threshold.
One drawback of LoFEx and CorNFLEx is that they are state specific
methods, that is, several subsequent calculations with different truncated
MO bases must be performed to obtain a set of excitation energies.
As a consequence, the calculation of transition moments between excited
states is complicated by the fact that the states are nonorthogonal
and interacting. An orbital selection procedure similar to that of
LoFEx and CorNFLEx has also been used for reduced scaling second-order
algebraic diagrammatic construction (ADC(2)) calculations by Mester
et al.[35,42]Here, we use a truncation scheme for
the MOs where semilocalized
Hartree–Fock orbitals (virtual and occupied) are constructed
and used to calculate localized intensive properties in large molecular
systems. When the region of interest is sufficiently small compared
to the full system, the number of MOs (nMO) in the coupled cluster calculation is much smaller than the number
of atomic orbitals (AOs) denoted NAO.
This reduced space approach has previously been used with standard
coupled cluster models,[43,44] and a very similar
approach has been used together with local coupled cluster models.[45] For sufficiently large systems, the cost of
Hartree–Fock can become a limiting factor. When this is the
case, we handle it by combining the reduced space MLCC approach with
a multilevel Hartree–Fock[46,47] (MLHF) reference
wave function.The MLCC2 and MLCCSD implementations are based
on Cholesky decomposed
electron repulsion integrals.[48,49] We use the two-step
Cholesky decomposition algorithm introduced in ref (50). In this algorithm, the
Cholesky basis and the Cholesky vectors are determined in two separate
steps. We have implemented a direct construction of the Cholesky vectors
in the truncated MO basis. This reduces the memory requirement of
the vectors from to , making it possible to efficiently perform
reduced space calculations on systems with several thousands of basis
functions. We use n as a measure of the size of the
active space, which does not scale with the system. It should be noted
that storage of the Cholesky vectors in the AO basis, albeit temporary,
can only be avoided in a decomposition algorithm that determines the
Cholesky basis and the Cholesky vectors in separate steps. In the
Cholesky decomposition, we also use the MO screening procedure that
was introduced in ref (50). This MO screening leads to fewer Cholesky vectors, further reducing
the memory requirement of the Cholesky vectors to .
Theory
In coupled cluster theory, the wave function is defined
aswhere |HF⟩ is the Hartree–Fock
reference, X are the cluster operator, xμ are cluster amplitudes, and τμ are the excitation operators. The standard models within the coupled
cluster hierarchy are defined by restricting X to
include the excitation operators up to a certain order. In the CCn models, such as CC2[51] and CC3,[52] the nth order excitations are
treated perturbatively.In the following, the indices α,
β, γ, ... and p, q, r, ... refer to
spatial atomic and molecular orbitals, respectively, and the indices i, j, k, ... and a, b, c, ... refer to
occupied and virtual orbitals. The total number of occupied and virtual
orbitals are denoted by N and N, respectively,
and the number of active occupied and active virtual orbitals are
denoted by na and na.The MLCC2 cluster operator is given bywhere the single excitation
operator, X1, is unrestricted, that is,
defined for all
orbitals, whereas the double excitation operator, S2, is restricted to excite within an active orbital space.
As in standard CC2, S2 is treated perturbatively.
The MLCC2 ground state equations are given bywhere Ĥ is the X1-transformed Hamiltonian and F is the Fock operator. The doubles projection space, {⟨μ2|}, is associated with S2. Except for
the restriction of S2 and the projection
space, these equations are equivalent to the standard CC2 ground state
equations. The MLCC2 equations are solved in a basis where the active-active
blocks of the occupied-occupied and virtual-virtual Fock matrices
are diagonal. In this semicanonical basis, eq can be solved analytically
for the S2 amplitudes in each iteration.
The double amplitudes are inserted into eq , which is solved with a DIIS-accelerated[53] quasi-Newton solver[54] to obtain X1. The MLCC2 equations are
formulated in terms of the Cholesky vectors in the X1-basis. See Appendix A for detailed
expressions.If we consider a fixed active space, the overall
scaling of the MLCC2 ground state equations is : the X1-transformation
of the Cholesky vectors scales as , as does the computation of the Fock matrix
in the X1-basis and the correlation energy.
The construction of Ω scales as .The MLCC2 excitation energies are determined
as the eigenvalues
of the Jacobian matrix,Here, is a double excitation included in S2. The excited state equations also assume the
same form as in standard CC2, except for the restrictions of S2, and the same strategies can therefore be
used to solve the MLCC2 equations.[51,55] The most expensive
term in the transformation by MLCC2 appears at the CCS level of theory (see Appendix A); these terms scale as and no indices are restricted to the active
space. Thus, the overall scaling is .In MLCCSD, one defines two sets of active
orbitals, where one is
a subset of the other. The cluster operator has the formwhere X1 is unrestricted, S2 is restricted
to the larger active orbital
space, and T2 is restricted to the smaller
active orbital space. The S2 operator
is treated perturbatively (as in CC2 and MLCC2) and T2 acts as a correction to S2 in the smaller active space. This framework is flexible, since it
allows for both two-level calculations (CCS/CCSD and CC2/CCSD) and
three-level calculations (CCS/CC2/CCSD). Previously, we have found
that the less expensive and significantly simpler CCS/CCSD method
performs very well.[25] In the CCS/CCSD method,
the MLCCSD cluster operator reduces toand only the active space for T2 is needed. In this work, we only consider the CCS/CCSD
method.The MLCCSD (CCS/CCSD) ground state equations arewhere
the doubles projection space, {⟨μ2|}, is associated
with T2. Equations and 9 are equivalent
to the standard CCSD equations, except for the restriction
of T2 and the projection space. The construction
of the singles part of Ω, eq , has the same cost as constructing the MLCC2 Ω
(). When the active space is fixed, the construction
of the doubles part of Ω, eq , scales as due to the calculation
of the integrals
from the Cholesky vectors; all orbital indices are restricted to the
active space.The excitation energies are obtained as the eigenvalues
of the
MLCCSD Jacobian,where is a double excitation included in T2.In addition to the terms of the transformation
by MLCC2 that enter the
transformation by MLCCSD, there are terms which
scale as , , and . Integral construction for the different
terms scales, depending on the number of restricted indices, as , , or . See Appendix A for detailed
expressions.
Multilevel CC2 and CCSD
Partitioning the Orbital Space
Selecting
the active
orbital space for a multilevel coupled cluster calculation is not
trivial. Generally, the canonical Hartree–Fock orbitals must
be transformed—through occupied-occupied and virtual-virtual
rotations—to an orbital basis that can be intuitively partitioned.
To determine the type of orbitals to use, both the targeted property
and the system must be considered. There are two main approaches to
select the active spaces. If the property of interest is adequately
described at a lower level of theory, then the information from that
lower level can be exploited to partition the orbitals. An example
is the use of correlated NTOs (CNTOs).[25,41] If the property
of interest is spatially localized, then localized or semilocalized
orbitals can be applied. For instance, Cholesky orbitals[44,56] have been used in multilevel coupled cluster calculations by Myhre
et al.[23,24,57]The
CNTOs are constructed using excitation vectors, , from a lower-level calculation. The matricesare diagonalized; the
matrices that diagonalize and are
the transformation matrices of the occupied and virtual orbitals,
respectively. From eqs and 12 it may seem that the lower level method
must include double excitation amplitudes in its parametrization.
However, CNTOs can be generated from CCS excitation vectors by constructing
approximate double excitation vectors:Here, ωCCS is the CCS excitation
energy, and ϵ = ϵ + ϵ – ϵ–ϵ, where the ϵ are orbital energies.
The integrals g̅ are defined aswhere g = ∑LL is the electronic
repulsion integrals in the MO basis and (I are elements of a rank-4 tensor). Equations and 14 were suggested
by Baudin and Kristensen[38] and are based
on CIS(D).[58] In our previous work, we have
found that the CNTOs obtained from a CCS calculation (using eqs and 14) perform well, considering accuracy and cost, compared to
CNTOs from a CC2 calculation.[25] It should
be noted, however, that these orbitals are not expected to perform
well for states dominated by double excitations with respect to the
reference.The active space is selected by considering the eigenvalues
of and : active
orbitals result from the eigenvectors corresponding to the largest
eigenvalues. In this work, we either explicitly select the number
of active occupied and active virtual orbitals (na and na) or we select na and let the number of active virtual orbitals be determined
from the total fraction of virtual to occupied orbitals; that is,Alternatively,
one can use the selection criterion
given in ref (41).
The latter approach is more suitable for production calculations;
on the other hand, eq is convenient for testing the models. Several excited states can
be considered simultaneously by diagonalizing sums of and matrices generated
from the individual excitation vectors (eqs and 12).[25]Cholesky orbitals[44,56] are obtained by a restricted
Cholesky decomposition of the Hartree–Fock densities (occupied
and virtual); the pivots of the decomposition procedure are restricted
to correspond to AOs centered on active atoms.As an alternative
to Cholesky orbitals for the virtual space, one
can use projected atomic orbitals[3] (PAOs).
To construct the PAOs, the occupied orbitals are projected out of
the AOs, {χα}, centered on the active atoms:Here, is the
orbital coefficient matrix, is the
idempotent Hartree–Fock density, and is the AO overlap matrix. The orbital coefficient matrix
for the active PAOs is therefore PAO = – , where is
rectangular and contains the columns
of which correspond to AOs on active
atomic centers. The PAOs are nonorthogonal and linearly dependent.
To remove linear dependence and orthonormalize the active virtual
orbitals, we use the Löwdin canonical orthonormalization procedure.[59] The inactive virtual orbitals are obtained in
a similar way: the occupied orbitals, as well as the active virtual
orbitals, are projected out of the AOs and the resulting orbitals
are finally orthonormalized.After the orbitals have been partitioned—regardless
of which
orbitals are used—we transform to the semicanonical MO basis
that is used in MLCC2 and MLCCSD calculations. This transformation
involves block-diagonalizing the virtual-virtual and occupied-occupied
Fock matrices such that the active-active and inactive-inactive blocks
become diagonal.
Reduced Space Multilevel Coupled Cluster
Recall that
MLCC methods exhibit the scaling of the lower-level coupled cluster
model. To overcome this limitation, we apply a reduced space approach
in which only a subregion of the molecule is described at the coupled
cluster level. The orbitals in this subregion are divided into active
and inactive sets for the MLCC calculation. The rationale behind this
approach is that localized intensive properties can be described by
using accurate and expensive methods only for the region of interest.
In particular, it is assumed that the effect of the more distant environment
is sufficiently well captured through contributions to the Fock matrix.
A few numerical results[43,44] indicate that excitation
energies can be described accurately with this frozen Hartree–Fock approach. However, a comprehensive study has not yet been published.To perform reduced space MLCC calculations, we must first choose
the region of the molecular system to be treated with MLCC. After
the Hartree–Fock calculation, localized occupied and virtual
orbitals are constructed for the active region. Any localization procedure
can be employed; however, we use Cholesky orbitals for the occupied
space and PAOs for the virtual space. This set of orbitals enters
the MLCC calculation. The remaining occupied orbitals enter the equations
through their contributions to the Fock matrix,Here, N is the number of frozen
occupied orbitals and the index I denotes a frozen
occupied orbital. The multilevel coupled
cluster calculation now has nMO ≪ NAO, but the procedure is otherwise unchanged:
the reduced set of MOs is partitioned into active and inactive sets
and the MLCC equations are solved. We write nMO (with lower case n) to indicate that the
number of MOs does not scale with the system in such calculations.A multilevel Hartree–Fock[46,47] (MLHF) reference
can also be used. As in MLCC, one first determines the active orbitals:
a set of active atoms is selected, and the active occupied orbitals
are obtained through a partial limited Cholesky decomposition of the
initial idempotent density; PAOs can be used to determine the active
virtual orbitals. Only the active orbitals are optimized in the Roothan–Hall
procedure, which is performed in the MO basis.[47] The inactive orbitals enter the optimization through an effective Fock matrix that assumes the same form as in eq . The inactive two-electron
contribution (f) is only
computed once at the beginning of the calculation and is subsequently
transformed to the updated MO basis in every iteration (for details,
see ref (46)).The reduced space MLCC approach relies on the definition of levels
of active regions of the system, see Figure . We must first select which atoms are active
in the Hartree–Fock (HF) calculation. If all atoms are active,
we have a standard HF reference. Second, we must determine which atoms
enter the MLCC calculation. Lastly, if we use Cholesky/PAOs to partition
the orbitals in the MLCC calculation, we must determine which atoms
should be treated with the higher level coupled cluster method. This
is not necessary when CNTOs are used. Note that the active atom sets
for higher level methods are contained within the active atom sets
of lower level methods (see Figure ).
Figure 1
Different levels of active atoms used in reduced space
MLCC calculations.
Left panels show active atoms configurations of reduced space MLCC
calculation with an HF reference. Right panels show active atom configurations
of reduced space MLCC calculation with an MLHF reference. The two
lower panels show the active atom configurations when Cholesky/PAOs
are used to determine the active orbitals of the MLCC calculation.
Different levels of active atoms used in reduced space
MLCC calculations.
Left panels show active atoms configurations of reduced space MLCC
calculation with an HF reference. Right panels show active atom configurations
of reduced space MLCC calculation with an MLHF reference. The two
lower panels show the active atom configurations when Cholesky/PAOs
are used to determine the active orbitals of the MLCC calculation.Since these methods rely on selecting active regions,
they are
especially well suited for solute/solvent systems. They may also be
used for other large systems where the region of interest is known.
Integral Handling for Reduced Space Calculations
When nMO ≪ NAO and NAO is large, as is often the case in reduced
space calculations, the electron repulsion integrals must be handled
carefully to avoid prohibitive scaling with total system size. In
the AO basis, the Cholesky vectors, L, have a storage requirement of ; as demonstrated by Røeggen and Wisløff-Nilssen,[60] the number of Cholesky vectors, N, is approximately MNAO when a decomposition threshold of 10– is used. For example, with a loose decomposition
threshold of 10–2, about 28 TB of memory is needed
to store the Cholesky vectors of a molecular system with 12000 AOs—assuming
double precision and no screening.We have previously suggested
a two-step Cholesky decomposition algorithm[50] in which the Cholesky basis (i.e., the set of pivots), , is determined
in the first step. The Cholesky
vectors are constructed in the second step through an RI-like expression,where the matrix is the Cholesky factor of the matrix S = (K|L) for .
This two-step algorithm makes it possible
to directly construct the Cholesky vectors in the MO basis:We emphasize that
it is not possible to avoid storing the AO Cholesky
vectors with a one-step Cholesky decomposition of
the AO electron repulsion integral matrix. Alternatively, the MO electron
repulsion integrals can be constructed from the AO integrals. To reduce
the scaling, one can combine screening on the AO integrals and the
MO-coefficients.Below we outline an algorithm to construct
and store the vectors
directly in the MO basis (see Algorithm 1). This is done after the
elements of the basis have been determined, has been constructed and decomposed, and has been inverted. When the MO Cholesky
factor, , is too large to store in
memory, L is constructed
for a maximum number of p indices (resulting in several
batches, P1, P2, ..., P). The direct
construction of the Cholesky
vectors in the MO basis reduces the storage requirement to . Note that this is linear, rather than
cubic, in NAO.Algorithm 1 is designed to avoid the IO
operations involved in temporary storage and reordering of the intermediate . Alternatively, can be constructed and stored on disk before is constructed in batches over p or q. With the latter approach, the integrals (αβ|K) are never recalculated.
It should be noted, however, that when nMO ≪ NAO, batching over p is typically not necessary.The number of Cholesky vectors, N, can—through a method-specific screening—be
made to scale with nMO rather than NAO. Consequently, the storage requirements become . Method-specific decompositions were first
considered by Boman et al.[61] We use the
active space screening given in ref (50). In a given iteration of the Cholesky decomposition
procedure, the next element of the basis is determined by considering
the updated diagonal of the integral matrixHere,
the sum is over the current elements
of the basis. In the standard decomposition algorithm, the next element
of the basis is selected as the K = αβ corresponding to the largest element of . The decomposition procedure is terminated whenwhere τ is the decomposition threshold.
In the spirit of method specific Cholesky decomposition,[61] one can consider the Cholesky decomposition
of the matrix with elementsThe positive semidefiniteness of follows
directly from the positive semidefiniteness of . The diagonal of ,is bound from above bywhereand where a is the MO coefficient matrix of the
reduced space MLCC calculation.
We can modify the procedure to determine the Cholesky basis. The selection
and termination criteria are changed by considering the screened diagonalinstead of . Using eq , we obtain
a smaller Cholesky basis compared to the standard decomposition. The
MO integrals are, thus, given bywhere the errors Δ are less than τ.Finally, let us briefly consider the computational scaling of the
decomposition procedure. Except for the initial integral cutoff screening,
which scales as in our implementation, the MO-screened
decomposition algorithm scales as . The prescreening step can be implemented
with a lower scaling; however, this step is not time-limiting in any
of the reported calculations.
Results and Discussion
The MLCC2 and MLCCSD methods have been implemented in a development
version of the e program.[43] The following thresholds are applied, unless
otherwise stated: the Hartree–Fock equations are solved to
within a gradient threshold of 10–8; the Cholesky
decomposition threshold is 10–3; the coupled cluster
amplitude equations are solved such that |Ω| < 10–6; the excited state
equations are solved to within a residual threshold of 10–4; and occupied Cholesky orbitals are constructed using a threshold
of 10–2 on the pivots. The frozen core approximation
is used throughout. All geometries are available from ref (62).
Performance and Scaling
The MLCC2 and MLCCSD methods
can be used to obtain excitation energies of CC2 and CCSD quality,
at significantly reduced cost. This is demonstrated for rifampicin
and adenosine, see Figure . For rifampicin, the lowest excitation energy is calculated
at the MLCC2/aug-cc-pVDZ and CC2/aug-cc-pVDZ levels of theory. For
adenosine, the three lowest excitation energies are calculated at
the MLCCSD/aug-cc-pVDZ and CCSD/aug-cc-pVDZ levels of theory. We have
used CNTOs to partition the orbitals. The results are given in Tables and 2, respectively. These show that the error in the MLCCSD and
MLCC2 excitation energies with respect to CC2 and CCSD is smaller
than the expected error of CC2 and CCSD.[63,64] Furthermore, the cost is drastically reduced in all cases.
Figure 2
Rifampicin on the left and adenosine on
the right.
Table 1
MLCC2/aug-cc-pVDZ and CC2/aug-cc-pVDZ
Calculations for Rifampicina
method
noa
nva
ω [eV]
tgs [h]
tes [h]
tCNTO [h]
PMU [GB]
MLCC2
40
400
2.78
0.3
0.9
1.9
500.0
60
600
2.65
0.5
4.8
2.0
500.0
80
800
2.59
0.9
12.3
1.9
500.0
CC2
161
1645
2.57
32.2
183.8
498.3
na and na are the number of active occupied
and
virtual orbitals, and ω is the lowest excitation energy. The
wall times to solve the ground and excited state equations (tgs and tes) and
to construct the CNTOs (tCNTO) are also
given. The calculations were performed on two Intel Xeon E5-2699 v4
processors using 44 threads. The calculations were performed with
500 GB memory available. Peak memory usage (PMU) is given in GB.
Table 2
MLCCSD/aug-cc-pVDZ
and CCSD/aug-cc-pVDZ
Calculations on Adenosinea
method
noa
nva
ω1 [eV]
ω2 [eV]
ω3 [eV]
tgs [min]
tes [h]
tCNTO [min]
PMU [GB]
MLCCSD
25
225
5.26
5.37
5.41
2.3
0.8
1.9
40.3
30
270
5.25
5.36
5.41
4.9
1.8
1.9
77.2
35
315
5.25
5.35
5.41
9.9
4.0
1.8
138.0
CCSD
51
484
5.25
5.35
5.41
75.3
38.3
288.1
na and na are the number
of active occupied and
virtual orbitals, and ω is the ith excitation energy. The wall times to solve the ground
and excited state equations (tgs and tes) and to construct the CNTOs (tCNTO) are also given. The calculations were performed
on two Intel Xeon Gold 6138 processors with 40 threads and 355 GB
memory available. Peak memory usage (PMU) is given in GB.
na and na are the number of active occupied
and
virtual orbitals, and ω is the lowest excitation energy. The
wall times to solve the ground and excited state equations (tgs and tes) and
to construct the CNTOs (tCNTO) are also
given. The calculations were performed on two Intel Xeon E5-2699 v4
processors using 44 threads. The calculations were performed with
500 GB memory available. Peak memory usage (PMU) is given in GB.Rifampicin on the left and adenosine on
the right.In Tables –3, we have given the available
memory and peak memory used in these calculations. Note that the calculations
may be performed with less memory since the models are implemented
with batching for the memory intensive terms to use no more storage
than the specified available memory.
Table 3
MLCCSD/aug-cc-pVDZ
Calculations for
Rifampicina
noa
nva
ω [eV]
tgs [h]
tes [h]
tCNTO [h]
PMU [GB]
40
400
3.04
8.5
5.6
7.5
354.5
50
500
3.02
13.1
9.3
7.6
354.5
60
600
3.00
14.1
21.1
5.6
354.5
na and na are the number of active occupied
and
virtual orbitals, and ω is the lowest excitation energy. The
wall times to solve the ground and excited state equations (tgs and tes) and
to construct the CNTOs (tCNTO) are also
given. The calculations were performed on two Intel Xeon Gold 6138
processors with 40 threads and 355 GB memory available. Peak memory
usage (PMU) is given in GB.
na and na are the number
of active occupied and
virtual orbitals, and ω is the ith excitation energy. The wall times to solve the ground
and excited state equations (tgs and tes) and to construct the CNTOs (tCNTO) are also given. The calculations were performed
on two Intel Xeon Gold 6138 processors with 40 threads and 355 GB
memory available. Peak memory usage (PMU) is given in GB.na and na are the number of active occupied
and
virtual orbitals, and ω is the lowest excitation energy. The
wall times to solve the ground and excited state equations (tgs and tes) and
to construct the CNTOs (tCNTO) are also
given. The calculations were performed on two Intel Xeon Gold 6138
processors with 40 threads and 355 GB memory available. Peak memory
usage (PMU) is given in GB.The lowest excitation energy of rifampicin was also calculated
with MLCCSD/aug-cc-pVDZ, see Table . Since the system has 1806 MOs, a full CCSD calculation
would be demanding; therefore, we do not present a reference CCSD
calculation. However, the variation of the excitation energy is less
than 0.05 eV for the different active spaces and can therefore
be considered converged. In our experience, MLCCSD excitation energies
converge smoothly to the CCSD values.[25] Note that the MLCC2 and MLCCSD timings cannot be compared as the
calculations were performed on different processors.To demonstrate
the scaling properties, we consider a system of
PNA and water molecules. The size of the active space is fixed—with
36 occupied and 247 virtual orbitals—and the system size is
increased by adding water molecules (see Figure ). We use the aug-cc-pVDZ basis set.
Figure 3
PNA and water.
PNA and water.In Figure , we
show the overall wall times of the Hartree–Fock calculation,
the CNTO construction, and the MLCC ground and excited state calculations.
The steep scaling of the CNTO construction
is apparent:
for the largest system, it is the most expensive step. The ground
and excited state MLCC equations scale as ; however, for the larger systems we have
considered, the Hartree–Fock calculation is seen to be more
expensive. This must be understood in the context of system size and
the use of an augmented basis set. For sufficiently large inactive
spaces, the terms of MLCC2 and MLCCSD will
become more
expensive than Hartree–Fock.
Figure 4
Timings for MLCC2 (top) and MLCCSD (bottom)
calculations on PNA
and water. Nw is the number of water molecules, tHF is the full Hartree–Fock calculation
time, tGS,MLCC2 and tGS,MLCCSD are the MLCC ground state calculation times, tES,MLCC2 and tES,MLCCSD are the MLCC excited state calculation times to obtain a single
excited state, tERI is the time to Cholesky
decompose the electron repulsion integrals, and tCNTO is the time to construct the CNTOs. The calculations
were performed on two Intel Xeon E5-2699 v4 processors using 44 threads
and with 1.4 TB memory available.
Timings for MLCC2 (top) and MLCCSD (bottom)
calculations on PNA
and water. Nw is the number of water molecules, tHF is the full Hartree–Fock calculation
time, tGS,MLCC2 and tGS,MLCCSD are the MLCC ground state calculation times, tES,MLCC2 and tES,MLCCSD are the MLCC excited state calculation times to obtain a single
excited state, tERI is the time to Cholesky
decompose the electron repulsion integrals, and tCNTO is the time to construct the CNTOs. The calculations
were performed on two Intel Xeon E5-2699 v4 processors using 44 threads
and with 1.4 TB memory available.In Figure , we
present a timing breakdown of an iteration to solve the MLCC ground
state equations. The iteration is dominated by the step to construct the X1-transformed Cholesky vectors. The calculation of the
energy, and the necessary blocks of the Fock matrix in the X1-basis, also scale as , but the prefactor is lower for these operations.
The construction of the Ω-vector scales as . In MLCCSD, the Ω-vector
contains additional contractions, compared to MLCC2, that scale as or (see Appendix A).
Figure 5
Timing breakdown of the MLCC2 (top) and MLCCSD (bottom)
ground
state iteration for PNA and water. Nw is
the number of water molecules, tEnergy is the time to compute the MLCC correlation energy, t is the time to construct the necessary
blocks of the Fock matrix in the X1-basis, tΩ is the time to construct the Ω-vector, and t is the time to X1-transform the
Cholesky vectors. The calculations were performed on two Intel Xeon
E5–2699 v4 processors using 44 threads and with 1.4 TB memory
available.
Timing breakdown of the MLCC2 (top) and MLCCSD (bottom)
ground
state iteration for PNA and water. Nw is
the number of water molecules, tEnergy is the time to compute the MLCC correlation energy, t is the time to construct the necessary
blocks of the Fock matrix in the X1-basis, tΩ is the time to construct the Ω-vector, and t is the time to X1-transform the
Cholesky vectors. The calculations were performed on two Intel Xeon
E5–2699 v4 processors using 44 threads and with 1.4 TB memory
available.In Figure , we
plot the wall time of the Jacobian matrix transformation together
with the time spent on terms that arise at the CCS, CC2, and CCSD
level of theory. The CCS terms scale more steeply (), and for MLCC2, we see that these terms
dominate when the inactive space is sufficiently large. For MLCCSD,
the CCS terms are significant, but they do not dominate for any of
the systems.
Figure 6
Wall times of the linear transformation by the MLCC2 (top)
and
MLCCSD (bottom) Jacobian matrices ( and ) for systems of PNA and water.
The contribution
from terms that arise at the CCS level and at the CC2 or CCSD level
of theory are plotted separately. Nw is
the number of water molecules. The calculations were performed on
two Intel Xeon E5–2699 v4 processors using 44 threads and with
1.4 TB memory available.
Wall times of the linear transformation by the MLCC2 (top)
and
MLCCSD (bottom) Jacobian matrices ( and ) for systems of PNA and water.
The contribution
from terms that arise at the CCS level and at the CC2 or CCSD level
of theory are plotted separately. Nw is
the number of water molecules. The calculations were performed on
two Intel Xeon E5–2699 v4 processors using 44 threads and with
1.4 TB memory available.
Reduced Space Calculations
We now consider a larger
PNA-in-water system. The geometry is extracted from a single snapshot
of a molecular dynamics simulation taken from ref (65). The PNA-in-water system
is restricted to a sphere centered on PNA with a 15 Å
radius and includes 499 water molecules, see Figure .
Figure 7
PNA with 499 water molecules.
PNA with 499 water molecules.To assess the accuracy of the MO screening procedure of eqs and 26, we consider the lowest MLCCSD-in-HF excitation energy of
the system, which corresponds to a charge transfer process in PNA.
We compare the MO-screened Cholesky decomposition with the standard
Cholesky decomposition. Note that we use the partitioned Cholesky
decomposition (PCD) algorithm, described in ref (50), with two batches. In
these MLCCSD calculations, the atoms within a sphere of 5 Å
are included in the MLCC region (rCCS =
5 Å) and the atoms within a sphere of radius 3.5 Å
are defined as active at the CCSD level of theory (rCCSD = 3.5 Å). The orbitals are partitioned with
the Cholesky/PAO approach. For the CCSD/CCS/HF levels of theory, we
use the aug-cc-pVDZ/cc-pVDZ/STO-3G basis sets. The total number of
basis functions is 3971, and in the MLCCSD-in-HF calculations, we
have nCCSD = 90, nCCSD = 287, nCCS = 57, and nCCS = 219, that is, nMO = 653.
The results are given in Table .
Table 4
Lowest MLCCSD-in-HF Excitation Energy
of the PNA-in-Water System Obtained with Regular and MO-Screened Cholesky
Decompositiona
standard
MO-screened
τ
NJ
NJ/NAO
ϵ [au]
ω [eV]
NJ
NJ/NMO
ϵ [au]
ω [eV]
10–2
8434
2.1
1.1 × 10–2
4.0501
606
0.9
4.77
no convergence
10–3
12297
3.1
1.1 × 10–3
4.0771
1440
2.2
4.77
4.1055
10–4
15474
3.9
1.7 × 10–4
4.0761
2445
3.7
4.77
4.0785
10–6
24826
6.3
1.6 × 10–6
4.0753
5378
8.2
4.76
4.0754
The PCD algorithm is used. The
threshold, τ, the number of Cholesky vectors, N, and the largest error in the approximated
matrix in the AO basis, ϵ, are given. There are 3971 basis functions.
The PCD algorithm is used. The
threshold, τ, the number of Cholesky vectors, N, and the largest error in the approximated
matrix in the AO basis, ϵ, are given. There are 3971 basis functions.The MO screening yields significantly
fewer Cholesky vectors without
introducing large errors in the excitation energies. As expected,
the number of Cholesky vectors, N, is seen to be on the same order of magnitude as NAO and nMO for the standard
and MO-screened decomposition algorithms, respectively. Fewer Cholesky
vectors reduces the cost of the coupled cluster calculation, for which
the Cholesky vectors are either used to construct the integrals or
applied directly in Cholesky vector-based algorithms. Moreover, the
decomposition time is reduced when the MO screening is employed; for
instance, with a threshold of 10–4, the decomposition
time was 1161 s without screening and 286 s with screening.
In any case, the decomposition time is not a bottleneck in any of
these calculations.The largest error in the approximated AO
integral matrix, ϵ,
is also given in Table . For standard PCD, the errors are comparable to the decomposition
threshold. With MO screening, ϵ is large because AO integrals
that do not contribute to the MO integrals are not described by the
Cholesky vectors. Without MO screening, a Cholesky decomposition threshold
of 10–2 or 10–3 is typically sufficient.[50] For MLCC2 or MLCCSD in a reduced space calculation,
the MO screening can be used and a threshold of 10–4 seems suitable. In the calculation with MO screening and a threshold
of 10–2, the MLCCSD calculation did not converge.We have also performed MLCC calculations on the PNA-in-water system
in Figure with larger
basis sets. In Table , we present timings for MLCC2-in-HF/MLHF and MLCCSD-in-HF/MLHF calculations
with rCCS = 6.0 Å and rCC2/CCSD = 3.5 Å. The aug-cc-pVDZ basis is used for
all atoms included in the CC active region, and cc-pVDZ is used on
the remaining atoms. In total, there are 12669 AOs and 1498 MOs in
the coupled cluster calculation. The Cholesky decomposition is performed
with MO screening using a threshold of 10–4. For
the reference calculations, a gradient threshold of 10–6 is used.
Table 5
Lowest Excitation Energy (ω)
of the PNA-in-Water System, Calculated with MLCC2-in-HF, MLCC2-in-MLHF,
MLCCSD-in-HF, and MLCCSD-in-MLHF Using the Frozen Core Approximationa
MLCC2
MLCCSD
ref
tref [h]
ω [eV]
tMLCC [h]
PMU [GB]
ω [eV]
tMLCC [h]
PMU [GB]
HF
48.1
3.821
3.1
370
4.075
6.7
382
MLHF
33.6
3.832
3.1
370
4.083
6.8
382
The atoms within a radius of
6 Å are included in the MLCC calculation, and the atoms
within a radius of 3.5 Å are treated with the higher level
coupled cluster method (CC2 or CCSD). In the MLHF reference calculation,
the atoms within a radius of 10 Å are active. The aug-cc-pVDZ
basis is used on the atoms that are included in the MLCC calculation,
and cc-pVDZ is used on the remaining atoms. The wall times for the
reference calculation (tref) and the MLCC
calculation (tMLCC) are also given. The
calculations were performed on two Intel Xeon Gold 6152 processors
with 44 threads and 1.4 TB memory available. The peak memory usage
(PMU) is given in GB.
The atoms within a radius of
6 Å are included in the MLCC calculation, and the atoms
within a radius of 3.5 Å are treated with the higher level
coupled cluster method (CC2 or CCSD). In the MLHF reference calculation,
the atoms within a radius of 10 Å are active. The aug-cc-pVDZ
basis is used on the atoms that are included in the MLCC calculation,
and cc-pVDZ is used on the remaining atoms. The wall times for the
reference calculation (tref) and the MLCC
calculation (tMLCC) are also given. The
calculations were performed on two Intel Xeon Gold 6152 processors
with 44 threads and 1.4 TB memory available. The peak memory usage
(PMU) is given in GB.Comparing Tables and 5, we see that the MLCCSD-in-HF excitation
energies do not change significantly with a larger basis and an increased rCCS. For the calculations presented in Table , the reference calculation
is the most expensive step. Since the active region of the MLHF calculation
is large (10.0 Å), we do not obtain large savings using
an MLHF reference. However, this can be achieved by reducing rHF. Furthermore, MLHF is applicable for systems
where standard Hartree–Fock is not computationally feasible.
The CC2-in-HF calculation for this system, with a CC2 radius of 6 Å,
yields ω = 3.732 eV. Hence, the error of using MLCC2, compared
to CC2, is approximately 0.1 eV. The effect of extending the
CCS radius to rCCS = 8.0 Å is to
increase the excitation energy by 0.003 eV to ω = 3.824
eV.Solvation effects can be estimated by performing calculations
on
a series of snapshots from a molecular mechanics simulation, for instance
using the QM/MM approach for the individual snapshots, such as in
ref (65). The calculations
in this paper demonstrate that a fully quantum mechanical approach—MLCC-in-HF
and MLCC-in-MLHF—can be used to determine such solvation effects.
For the former of these approaches, the Hartree–Fock calculation
is likely to be the time limiting step.The MLCC-in-HF and MLCC-in-MLHF
approaches are not only applicable
to solute–solvent systems. They can also be used for large
molecules. As a proof of concept, we present MLCC-in-HF calculations
for the lowest excitation in rifampicin in Table . In Figure , we have plotted the NTOs from a CCS/aug-cc-pVDZ calculation.
The excitation is seen to be located in a subregion of the molecule.
It can therefore be treated with CC-in-HF or MLCC-in-HF. In Figure , we have also plotted
the Hartree–Fock density of the active occupied orbitals treated
with CC or MLCC. The active atoms in the CC and MLCC calculations
were selected by hand by inspecting the NTOs. CNTOs were used to partition
the orbitals in the case of MLCC2/MLCCSD. The shift observed by going
from X to X-in-HF is about 0.1 eV
in all the presented calculations. It should be noted that this system
is too small to be suitable for CC-in-HF and MLCC-in-HF, but is chosen
because the reference CC2 calculations are available. The MLCC2 or
MLCCSD methods are preferable for systems of this size since, as can
be seen from Tables and 3, these calculations can be performed
with ease.
Table 6
X and X-in-HF Calculations for the
Lowest Excitation Energy in Rifampicin, with X = {MLCC2, CC2, MLCCSD}a
method
noa
nva
ω [eV]
MLCC2
60
600
2.65
MLCC2-in-HF
60
600
2.77
CC2
161
1645
2.57
CC2-in-HF
131
981
2.70
MLCCSD
50
500
3.02
MLCCSD-in-HF
50
500
3.13
MLCCSD-in-HFb
50
500
3.16
The aug-cc-pVDZ
basis is used,
unless otherwise stated. The active atoms in the CC and MLCC calculations
were selected by hand (see Figure ). CNTOs were used to partition the orbitals in the
MLCC calculations.
aug-cc-pVTZ
on active atoms.
Figure 8
Dominant occupied (left)
and virtual (center) NTOs (CCS/aug-cc-pVDZ)
of the lowest excited state of rifampicin. The Hartree–Fock
density of the active occupied MOs used in the CC-in-HF/aug-cc-pVDZ
and MLCC-in-HF/aug-cc-pVDZ calculations (right); the active atoms
are illustrated with “ball-and-stick” and the inactive
atoms as “sticks”.
The aug-cc-pVDZ
basis is used,
unless otherwise stated. The active atoms in the CC and MLCC calculations
were selected by hand (see Figure ). CNTOs were used to partition the orbitals in the
MLCC calculations.aug-cc-pVTZ
on active atoms.Dominant occupied (left)
and virtual (center) NTOs (CCS/aug-cc-pVDZ)
of the lowest excited state of rifampicin. The Hartree–Fock
density of the active occupied MOs used in the CC-in-HF/aug-cc-pVDZ
and MLCC-in-HF/aug-cc-pVDZ calculations (right); the active atoms
are illustrated with “ball-and-stick” and the inactive
atoms as “sticks”.
Concluding Remarks
We have demonstrated the computational
savings that can be obtained
with MLCC2 and CCS/CCSDMLCCSD. These multilevel methods can be used
for systems that are too large to be described at the CC2 and CCSD
level. However, the MLCC2 and MLCCSD models are limited by the underlying
scaling of the lower-level coupled cluster method (CCS). We have therefore
presented a framework of reduced-space MLCC that can be used for systems
with several thousand AOs. In this layered approach, MLCC is only
applied to a restricted region of the molecular system; the environment
is optimized with Hartree–Fock, or multilevel Hartree–Fock,
and only contributes to the MLCC calculation through the Fock matrix.
Efficient implementation of this framework requires careful handling
of the electron repulsion integrals. We have implemented a direct
construction of MO Cholesky vectors that reduces the storage requirement
to . With an additional screening during the
Cholesky decomposition algorithm, we further reduce this requirement
to , making the storage requirement independent
of the size of the environment. Exploiting the Cholesky factorization
in this manner, we can handle systems with several thousand basis
functions using existing MLCC implementations. The MLCC-in-HF/MLHF
framework is therefore suited to accurately model solvation effects
on intensive properties on the solute. It can also be used for chromophores
in biomolecules.
Authors: Sarai D Folkestad; Eirik F Kjønstad; Rolf H Myhre; Josefine H Andersen; Alice Balbi; Sonia Coriani; Tommaso Giovannini; Linda Goletto; Tor S Haugland; Anders Hutcheson; Ida-Marie Høyvik; Torsha Moitra; Alexander C Paul; Marco Scavino; Andreas S Skeidsvoll; Åsmund H Tveten; Henrik Koch Journal: J Chem Phys Date: 2020-05-14 Impact factor: 3.488
Authors: Linda Goletto; Eirik F Kjønstad; Sarai D Folkestad; Ida-Marie Høyvik; Henrik Koch Journal: J Chem Theory Comput Date: 2021-11-07 Impact factor: 6.006