Laurens D M Peters1, Jörg Kussmann1, Christian Ochsenfeld1,2. 1. Chair of Theoretical Chemistry, Department of Chemistry , University of Munich (LMU) , Butenandtstr. 7 , D-81377 München , Germany. 2. Max Planck Institute for Solid State Research , Heisenbergstr. 1 , D-70569 Stuttgart , Germany.
Abstract
Nonadiabatic molecular dynamics (NAMD) simulations of molecular systems require the efficient evaluation of excited-state properties, such as energies, gradients, and nonadiabatic coupling vectors. Here, we investigate the use of graphics processing units (GPUs) in addition to central processing units (CPUs) to efficiently calculate these properties at the time-dependent density functional theory (TDDFT) level of theory. Our implementation in the FermiONs++ program package uses the J-engine and a preselective screening procedure for the calculation of Coulomb and exchange kernels, respectively. We observe good speed-ups for small and large molecular systems (comparable to those observed in ground-state calculations) and reduced (down to sublinear) scaling behavior with respect to the system size (depending on the spatial locality of the investigated excitation). As a first illustrative application, we present efficient NAMD simulations of a series of newly designed light-driven rotary molecular motors and compare their S1 lifetimes. Although all four rotors show different S1 excitation energies, their ability to rotate upon excitation is conserved, making the series an interesting starting point for rotary molecular motors with tunable excitation energies.
Nonadiabatic molecular dynamics (NAMD) simulations of molecular systems require the efficient evaluation of excited-state properties, such as energies, gradients, and nonadiabaticcoupling vectors. Here, we investigate the use of graphics processing units (GPUs) in addition to central processing units (CPUs) to efficiently calculate these properties at the time-dependent density functional theory (TDDFT) level of theory. Our implementation in the FermiONs++ program package uses the J-engine and a preselective screening procedure for the calculation of Coulomb and exchange kernels, respectively. We observe good speed-ups for small and large molecular systems (comparable to those observed in ground-state calculations) and reduced (down to sublinear) scaling behavior with respect to the system size (depending on the spatial locality of the investigated excitation). As a first illustrative application, we present efficient NAMD simulations of a series of newly designed light-driven rotary molecular motors and compare their S1 lifetimes. Although all four rotors show different S1 excitation energies, their ability to rotate upon excitation is conserved, making the series an interesting starting point for rotary molecular motors with tunable excitation energies.
Nonadiabaticprocesses
such as electronic excitations,[1] radiationless
transitions,[2] and electron transfer[3] are of
key importance in chemistry and biology. One of the most prominent
examples in these fields is the rhodopsin protein, whose chromophore
undergoes a photoisomerization when exposed to light.[4] This energy conversion of light to mechanical motion has
inspired chemists to design synthetic light-driven rotary molecular
motors,[5,6] for which Bernard Feringa was awarded the
Nobel Prize in chemistry in 2016 together with Jean Pierre Sauvage
and Sir James Fraser Stoddart.[7] The description
of theses nonadiabatic processes is an ongoing challenge in modern
quantum chemistry (e.g., ref (8)). This is mainly due to the fact that (1) excited states
have to be taken into account and (2) the dynamics of the nuclei need
to be considered.To tackle the first challenge, several quantum-chemical
methods
have been developed. Examples include the complete active space self-consistent
field (CASSCF) method,[9] the algebraic-diagrammaticconstruction (ADC(2)),[10] several coupled
cluster methods (e.g., CC2),[11] and time-dependent
density functional theory (TDDFT).[12,13] The latter
serves (despite its well-known limitations using today’s functionals[13,14]) as a good compromise between effort and accuracy.[15,16] As a consequence of this, not only excited-state energies but also
excited-state gradients[17] and nonadiabaticcoupling vectors (NACVs)[18−29] have been implemented to provide access to molecular properties
at the TDDFT level of theory. Having excited-state energies and properties
at hand, trajectory surface hopping (TSH)[30,31] is a straightforward way to conduct nonadiabatic molecular dynamics
(NAMD) simulations including several electronic states.Despite
many advances in the field of NAMD[32−34] and their broad
field of application,[35−44] it remains difficult or even impossible to investigate large molecular
systems. The reason for this is that TSH requires, because of its
stochastic nature, a large set of independent trajectories including
many time steps, which involve expensive (even at the TDDFT level
of theory) excited-state energy and property calculations. One way
to accelerate NAMD simulations is the use of exciton models.[45] Another approach is the use of graphics processing
units (GPUs) in addition to central processing units (CPUs) for the
calculations. It was shown that this leads to significant speed-ups
for ground-state energy and forces evaluations[46−53] and was also successfully applied to excited-state energies and
properties[45,54−56] as well as
ab initio multiple spawning[57,58] and NAMD[59,60] simulations.In this work, we present efficient NAMD simulations
with TDDFT
energies, gradients, and NACVscalculated on hybrid CPU/GPU architectures
using the FermiONs++ program package.[50−52] We start with a brief
summary of the theory behind TSH as well as TDDFT energies, gradients,
and NACVs in Section and discuss their implementation on GPUs, featuring the hybrid CPU/GPU
engine[52] for the two-electron integrals
as well as the preselective screening procedure for the evaluation
of exchange kernels.[50,51] Computational details are given
in Section . In Section , we discuss the
accuracy and the performance of our GPU-based excited-state routines,
investigating the scaling of our integral evaluations and contractions.
We show timings for selected molecules containing more than 500 atoms.
As a prototypical application, we investigate the photoinduced rotation
of four newly designed rotary molecular motors via NAMD in Section , followed by a
conclusion and an outlook.
Theory
All equations in this section
use the standard notation for orbitals: i, j, k... denote occupied, a, b, c... denote virtual,
and p, q, r...
denote arbitrary molecular orbitals, while μ, ν, λ...
denote basis functions. I, J, ...
denote different electronic states. vxc and fxc are the first- and second-order
exchange-correlation functional derivatives, respectively. cx is the amount of exact exchange. F and S are the Kohn–Sham and the overlap matrices,
respectively. P is the ground-state density, and h is the one-electron core Hamiltonian matrix. The two-electron
integrals are written using Mulliken notation (..|..). Superscript
ξ denotes derivatives with respect to the nuclear coordinates,
and * and † symbolize complex conjugation and adjungation,
respectively. For the sake of simplicity, we use the same symbol for
matrices in the atomic orbital (AO) and molecular orbital (MO) bases
with different indices (e.g., P vs P).
Trajectory Surface Hopping
In nonadiabatic
molecular dynamics (NAMD), the electronic time-dependent wave function
of the system is assumed to be a linear combination of the time-independent
electronic wave functions of the individual electronic states weighted
by the state amplitudes (c⃗)where x represents
nuclear coordinates and {r} is the set of electroniccoordinates. Having the state energies (ω, see Section ) and nonadiabaticcoupling vectors (NACVs,
τξ, see Section ) at hand, it
is possible to propagate the state amplitudes along with the nuclei
during the molecular dynamics (MD) simulation using a unitary propagator:[36]Here, ω is the energy difference between two
states (ω – ω). The applied δt for the propagation of c⃗ is normally 3 orders of magnitude smaller than the time step of
the MD simulation (Δt).[31] In the fewest-switches surface hopping algorithm,[30,31]c⃗ is used to calculate the probability g of the system
switching from its current state I to another state J at every time step:If a random number between
zero and one exceeds g, then the MD simulation is
continued on the potential energy surface of state J and the nuclear velocity is rescaled along τξ. Observables (e.g., lifetimes of states
or relaxation pathways) can be drawn from ensembles of trajectories
using a different set of random numbers. In the following sections,
we will briefly summarize the calculation of the necessary ingredients
of TSH (excited-state energies, gradients, and NACVs) at the time-dependent
density functional theory (TDDFT) level of theory.
Excited-State Energies
Excitation
energies can be calculated from linear response TDDFT by solving the
TDDFT or the random phase approximation (RPA) equation[12,61,62]A ± B are the orbital rotation Hessianswith ϵ being the energy of orbital p. X and Y are the transition densities for excitation and de-excitation,
respectively. Neglecting B is known as the Tamm–Dancoff
approximation (TDA) of TDDFT,[63] simplifying eq toPlease note that eqs and 7 are
equivalent to the time-dependent Hartree–Fock (TDHF) and configuration
interaction singles (CIS) equations, respectively, when cx is set equal to 1 and all exchange-correlation terms
are neglected. TDA is computationally less demanding than TDDFT because
fewer two-electron integrals have to be evaluated. It may also be
more suitable for NAMD simulations because it is more stable (regarding
its convergence) and thus typically delivers better results than TDDFT
in the vicinity of conical intersections involving the ground state.[39,64]TDDFT is nowadays widely used for the calculation of excitation
energies, being a good compromise between accuracy and computational
cost.[15,16] Shortcomings are, however, the inability
to calculate double excitations and the poor description of charge-transfer
excitations.[13,14] To tackle the latter, range-corrected
functionals (e.g., ωB97[65]) have been
introduced.
Excited-State Energy Gradients
To
determine the energy gradient of an excited state I, one has to calculate the change in the excitation energy with respect
to the nuclear coordinates (ωξ). The final equation in
the AO basis has been derived by Furche et al.[17]The calculations of the relaxed
difference density matrix (P), the energy-weighted difference density matrix (W), and the two-particle difference density matrix
(Γ) are shown in the Appendix.
The first two require an iterative solution of a Z-vector equation.[17]R is X + Y or X in the case of TDDFT or TDA, respectively.
Nonadiabatic Coupling Vectors
The
NACV between two states (I and J) is defined asIt thus describes the change
of the overlap of the wave functions of I and J with respect to the nuclear coordinates. The first formulation
of the NACV involving the ground state (τ0 → ξ) derived by Chernyak and Mukamel[18] was corrected for finite basis set effects by Send et al.[23] in 2010:Equations for the density
matrices (P0, W0, and Γ0) are again given in the Appendix. They can be calculated directly
from the TDDFT transition densities so that no Z-vector
equation needs to be solved. S[ denotes an antisymmetric overlap derivativewhereas γ0 is defined aswith L being X – Y or X in the case of TDDFT or TDA, respectively.The calculation of NACVs between two excited states has been tackled
by many publications showing expressions based on linear and quadratic
response at different levels of theory.[25,26,29,66−71] The main conclusion is that the use of quadratic response theory
leads to unphysical poles when the energy difference between excited
states matches the excitation energy of another state.[66−70] The use of linear response theory or the so-called pseudowave function
approach[25−27,29] is therefore recommended.
The resulting equation in the AO basis is as follows:Equations for the density
matrices (P, W, and Γ) can again be found in the Appendix. γ is defined asIt was shown by Fatehi et
al.[25] that the antisymmetric overlap derivatives
(eq ) introduce translational
variance into the NACVcalculations. Neglecting these terms in eqs and 13 is equivalent to adding electron-translational factors (ETFs).
However, trajectory surface hopping (TSH) simulations using NACVs
with and without ETFs lead to nearly identical results for larger
molecular systems.[72]
Graphics Processing Units
Graphics
processing units (GPUs) significantly accelerate quantum-chemical
calculations because they allow for an efficient evaluation and contraction
of two-electron integrals (eq ) and their derivatives with respect to the nuclear coordinates
(eq ).[46−52]J and K denote the Coulomb and exchange parts, respectively, and M and N denote general density matrices. For
the Coulomb part, large speed-ups can be observed when the J-engine[73,74] is applied to the rearranged shell-pair data.[47] To see comparable speed-ups for the exchange part, we apply
an additional preselective screening (preLinK):[50,51]It determines the significant
shell pairs (those with an expected value above the given thresholds
ϑpre and ϑpre∇) before their distribution to and calculation
on the GPUs. The two-electron integral evaluation can be done even
more efficiently when the workload is spread among both GPUs and CPUs.[52]In ground-state calculations, M and N are solely the ground-state density (P), whereas for excited states M and N can
additionally be P, R, or L. However, the discussed procedure (shell pair rearrangement,
J-engine, and preLinK) is still valid and can easily be adapted to
excited-state routines when keeping in mind that M is
not always equal to N and may be nonsymmetric.
Computational Details
General Remarks
The FermiONs++ program
package[50−52] was used for all calculations presented in this work.
It was compiled using the Intel compiler (2019),[75] and the Intel Math Kernel Library (MKL). Routines on AMD
GPUs were compiled with the AMD APPSDK compiler. In addition, LibXC
library v4.0.1[76,77] was used. In all calculations,
we used the gm5 grid[78] (with the modified
Becke weighting scheme described in ref (78)) and tight thresholds for the SCFconvergence
(ϑSCF = 10–7 using the FP-commutator),
the integrals (ϑINT = 10–10), and
the Z-vector equation convergence (ϑZ = 10–5). Throughout all calculations, we neglected
the symmetry of the molecules and solely calculated singlet excitations.
Excited-state energies, gradients, and nonadiabaticcoupling vectors
(with electron transition factors) at the TDDFT level of theory were
calculated using eqs , 8, 10, and 13, respectively.
Preparation and Calculation of Systems I,
II, IIIn, IVn, V, VI, and VII
To illustrate
the performance and/or accuracy of our implementation on GPUs, we
use protonated formaldimine (I), the Schiff base of retinal
(II), a series of linear polyethynes (III) and dialkylethenes (IV), a motorized nanocar (without “wheels”, V),[79] and one (VI)
and three (VII) pores of a covalent organic framework[80] as example molecules. The structures of I, II, and V have been optimized
at the PBE0[81−84]/def2-SVP[85,86] level of theory, while III and IV have not been optimized to maintain their linear
structures. VI and VII have been prepared
according to ref (80). All structures are available at https://www.cup.uni-muenchen.de/pc/ochsenfeld/download/.TDDFT energies and properties of I, II, III, IV, V, VI, and VII were calculated at the PBE0/def2-SVP or PBE/def2-TZVP
level of theory. We used tight thresholds for preLinK (ϑpre = 10–4), the preLinK gradient (ϑpre∇ = 10–10), and the TDDFT convergence (ϑTDDFT = 10–6). For I, II, III, IV, V, VI, and VII, four, six, five, three, six, seven, and seven states were taken
into account, respectively.When investigating the accuracy,
we use a calculation performed
on CPUs with tight thresholds (ϑTDDFT = 10–7) as a reference. To allow for a fair comparison, we employed the
continuous fast multipole method (CFMM)[87,88] and the LinK
scheme[89,90] for Coulomb and exchange kernels (and their
derivatives) on CPUs when comparing CPU and GPU performance. Coulomb
and exchange kernels on GPUs were calculated with the J-engine[73,74] and the preLinK scheme,[50,51] respectively. Timings
of integral evaluations and entire routines were determined as an
average over five independent calculations on two Intel Xeon CPU E5
2640 v4 @ 2.20 GHz (20 threads) CPUs and four AMD FirePro 3D W8100
GPUs. The scaling behavior is determined as the slope of the corresponding
log–log plots (Supporting Information) using the timings of III and IV with n = 40, 50, 75, 100. The parallel efficiency is determined as the
ratio between the measured and ideal speed-ups.
Preparation and Calculation of the Rotary
Molecular Motors
The structures of the four rotary molecular
machines (C, N, S, and O) have been optimized at the ωB97[65]/def2-SVP level of theory. Excited-state properties and
timings were calculated at the TDA (ωB97/def2-SVP) level of
theory using accurate thresholds (ϑpre = 10–3, ϑpre∇ = 10–10, and ϑTDDFT = 10–5). NAMD simulations were conducted at the same level
of theory. The propagation of the nuclei was calculated using the
Velocity Verlet algorithm.[91,92] The extended Lagrangian
method[93] for the extrapolation of ground-state
density was used to accelerate SCFconvergence. Transition densities
and relaxed difference densities of the previous step were used as
guesses for the TDA and Z-vector equation, respectively.Twenty initial geometries and velocities (available
at httpn>s://www.cup.uni-muenchen.de/pc/ochsenfeld/download/) were
drawn from a 5 ps ground-state NVT simulation (200 fs equilibration,
0.2 fs step size, velocity rescaling thermostat[94]) at the same level of theory. From each initial condition,
five independent (different series of random numbers) NAMD simulations
were conducted for 1 ps (0.2 fs step size) without equilibration,
thermostat, or decoherence correction, starting at the first excited
singlet state (S1). At every step of the simulation, the
overall rotation and translation of the molecule was removed. Three
excited states were taken into account, whereas only the coupling
vector from the ground state to the first excited state was calculated
and used for the propagation of the state amplitudes (eq ) and the calculation of the hopping
probability (eq ).
Performance
Accuracy and Thresholds
We start with an analysis of the errors of excited-state
energies and propn>erties introduced by their calculation on GPUs and
the use of the preLinK scheme[51,52] for integrals and integral
derivatives. Therefore, we compare calculations of the smallest Schiff
base (protonated formaldimine, CH2NH2+, I) and the Schiff
base of retinal (C20H30N+, II) with GPUs and preLinK to calculations on CPUs without
prescreening or the continuous fast multipole method (CFMM). (For
computational details, see Section .) The mean absolute errors for different thresholds
are listed in Table .
Table 1
Mean Absolute Errors (MAE, in Atomic
Units) of Excitation Energies (ω), Gradients (ωξ), and Nonadiabatic
Coupling Vectors (τ) of Protonated Formaldimine (I) and the Schiff
Base of Retinal (II) Calculated at the PBE0/def2-SVP
Level of Theory on GPUs, Employing the preLinK Scheme and Using Different
Thresholds for preLinK (ϑpre), the preLinK Gradient
(ϑpre∇), and the TDDFT Convergence (ϑTDDFT)a
screening thresholds and convergence criteria
ϑpre
10–3
10–4
10–4
10–5
10–5
10–5
10–5
10–5
ϑpre∇
10–11
10–11
10–11
10–11
10–11
10–9
10–10
10–11
ϑTDDFT
10–5
10–5
10–6
10–6
10–7
10–7
10–7
10–7
A calculation on CPUs with ϑTDDFT = 10–7 and without CFMM or preLinK
is used as a reference. Throughout this work, we will use accurate
(ϑpre = 10–3, ϑpre∇ = 10–10, and ϑTDDFT = 10–5) or tight thresholds (ϑpre = 10–4, ϑpre∇ = 10–10, and ϑTDDFT = 10–6).
A calculation on CPUs with ϑTDDFT = 10–7 and without CFMM or preLinK
is used as a reference. Throughout this work, we will use accurate
(ϑpre = 10–3, ϑpre∇ = 10–10, and ϑTDDFT = 10–5) or tight thresholds (ϑpre = 10–4, ϑpre∇ = 10–10, and ϑTDDFT = 10–6).The choice of the preLinK threshold (ϑpre) for
excited-state calculations should always depend on the applied convergence
threshold for the TDDFT equation (ϑTDDFT). If the
chosen threshold is too loose, then the iterative solution of the
TDDFT equation does not converge (especially for large molecular systems),
while a ϑpre that is too tight does not improve the
result when ϑTDDFT is not adjusted accordingly (Table ). The latter is also
attributed to the small effect of preLinK on these relatively small
systems.[50,51] In our calculations with the FermiONs++
program package, the “ideal” ϑpre is
2 orders of magnitude larger than ϑTDDFT. A tightening
of these two parameters (left side of Table ) systematically leads to smaller errors.
However, the errors in the coupling vectors of the large system do
not fall below 10–4 a.u., marking the numerical
limit of these second- to third-order properties. The use of the tight
thresholds (ϑTDDFT = 10–6, ϑpre = 10–4) should thus, in general, be sufficient
because all errors are below 10–3 a.u. A looser
preLinK threshold (ϑpre = 10–3 a.u.),
which is expected to give μH accuracy for ground-state properties,
is still accurate for excited-state properties and could be used in
extensive application calculations such as NAMD simulations, where
observables are determined as ensemble averages.The effect
of the preLinK gradient threshold (ϑpre∇) on the
error is not straightforward (right side of Table ). However, we recommend it to be at least
as tight as the integral threshold (ϑINT = 10–10). Because the difference between ϑpre∇ = 10–11 and ϑpre∇ = 10–10 is negligibly
small, we apply the latter throughout this work.
Scaling with the System Size
To compn>are
timings on CPUs and GPUs and to investigate the effective scaling
behavior of the excited-state integral routines on GPUs, we use integral
timings of linear polyethynes (III) and dialkylethenes (IV) (for structures, see Figure ). In Figure , we compare CPU and GPU timings of Coulomb (Figure a,c) and exchange
(Figure b,d) calculations
at the PBE0/def2-SVP level of theory. We show contractions of the
ground-state density (J/K(P)), the transition density of the first excited state (J/K(X1 + Y1)) (Figure a,b), the integral derivatives involving the ground
state and the relaxed difference density of the first excited state
(J/K(P, P1)), and the transition density of the first excited state
(J/K(X1 + Y1, X1 + Y1)) (Figure c,d). The same integrals are shown in Figure , where we compare the timings of III and IV on GPUs. The effective scaling behaviors of the integral
evaluations are shown in the Supporting Information. Definitions of the integral contractions are given in eqs and 16. Details of the calculations are given in Sections and 3.2.
Figure 1
Structures of the linear polyethynes (III) and dialkylethenes (IV).
Figure 2
Timings of (a, c) Coulomb and (b, d) exchange integral
evaluations
(a, b; eq ) and their
derivatives with respect to the nuclear coordinates (c, d; eq ) of polyethyne (III) with n =
1–100 calculated at the PBE0/def2-SVP level of theory on CPUs
(dashed lines) and GPUs (solid lines). For details on the calculations
and the computational setup, see Sections and 3.2.
Figure 3
Timings of (a, c) Coulomb and (b, d) exchange integral
evaluations
(a, b; eq ) and their
derivatives with respect to the nuclear coordinates (c, d; eq ) of polyethine (III, solid line) and dialkylethene (IV, dashed line) with n = 1–100
calculated at the PBE0/def2-SVP level of theory on GPUs. For details
on the calculations and the computational setup, see Sections and 3.2.
Structures of the linear polyethynes (III) and dialkylethenes (IV).Timings of (a, c) Coulomb and (b, d) exchange integral
evaluations
(a, b; eq ) and their
derivatives with respect to the nuclear coordinates (c, d; eq ) of polyethyne (III) with n =
1–100 calculated at the PBE0/def2-SVP level of theory on CPUs
(dashed lines) and GPUs (solid lines). For details on the calculations
and the computational setup, see Sections and 3.2.Timings of (a, c) Coulomb and (b, d) exchange integral
evaluations
(a, b; eq ) and their
derivatives with respect to the nuclear coordinates (c, d; eq ) of polyethine (III, solid line) and dialkylethene (IV, dashed line) with n = 1–100
calculated at the PBE0/def2-SVP level of theory on GPUs. For details
on the calculations and the computational setup, see Sections and 3.2.For the Coulomb and exchange kernels of the transition
densities,
we observe similar speed-ups (up to a factor of 5) as for the ground-state
kernels (Figure a,b).
The accelerations of the gradient kernels stem (nearly exclusively)
from their Coulomb part, which is extremely efficient on GPUs (Figures c). We also see
minor speed-ups for K(X1 + Y1, X1 + Y1) (Figures d). The evaluation of K(P, P1) is significantly slower because the GPU routine
has to be called twice. This is due to the fact that the analytical
exchange evaluation on GPUs[48−51] does not exploit the full symmetry of the two-electron
integral. Please note that this problem can be solved by applying
a seminumerical exchange scheme, which is currently developed for
GPUs in our group.[95] For the largest molecule
(III100), a speed-up of two for the entire
determination of an excited-state gradient involving calculations
of the ground-state and excited-state energy and gradient is observed.
Here, it should be stressed that the linear algebra and the evaluation
of the exchange-correlation kernels are performed entirely on CPUs.When compn>aring the effective scaling behavior of the Coulomb integrals
of III (Supporting Information), one can see that the scaling of the
GPU integrals is slightly larger than the scaling of the CPU integrals.
This is due to the formal quadratic scaling of the J-engine employed
on GPUs, in comparison to the (asymptotically linear scaling) CFMM
method. This larger scaling behavior is, however, irrelevant as the
prefactor of the routines is greatly reduced (Figure a,c). The scaling of the exchange integrals
is slightly reduced by exploiting the preLinK method on GPUs. We observe
∼1.5 and ∼1.0 scalings for the integrals and the integral
derivatives of these system sizes, respectively.The expan class="Chemical">citation
in III is delocalized over
the entire molecule, leading to similar scaling
behavior for the ground-state and excited-state properties. To show
the performance of a system with local excitation, we also investigate IV, where only one double bond
instead of a conjugated system is excited. This leads to massive speed-ups
(Figure ) of the excited-state
exchange integrals and reduces their scaling significantly (Supporting Information). For the exchange integral
derivatives, we even observe sublinear scaling behavior as a result
of the preLinK screening. The effect on the Coulomb integrals is smaller,
mainly because of the fact that these routines take only a few seconds
even for the largest investigated molecules.
Example Calculations
To demonstrate
the applicability of our excited-state properties routines in the FermiONs++ program package,[50−52] we study four
molecules of interest in modern excited-state research and show their
timings at the PBE0/def2-SVP and PBE/def2-TZVP levels of theory on
GPUs (Table ). We
investigate the Schiff base of retinal (II), a model
system of the chromophore in rhodopsin,[4] a nanocar[79] (V) using a
rotary molecular motor, and one (VI) and three (VII) pores of a covalent organic framework,[80] which catalyze the formation of hydrogen from water when
exposed to light. Details of the calculations and the computational
setup are again listed in Sections and 3.2.
Table 2
Molecular Structures and Computational
Times (t(s)) of the Schiff Base of Retinal (II), a Motorized Nanocar (V), and One (VI) and Three (VII) Pores of a Covalent Organic
Framework, Calculating Ground-State Energies (E0) and Gradients (E0ξ), Excited-State Energies (ω) and Gradients (ωξ), and Nonadiabatic Coupling Vectors (τ) at the PBE0/def2-SVP and PBE/def2-TZVP
Levels of Theory on GPUsa
II
V
VI
VII
formula
C20H30N+
C51H28S2
C144H102N30
C312H213N69
Natoms
51
81
276
594
PBE0/def2-SVP
t(E0)/step
1.3
4.4
17.8
67.7
t(ω1)/(step and state)
1.6
6.3
34.4
179.5
t(E0ξ)
12.36
54.5
140.9
334.3
t(ω1ξ)
65.4
247.9
668.4
1392.6
t(τ0→1)
24.3
123.4
375.2
914.8
t(τ1→2)
82.7
349.4
959.0
1858.0
PBE/def2-TZVP
t(E0)/step
1.8
1.5
51.5
272.6
t(ω1)/(step and state)
2.1
1.7
30.2
95.3
t(E0ξ)
6.0
5.2
65.3
241.9
t(ω1ξ)
52.5
47.9
576.7
1942.9
t(τ0→1)
12.3
10.1
124.4
377.0
t(τ1→2)
56.1
52.7
523.2
1060.3
Please note that t(E0) and t(ω1) are given per step (and state). For details on the calculations
and the computational setup, see Sections and 3.2.
Please note that t(E0) and t(ω1) are given per step (and state). For details on the calculations
and the computational setup, see Sections and 3.2.Table shows that
with the presented implementation on GPUs, excited-state propn>erties
and dynamics become accessible even for large systems and when applying
DFT methods with exact exchange (e.g., PBE0) or triple-ζ basis
sets. In our examples, the calculations of excited-state properties
at the PBE/def2-TZVP level of theory are even faster than at the PBE0/def2-SVP
level of theory mainly due to the fast Coulomb contractions discussed
above. The only exception is t(ω1ξ) of VII, for which the Z-vector equation converges slowly at the
PBE/def2-TZVP level of theory (seven instead of four iterations).When compn>aring CPU and GPU timings of the entire calculations,
we observe speed-ups of three and eight for VI and VII, respectively. This clearly shows that the use of GPUs
becomes more attractive with increasing system size, where the integral
evaluations dominate the overall computational time. However, even
for relatively small molecule II we already obtain a
speed-up of two. These speed-ups exclusively stem from the Coulomb
and exchange calculations as the linear algebra and the evaluation
of the exchange-correlation kernels are performed entirely on CPUs.
The entire calculations of II and VII at
the PBE0/def2-SVP level of theory take ∼5 min and ∼5
h, respectively, enabling NAMD simulations of II and
the calculation of excited-state properties of VII on
a reasonable time scale and yet good accuracy.
Scaling with the Computational Resources
In Figure , we
show the parallel efficiency of J/K(X + Y) (Figure a) and J/K(P, P) (Figure b) for II, VI, and rotary molecular machine C (C10H13FN), which will be investigated in the next
section using up to four GPUs. The Coulomb integrals of II and C are not analyzed because their computational
time is too short.
Figure 4
Parallel efficiency of selected GPU integral routines
for the calculations
of C, II, and VI. The computational
time of the integral evaluation using four GPUs is given in parentheses.
For details on the calculations and the computational setup, see Section .
Parallel efficiency of selected GPU integral routines
for the calculations
of C, II, and VI. The computational
time of the integral evaluation using four GPUs is given in parentheses.
For details on the calculations and the computational setup, see Section .Figure again shows the suitability of GPUs for
large molecular systems.
For the time-consuming integral evaluations of VI, we
observe a nearly perfect scaling of >0.9. This also indicates that
adding even more GPUs will still lead to decent speed-ups of the calculations.
The parallel efficiency observed in the case of the smaller molecules
is lower but still remarkable considering that some computational
times are <1 s. This strong scaling
makes the use of GPUs also attractive for small to medium-sized molecules,
as shown in the rotary molecular machine example presented in this
work.
Illustrative Examples: Rotary Molecular Machines
As a prototypical exampn>le, we investigate the propn>erties and the
dynamics of four newly designed rotary molecular machines. For structures
and definitions, see Figure . Similar to previous ab initio studies,[41−44] our machines contain a C=N+ motif, which is also present in the chromophore of rhodopsin.
Upon excitation, the molecule should rotate around the central C–C
double bond. The fluoride substituent should accelerate this rotation
because of the steric repulsion, whereas the puckering of the six-membered
ring should influence the direction of the rotation.[43,44] Here, we want to investigate the influence of the atom or group
X adjacent to the central double bond on the light-driven rotation
of the molecule. We use CH2, NH, S, and O as X denoted
as C, N, S, and O, respectively.
Figure 5
(a) Structures of rotary molecular machines C, N, S, and O and (b) definitions
of the X–F distance (d(X–F)), the dihedral
angle (γ), and the direction of rotation: clockwise (CW) and
counterclockwise (CCW).
(a) Structures of rotary molecular machines C, N, S, and O and (b) definitions
of the X–F distance (d(X–F)), the dihedral
angle (γ), and the direction of rotation: clockwise (CW) and
counterclockwise (CCW).When calculating the excited-state properties and
dynamics of the
rotary molecular machines, we switch to the Tamm–Dancoff approximation
(TDA).[63] As discussed in Section , this accelerates the calculation
and leads to more stable trajectories close to conical intersections.[39,64] A comparison of the excited-state energies and properties at the
TDA and RPA levels of theory is presented in the Supporting Information, where we show mean differences as
well as plots of the difference density, the first excited-state gradient,
and the NACV between the ground state and the first excited state.
We observe an average difference below 10–2 a.u.,
with the state ordering and the shape of relaxed difference densities
not being affected. The comparison of TDA and CASSCF energies (Supporting Information) also proves the suitability
of TDA for the investigated problem. Trends between the systems and the energies close to the
conical intersection agree remarkably well. Moreover, we apply the
looser accurate thresholds (ϑpre = 10–3, ϑpre∇ = 10–10, and ϑTDDFT = 10–5) because they introduce only a maximum average error
of a few 10–5 a.u. (Supporting Information). Additional details on the calculations are listed
in Section .In Figure , we
show the excitation energies of C, N, S, and O. Changing the substituent has an influence
on the bright S1 state of the rotary molecular motor. While
the difference density plots look similar for all molecules, the excitation
energies increase (S < C < N < O) and the directions of the gradients and NACVschange slightly. This is due to the different electronegativities
of the substituents (C ≈ S < N < O) and the fact that
S, N, and O atoms have free electron pairs.
Figure 6
Singlet excitation energies
of the four rotary molecular machines
(C, N, S, and O) calculated at the TDA (wB97/def2-SVP) level of theory. For details
on the calculations, see Section .
Singlet excitation energies
of the four rotary molecular machines
(C, N, S, and O) calculated at the TDA (wB97/def2-SVP) level of theory. For details
on the calculations, see Section .To study their effect on the light-induced rotation,
we conducted
NAMD simulations (105 trajectories for each rotor). In Table , we list the percentage of
the trajectories that showed a rotation (η) as well as the ratio
of clockwise and counterclockwise rotations (r).
The direction of the rotation has been determined using the dihedral
γ. (See the Supporting Information for plots.) The time-dependent occupation of the S1 state
(calculated as an average over all trajectories) and d(X–F) for the trajectory with the fastest rotation of C, N, S, and O, respectively,
are shown in Figure .
Table 3
Efficiency (η, Determined from
the Percentage of Rotating Molecules in the NAMD Simulations) of the
Four Rotary Molecular Machines (C, N, S, and O) and Ratio [r = n(CW)/n(CCW)] of CW and CCW Rotations
C
N
S
O
η
70%
61%
54%
56%
r
0.68
0.78
1.19
1.68
Figure 7
(a) Time-dependent decay of the occupancy of the S1 state
determined as a mean of all nonadiabatic molecular dynamics simulations.
(b) Change in d(X–F) during one selected trajectory
of C, N, S, and O, respectively, showing the rotation of the molecule around the central
C–C bond. The maxima (180° rotation) are marked by the
vertical dashed lines.
(a) Time-dependent decay of the occupancy of the S1 state
determined as a mean of all nonadiabatic molecular dynamics simulations.
(b) Change in d(X–F) during one selected trajectory
of C, N, S, and O, respectively, showing the rotation of the molecule around the central
C–C bond. The maxima (180° rotation) are marked by the
vertical dashed lines.For all system, we observed (1) the expected rotation
around the
central C–C bond, which can be detected using either γ
or d(X–F) and (2) the relaxation from S1 to the ground state (S0). Movies of the rotations
are available at https://www.cup.uni-muenchen.de/pc/ochsenfeld/download/. The decay of the S1 population (Figure a) seems to correlate
with the rotational speed of the corresponding rotary molecular machine
(Figure b). C shows the fastest rotation and the fastest decay, followed
by S and O, which behave nearly identically.
The rotation of N is the slowest, so is the decay of
the S1 occupancy. The trend in the rotational speeds (N < S < O < C) can easily be explained. O rotates slower than C because the nuclear repulsion between X and F is smaller.
In the case of S, the repulsion of the S atom should
be similar to that of the CH2 moiety, but the time for
the rotation increases because of its larger mass. The rotation of N is even slower because of the hydrogen bond between the
NH moiety and the F atom.However, this trend cannot be observed
when looking at the rotational
efficiency (η, see Table ). Although C shows
the largest number of trajectories featuring a rotation (70%), the
η of S and O (∼55%) is significantly
smaller than the η of N (61%). The reason for this
might be the slightly different coupling vectors between the ground
state and the first excited state (Supporting Information). In contrast to refs (43) and (44), we do not observe an influence of the puckering of the
ring on the rotation. The ratios of CW and CCW rotations (r) are close to 1 for all four rotors, and no general trend
is visible in the γ plots (Supporting Information).Changing X thus has an interesting effect
on the rotary molecular
machines. While the excitation energy changes significantly, the ability
of the molecule to rotate around the central C–C bond is (nearly)
preserved. These systems may thus be a good starting point for the
design of a series of molecular rotors with tunable excitation energy.
Our GPU-based routines aide these investigations by accelerating NAMD
simulations even for these small systems. Because of the strong scaling
(shown in Figure ),
we observe speed-ups in Coulomb and exchange integral evaluations
(even when their computing time is shorter than 1 s), leading to a
total speed-up of two with respect to a calculation entirely on CPUs.
One trajectory took ∼1 day on two Intel Xeon CPU E5 2640 v4
@ 2.20 GHz (20 threads) CPUs and four AMD FirePro 3D W8100 GPUs.
Conclusions
Throughout this work, we
have examined the use of graphics processing
units (GPUs) for the evaluation of two-electron integrals in excited-state
energies and property (gradients and nonadiabaticcoupling vectors)
calculations at the time-dependent density functional theory (TDDFT)
level of theory. Similar to ground-state calculations, we observe
that the use of GPUs along with the J-engine[73,74] and the preLinK scheme[50,51] leads to decent speed-ups
of the integral calculations while additionally showing a reduced
scaling behavior depending on the locality of the examined excitation.
These speed-ups may even become larger with our currently developed
seminumerical exchange scheme.[95] By using
GPUs, nonadiabatic molecular dynamics become more efficient for large but also small
molecules (as the a result of the strong scaling) without a loss of
accuracy or the introduction of further assumptions. As a first example,
we investigated a series of newly designed rotary molecular machines
showing that one can tune the excitation energy of these systems without
losing their ability to rotate by changing the hetereocycle. For future
applications, we consider extending the current implementation toward
decoherence corrections[96] and triplet states.[97]
Authors: Vijay S Vyas; Frederik Haase; Linus Stegbauer; Gökcen Savasci; Filip Podjaski; Christian Ochsenfeld; Bettina V Lotsch Journal: Nat Commun Date: 2015-09-30 Impact factor: 14.919