Hsin-An Chen1, Chun-Wei Pao1. 1. Research Center for Applied Sciences, Academia Sinica, Taipei 11529, Taiwan.
Abstract
Hybrid organic-inorganic perovskite materials are promising materials for photovoltaic and optoelectronic applications. Nevertheless, the construction of a computationally efficient potential model for atomistic simulations of perovskite with high fidelity to ab initio calculations is not a trivial task given the chemically complex nature of perovskite in terms of its chemical components and interatomic interactions. In the present study, we demonstrate that artificial neural network (ANN) models can be employed for efficient and accurate potential energy evaluation of MAPbI3 perovskite materials. The ANN models were trained using training sets composed of thousands of atomic images of tetragonal MAPbI3 crystals, with their respective energies and atomic forces obtained from ab initio calculations. The trained ANN models were validated by predicting the lattice parameters and energies/atomic forces of cubic MAPbI3 perovskite and had excellent agreement with ab initio calculations. The phonon modes could also be extracted using the trained ANN model with good agreement with ab initio calculations, provided that the atomic forces were incorporated into the training processes. Finally, we demonstrate that for a given system size, the trained ANN model offers 104 to 105 faster time consumption per energy evaluation relative to ab initio calculations using Vienna Ab initio Simulation Package, demonstrating the potential of the ANN model for exhaustively sampling the configuration spaces of chemically complex materials for predictions of thermodynamic properties and phase stabilities.
Hybrid organic-inorganicperovskite materials are promising materials for photovoltaic and optoelectronic applications. Nevertheless, the construction of a computationally efficient potential model for atomisticsimulations of perovskite with high fidelity to ab initio calculations is not a trivial task given the chemically complex nature of perovskitein terms of its chemicalcomponents and interatomicinteractions. In the present study, we demonstrate that artificial neural network (ANN) models can be employed for efficient and accurate potential energy evaluation of MAPbI3 perovskite materials. The ANN models were trained using training sets composed of thousands of atomicimages of tetragonalMAPbI3crystals, with their respective energies and atomic forces obtained from ab initio calculations. The trained ANN models were validated by predicting the lattice parameters and energies/atomic forces of cubicMAPbI3 perovskite and had excellent agreement with ab initio calculations. The phonon modes could also be extracted using the trained ANN model with good agreement with ab initio calculations, provided that the atomic forces were incorporated into the training processes. Finally, we demonstrate that for a given system size, the trained ANN model offers 104 to 105 faster time consumption per energy evaluation relative to ab initio calculations using Vienna Ab initio Simulation Package, demonstrating the potential of the ANN model for exhaustively sampling the configuration spaces of chemically complex materials for predictions of thermodynamic properties and phase stabilities.
Hybrid
organic–inorganicperovskites have recently stimulated
significant research effort worldwide because of their extraordinary
optical and optoelectronic properties, including their high optical
absorption coefficient in the visible light region,[1,2] tunable
band gap energy,[3,4] low exciton binding energy, and
long, balanced carrier diffusion lengths.[5,6] Perovskite
materials have been widely employed as the active materialin solar
cells,[7−9] light-emitting
diodes,[10] lasers,[11] photodetectors,[12] or photocatalysts.[13] Solar cells made from perovskites have reached
remarkably high power conversion efficiencies of more than 20%,[14,15] making them promising photovoltaic materials for replacing silicon
and reducing both device fabrication costs and weight. Among all perovskites,
methylammonium lead iodide (CH3NH3PbI3 or MAPbI3) is one of the most-widely studied materials
and has a band gap energy of ∼1.6 eV[16,17] and
absorption spectrum closely matched to the solar spectrum.[18,19] In devices fabricated from perovskite materials, such as solar cells,
the performance of the device critically relies on the morphology
and grainsize of perovskite materials. The morphologies of perovskite
films are sensitive to their fabrication protocols;[7,14,20−22] hence, elucidation of the processing–structure–property
(PSP) relationships for perovskite films is the key to controlling
the morphology of the film for commercial deployment of perovskite-based
devices. Atomistic scale simulation is a powerful tool for elucidating
the PSP relationship of perovskite materials. Ab initio calculations
are considered to be the most accurate of atomisticsimulation methods;
however, the system size of interest for perovskitesis of the order
of at least several tens of nanometers, which is well beyond the reach
of ab initio calculations.On the other hand, classical molecular
dynamics (MD) simulation can handle systems of tens/hundreds of thousands
atoms, which is ideal for studying the structural and morphological
properties of perovskite materials. However, the reliability of classical
MD simulation depends on the accuracy of the classicalinteratomic
force fields, and achieving such accuracy becomes challenging with
the increasing complexity of the chemical space. In the MAPbI3 perovskite, five elements are present in the system (Pb,
I, C, N, and H atoms) with a mixture of covalent, van der Waals, ionic,
and hydrogen bonding interactions. Several earlier works have parametrized
classical force fields for MAPbI3crystals by decomposing
interatomicinteractions into analytical functions describing aforementioned
interactions.[23−26] The accuracy of force fields therefore critically
relies on the interaction functions selected, and contributions from
effects such as atomic polarization or hydrogen bonding were not taken
into accounts in these classical force fields. As a result, despite
that these force fields in general show reasonable agreements with
experiments or first-principle calculations incertainperovskite
properties such as lattice parameters, there still is little room
of improvement after benchmark testing with first-principle calculations.[25] Furthermore, these force fields cannot be utilized
to study complex structural transitions such as two-step processes
of perovskite[21,22] or perovskite materials with
complex compositions such as mixed ion perovskitealloying FA, MA,
and Cs anions and Br or Cl cations.[27−29] Hence, the development of an atomistic force field
that can describe the interactions of a chemically complex system
with high fidelity to first-principle calculations is mandatory for
understanding the structure and morphology of perovskite materials.By harnessing the power of machine learning, interatomic force
fields based on artificial neural networks (ANN) have drawn increasing
attention.[30−32] It has been
demonstrated that a well-trained ANN model can successfully replicate
the system energies of given structures obtained from first-principles
calculations. Incontrast to aforementioned classical force fields,
the ANN potential models backed by the universal approximation theorem
allows encapsulation of all interatomicinteractions into one neural
network model, offering force fields with high fidelity to the first-principle
calculations. To train an ANN model of given material, a training
set composed of thousands of images of a given material with atomisticconfigurations labeled with respective energies from first-principles
calculations is required. Datasets such as the Materials Project[33] compile a massive amount of data of materials
from first-principles calculations and have been successfully utilized
to train models to predict material properties of materials with more
than three constituent elements;[34,35] however, large
amount of data of atomisticconfigurations labeled with first-principles
energies for one given materialis not yet available for these datasets.
Therefore, to date, the chemicalcomponents of ANN potential models
have been limited to materials with equal or less than three elements,
such as simple binary alloys, inorganicceramic materials, or Al–Mg–Si
ternary alloys.[36−38] In
the present manuscript, we demonstrate that the ANN model can be utilized
to accurately predict the energies and atomic forces of MAPbI3 perovskite materials composed of five elements mixing covalent,
ionic, and van der Waalsinteractions. Furthermore, we demonstrate
that by incorporating atomic forces into the training processes, we
can obtain a phonon density of states (PDOS) of MAPbI3 and
the phonon modes with good agreement to those obtained from density
functional theory (DFT) calculations. Hence, the present study demonstrates
that the ANN-based energy/force evaluator is efficient and reliable
and can be readily implemented in systems with complex chemicalcompositions,
such as mixed-ion perovskite materials.
Materials and Methods
Neural Network
Function and Descriptor
The ANNis a computational model
inspired by the biological neural network.[39] In the present work, we employed the energy partitioning scheme
proposed by Behler and Parrinello to construct an ANN model for efficiently
computing system energies.[30] One of the
advantages of this energy partition scheme is the transferability
of the ANN model, namely, the trained ANN model can, in principle,
be utilized for atomisticsimulations of systems with identicalchemicalcompositions and an arbitrary number of atoms. Figure a depicts the scheme of the ANN model employed
in the present study. The objective of the ANN model is to predict
the system energy from given atomiccoordinates with high fidelity
to the corresponding ab initio calculations. As depicted in the upper
panel of Figure a,
in the first step, the atomiccoordinates of all of the atoms A(x) are transformed into a set of rotationally/translationally
invariant descriptor (or fingerprint) functions as the input feature
vector for the atomic neural network function. The output of the atomic
neural network of atom iis its atomic energy E, and the summation of the
atomic energies is the overall system energy E, namely, E = ∑E. The architecture of each atomic neural
network is schematically displayed in the lower panel of Figure a. The atomic neural
network is composed of the input layer, the hidden layers, and the
output layer. The input layer is composed of a series of atomic descriptor
functions transformed from the Cartesian coordinates of the individual
atoms and their neighbors that depict their atomisticchemical environments.
The output layer will output the atomic energy of each atom. Each
of the layers in the neural network is composed of a finite number
of nodes (neurons), and all of the nodes are connected (see the arrows
in the lower panel of Figure a) via a set of weighting parameters w, so
that the vector of all of the nodes in the ith layer
can be expressed aswhere W is the weighting matrix connecting the nodes in layer i and layer i – 1. The atomic energy
of the ith atom obtained for an atomic neural network
(the lower panel of Figure a) with M hidden layers can be written aswhere N(,{}) is called the neural network function and Iis the feature vector of the ith atom. We note that each chemical species in the system
(e.g., Pb, I, C, N, and H in the present study) should have their
own atomic neural network function N (namely, NPb, NI, NC, NN, and NH in the present study). In the present study,
we selected the hyperbolic tangent function tanh(o) as the activation function fa. Note
that the rectifier linear unit (ReLU) activation function has been
extensively employed as the activation function of deep neural network
because of effective suppression of gradient vanishing upon training;
however, in the present study, the hyperbolic tangent function was
employed to prevent discontinuous derivatives of neural network function.
In performing atomisticsimulations, the derivatives of the ANN potential
model with respect to atomiccoordinates are required to be evaluated
as atomic forces exerted to each atom. The ReLU function may induce
discontinuity in atomic forces, which is unfavorable for atomisticsimulations. Continuous ReLU alternatives such as the softplus function
can avoid the discontinuous force problems; however, in the Behler
and Parrinello energy partitioning scheme, the output of atomic neural
network function N(,{}) are atomic energies, which are usually negative in magnitude;
as a result, it would be impossible to implement the ReLU activation
functions directly. To implement ReLU activation functions in the
energy partitioning scheme employed in the present study, substantial
modifications in the overall neural network architectures would be
required, such as including a separate sum pooling layer or linear
regression employed in SchNet and HIP-NN while utilizing ReLU activation
functions in the atom-wise or interaction layers.[40,41] This
could potentially further promote the training of ANN models for complex
systems and is a promising future research direction. The weighting
matrix sets {} connecting the hidden
layers in the neural network are key for efficient energy/force predictions,
and these parameters must be trained to become a valid model that
can be utilized for subsequent atomisticsimulations.
Figure 1
(a) Schematic of the
network architecture in the ANN scheme. (b) Structure of tetragonal
perovskite MAPbI3 (dark gray: Pb, brown: I, light gray:
C, white: H, and blue: N).
(a) Schematic of the
network architecture in the ANN scheme. (b) Structure of tetragonalperovskiteMAPbI3 (dark gray: Pb, brown: I, light gray:
C, white: H, and blue: N).The feature
vector I of the ith atom is composed of a series of descriptor functions
that transform the Cartesian atomiccoordinates of the ith atom into translationally/rotationally invariant fingerprints
specifying its chemical environment. In this work, the Gaussian descriptor
functions proposed by Behler[42] were employed
as the descriptor functions for the input layer of the atomic neural
network functions N. In the present study, the Gaussian
descriptor functions were divided into two categories, namely, the
radial descriptor GII and the angular
descriptor GIV. The radial and angular
descriptors can be expressed aswhere Rcis the cutoff distance for the descriptor functions and η
and ζ are predefined parameters for the descriptors. The cutoff
function fccan be expressed asThe parameters of the Gaussian descriptors
employed in the present study are compiled in Tables S1 (radial GII-type descriptor)
and S2 (angular GIV-type descriptor) in
the Supporting Information. The cut-off
radius for the potential training was set to be 6.5 Å, which
is approximately twice the Pb–I bond length to ensure the second-nearest
neighbors in the backbone part to be included for fingerprint evaluations.
The feature vector for the atomic neural network is composed of twenty GII-type descriptor functions and sixty GIV-type descriptor functions. The feature vectors I were computed by accumulating
contributions to all eighty descriptors from each neighboring atoms
within the cut-off sphere of atom icompiled in Tables
S1 and S2 in the Supporting Information. Note that in Tables S1 and S2, parameters
with identical parameters were to specify contributions from neighboring
atoms or atom pairs of different atomic species (for radial descriptors)
or pair combinations (for angular descriptors, because the atom i serves as the center atom of the atomic triplet consisting
the angle). Take a radial descriptor (Table S1) as an example, descriptor no. 1 (2) in Table S1 accumulates contributions from all neighboring C (H) atoms
(within the cut-off radius) with specified parameters (η = 5)
and will return zero value if there are no neighboring C (H) atoms
inside the cut-off sphere. Because there were five elements for MAPbI3 (C, N, H, Pb, and I), for each set of radial Gaussian descriptor,
there were five possible elements as neighbors. This is the reason
why there were five radial descriptors with identical parameters.
Similarly, angular descriptors with identical parameters were to accumulate
contribution from atom pairs specified in Table S2 in the Supporting Information.The ANN model must
be trained to enable its use for predicting system energies/forces
with high accuracy. During the training processes, the weighting parameter
set {} is optimized by minimizing the
quadratic error functionwhere
σ is a structure within the training set, Eσref is
the respective reference total energy, Fσref is the reference component k of the atomic
force of atom i, and Nσ is the number of atoms in the structure σ, respectively. We
note that the first and the second terms on the right-hand side refer
to the quadratic errors in the system energies and atomic forces,
respectively. The parameter α is the weighting coefficient of
the force, which can be set to zero if only system energies are used
as the training targets. The weighting parameter sets {} were obtained by minimizing the quadratic error
function ε. In the present study, we employed the limited-memory
BFGS for bound-constrained optimization (l-BFGS-b) minimizer implemented
in the Atomistic Machine-Learning Package[43] to train the ANN model. The energy and force convergence criteria
for the l-BFGS-b minimization processes are described in the following
paragraph.In the present manuscript, tetragonalperovskiteMAPbI3 was selected as the structure to train the ANN model
(Figure b). The lattice
parameters of the optimized tetragonalcell were a = 12.26 Å and c = 12.77 Å, close to the
results of previous ab initio calculations[17,44] and
experimental results.[45,46] The atomic structures in the
training set were generated by performing ab initio MD (AIMD) simulations
under the NVT ensemble using the Langevin thermostat.
Because the neural network function as well as its derivatives with
respect to atomiccoordinates needs to be evaluated for each atom
(up to 105 atoms) every atomisticsimulation step (up to
105 to 106 steps), the neural network architectures
(i.e., number of hidden layers and number of nodes per layer) were
chosen to ensure balance between computational efficiency and model
accuracy. In the present study, we trained neural network models of
incrementally reduced network size from a network of [10, 10, 10]
architecture (namely, three hidden layers and ten neurons per hidden
layer), and the architectures specified in Table were chosen because they were among the
simplest architectures with accuracy compatible with the [10, 10,
10] model. Note that three different settings for training were considered
(Table ). In settings
1 and 2, we tested the effects of the network architecture on the
trained ANN model, whereas in sets 2 and 3, we fixed the neural network
architectures and compared training the ANN model using only system
energies (setting 2) and training the ANN model using both energies
and atomic forces (setting 3). The energy convergence criteria for
settings 1–3 were set to 0.001 eV/atom, and the force convergence
criterion was set to 0.1 eV/Å in setting 3. To perform atomisticsimulations, we implemented the ANN potential model into the Large-scale
Atomic/Molecular Massively Parallel Simulator (LAMMPS).[47] The LAMMPS input files for the ANN potential
models can be found in the Supporting Information and the code for the implemented ANN pair style is available on
request.
Table 1
Neural Network Architectures
and Training Targets
Employed in the Present Studya
#
Pb, I
C, N, H
energy training
force
training
1
10,
5, 5
5, 5, 5
O
X
2
10, 5, 5
5, 5
O
X
3
10, 5,
5
5, 5
O
O
The atomic neural network architectures of Pb, I atoms (2nd column)
and C, N, H atoms (3rd column) are specified, respectively. Note that
the numbers separated by comma in the network architectures denote
the number of nodes in each hidden layer.
The atomic neural network architectures of Pb, I atoms (2nd column)
and C, N, H atoms (3rd column) are specified, respectively. Note that
the numbers separated by comma in the network architectures denote
the number of nodes in each hidden layer.
Training Set Preparation
Twenty-two separate AIMD simulations of tetragonalMAPbI3 subjected to hydrostatic strains that incrementally increased from
−5 to +5% were performed to expand the span of the atomic feature
vectors to ensure the transferability of the ANN model. For each AIMD
simulation, the system temperature and step size were set to 300 K
and 0.5 fs, respectively, and a trajectory of 1000 AIMD steps was
collected. Hence, a total number of 22 000 images were collected.
Two thousand images were randomly selected from the pool of 22 000
images for the training set for potential training, and another two
thousand images were selected for validation. Note that despite the
images in the validation sets were extracted from the same pool as
the training set, there were literally no or only weak correlation
between images in the training and validation sets because validation
images from one particular AIMD trajectory were not correlated with
training images from the rest 21 AIMD trajectories. In addition, despite
validation images from one AIMD trajectory are not totally uncorrelated
with training images from the same trajectory, their atomiccoordinates
deviate from those of the training images with atomic displacements
of the order of angstroms, thereby allowing testing the trained ANN
models using inputs outside the images selected for the training processes
for the model validation purpose.
Ab Initio Molecular Simulations
All AIMD calculations and
structural minimization were performed using the Vienna Ab initio
Simulation Package (VASP).[48−50] The generalized gradient approximation was used with the Perdew–Burke–Ernzerhof
exchange–correlation functional[51,52] and the projector
augmented wave[53,54] pseudopotentials. The cutoff
energy was set to be 400 eV, and a 3 × 3 × 3 Monkhorst–Pack k-point mesh was employed. The self-consistent field convergence
criterion was set to dE < 10–5 eV, where dE is the total energy difference between
two steps. The DFT-D3 with Becke–Johnson damping[55,56] was employed to incorporate the van der Waalsinteractions between
the atoms.
Results and Discussion
Validation and Testing of the Trained ANN Potentials
The ANN models were trained based on three different training settings
(referring to Table ) and the training set composed of 2000 randomly picked system images.
Then, the trained ANN models were validated using the validation set,
which consisted of another 2000 randomly picked system images. Figure a displays the correlations
(or parities, left panels) and errors (right panels) between the potential
energies from three trained ANN models and the VASP calculations in
the validation set. In the left panels of Figure a (the parity plots), the potential energies
from the trained ANN models lie on the diagonal line with very small
fitting errors. The RMSEs were ∼0.001 eV/atom and the maximum
absolute errors (MAE) were less than 0.005 eV/atom in the three training
settings—much lower than the atomic thermal energies at 300
K (ca. 0.026 eV/atom), demonstrating that these trained ANN models
can successfully predict the energies of tetragonalMAPbI3crystals without performing computationally expensive DFT calculations.
The learning curves of the energy RMSE with respect to the validation
set as the function of training set sizes for three training settings
can be found in Figure S1 in the Supporting Information. From the learning curve, we can see that for all three training
settings a training set of 2000 images already gives very low RMSE
in the validation set (∼0.001 eV/atom) with no sign of overfitting.
Figure 2
(a) Validations of the
trained ANN models with
different training settings; the left panels display the parity plots
and the right panels display the corresponding errors. (b,c) Potential
energies of (b) cubic and (c) orthorhombic perovskite MAPbI3 with respect to the imposed hydrostatic strains obtained using VASP
and the trained ANN models, where the lowest potential energy was
set to be 0. Note that the dashed line is fitted with the potential
energies from VASP calculations.
(a) Validations of the
trained ANN models with
different training settings; the left panels display the parity plots
and the right panels display the corresponding errors. (b,c) Potential
energies of (b) cubic and (c) orthorhombicperovskiteMAPbI3 with respect to the imposed hydrostatic strains obtained using VASP
and the trained ANN models, where the lowest potential energy was
set to be 0. Note that the dashed line is fitted with the potential
energies from VASP calculations.To test the transferability of the trained ANN models’ for
different crystal polymorphs of MAPbI3, we performed separate
DFT calculations for MAPbI3cubic and orthorhombicperovskite
(the ANN models were trained using a tetragonal structure). We first
applied the trained ANN model to evaluate the potential energy curves
of cubic and orthorhombicperovskiteMAPbI3 against incremental
hydrostatic strain of 0.5% from −5 to +5% and compared the
respective potential energy curves with those obtained from VASP calculations. Figure b,c displays the
potential energies of cubic (Figure b) and orthorhombic (Figure c) from VASP (with fitted potential energy
curve) and the potential energies from the trained ANN models with
the changing hydrostatic strains. We observed that the potential energies
from VASP and the trained ANN models almost overlapped, demonstrating
that the ANN models provide excellent predictions for potential energies
with high fidelity to the corresponding VASP calculations.Next,
we performed separate testing of the AIMD simulations of cubicperovskiteMAPbI3 with a lattice parameter of 12.5 Å and compared
the potential energies and atomic forces obtained from the AIMD simulations
with those from the trained ANN models. In the testing of the AIMD
simulations, the system temperature was set to 300 K, and the step
size and the total number of steps of the testing runs were 0.5 fs
and 2000 steps (1.0 ps in total), respectively. The stepwise system
energy evolution and respective error (between trained ANN models
and VASP) are displayed in the left panels of Figure a–c. Note that the potential energies
and atomic forces from the trained ANN models were obtained by feeding
atomiccoordinates from respective AIMD trajectory into the trained
ANN models. The advantage of this approach is to ensure the input
of both VASP and ANN models to be identical, thereby allowing direct
comparison between VASP and ANN potential models. The right panels
in Figure a–c
display the corresponding atomic force parities. We note that all
three components of each atomic force were included in the atomic
force parity plots. It is observed from the left panels of Figure a–c that the
system energies predicted using the trained ANN models (red lines
in the left panels) are in good agreement with those from the AIMD
simulations (black lines). The RMSE values for the energy per atom
are 0.00123, 0.00174, and 0.00181 eV/atom in Sets 1, 2, and 3, respectively,
with corresponding MAEs in the energy per atom of less than 0.005
eV inall cases. We note that, inall cases, the RMSE values for the
energy per atom were much smaller than the thermal energy at 300 K
(∼0.026 eV/atom), demonstrating the transferability of the
trained ANN models for predicting system energies. The RMSE for the
energy per atom of Set 1is slightly smaller than those of Set 2 and
Set 3, which can be attributed to the more complicated network architecture
inSet 1 for the C, N, and H atoms relative to those of Sets 2 and
3. On the other hand, for the prediction of atomic forces, the RMSE
value of the atomic force components is much smaller for Set 3 (0.105
eV/Å) than those for Set 1 (0.75 eV/Å) and Set 2 (0.81 eV/Å),
as is also evident from an examination of the parity plots of the
atomic force components in the right panels of Figure a–c. The better atomic force predictions
for Set 3 can be attributed to the inclusion of atomic force training
during the training process. The present study hence demonstrated
that simple neural network architectures can predict energies/forces
of MAPbI3 with high fidelity to DFT calculations. We do
anticipate that more complicated neural network architectures should
further promote accuracy; however, we have to balance the computational
accuracy and efficiency because energy/force evaluation is the most
computationally intensive part in atomisticsimulations. It must be
noted that despite the ANN potential models presented can successfully
predict energies of both tetragonal and cubicMAPbI3, at
this moment these ANN models cannot be extended to mixed-ion perovskite
materials such as FAMA1–PbI3, because the images of FAMA1–PbI3 from AIMD simulations were not included in the training sets in
the present study. The present study already demonstrated the capability
of the ANN potential model in prediction of energy of organometalhalide perovskite with mixed covalent, ionic, and vdW interactions
between atoms, and we are actively working on extending the ANN potential
model to FAMA1–PbIBr1– mixed-ion perovskite material.
Figure 3
(a–c)
Left panels: potential energy evolution
from both the trained ANN model and AIMD trajectory and the respective
errors of Sets 1–3; right panels: respective atomic force component
parity plots. (d) Parity plot of the atomic force magnitude and distribution
of the error of the atomic force direction for Set 3.
(a–c)
Left panels: potential energy evolution
from both the trained ANN model and AIMD trajectory and the respective
errors of Sets 1–3; right panels: respective atomic force component
parity plots. (d) Parity plot of the atomic force magnitude and distribution
of the error of the atomic force direction for Set 3.The atomic force parities
between the trained ANN and the respective VASP calculations were
further examined for Set 3 (energy/force training). Figure d displays the parities of
the atomic force magnitudes and the error distribution of the atomic
force directions of the ANN model trained using training setting 3
(energy/force training). The RMSE of the he atomic force magnitudes
is 0.0746 eV/Å with an MAE of 0.836 eV/Å. For the force
direction parities, 91.5% of the atomic force directions predicted
from the ANN model (Set 3) deviate from those obtained by the VASP
calculations by less than 20°. Hence, the atomic forces from
the ANN model trained using both energy and forces (Set 3) are in
very good agreement with those obtained from the VASP calculations.
PDOS Obtained Using the Trained ANN Model
We computed the PDOS and extracted the corresponding normal modes
of tetragonalMAPbI3 using the ANN model trained using
both energies and atomic forces (Set 3 in Table ). The finite displacement method[57] were employed for PDOS calculations, and the
supercell size in VASP calculation was 2 × 2 × 2, whereas
both 2 × 2 × 2 and 6 × 6 × 6 supercells were used
for PDOS calculations using the ANN potential model. Figure a shows the PDOS of the tetragonalperovskite for phonon frequencies of up to 20 meV computed from both
trained ANN model of different supercell sizes and DFT calculations
using VASP. Characteristic peaks at 4, 10, 15, and 20 meV and valleys
at 8 and 18 meV can be observed in the PDOS diagram obtained using
the ANN models, and these characteristic frequencies were in good
agreements with VASP calculations as well as earlier works.[58,59] Note that the characteristic peak heights between VASP and ANN models
are in good agreements when the supercell sizes were identical (2
× 2 × 2) for both VASP and ANN model; hence, the discrepancies
between VASP and ANN model (6 × 6 × 6 supercell) can be
attributed to the difference insimulation cell sizes. Normal modes
with a negative or very low frequency correspond to the translations
and rotations of MA cations with Pb and I atoms nearly fixed. Such
low frequencies result in low activation energies for the vibrations
and rotations of MA cations in octahedralcages of perovskites, creating
degenerate states. Normal modes with frequencies of approximately
4 meV (∼30 cm–1) can be attributed to the
off-axis displacements of I atoms with modest coupling vibrations
of MA cations, as shown in Figure b and Movie M1 in the Supporting
Information. Normal modes with frequencies of approximately 10 meV
(∼80 cm–1) are due to the on-axis displacements
of the Pb atoms and I atoms of the backbone, as shown in Figure c (and Movie M2 in the Supporting Information). The
strong coupling of MA cation libration
with the out-of-axis displacements of I atoms (Figure d and Movie M3 in Supporting Information) corresponds to the normal modes with
frequencies of approximately 15 meV (∼120 cm–1). Finally, the normal modes with frequencies of approximately 20
meV (∼160 cm–1) can be attributed to the
libration of MA cations without the displacement of the backbone atoms,
as shown in Figure e and Movie M4 in the Supporting Information.
Hence, the trained ANN model can be utilized to extract the phonon
modes that are in good agreement with the phonon modes obtained by
the DFT calculations.
Figure 4
(a) PDOS obtained
using the trained ANN model (red and green for 6 × 6 × 6
and 2 × 2 × 2 supercell, respectively) and VASP (blue).
(b–e) Atomic displacements of the normal mode vibrations extracted
using the trained ANN model. (f) PDOS computed using the ANN model
with force training (red curve, Set 3) and without force training
(blue curve, Set 2).
(a) PDOS obtained
using the trained ANN model (red and green for 6 × 6 × 6
and 2 × 2 × 2 supercell, respectively) and VASP (blue).
(b–e) Atomic displacements of the normal mode vibrations extracted
using the trained ANN model. (f) PDOS computed using the ANN model
with force training (red curve, Set 3) and without force training
(blue curve, Set 2).We compare the PDOS computed with the
trained ANN models obtained using energy training (Set 2in Table ) and using energy/force
training (Set 3 in Table ), as shown in Figure f. We note that even though both trained ANN models have identical
neural network architectures, the PDOS computed from the two training
settings yields distinct results. Relative to the energy/force trained
ANN model, the ANN model trained without force training underestimates
the frequencies of the low-frequency modes (the backbone modes), while
it overestimates the frequencies of the high-frequency modes (MA modes
or strongly coupled MA modes). The discrepancies between the PDOSs
obtained from the trained ANN models can be attributed to the relatively
poor agreements of atomic forces with the DFT calculations in the
ANN model trained without forces. Hence, even though the training
using system energies leads to an ANN model that yields accurate predictions
of energies, the PDOS is in poor agreement with the PDOS obtained
from DFT calculations. Therefore, to obtain an accurate evaluation
of PDOS and thermal transport properties, the atomic forces must be
included into the training sets during the training processes.
Benchmark of Time Consumption in Energy Evaluations
with Ab Initio Calculations
We demonstrated that the trained
ANN models can predict system energies with high fidelity to the corresponding
ab initio calculations. To benchmark the efficiency of the trained
ANN models in time consumption per energy evaluation relative to ab
initio calculations, we carried out a series of atomisticcalculations
for tetragonalperovskitecrystals with different system sizes using
both the trained ANN models (setting 1 and 2) and VASP. Figure displays the comparison of
the time consumed per energy evaluation (the vertical axis) using
the ANN model (Set 1in Table ) and VASP calculations. We note that the ANN model calculations
were carried out only using 16 CPU cores (Intel Xeon E5-2630), whereas
the VASP benchmark calculations were performed using both CPUs (16
and 64 cores) and GPUs (1 and 4 NVIDIA Tesla P100). Note that for
ANN models, neural network Set 2is about 10 to 20% faster than Set
1, which can clearly be attributed to one less hidden layers for C,
N, and H atomsinSet 2 than inSet 1 (see Table ). For a system of fewer than 104 atoms, the trained ANN model offers a 104 to 105 increase in speed relative to VASP calculations. We note that in
the present study, the system size for VASP calculations was limited
to 2592 atoms due to computational resource limitations, whereas the
largest system size for ANN calculations was near 108 atoms.
Hence, the ANN model not only yields high accuracy with ab initio
calculations but also provides a significant increase incomputation,
allowing efficient sampling of the configurational spaces for thermodynamic
property calculations and the construction of phase diagrams using
atomistic Monte Carlo simulations. We anticipate that further computational
boost can be reached by implementing the ANN models to GPUs, which
can potentially facilitate simulation of models of hundreds of thousands
atoms; however, it is beyond the scope of the present study.
Figure 5
Benchmark of time consumed
per energy evaluation
of the trained ANN model and VASP calculations using both CPUs and
GPUs.
Benchmark of time consumed
per energy evaluation
of the trained ANN model and VASP calculations using both CPUs and
GPUs.
Conclusions
In summary, in the
present study, we successfully trained ANN models of MAPbI3 perovskite materials by feeding a training set composed of thousands
of images of tetragonalperovskitecrystals obtained from ab initio
calculations using VASP into the ANN model. We demonstrated that despite
the complex nature of both the chemicalcomposition (Pb, I, C, N,
and H atoms) and interatomicinteractions (covalent, van der Waals,
and ionicinteractions), the trained ANN model can successfully predict
the energies of cubicperovskitecrystals for several different ANN
network architectures and training schemes. Furthermore, the phonon
modes can be computed using the trained ANN models, and good agreements
with PDOS obtained by ab initio calculations can be reached by training
the ANN using both system energies and atomic forces. We benchmarked
the time consumption for both the trained ANN model and VASP calculations
and demonstrated that for systems of given sizes, the ANN model offers
an increase in speed of a factor of 104 to 105 in energy evaluations, thereby allowing exhaustive sampling of the
configurational spaces of perovskite for thermodynamic property calculations.
Hence, the present study demonstrates that the ANN model can be trained
to model chemically complex systems with high fidelity with respect
to ab initio calculations as well as provide a significant acceleration
incomputational efficiencies and can be utilized for predictions
of the thermodynamic and structural properties of chemically complex
materials, such as mixed-ion perovskites or high entropy alloys.