Dorothea Golze1,2, Markus Hirvensalo2, Patricia Hernández-León3, Anja Aarva3, Jarkko Etula4, Toma Susi5, Patrick Rinke2, Tomi Laurila3,4, Miguel A Caro3. 1. Faculty of Chemistry and Food Chemistry, Technische Universität Dresden, 01062 Dresden, Germany. 2. Department of Applied Physics, Aalto University, 02150 Espoo, Finland. 3. Department of Electrical Engineering and Automation, Aalto University, 02150 Espoo, Finland. 4. Department of Chemistry and Materials Science, Aalto University, 02150 Espoo, Finland. 5. University of Vienna, Faculty of Physics, Boltzmanngasse 5, 1090 Vienna, Austria.
Abstract
We present a quantitatively accurate machine-learning (ML) model for the computational prediction of core-electron binding energies, from which X-ray photoelectron spectroscopy (XPS) spectra can be readily obtained. Our model combines density functional theory (DFT) with GW and uses kernel ridge regression for the ML predictions. We apply the new approach to disordered materials and small molecules containing carbon, hydrogen, and oxygen and obtain qualitative and quantitative agreement with experiment, resolving spectral features within 0.1 eV of reference experimental spectra. The method only requires the user to provide a structural model for the material under study to obtain an XPS prediction within seconds. Our new tool is freely available online through the XPS Prediction Server.
We present a quantitatively accurate machine-learning (ML) model for the computational prediction of core-electron binding energies, from which X-ray photoelectron spectroscopy (XPS) spectra can be readily obtained. Our model combines density functional theory (DFT) with GW and uses kernel ridge regression for the ML predictions. We apply the new approach to disordered materials and small molecules containing carbon, hydrogen, and oxygen and obtain qualitative and quantitative agreement with experiment, resolving spectral features within 0.1 eV of reference experimental spectra. The method only requires the user to provide a structural model for the material under study to obtain an XPS prediction within seconds. Our new tool is freely available online through the XPS Prediction Server.
X-ray spectroscopy techniques
are routinely used for structural
characterization of materials[1] and molecules.[2] Among the different X-ray spectroscopies, X-ray
photoelectron spectroscopy (XPS) is arguably the most widespread.
In XPS, a material sample is irradiated with monochromatic X-rays
to probe the binding energy (BE) of its core electrons. When a core
electron absorbs an X-ray photon with enough energy, it leaves the
sample and its kinetic energy can be measured. Since the X-ray incident
energy is known, the difference between the kinetic and the incident
energy is the core–electron BE. This energy is characteristic
of the chemical environment of the core-excited atom. XPS spectra
are therefore frequently used for structural characterization.[2,3]Carbon-based materials, such as amorphous and disordered carbons,
graphene, graphene derivatives, and nanotubes are an important material
class in industry and research.[4−6] Furthermore, emerging applications
are envisioned, such as energy storage and conversion,[7] electronics,[8] electrocatalysis,[9,10] and biosensing.[11] Unfortunately, the
atomic structure of carbon-based materials is often not completely
known because, in addition to their possibly disordered nature, they
also often contain a wide variety of defects and surface chemical
functionalizations. XPS is one of the most commonly used spectroscopy
tools for structural characterization of carbon-based materials.[11,12] However, it is generally difficult and sometimes impossible to establish
the precise origin of each peak in an XPS spectrum due to the lack
of well-defined reference data.[13] In addition,
the link between atomic structure and a particular XPS spectrum is
often imprecise since the core–electron BEs of two atoms in
different chemical environments can be the same. Further ambiguities
are introduced by the peak fitting procedure, which must be applied
to resolve overlapping features in the experimental spectrum and which
often relies on a number of assumptions, such as the total number
of peaks.[13,14] These limitations impact the interpretation
of experimental XPS spectra. Inferring the atomic structure from the
XPS spectrum is referred to as the “backward” route.
An alternative strategy is to generate XPS spectra from candidate
structural models. The best match between generated and measured spectra
then provides the best structural model. We call this the “forward”
direction.For the forward route to be feasible, we require
a pool of candidate
structures and a theoretical or computational approach that is able
to accurately predict core–electron BEs from the atomic structure
alone. Candidate structures can be generated computationally. The
difficulty with the forward direction therefore lies with the availability
and computational cost of accurate XPS prediction tools. Currently,
XPS modeling is almost exclusively based on Kohn–Sham density
functional theory (DFT) employing so-called Δ-methods, such
as the Delta self-consistent field (ΔSCF)[15] or Delta Kohn–Sham (ΔKS) frameworks.[16] DFT is, by design, a ground-state theory and
does not provide systematic access to excited-state properties. However,
for small molecules, relative and absolute core-level BEs from ΔSCF
and ΔKS generally compare well to experiment, in particular
when employing meta-generalized gradient approximation (meta-GGA)
functionals.[17−20] Recently, ΔSCF calculations with meta-GGAs were, in combination
with finite-size correction schemes, also successfully applied to
simple solids.[21]While the functional
dependence of molecular core-level excitations
is moderate for absolute[22,23] and even negligible
for relative BEs,[17] it can be more severe
for complex materials.[24−26] As a consequence of the self-interaction error, the
accuracy of DFT-based Δ-methods deteriorates with increasing
system sizes, which has been comprehensively discussed for valence
excitations[27] and has been also observed
for core states.[28] The GW approximation[29] to many-body perturbation
theory overcomes these limitations of DFT and provides a rigorous
quantum-mechanical framework for the photoemission process,[30] see also refs (28) and (63) for a discussion of the limitations of Δ-methods.
However, the computational cost of a GW calculation
is orders of magnitude larger than for DFT, making a straightforward
application of GW to complex materials difficult.The computational prediction of XPS spectra of amorphous structures
requires sampling over all atoms in the structure, which is demanding
even with DFT and impossible with GW. Machine-learning
(ML) methods are a promising strategy to bridge the gap between high
accuracy and computational efficiency. While the application of ML
in spectroscopy is still in its infancy,[31−34] the first proof-of-concept applications
for the prediction of valence[35] and core-level
spectra are emerging.[36−38] In this paper, we advance these ideas for real-world
applications and develop a powerful XPS prediction tool by combining
ΔKS calculations and highly accurate GW predictions
with ML models. We show that our method can provide access to quantitatively
accurate predictions of XPS spectra of complex disordered materials
and small molecules containing carbon, hydrogen, and oxygen (CHO).The remainder of this article is structured as follows: We discuss
the generation of structural models for CHO-containing compounds in section . We proceed
with the details of the electronic-structure methods used to generate
computational XPS data in section . The architecture of our ML models is discussed in section . The performance
of the ML models for C 1s and O 1s excitations of molecular and extended
CHO structures is presented in section , followed by XPS spectra predictions for selected
CHO materials. Finally, we introduce our XPS Prediction Server as
a freely available online tool and draw conclusions in section .
Methods
Structural Models of CHO-Containing Compounds
Due to carbon’s versatility in chemical bond formation,
the composition and configuration space for carbon-based materials
and molecules is vast.[39,40] Diverse examples of CHO compounds
are small molecules (water, methane, methanol, etc.), large molecules
and polymers (lipids, sugars, cellulose, etc.), and solid-state materials.
The main focus of this work are CHO materials. While two distinct
CHO materials will differ from each other when regarded as a whole,
they are made of the same (or very similar) individual building blocks,
or atomic motifs. By building a library of structural models for CHO
materials, we can identify the individual atomic motifs most representative
of the ensemble.[41] Effective motif selection,
discussed in more detail in section , is essential to obtain a compact and
manageable representation of large structural databases.
CHO Structural Databases
We have
generated two CHO structure databases for the prediction of XPS data:
one for CHO materials and another one for small CHO-containing molecules.
The structural database for CHO materials was constructed from computer-generated
model structures of amorphous carbon (a-C), hydrogenated a-C (a-C:H),
oxygen-enriched a-C (a-C:O), functionalized a-C, oxygenated amorphous
carbon (a-CO), graphene (G), and reduced
graphene oxide (rGO). All of the computational structural models for
these materials are taken from the available literature.[41−47] An exception are the oxygen-rich a-CO models, which were generated using DFT molecular dynamics following
ref (47). The CHO materials
in our structural database cover a broad range of structural building
blocks, which provides the necessary foundation to map all the characteristic
atomic motifs centered on O and C (H lacks an atomic core) to their
corresponding core–electron binding energies.Our structural
database of small CHO molecules is a subset of the QM9 data set,[48] which contains in total 134k organic compounds.
Our QM9 subset consists of 2089 CHO molecules with 3–29 atoms
and its size distribution follows that of those QM9 molecules which
contain exclusively C, H, and O. The full QM9 database also includes
molecules with N and F. Those molecules were not considered for the
subset generation. The molecules in our subset contain up to 9 “heavy”
atoms (in this case C or O), which amounts to a total of 14 707
C 1s and 1 865 O 1s excitations. We found that the relationship
between local molecular structure and XPS properties is not transferable
to extended CHO structures, i.e., these data were not used for the
generation of the ML models for CHO materials. However, since these
molecules are small, calculating their XPS spectra is computationally
comparatively inexpensive. This allows us to both benchmark our methodology
in the limit of abundant data and produce a useful reference database
of computational XPS spectra of small CHO-containing molecules, which
is valuable on its own.
Structure Classification Based on Representative
Atomic Motifs
The computational structures for the CHO materials
contain many C and O sites. Computing the 1s core–electron
BE for each site is, unlike for molecules, computationally too expensive,
in particular at the GW level of theory. Therefore,
to make the computational effort tractable, we have identified the
most representative atomic motifs in the entire CHO materials database,
using the data clustering methodology presented in ref (41). In short, a many-body
atomic descriptor known as the “smooth overlap of atomic positions”
(SOAP) is used to encode the atomic structure surrounding each atomic
site in the database of structures.[49] These
descriptors allow us to construct kernel functions, which can also
be understood as measures of similarity between the atomic environments.
From these, we can build a distance matrix that can then be fed to
a data classification algorithm. This algorithm, k-medoids in our case,[50] clusters data
(here the atomic environments) into groups that share similarities
and assigns a “medoid” (also called a “centroid”
in barycenter-based clustering methods). This medoid corresponds to
the most representative atomic motif within each data cluster. We
can preselect the number of data clusters to build, based on our estimate
of available CPU power, and perform the core–electron calculations
only on those. This leads to efficient charting of configuration space,
since we avoid repeatedly calculating motifs that are overrepresented
in the database of atomic structures.The composition of the
database is visualized in Figure , where we show a map of chemical and structural similarities
between the present atomic motifs using a low-dimensional embedding
tool, cl-MDS,[54] which combines ML atomic
descriptors[49,51] and multidimensional scaling
(a dimensionality reduction technique)[41,52,53] with data clustering.[50] Here, an atomic motif is constructed from a central C, H, or O atom
embedded within diverse CHO environments. Atomic motifs are associated
in classical terms with coordination environments (e.g., sp, sp2, and sp3 in carbon[41]) or with chemical groups, such as keto, epoxide, hydroxyl groups,
etc. The central atom is either part of this group or adjacent to
it, see Figure . The
environment within the immediate vicinity of the central atom includes
all the neighbor atoms within a given cutoff sphere. To simplify visualization,
we used a cutoff radius of ∼2.25 Å to generate the similarities
in Figure . However,
this radius is too small when selecting structures for training the
ML models. Motivated by the convergence studies for finite systems
discussed in section , we retain structural information within a cutoff sphere
of radius 4.25 Å for all the ML models trained in this study.
Figure 1
Cluster-based
multidimensional scaling map of the CHO materials
database used in this work. The graph is partitioned into three sections,
depending on whether a C, H, or O atom is at the center of the environment.
For instance, a hydroxyl group is viewed differently depending on
whether the atomic environment descriptor uses C, H, or O as the origin.
The distance between data points on the map is inversely proportional
to the degree of similarity between the corresponding atomic environments.
This similarity is established using SOAP many-body descriptors,[49,51] as explained in more detail in refs (41, 52, and 53). The
gray triangles indicate the motifs selected for the GW calculations.
Cluster-based
multidimensional scaling map of the CHO materials
database used in this work. The graph is partitioned into three sections,
depending on whether a C, H, or O atom is at the center of the environment.
For instance, a hydroxyl group is viewed differently depending on
whether the atomic environment descriptor uses C, H, or O as the origin.
The distance between data points on the map is inversely proportional
to the degree of similarity between the corresponding atomic environments.
This similarity is established using SOAP many-body descriptors,[49,51] as explained in more detail in refs (41, 52, and 53). The
gray triangles indicate the motifs selected for the GW calculations.The subset of atomic motifs in the database selected
for GW calculations is indicated with gray triangles
in Figure . ΔKS
calculations
are performed on an extended subset of the whole database that includes
the GW environments. Note that core-level calculations
are only performed for heavy atoms, and we include thus only motifs
with central C or O atom. Using the structures selected for GW as an example (the gray triangles), Figure visually highlights two aspects
of motif selection: (i) The diversity of our CHO database is preserved
in the selection of those structures used for ML model construction,
i.e., we draw samples from all over the map. (ii) This selection needs
to be carried out according to the longer cutoff, since a classification
based on a shorter cutoff, while useful in visualization, does not
contain enough information to train a predictive ML model (the drawn
samples are not homogeneously distributed on the map). If we drew
the same map according to similarities based on the 4.25 Å cutoff, Figure would look very
different and it would be impossible to make an intuitive connection
between the map and classical chemical group classification. See Figure S5 of the Supporting Information for a
cl-MDS map with the larger cutoff.
Carving out the Structures
Another
issue pertains to the size of the input structures, which are model
“supercells” with up to 355 atoms with periodic boundary
conditions. The cost of DFT calculations scales cubically with the
number of atoms in the system. The cost of core-level GW calculations is even more formidable and scales with the fifth power
of the number of atoms.[28] We therefore
resort to moderately sized cluster models of around 100 atoms for
our carbon structures. Cluster models are justified for core-level
excitations, since local atomic properties, such as the core–electron
binding energy, are expected to converge with respect to the number
of neighbors explicitly included in the calculation. By only keeping
the atoms in the immediate surrounding of the core-excited atom, we
can significantly speed up the core-level calculations. At the same
time, we remove the need for periodic boundary conditions, which are
currently incompatible with our core-level GW implementation.The cluster models need to be constructed with care, since creation
of dangling bonds or radicals at the cluster surface can affect the
overall spin state of the system. We use a “carving”
technique introduced in refs (43 and 55), where a spherical portion of the material centered on a specific
site is carved out of the bulk material. Broken C–C bonds are
passivated with H atoms. All other bonds (C–O, O–H,
and C–H) are preserved. The procedure is exemplified in Figure , and our carving
code is freely available online.[56]
Figure 2
(Top) Example
of the cluster carving procedure for an a-CO structure, where the cluster is contained
within a sphere centered on the carbon atom highlighted with the red
circle where the core hole is created. (Middle) Dependence of the
core–electron BE of the cluster on the cutoff radius, together
with the periodic reference, for the example structure in the top
panel (ΔKS values). (Bottom) Comparison for the whole set of
ΔKS data points in our database for which both cluster (rcut = 4.25 Å) and periodic ΔKS0 values are available, showing that the main difference is
simply a small vertical shift in the energies. This strongly indicates
that the carved structures are good surrogate models for the extended
(periodic) systems. ΔKS+ refers to calculations where
the excited core electron is removed from the system, whereas in ΔKS0 it is promoted to the conduction band, see section . The root-mean-square
and mean-absolute errors, once the rigid shift is taken into account,
are 0.41 and 0.32 eV, respectively, with a maximum error of 1.69 eV.
(Top) Example
of the cluster carving procedure for an a-CO structure, where the cluster is contained
within a sphere centered on the carbon atom highlighted with the red
circle where the core hole is created. (Middle) Dependence of the
core–electron BE of the cluster on the cutoff radius, together
with the periodic reference, for the example structure in the top
panel (ΔKS values). (Bottom) Comparison for the whole set of
ΔKS data points in our database for which both cluster (rcut = 4.25 Å) and periodic ΔKS0 values are available, showing that the main difference is
simply a small vertical shift in the energies. This strongly indicates
that the carved structures are good surrogate models for the extended
(periodic) systems. ΔKS+ refers to calculations where
the excited core electron is removed from the system, whereas in ΔKS0 it is promoted to the conduction band, see section . The root-mean-square
and mean-absolute errors, once the rigid shift is taken into account,
are 0.41 and 0.32 eV, respectively, with a maximum error of 1.69 eV.We find that the core-level BEs converge quickly
with respect to
the cluster radius for ΔKS, as shown in Figure (middle) and also for GW, see Figure S1 in the Supporting Information.
Convergence is typically reached for rcut = 4.25 Å, which is the cluster radius we use for the data acquisition
of our ML models. Figure (bottom) also indicates that the carved structures represent
a good surrogate model for the periodic structures, since the respective
predicted core–electron binding energies closely follow each
other. The main difference is a constant shift of 0.27 eV, which is
easy to correct for. We attribute the systematic 0.27 eV upward shift
to finite-size (or particle-in-a-box) effects upon promotion of the
core electron to the conduction band minimum. Since the electron localization
length is necessarily reduced in the 4.25 Å cluster compared
to the extended structure, the energy of the excited state increases
accordingly, by 0.27 eV on average in this case. The validity of the
cluster approach is further discussed in section .
DFT and GW Calculations
DFT
We carry out the DFT calculation
of core–electron BEs using the ΔKS[16] total energy method. ΔKS is computationally more
affordable and therefore more amenable to high-throughput calculations
than its similarly accurate all-electron variant, the ΔSCF[15] method (see the Supporting Information for an explanation regarding the difference between
the ΔKS and ΔSCF methods). In ΔKS, the core-level
BE is given as the difference between core-excited and ground-state
total energies. In the excited-state calculation, the C 1s or O 1s
electron is removed from the core, which is modeled via a special
projector augmented-wave (PAW[57]) potential,
and only the valence electrons are relaxed. These valence electrons
can be relaxed either in the presence of the excited electron (neutral
calculation, ΔKS0) or in its absence (charged calculation,
ΔKS+). For molecules, the ΔKS+ approximation
can be applied directly since the vacuum level is well-defined. For
materials, the ΔKS0 calculation allows one to align
the computed BEs similarly as in experiment, i.e., with respect to
the Fermi level. In addition to the periodic ΔKS0 calculations, which we need for our ML model, we carried out ΔKS0 calculations also for carved clusters to validate that these
clusters are indeed good surrogate models for the periodic structures,
as shown in Figure (bottom). However, the ΔKS+ values are those directly
comparable to our GW cluster calculations.We performed open-shell DFT calculations with VASP[58−60] using the Perdew–Burke–Ernzerhof
(PBE) functional.[61] See the Supporting Information for a discussion on the
choice of functional. We apply a constant correction based on GPAW[16,62] results to convert the relative ΔKS values (i.e., the chemical
shifts) from VASP to absolute ΔKS values. Disordered carbon
materials often exhibit local atomic magnetization,[11] which makes the determination of the ground state challenging.
We expand on the procedure how to determine the lowest energy magnetic
configuration, details of the DFT calculations, and the definition
of the reference level for comparison to the experiment in the Supporting Information. Further discussions of
energy referencing can also be found in section and in ref (13).
GW
The GW approximation[29] is a highly accurate electronic-structure method
that can be applied to predict photoemission spectra. The central
object of GW is the self-energy Σ, which is
computed from the Green’s function G and the
screened Coulomb interaction W, where Σ = iGW, hence the name GW. The self-energy
contains all quantum mechanical correlation and exchange interactions
between the electrons and the hole created upon photoemission. GW offers access to quasiparticle energies, which directly
correspond to the negative of the vertical ionization potentials.GW has become the gold standard for the computation
of band structures of solids and is now also increasingly applied
to molecular valence excitations.[30] Recently,
we advanced the GW methodology and implementation
for application to deep core excitations by combining exact numeric
algorithms in the real frequency domain[28] with partial self-consistency[63] and relativistic
corrections.[64] We showed that GW reproduces absolute molecular 1s excitations within 0.3 eV of experiment
and relative binding energies with average deviations below 0.2 eV.[63] Our core-level GW approach
was recently also applied to simple solids,[74] yielding first promising results. In addition, its extension to
the Bethe–Salpeter equation (BSE@GW) was lately
also successfully used for the prediction of molecular K-edge transition energies.[65]GW calculations are several orders of magnitude
more expensive computationally than DFT calculations with GGA and
even hybrid functionals. Nevertheless, GW is nowadays
routinely applied to predict valence excitations of systems with several
hundred atoms.[66−71] However, the application of GW to deep core excitations
is computationally more expensive than for valence states. First,
core-level GW calculations require more advanced
numerical schemes,[28] increasing the conventional
scaling with respect to system size N from O(N4) (valence states) to O(N5) (core states). This unfavorable
scaling restricts the accessible system size in core-level GW to around 100 atoms. Second, an all-electron treatment
is necessary, which we efficiently realize by an implementation with
localized basis sets. The implementation of GW in
localized basis set codes is a rather recent development of the past
decade,[30] for which the efficient implementation
of periodic boundary conditions is still the subject of ongoing work.[72−74] Our core-level GW implementation[28] is thus currently restricted to cluster calculations. The
largest GW calculation in this work was performed
for an a-C cluster with 112 atoms on more than 8000 CPU cores.For the GW calculations, we use the FHI-aims program
package[75,76] and follow the procedure developed in ref (63). We employ a single-shot G0W0 approach in
combination with the PBEh(α = 0.45) functional for the underlying
DFT calculation, where α is the amount of exact Hartree–Fock
exchange.[77] The α value was tuned
to reproduce the results of computationally more demanding eigenvalue-self-consistent GW methods.[63] We performed a
screening for the lowest-energy configuration at the PBEh(α)
level to ensure that the open-shell G0W0 calculations are performed on top
of the DFT ground state. All GW results are extrapolated
to the complete basis set limit, and relativistic corrections[64] are added for the O 1s excitations. Further
details are given in the Supporting Information. To support open data-driven materials science,[78] we uploaded the input and output files of all GW calculations of the a-C clusters to the Novel Materials Discovery
(NOMAD) repository.[79]
Machine-Learning Model
We develop
ML models for the prediction of either a core-level BE for atom i, BE, or the difference Δ between GW and DFT predicted
core-level BEs, Δ = BE – BEDFT. Since all our ML models use
the same architecture, we denote the quantity to be learned generically
as γ. Separate models are trained
for C 1s and O 1s, which is motivated by the structure of our data:
core-level BEs are strongly species dependent and separated by more
than 100 eV for different atomic species. C 1s excitations occur around
290 eV, whereas O 1s photoelectrons are ejected at approximately 540
eV. The chemical shifts due to different local atomic environments
around a carbon or oxygen core are 2–3 orders of magnitude
smaller.Our ML model is based on kernel ridge regression (KRR),
using kernels constructed from soap_turbo descriptors,
which are a modification[51] of the SOAP
many-body atomic descriptor,[49] providing
improved speed and accuracy. Here, we used the Python interface (“Quippy”)
to the soap_turbo library provided by the QUIP
and GAP codes.[80,81] Briefly, KRR replaces the nonlinear
problem of expressing γ for the
core of atom i as a function of atomic positions,
γ = γ({r ∈ S}), where j runs through all atoms
within an environment of i, S, with a linear problem. Using the “kernel
trick”, the same quantity, γ, is expressed as a linear combination of kernel functions, k(i, t):Here, q are the SOAP-type many-body atomic descriptors that we use
to encode the atomic information about the environment of atom i, and t denotes a number of reference
environments in the training set. The dot-product SOAP kernel (where ζ = 2 in our case) provides
a measure of similarity between i and t that is rotationally and translationally invariant.[49] δ is a parameter, given in eV, which controls the
energy scale, and e0 is a constant reference
energy subtracted during training and then added during prediction.
Conceptually and methodologically, the present approach is similar
to that of the Gaussian approximation potential (GAP) formalism[80,82] and to our previous models of adsorption energetics in carbon-based
materials.[41] The SOAP descriptors encode
atomic structural information up to a certain cutoff radius from the
central atom i. Thus, we implicitly make the assumption
of locality for the binding energies. That is, we assume that only
the arrangement of atoms in the immediate vicinity of atom i affects the core levels of that atom. The validity of
this assumption is illustrated in Figure (middle), where we show that the core-level
BE of the central C atom quickly converges with the cluster size.
Further confirmation is obtained from Figure (bottom), which shows that the difference
between the C 1s excitation from periodic and cluster models is mainly
a constant shift.Model training consists essentially in the
inversion of eq (or,
more precisely, on
a least-squares based optimization of the α) for a set of reference calculations, i.e., during training
both i and t run over the same set
of atomic environments. To prevent overfitting, we use regularization.
Since our data sets for the CHO materials have very few entries (most
notably our GW data set), to collect error statistics
we test all of our models using n-fold cross validation,
where n models are trained, each time leaving out
one of the data points, and the model is tested on the particular
entry that is left out. For the models based on QM9 data, for which
many more training points are available, we train 10 different models
for a given training set size, randomizing each time over training
configurations and test on the remaining configurations. We do not
perform explicit hyperparameter optimization.The basic ML model
architecture used throughout this work is given
by eq . The application
of this model is straightforward for learning the GW and DFT predicted molecular C 1s and O 1s BEs of our QM9 subset.
For the latter, we have a large amount of GW and
DFT data, for which we can even train models based on the GW data alone. In addition, the data sets are “coherent”
in the sense that the DFT and GW data sets are of
equal size and that the computational data in both sets are well-defined
for isolated structures.However, model training and utilization
become more intricate for
the CHO materials, where we have few GW data, which
are in addition only available for the carved clusters. Our main objective
for the CHO materials is to combine DFT and GW data
to (i) improve upon the accuracy of DFT and (ii) overcome the current
limitation of GW calculations to nonperiodic systems.
This implies that we must combine two or more data sets and potentially
also two or more ML models. We propose to compute a corrected binding
energy (BEc) for atom i aswhere {rext} denotes the
atomic environment of i within a periodic DFT calculation
of the extended structures and {rcarv} denotes
a truncated representation of this environment, i.e., the one given
by a carved cluster centered on i. We have therefore
split the input data in eq into two terms, a baseline given by and a correction given by .The rationale for using eq is the following. First,
the Fermi level alignment (important
in experimental solid-state XPS because the sample and detector are
shorted) is provided by a neutral ΔKS calculation of the extended
structure. In this neutral calculation, the BE is computed for the
transition of an electron from the core level to the Fermi level;
this is the type of ΔKS calculation usually performed for solid-state
samples.[13,14,16,24] Second, the correction to this ΔKS BE, that
is, the difference between a GW and a ΔKS calculation
performed on exactly the same system, is assumed to be (i) local (justifying
the use of carved structures) and (ii) independent of whether the
core electron is excited to the vacuum level or the Fermi level. While
arguably intuitive, there is no formal reason, a priori, why these assumptions should hold true. Instead, we verify their
validity from the agreement between computational and experimental
spectra reported in section .With the data partition in eq , we can choose two different routes for predicting
BEc, either (i) train an ML model from {b + Δ}
or (ii)
train an ML model from {b} and another from {Δ}, then obtain
BEc as the sum of both predictions. It is not straightforward
to determine a priori which option gives more accurate
predictions, since the learning rates and available amount of data
points are different for each data set. We explore both strategies
for the CHO materials studied in this work. The outlined hybrid ML
model architecture, i.e., combining data sets from both periodic and
cluster calculations, is further described in section .
Results and Discussion
We start with
a comparison of the GW and DFT predicted
excitations for the CHO molecules and cluster models. Next, the performance
of the ML models for the molecular excitations is discussed. We proceed
with results for the ML models, which are the building blocks for
our hybrid ML architecture of the CHO materials. We then demonstrate
that the hybrid approach is key to achieve quantitatively accurate
XPS predictions of CHO materials for three showcases and introduce
our XPS Prediction Server.
Comparison of ΔKS and GW Excitations
The results of the ΔKS+ and GW calculations are shown in Figure and compared to each other for both the
cluster models of the solid-state CHO materials and our subset of
small CHO-containing molecules from the QM9 data set. Figure allows a direct comparison
between the GW and DFT predictions, since they are
computed on the exact same finite structures and are aligned both
at the respective vacuum levels. In all cases depicted in Figure , the leading difference
between GW and DFT BEs, which we can identify as
the leading error in the DFT prediction, is a systematic underestimation
of the BE. However, this leading error is data set specific. It is
in the range of 1.1–1.2 eV for the QM9 subset and around 1.6–1.7
eV for the CHO clusters. In addition, there are some subtle, but important,
nonsystematic differences. In the remainder of this section, we illustrate
how these subtleties and nonsystematic differences in the data can
be absorbed by our ML models as well as how these models can combine
data sets to improve the accuracy of the predictions.
Figure 3
Comparison between the GW and ΔKS+ results obtained for carved
clusters and QM9 molecules. The dashed
line indicates a linear fit, where the constant vertical shift gives
the leading difference between GW and DFT data. This
shift is specific to each data set and listed in the legend of each
panel.
Comparison between the GW and ΔKS+ results obtained for carved
clusters and QM9 molecules. The dashed
line indicates a linear fit, where the constant vertical shift gives
the leading difference between GW and DFT data. This
shift is specific to each data set and listed in the legend of each
panel.
Learning Molecular Core–Electron Binding
Energies
In the following, we demonstrate how to infer GW-quality core-level BEs from a DFT calculation based on
our QM9 subset of small CHO molecules. Even though molecular excitations
are not the target of this manuscript, the discussion of CHO molecules
is instructive because, even at the GW level, these
systems are small enough that plenty of data can be generated and
trends in ML accuracy and learning rates can be closely monitored.
We explore three different ways how to avoid an expensive GW calculation while retaining GW accuracy.
(1) A ΔKS+ calculation is performed followed by a
rigid shift of the obtained BE. (2) An ML model for the difference
between GW and ΔKS+ results is developed,
and the ML predicted difference is added to the result of the ΔKS+ calculation. (3) An ML model is trained that learns the GW data directly. The expected mean absolute errors (MAEs)
and root-mean-square errors (RMSE) with respect to the GW reference are shown for all three approaches in Table (the method for error estimation
is detailed below).
Table 1
Expected Errors and Timings When Computing
the Core-Electron BE of a CHO-Containing Molecule with Four Different
Approaches: (i) Direct GW Calculation; (ii) ΔKS
Calculation Followed by a Correction Based on a Constant Shift to
Account for the Difference between GW and ΔKS;
(iii) ΔKS Calculation Followed by a Correction Based on an ML
Model of the Difference between GW and ΔKS;
and (iv) Prediction of a ML Model That Learns the GW Result Directlya
GW
ΔKS++
GWML
shift
(GW – ΔKS+)ML
C 1s MAE (meV)
0
75
15
27
C 1s RMSE (meV)
0
105
24
38
O 1s MAE (meV)
0
77
17
37
O 1s RMSE (meV)
0
95
25
61
CPU time (s)
∼300k
∼5k
<1
The errors are (linearly) extrapolated
from the learning curves in Figure to the full size of our CHO-QM9 database: 14 707
and 1 865 unique atomic environments for C and O, respectively.
For simplicity, the GW error is taken as zero, and
the other three approaches are designed to match the GW prediction. The CPU time refers to the average computational cost
per molecule.
The errors are (linearly) extrapolated
from the learning curves in Figure to the full size of our CHO-QM9 database: 14 707
and 1 865 unique atomic environments for C and O, respectively.
For simplicity, the GW error is taken as zero, and
the other three approaches are designed to match the GW prediction. The CPU time refers to the average computational cost
per molecule.
Figure 4
Learning
curves for different ML models based on BE data for CHO-containing
molecules in the QM9 database. GW and ΔKS+ models show very similar learning rates, and the ΔML
model based on the difference between GW and ΔKS+ demonstrates an extremely good learning ability. For each
training set size n, 10 different models are trained,
each of which is constructed from n randomly chosen
training configurations. The errors are then computed by testing the
models on all the structures not used for training, i.e., for a given
model with n training samples, the test set contains
14 707 – n and 1865 – n samples for C and O, respectively. The error bars are
computed by averaging the errors over these 10 different models.
The first approach is motivated by the results in section , where we found
that the
leading difference between a GW and ΔKS prediction
is a constant shift of the energies. If we take a molecule from our
QM9 subset and shift the ΔKS+ results by +1.129 (C
1s) and +1.190 eV (O 1s), the prediction deviates on average by 75
and 77 meV from the GW result, respectively; see
also Table for the
RMSE. These errors are already quite small. However, this approach
still requires a full ab initio calculation at the
DFT level, which is much cheaper than GW but also
becomes computationally unfeasible for large disordered carbon structures.We can improve the speed of the prediction and/or improve the accuracy
of the prediction by using ML models. The learning curves for BE and the difference are reported in Figure for the C 1s and O 1s excitations of the CHO-QM9 subset.
Displayed are the ML errors (MAE and RSME) dependent on the number
of data points used during training. The errors are computed by testing
the models on the portion of the entire database not used for training.
For completeness, we included also the learning curves for the ΔKS
computed BEs in Figure . The models for BE and exhibit similar learning rates with quickly
decreasing errors. The MAE is <40 meV for both ML models when extrapolated
to the limit of all the available data (training plus testing). These
errors are summarized in Table . Figure also
shows that learning the difference, Δ, is easier than directly learning the core-level BEs, as indicated
by an extrapolated MAE of <20 meV. Moreover, the ΔML model
achieves the same accuracy with 50 data points as the GWML model with 2000 data points. Finally, we note that
it is easier to learn the O 1s data than it is to learn C 1s data.
This is due to the higher diversity of possible C atomic environments.Learning
curves for different ML models based on BE data for CHO-containing
molecules in the QM9 database. GW and ΔKS+ models show very similar learning rates, and the ΔML
model based on the difference between GW and ΔKS+ demonstrates an extremely good learning ability. For each
training set size n, 10 different models are trained,
each of which is constructed from n randomly chosen
training configurations. The errors are then computed by testing the
models on all the structures not used for training, i.e., for a given
model with n training samples, the test set contains
14 707 – n and 1865 – n samples for C and O, respectively. The error bars are
computed by averaging the errors over these 10 different models.Assuming the GW value to be the
“golden”
standard, the most accurate prediction for CHO molecules is obtained
from the ΔKS+ + ΔML approach, which reduces
the prediction errors to values in the vicinity of 20 meV. Such errors
are negligible since they are an order of magnitude smaller than the
overall instrumental broadening in XPS experiments and also smaller
than the smallest chemical shifts that can be resolved by analyzing
experimental XPS spectra (i.e, those that do not overlap). The ML
model trained from GW data yields the next-best predictions.
The worst approach is a ΔKS+ calculation followed
by a rigid shift, which has the same computational cost as the ΔML-based
scheme with an error that is ∼5 times larger; see Table . Once the models
are trained, the computationally cheapest prediction is obtained from
the GWML model, offering the best compromise
between accuracy and speed. However, all three strategies are computationally
much cheaper than performing an actual GW calculation,
which is already for small molecules almost 2 orders of magnitude
more expensive than a ΔKS calculation (Table ).
Learning Binding Energies of CHO-Containing
Materials
The generation of ΔKS and, in particular, GW data is significantly more expensive for CHO materials
than for molecules for the following reasons: (i) The size of configuration
space, i.e., the different ways in which CHO atoms can be arranged
in space, is much larger. This implies that more data are necessary
to train ML models of similar quality. (ii) More atoms per site need
to be considered to capture the effect of the chemical environment
on the excitation energy. This is true even when employing carved
structure models, where the number of atoms per atomic environment
can be still of the order of 100–200. If the scaling of the
method of choice is N, this means a cluster calculation
is between ∼5 and 10 times more expensive than a QM9 molecule calculation,
since the cluster will contain 5–10 times more atoms than the
largest molecules in the QM9 data set.Another problem is that
the difference in the computational cost between DFT and GW increases with growing system size. In fact, the highest scaling
steps in GW only start to dominate the calculation
for structures larger than 30–50 atoms,[28] and hardly affect the computational cost for the CHO molecules.
For example, the computational time for a GW calculation
of a carved cluster with 96 atoms (out of which 38 are C atoms) is
400 000 CPU hours, whereas the ΔKS calculation takes
approximately 22 CPU hours for the same system. Compared to the molecular
case (see Table ),
the difference in computational cost between GW and
DFT increased from a factor of 60 to 20 000.For CHO
materials, our ΔKS and GW databases
are 1 and 2 orders of magnitude smaller, respectively, than for the
CHO-QM9 molecules. However, the analysis of the CHO molecules in section reveals the
solution to this problem. We have seen that a ΔML model, based
on the difference between GW and ΔKS predictions,
can be trained to high accuracy with less data than directly training
a model for the BEs at the ΔKS or GW level.
This justifies the strategy of developing hybrid ML architectures
as outlined in section . Starting from eq , we consider two options: (i) We learn a DFT baseline for
the extended (“ext”) structures and apply an ML-predicted
correction based on the difference between GW and
DFT for the carved (“carv”) structures as shown in eq . (ii) We learn the ΔKSext0 baseline and
the Δ term simultaneously as in eq :The performance of the ML models required
to construct the hybrid
ML architectures in eqs and 4 is shown in Figure . We start with the discussion of the ingredients
for eq , i.e., the models for the BEs of the extended CHO
structures (Figure g,i) and the ΔML models for the carved structures (Figure e,f). For comparison,
we also trained ML models for BEs of the carved structures based on GW (Figure a,c) and ΔKS+ data (Figure b,d). We observe that the convergence of
the GW and ΔKS+ models for the BEs
of the carved structures is much slower than for the molecular case.
For instance, the best ΔKScarv+ model for the C 1s excitations (Figure b) still shows a significantly
larger error with over 1300 training samples (MAE = 264 meV), when
compared to the corresponding CHO-QM9 model in Figure (MAE ∼ 65 meV), corresponding to
a 4-fold relative increase of the error. This is easily ascribed to
the much more complex configuration space spanned by CHO materials
compared to small CHO molecules, as discussed before. Nevertheless,
we find that we can train ΔML models (Figure e,f) of reasonably good quality (MAE ∼300
meV) for the carved structures with as little as 150 (C 1s) and 37
(O 1s) data points. This is in line with the observation made for
molecules that less data are needed for the ΔML models. However,
the leading error in eq will originate from the model (Figure g,i) with an MAE of ∼400 meV for both
C 1s and O 1s BEs. Its learning behavior is in fact similar to the model, and the same arguments regarding
the complex configuration space apply.
Figure 5
Performance of the different
ML models for C 1s (black) and O 1s
(red) BEs trained as part of this work. “+” and “0”
refer to how the ΔKS simulation is carried out in practice,
i.e., by either removing the core electron from the sample or promoting
it to the conduction band, respectively. “Ext” stands
for “extended” (periodic) structures, as opposed to
carved structures (“carv”). See text for a detailed
discussion of the figure. The errors were obtained by n-fold cross validation, due to the small size of the training sets.
Performance of the different
ML models for C 1s (black) and O 1s
(red) BEs trained as part of this work. “+” and “0”
refer to how the ΔKS simulation is carried out in practice,
i.e., by either removing the core electron from the sample or promoting
it to the conduction band, respectively. “Ext” stands
for “extended” (periodic) structures, as opposed to
carved structures (“carv”). See text for a detailed
discussion of the figure. The errors were obtained by n-fold cross validation, due to the small size of the training sets.The performance of the ML model where we train
the DFT baseline
and Δ term at the same time (eq ), is shown in Figure h,j. Despite the small size of our training set (150/36
data points for C 1s/O 1s), we obtain MAEs that are with 500 meV in
the range of the overall instrumental broadening in regular XPS experiments
(synchrotron-based XPS experiments can achieve better resolution).The panels k and l in Figure display ΔML models where we compare the core-level
BEs from ΔKS0 calculations of carved clusters to
those of the corresponding periodic structures . The results in Figure (bottom) already indicated that the main
difference between core-level BEs of carved and periodic structures
is a constant shift. The purpose of training these ΔML models
is to assess the validity of the locality assumption inherent to the
carving process in more detail (see ref (83) for a general discussion of locality in atomistic
modeling and ref (45) for a discussion in the context of GAP force fields). Compared to
all other models displayed in Figure a–j, we find that the ML models in panels k
and l show the poorest relative performance, i.e.,
the largest errors relative to the spread of input values. The MAEs
of 284 meV (C 1s) and 216 meV (O 1s) are statistically significant
measures for the intrinsic errors due to the carving procedure. These
MAEs quantify the influence of the discarded portion of the periodic
structure on the core-level BE. In other words, by representing extended
structures via carved clusters truncated at 4.25 Å, we will not
be able to obtain predictions more accurate than these errors, even
in the limit of infinite data. Fortunately, even though individual errors from the carving procedure can be expected on the order of
200 meV, the error in the statistical distribution of the predictions
is more significant for XPS prediction, since an XPS spectrum is constructed
out of the superposition of many individual BE contributions. In addition,
we never require ΔKScarv to be an accurate approximation
of ΔKSext. Instead, we need the difference between
ΔKScarv and GWcarv to
be an accurate approximation of the difference between ΔKSext and a hypothetical GWext calculation,
which we cannot carry out because periodic GW core–electron
BE calculations are currently unavailable.Taking the arguments
for experimental broadening and statistical
distribution into account, we can indeed conclude that the clusters
are reasonably good surrogate models for the extended structures,
a result that will be corroborated in section for actual XPS spectra predictions.A final observation is that, while it may appear that it is easier
to learn C 1s BEs than O 1s BEs, this is solely due to the size of
the training sets, which is in turn dictated by the number of available
C and O environments in the database. As we saw in Figure for the CHO-QM9 molecules,
it is in fact easier to learn O 1s BEs. This is likely due to the
higher diversity of possible atomic motifs for carbon[41] than for oxygen in the CHO system.
Predicting XPS Spectra from the Models
While linking molecular XPS spectra to the computationally predicted
BEs from GW and ΔKS+ is straightforward,
this connection is not so clear for materials. There are two main
differences that pose significant additional challenges. The first
difference is that, in experimental XPS of solid-state samples, the
vacuum level is not an easily accessible reference and the experimental
BEs are typically reported with respect to the Fermi level of the
sample. In the context of electronic structure theory, the Fermi level
is only well-defined for metallic systems. For semiconductors and
insulators, we need to rely on the thermodynamic definition, in which
the Fermi level is given as the derivative of the total (free) energy
with respect to the number of electrons in the system. As discussed
in section and the Supporting Information, one possible
way to estimate the core-level BE with the Fermi level as reference
is to perform a ΔKS0 calculation, where we add the
excited electron to the conduction band and relax the electronic structure.
This is the strategy we follow here and the reason why we learn a
DFT baseline at the ΔKS0 level in eqs and 4.The second difference arises precisely from the need for a ΔKS0 calculation. The electron that we added to the conduction
band will interact with the core hole via the (screened) Coulomb potential
leading to a spurious bound exciton. Compared to valence band holes
in semiconductors, the core hole is extremely localized and the exciton
BE will therefore be quite large (on the order of 0.5 to 1 eV for
CHO materials).[84] Since exciton binding
stabilizes the system, the spurious exciton BE lowers the ΔKS
prediction, compared to the actual core electron BE, i.e., the one
that should be compared to experimental XPS. Dynamical core-hole screening
effects are also unaccounted for, which can further complicate direct
comparison with experiment. Fortunately, these exciton BEs tend to
be highly material-specific and lead to a constant shift of the whole
computational XPS spectra (toward lower values). Our future work will
aim at quantitative estimation of these exciton BEs for improved core–electron
BE prediction. Nevertheless, we find our models to be satisfactorily
accurate, even in the absence of excitonic corrections, for the purpose
of comparing between computational and experimental spectra. We thus
speculate that the contribution of excitonic effects to the chemical
shifts in disordered carbon materials may be small enough to not affect
this comparison.The XPS spectra of the CHO materials are computed
as the superposition
of the individual, experimentally broadened signals of each atom in
a given atomic structure (or “supercell”),where E is an energy in the
spectrum. Each signal E is given for atom i by an ML model for periodic
(“extended”) structures as a function of its atomic
environment S. The latter
is characterized via soap_turbo many-body atomic
descriptors. The smearing function δ(E – E; σ) is chosen to account
for thermal and instrumental broadening. An appropriate choice of
broadening function is, e.g., a normalized Gaussian with width σ
≈ 0.5 eV.[14]We will test three
different models for E in eq , all of which
implicitly use the Fermi level as reference.
(i) The model is employed to predict the BEs followed
by a rigid shift. The shift is obtained from Figure (top): C 1s excitations are shifted by 1.641
eV and O 1s excitations by 1.702 eV. (ii) The hybrid ML model introduced
in eq is used to compute
the GW-corrected BEc. (iii) Equation is employed to obtain an
ML prediction for BEc. The leading physical assumption
for approaches (ii) and (iii) is that the GW correction
to the charged excitation energies carries over to the neutral excitation
case for extended structures. Which of the two GW-corrected ML models is optimal strongly depends on the amount of
available GW data compared to ΔKS data. Even
though eq has two sources
of error, the error in the first term can be made very small with
enough ΔKSext0 data. In eq , the amount of ΔKSext0 data that can be used is limited to those
structures for which GW data is also available, thus
limiting the amount of training data that can be reused.For
the amount of training data that we managed to gather for this
work, both GW-corrected ML models perform very similarly.
We estimate the RMSE for models based on eqs and 4 to be ∼0.697
and 0.685 eV, respectively, for C 1s predictions, where the error
for eq is estimated
as the square root of the sum of the individual squared errors (i.e.,
assuming the individual errors are normally distributed). For O 1s,
the estimated RMSEs are 0.662 and 0.608 eV for eqs and 4, respectively.
The similar performance manifests also in the prediction of the XPS
spectrum of the CHO materials. Figure shows that the predicted peak positions and overall
spectrum shape are very similar. For the prediction of the XPS spectrum
of selected CHO materials in section , we will use eq since this ML model has currently more potential
to be trained to even higher accuracy by gathering more ΔKSext0 data, whereas
a performance improvement with eq would also require additional GW calculations.
Figure 6
Comparison
of the GW-corrected ML models for an
a-CO sample with 19 at-% O.
Comparison
of the GW-corrected ML models for an
a-CO sample with 19 at-% O.
XPS Spectra Predictions for Selected CHO Materials
The ultimate test for the models presented in this paper are predictions
of XPS spectra for realistic structural models of CHO materials and
subsequent comparison to experiment. We present XPS predictions for
three classes of CHO materials: (1) a-C throughout the full range
of deposition energies, which in turn covers the full range of sp2/sp3 ratios observed experimentally; (2) oxygenated
amorphous carbon (a-CO) with different
amounts of oxygen content; and (3) rGO also with varying oxygen concentrations.Experimentally, a-C thin films are grown by a number of physical
deposition methods,[4] where the main deposition
parameter is the kinetic energy of the deposited atoms. Therefore,
to model a-C realistically, we use computational structures generated
in previous work for deposition energies in the range 1–100
eV.[55,85] The XPS predictions of the a-C structures
are shown in Figure a–d, where we have focused on two different regions of thin-film
structures: the bulk of the film (panels a and b), on the one hand,
and the surface layer (panels c and d), on the other. We present XPS
predictions using eq in combination with (i) the + shift model and (ii) the ML model from eq . In the following, we
refer to these models as ΔKS + shift and ΔKS + GWcorr, respectively.
Figure 7
XPS predictions based
on our new methodology for C 1s spectra of
(a,b) a-C bulk and (c,d) a-C surfaces at different deposition energies
and for (e–g) C 1s and (h–j) O 1s spectra of a-CO. The panels g and j show site-resolved contributions
to the spectra. For example, the light-gray colored C atoms in panel
g contribute to the light-gray regions in the spectrum, whereas dark-gray
C atoms contribute to the dark-gray spectral regions. The experimental
a-CO C 1s data was taken from Santini
et al.[8] Compared are two models: predictions corrected by (i) a constant
shift (+1.641 eV for C 1s and +1.701 eV for O 1s) and (ii) GW. All GW-corrected predictions in this
panel are obtained with eq ; see the Supporting Information for a comparison of eq and eq for a-CO.
XPS predictions based
on our new methodology for C 1s spectra of
(a,b) a-C bulk and (c,d) a-C surfaces at different deposition energies
and for (e–g) C 1s and (h–j) O 1s spectra of a-CO. The panels g and j show site-resolved contributions
to the spectra. For example, the light-gray colored C atoms in panel
g contribute to the light-gray regions in the spectrum, whereas dark-gray
C atoms contribute to the dark-gray spectral regions. The experimental
a-CO C 1s data was taken from Santini
et al.[8] Compared are two models: predictions corrected by (i) a constant
shift (+1.641 eV for C 1s and +1.701 eV for O 1s) and (ii) GW. All GW-corrected predictions in this
panel are obtained with eq ; see the Supporting Information for a comparison of eq and eq for a-CO.The deposition energies control the mass density
and sp2/sp3 content of the a-C films. High deposition
energies
yield films with high mass densities and sp3 contents,
whereas low deposition energies correspond to low mass densities and
high sp2 content.[4,55] This is also apparent
from Figure b,d, where
we observe a pronounced transition between an sp2-dominated
XPS spectrum (peak at ≈284.5 eV) for low deposition energy
and an sp3-dominated (peak at ≈285.6 eV) spectrum
at higher deposition energy. The turning point for the transition
is between 5 and 6.5 eV incident atom energy. For the bulk, we have
a clear transition and the sp3 peak at high deposition
energies has the same intensity as the sp2 peak at low
deposition energies. The transition is not fully developed for the
surface layer, where the intensity of the sp3 peak is less
pronounced. The reason is that the surface of the a-C film generally
contains lower coordinated atoms compared to the bulk. a-C films are
sp2-rich at the surface even for very high densities and
may contain significant numbers of undercoordinated sp C motifs, which
are present only in negligible amounts in the bulk.[55,85]Comparing the ΔKS + shift and the ΔKS + GWcorr spectra in Figure a–d, we find that the main effect
of the GW correction is to reduce the width of the
predictions.
The ΔKS + shift model predicts the sp3 peak to be
located 0.5 eV higher than the GW-corrected model.
Experimentally, the separation between sp2 and sp3 features in a-C has been determined to be of the order of 1.1 eV.[86] This is the same separation predicted by our GW-corrected model, whereas the ΔKS model predicts
a separation of ∼1.5 eV. The relative shifts from DFT-based
Δ-methods typically agree well with experiment for small molecules.[17] However, this result indicates that the accuracy
of the relative shifts deteriorates for larger systems, which is a
consequence of the delocalization error in DFT, demonstrating the
need for the GW correction.We discuss next
the XPS predictions for a-CO with different
oxygen contents (15, 19, and 36 at-% O). Figure shows the C 1s (parts
e–g) and O 1s (parts h–j) excitations employing the
ΔKS + shift and ΔKS + GWcorr model from eq . The
main effect of the GW correction is again the reduction
of the spread of the predictions, especially for the O 1s spectrum.
For a comparison between the ΔKS + GWcorr models from eq and eq , see Figure and Figure S3 in the Supporting Information. The peak alignments are very
similar between the two GW correction schemes, with
slight differences regarding the relative intensity and spread of
the lower lying peaks.The correspondence between excitation
energy and atomic motifs
is highlighted in panels g and j of Figure with color codings: light-gray (dark-gray)
colored C atoms in panel g contribute to light-gray (dark-gray) regions
in the C 1s spectrum and light-red (dark-red) colored O atoms in panel
j to light-red (dark-red) regions in the O 1s spectrum. For C 1s spectra,
the lower energy contributions correspond to carbon–carbon
bonds, followed by an increase in the BE as the number of neighboring
O atoms increases. The XPS spectra for a-CO materials with higher oxygen content have consequently more
features at higher energies since the number of epoxide and ether
(C–O–C), keto (C=O), and ester (R–COO–R′)
groups increases. The core-level BEs of the C atoms in these groups
increase also in that order, where the largest C 1s excitation energies
at around 289–290 eV are observed for carboxyl C atoms. For
the O 1s spectra, the distribution is essentially bimodal. At lower
energies, we observe a peak corresponding to carbonyl O atoms from
keto or ester groups. The peak at higher energies originates from
contributions of the O atoms in epoxides and ethers and the hydroxyl
(singly bound) O atom in the ester groups. The relative intensity
of these peaks strongly depends on the oxygen content. In our computational
samples, epoxides, ketos, and esters are present in approximately
13:61:26, 42:40:18, and 8:60:32 percentage ratios at 15, 19, and 36
at-% O, respectively.A comparison to experimental a-CO data
is available for the C 1s spectrum from Santini et al.[8] We observe good agreement for the relative position of
the different peaks present in the C 1s spectra with our ΔKS
+ GWcorr prediction. The agreement is
less good for the peak intensities, but the likely reason is that
the relative concentrations of functional groups in the computational
and experimental samples are different. We can infer from this direct
comparison an oxygen content somewhere in between 19 and 36 at-% (Santini
et al. report ≈37% for this sample) and suggest that a combination
of those two simulated curves would lead to better agreement with
experiment. This in turn suggests that the experimental sample may
be inhomogeneous with respect to the oxygen content distribution.
Reproducing the experimental structure more closely would require
deposition simulations similar to (but more complex than) those in
refs (55 and 85), which are nontrivial
and beyond the scope of this work. In any case, determining the precise
atomic percentages experimentally is difficult because there are instrumental
issues (such as calibration), sample issues (heterogeneity, surface
roughness), methodological issues (e.g., regarding how the peaks are
fitted or how the background was subtracted) and many more.[87−89] Experimental XPS-derived compositions will also often disagree with
other methods, such as X-ray absorption spectroscopy (XAS) or elastic-recoil
detection analysis (ERDA), because of sample inhomogeneity and different
accessible depths. We show below for rGO that, when more candidate
computational structures are available, the atomic percentages can
be resolved more precisely by matching predicted and experimentally
measured spectra. Therefore, the ability of ML-based XPS predictions
to accurately quantify atomic percentages in CHO materials may prove
very useful in guiding and interpretation of experiments.With
our third application, rGO, we demonstrate how our developed
methodology can be used to assess the validity of candidate structural
models for materials. The rGO structures in our database were taken
from Kumar et al.[42] and contain different
amounts of oxygen in the range from 10 to 20 at-%. They were either
generated from COOH-rich GO (series 1) or OH-rich GO (series 2) precursor
structures. Altogether, there are 240 rGO structures with approximately
210 atoms each. We computed the XPS spectra of all of them using the GW-corrected ML model of eq . Note that, altogether, these ML predictions take
only a few minutes on a desktop computer.In Figure , we
compare the predicted spectra for series 1 (top) and series 2 (bottom)
to the experimental rGO spectrum from ref (90). From Figure , we can identify the candidate model structure whose
XPS spectrum best matches the experimental one. The spectra of candidate
structures with a low oxygen content of 10 or 11 at-% clearly differ
from experiment, while the ones with high-O content of 15–20
at-% agree best with the experimental XPS. Clearly, the experimental
sample must contain a large fraction of oxygen. However, it is also
evident that the structural models with high oxygen content are missing
the functional groups that contribute to the feature at ∼289.5
eV in the experimental spectrum. As we saw for the a-CO example, this feature corresponds to carboxyl C
atoms. These specific groups are not present in the rGO reference
database of Kumar et al.,[42] even though
the rGO structures from series 1 were generated from COOH-rich GO
starting configurations. However, unlike hydroxyl groups, the COOH
groups are thermodynamically unstable in the computational structure
generation process and are not present in the final structural models.
Our analysis thus indicates that the composition of the experimental
sample is similar, but not identical, to the structural models with
high O-content. In particular, it sheds light onto the missing bits
of information, in this case, the presence of COOH groups.
Figure 8
Comparison
of C 1s spectra between the computational predictions
made with the present XPS model (with GW corrections, eq ) for the rGO structural
models of ref (42) and
an experimental rGO XPS spectrum from ref (90). “Series 1” and “Series
2” indicate that the computational rGO models were derived
from different initial compositions of the GO precursor; series 1
was generated from COOH-rich GO and series 2 from OH-rich GO.[42] Note that the final content of COOH in rGO for
these models is negligible, regardless of the starting configuration
used.
Comparison
of C 1s spectra between the computational predictions
made with the present XPS model (with GW corrections, eq ) for the rGO structural
models of ref (42) and
an experimental rGO XPS spectrum from ref (90). “Series 1” and “Series
2” indicate that the computational rGO models were derived
from different initial compositions of the GO precursor; series 1
was generated from COOH-rich GO and series 2 from OH-rich GO.[42] Note that the final content of COOH in rGO for
these models is negligible, regardless of the starting configuration
used.
The XPS Prediction Server
We have
set up an online tool that utilizes the different ML models described
throughout this paper. The XPS Prediction Server is available for
free at nanocarbon.fi/xps. The user can upload a model structure in any format readable by
the Atomic Simulation Environment (ASE),[91] and the server will execute a Python script that runs the descriptor
construction (via calls to Quippy[81]) and
performs the kernel regression according to a model of choice. At
the moment, only the CHO models described herein are available, but
models for other materials can be uploaded in the future as they are
developed. The tool works in a fully automated way, and for systems
of usual sizes in the context of DFT modeling of materials (a couple
hundreds of atoms), a prediction can be obtained within seconds. It
is our hope to extend this concept of ML-based computational prediction
to other materials and experimental observables, most notably other
spectroscopic techniques.
Summary and Outlook
We have presented
an ML-based methodology to predict quantitatively
accurate XPS spectra for CHO-containing molecules and materials. We
generated a comprehensive database of computational core-level BEs
from DFT and GW calculations. By careful combination
of DFT and GW data, accurate ML models were trained
for C 1s and O 1s excitations from relatively small data sets. For
molecular BEs, we showed that the errors in the ML predictions can
be reduced to less than 50 meV. The ML models were then applied to
generate XPS spectra of selected CHO materials, namely, a-C thin films,
a-CO, and rGO with different oxygen concentrations.
Our predictions show excellent qualitative and quantitative agreement
with experiment, resolving spectral shapes and features within 0.1
eV for the selected disordered carbon-based materials. We also showed
that ML models trained with DFT data alone cannot reach this level
of predictive power and that data from the more accurate GW approach are indeed crucial. For disordered materials, we expect
that whenever a suitably constructed database with more GW data is available, an XPS ML model can be trained to provide accuracy
close to the practical resolution of common XPS experimental equipment.We demonstrated the potential and suitability of our computational
XPS tool to, e.g., quantify the atomic percentages in a-CO or identify shortcomings in candidate structure
models of rGO. We have made this new methodology freely available
to the public through the XPS Prediction Server. Such a computational
tool may prove valuable in guiding and interpreting experimental work
and in validating computational structural models of materials. We
hope to extend our ML models to other material classes and spectroscopic
techniques (XAS, Raman, IR, NMR, etc.) in the future.
Authors: Ask Hjorth Larsen; Jens Jørgen Mortensen; Jakob Blomqvist; Ivano E Castelli; Rune Christensen; Marcin Dułak; Jesper Friis; Michael N Groves; Bjørk Hammer; Cory Hargus; Eric D Hermes; Paul C Jennings; Peter Bjerre Jensen; James Kermode; John R Kitchin; Esben Leonhard Kolsbjerg; Joseph Kubal; Kristen Kaasbjerg; Steen Lysgaard; Jón Bergmann Maronsson; Tristan Maxson; Thomas Olsen; Lars Pastewka; Andrew Peterson; Carsten Rostgaard; Jakob Schiøtz; Ole Schütt; Mikkel Strange; Kristian S Thygesen; Tejs Vegge; Lasse Vilhelmsen; Michael Walter; Zhenhua Zeng; Karsten W Jacobsen Journal: J Phys Condens Matter Date: 2017-03-21 Impact factor: 2.333