Xue Jiang1, Yong Wang2, Baorui Jia2, Xuanhui Qu2, Mingli Qin2. 1. Beijing Advanced Innovation Center for Materials Genome Engineering, Collaborative Innovation Center of Steel Technology, University of Science and Technology Beijing, Beijing 100083, People's Republic of China. 2. Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing, 100083, People's Republic of China.
Abstract
Transition metal (such as Fe, Co, and Ni) oxides are excellent systems in the oxygen evolution reaction (OER) for the development of non-noble-metal-based catalysts. However, direct experimental evidence and the physical mechanism of a quantitative relationship between physical factors and oxygen evolution activity are still lacking, which makes it difficult to theoretically and accurately predict the oxygen evolution activity. In this work, a data-driven method for the prediction of overpotential (OP) for (Ni-Fe-Co)O x catalysts is proposed via machine learning. The physical features that are more related to the OP for the OER have been constructed and analyzed. The random forest regression model works exceedingly well on OP prediction with a mean relative error of 1.20%. The features based on first ionization energies (FIEs) and outermost d-orbital electron numbers (DEs) are the principal factors and their variances (δFIE and δDE) exhibit a linearly decreasing correlation with OP, which gives direct guidance for an OP-oriented component design. This method provides novel and promising insights for the prediction of oxygen evolution activity and physical factor analysis in (Ni-Fe-Co)O x catalysts.
Transition metal (such as Fe, Co, and Ni) oxides are excellent systems in the oxygen evolution reaction (OER) for the development of non-noble-metal-based catalysts. However, direct experimental evidence and the physical mechanism of a quantitative relationship between physical factors and oxygen evolution activity are still lacking, which makes it difficult to theoretically and accurately predict the oxygen evolution activity. In this work, a data-driven method for the prediction of overpotential (OP) for (Ni-Fe-Co)O x catalysts is proposed via machine learning. The physical features that are more related to the OP for the OER have been constructed and analyzed. The random forest regression model works exceedingly well on OP prediction with a mean relative error of 1.20%. The features based on first ionization energies (FIEs) and outermost d-orbital electron numbers (DEs) are the principal factors and their variances (δFIE and δDE) exhibit a linearly decreasing correlation with OP, which gives direct guidance for an OP-oriented component design. This method provides novel and promising insights for the prediction of oxygen evolution activity and physical factor analysis in (Ni-Fe-Co)O x catalysts.
Catalysts play a critical role
in oxygen electrochemical processes for renewable energy storage and
conversion devices such as fuel cells, artificial photosynthesis,
and metal–air batteries.[1−6] For example, the oxygen evolution reaction (OER) involves a four-electron
(4e–) transfer, and such a complicated process results
in sluggish reaction kinetics.[7−9] Therefore, the OER is considered
as the main bottleneck toward the practical implementation of polymer
electrolyte membrane (PEM) electrolysis and water splitting. Noble-metal
(Ru and Ir) catalysts exhibit high OER activity and can reduce the
required overpotential; however, their scarcity and high cost limit
their wide application.[10−12] Transition metal (such as Fe,
Co, and Ni) oxides are excellent systems in the OER for the development
of non-noble-metal-based catalysts.[13] The
quantitative prediction of OER activity is critical for transition
metal oxide catalyst design.[14] Due to fact
that direct experimental evidence and the mechanism of the quantitative
relationship between physical factors and oxygen evolution activity
from physical insights are not yet clear, OER activity prediction
remains an unsolved challenge.[13] Usually,
scientists calculate the adsorption energetics of different chemical
structures by density functional theory and indirectly infer the OER
activity, which is more suitable for rationalizing observed activity
trends and facts rather than predicting them in advance from the large
potential chemical space. Consequently, a predictive method as the
function of latent influencing factors needs to be captured for OER
activity.Machine-learning (ML) approaches are transforming
materials research
by changing the paradigm from “trial and error” to a
data-driven methodology, thereby accelerating the discovery of new
materials.[15−25] Recently, the catalyst community has begun to utilize ML tools to
accelerate the overpotential prediction of the oxygen evolution reaction
for single atoms,[26] forecast Ni-Co-Fe-Ce
water oxidation catalysts[20] and evaluate
the perovskite chemistry factors of OER activity.[14] Various factors of oxide perovskite catalysts have been
demonstrated over the past 60 years, such as the reaction free energy
and eg occupancy, which were obtained by DFT calculations.
A simple factor, μ/t, was derived from symbolic
regression for perovskite catalysts.[27] A
good factor should be simple and yet provide physical insight to therefore
guide and accelerate the discovery of new oxide OER catalysts. Despite
the great potential, its use has been notably absent in transition
metal oxide systems, such as (Ni-Fe-Co)O catalysts.In this work, we introduce a data-driven method
to predict the
overpotential (OP) for (Ni-Fe-Co)O catalysts
using a machine-learning algorithm. The relationship between multiple
physical features and OP properties was successfully determined, by
considering valence electron number, relative atomic mass, atomic
number, atomic radius (nonbonded), covalent radius, ionization energies
(first), electron affinity, electronegativity (Pauling scale), and
outermost d-orbital electron number. The random forest regression
model works exceedingly well with a mean relative error of 1.20% evaluated
on a hold-out set. The importance of physical features has been further
analyzed. The first ionization energies (FIEs) and outermost d-orbital
electron numbers (DE) are the principal factors, and their variances
(δFIE and δDE) exhibit a linearly
decreasing correlation with OP. They give a direct guidance for an
OP-oriented component design for (Ni-Fe-Co)O catalysts. Our work aims to provide novel and promising physical
insight for OER activity of (Ni-Fe-Co)O catalysts.All of the data used in this work were collected
and screened from
the published studies of Haber by a high-throughput experiment.[27] We consider the oxide catalysts that belong
to the NiCoFe system, where the mole fractions
of each element of x, y, and z is constrained by x + y + z = 100%. The data set consists of 496 entries
(see the Supporting Information for details),
covering the elemental composition in percent representing different
(Ni-Fe-Co)O materials and characterizing
their overpotential (OPs) using a 10 s chronopotentiometry measurement
at 10 mA/cm2 in O2-saturated 1.0 M NaOH(aq).[27] The overall data set therefore possesses three
features (input variables) represented by composition with respect
to the elements and one target OP (output variable). Figure gives the whole data visualization
carried out with the Python programming language[28] and the statistical data visualization library Seaborn.[29] The composition for each of Ni, Co, and Fe ranges
from 0 to 1 by even steps of 3.33 atom % and covers the whole composition
space that may be formed. As the composition for each of Ni, Co, and
Fe increases, the OP shows an overall trend of first decreasing and
then increasing. This method can obviously show the optimal composition
combination that minimizes the OPs of metal oxide catalysts. Complete
and comprehensive data provide a reliable basis for a prediction model
and physical fact analysis.
Figure 1
Original data set visualization by categorical
plots of Ni, Co,
Fe, and overpotential.
Original data set visualization by categorical
plots of Ni, Co,
Fe, and overpotential.Physical features are
critical for representing the intrinsic relationship
between the latent fact and OP property. From the perspective of the
atom level, nine accessible primary physical features by empirical
experience given in Table are used, including the valence electron number, relative
atomic mass, atomic number, nonbonded atomic radius, covalent radius,
first ionization energy, electron affinity, Pauling scale electronegativity,
and outermost d-orbital electron number. For Ni, Co, and Fe elements,
their physical properties corresponding with these features are collected
from The Royal Society of Chemistry’s interactive periodic
table database.[30] The composition (C) and the associated elemental
properties were adopted to numerically represent each catalyst sample
with the featured transformation functions of eqs and 2 for the purpose
of converting the original chemical element space to a primary physical
features space. For each catalyst sample in the 496 entries, X̅ calculates the weighted average of the element
content corresponding to each physical feature, and δ produces the variance for each physical feature
reflecting the physical difference of chemical element. In eqs and 2, C is the mole fraction
of each element and P corresponds to the properties of each element, respectively. Thus,
18 descriptive features that may be physically relevant to the OP
are constructed by X̅ and δ. The original data set is transformed into a new
data set with 496 × 18 shape (see the Supporting Information for details).
Table 1
Material Physical
Features, Abbreviations,
Units and the Transformation Formulas
features
abbreviation
unit
formula
valence electron number
VEN
relative
atomic mass
RAM
atomic number
AN
atomic radius,
nonbonded
RA
Å
covalent radius
RC
Å
first
Ionization energy
FIE
kJ mol–1
electron affinity
EA
kJ mol–1
electronegativity (Pauling
scale)
EP
-
outermost d-orbital electron
number
DE
-
In order to remove the linear correlation between two variables,
Pearson correlation coefficients were then calculated before machine
learning. Pearson correlation coefficients can be described aswhere R is the correlation coefficient matrix,
C is the covariance matrix. Covariance indicates the level to which
two variables vary together, and it belongs to [-1,1]. The closer
the |R| value is to
1, the higher the linear correlation between the two variables is.
The heatmap of Pearson correlation coefficients for the physical features
and OP are shown in Figure . Features with an absolute value of correlation coefficient
greater than 0.95 were considered highly correlated (the boxes in
dark blue and dark red in Figure ). In the highly correlated feature pairs, one can
be linearly expressed and replaced by the other. For example, the
correlation coefficient between and is 1;
thus, is retained
instead of . Therefore, , , , δVEN, δRAM, δAN, δRA, and δEA are excluded. Finally, the data set contains 496 entries with 10
features (, , , , , , δRC, δFIE, δEP and δDE) and 1 target property.
Figure 2
Heat map
of Pearson correlation coefficients for the physical features
and OP.
Heat map
of Pearson correlation coefficients for the physical features
and OP.The OP is plotted as a function
of , , , , , , δRC, δFIE, δEP and δDE, respectively, as
shown in Figure .
It is obvious that OP decreases first and then rises with the increase
of and . The
optimal values of and may be set at around 750 and 7 to achieve
the lowest OP. As δRC, δFIE, δEP, and δDE increase, OP decreases, and the
phenomenon is more obvious for δFIE and δDE.
Figure 3
OP as a function of , , , , , , δRC, δFIE, δEP, and δDE.
OP as a function of , , , , , , δRC, δFIE, δEP, and δDE.Machine-learning algorithms can fit a data-driven overpotential
model with the selected physical features as the input and the target
property OP as the output. Before model construction, model selection
and parameter optimization ensure which algorithm together with the
suitable parameters would perform the best and be considered. The
data set with 10 features and 1 target property was split by 80% (397
data entries) as the training set and 20% (99 data entries) left as
the testing set (hold-out set). Several well-known machine-learning
algorithms were used, such as stochastic gradient descent regression
(SGDR) with penalties of L1 and L2, lasso regression (Lasso), elastic
net regression (ElasticNet), multilayer proceptron (MLP), tree regression
(TreeR), Adaboost regression (AdaBR), gradient boosting regression
(GBR), random forest regression (RFR), logistic regression (LogisticR),
kernel ridge regression (KernelRidge), Bayesian Ridge regression (BayesRidge),
support vector regression (SVR) with radial basis function kernel,
and k-nearest neighbor regression (KNR). Parameter
tuning was performed by a grid search on the training set by 5-fold
cross validation for each machine-learning model, and the parameter
with the best average mean squared error was determined. Then, the
model was trained with the best parameter on the training set. Figure shows the mean squared
errors (MSE) and the standard deviations for different models during
model selection. GBR, RFR, KernelRidge and KNR models exhibit excellent
MSEs and uncertainties, and among them RFR has the lowest MSE of 40.6.
Figure 4
Mean squared
errors for different models during model selection.
Mean squared
errors for different models during model selection.Then, we retrained the RFR model with the optimized parameters
on the training set and evaluated the metrics of MSE and mean relative
error (MRE) on 20% testing set, respectively. Figure a shows the diagonal scatter plot for the
predicted OP and the ground truth by the RFR model during training
and testing on the basis of the transformed and selected data set.
RFR performs the best with an MRE value of 1.20% and an MSE value
of 49.79. Figure b
shows the contour map of the predicted overpotential by the RFR model
under different compositions. The RFR model performs well on the “unseen”
hold-out data set, whose MSE value is only 9.19 higher than that in
training process, illustrating that this model can be used for OP
prediction with generalization capability in (Ni-Fe-Co)O catalysts.
Figure 5
Machine-learning model by RFR. (a) Diagonal
scatter plot for the
predicted OP and the ground truth by RFR. (b) Contour map of the predicted
overpotential by the RFR model under different compositions. (c) Physical
feature importance ranking by the RFR model.
Machine-learning model by RFR. (a) Diagonal
scatter plot for the
predicted OP and the ground truth by RFR. (b) Contour map of the predicted
overpotential by the RFR model under different compositions. (c) Physical
feature importance ranking by the RFR model.Next the contributions each physical feature makes to the high-precision
OP prediction model will be revealed. On the basis of the trained
RFR model, the 10 physical features are ranked by their feature importance
in Figure c. The greater
contribution to high-precision OP prediction one feature makes, the
higher the importance index is. δFIE and δDE are the most critical factors supported by the RFR model,
and from the perspective of model precision, δFIE is more important than δDE. The importance indices
of and are relatively lower than that of δFIE and δDE but higher than the others. When
the trends of the physical features (δFIE, δDE, and ) in Figures and 5 are combined, the δFIE and δDE values can be used to give direct
guidance to design an optimal component for Ni, Co, and Fe for excellent
OP, by adjusting the contents of different elements to make the two
simple factors δFIE and δDE greater.In summary, a data-driven approach to predict the OP for (Ni-Fe-Co)O catalysts is proposed via machine learning.
The physical features that are more related to the catalyst overpotential
for the OER are constructed, covering valence electron number, relative
atomic mass, atomic number, atomic radius (nonbonded), covalent radius,
first ionization energy, electron affinity, electronegativity (Pauling
scale), and outermost d-orbital electron number. The random forest
regression model works exceedingly well with a mean relative error
of 1.20%. The simple and easily accessed factors (δFIE and δDE) by the variances of first ionization energies
(FIE) and outermost d-orbital electron number (DE) importance are
captured, exhibiting a linearly decreasing correlation with OP. They
give direct guidance for the OP-oriented component design for (Ni-Fe-Co)O catalysts. Our work aims to provide novel
and promising physical insights into the OER activity of (Ni-Fe-Co)O catalysts.
Authors: Paul Raccuglia; Katherine C Elbert; Philip D F Adler; Casey Falk; Malia B Wenny; Aurelio Mollo; Matthias Zeller; Sorelle A Friedler; Joshua Schrier; Alexander J Norquist Journal: Nature Date: 2016-05-05 Impact factor: 49.962