Jie Zhao1, Xiaoyan Wang2. 1. College of Chemical Engineering, Nanjing Tech University, Nanjing, Jiangsu 211816, People's Republic of China. 2. School of Information Engineering, Nanjing Audit University, Nanjing, Jiangsu 211815, People's Republic of China.
Abstract
Perovskite oxides are attractive candidates for various scientific applications because of their outstanding structure flexibilities and attractive physical and chemical properties. However, labor-intensive and high-cost experimental and density functional theory calculation approaches are normally used to screen candidate perovskites. Herein, a machine learning method is employed to identify perovskites from ABO3 combinations formulated as constraint satisfaction problems based on the restrictions of charge neutrality and Goldschmidt tolerance factor. By eliminating five features based on their correlation and importance, 16 features refined from 21 features are employed to describe 343 known ABO3 compounds for perovskite formability and stability model training. It is found that the top three features for predicting formability are structural features of the A-O bond length, tolerance, and octahedral factors, whereas the top nine features for predicting the stability are elemental and structural features related to the B-site elements. The precision and recall of the two models are 0.983, 1.00 and 0.971, 0.943, respectively. The formability prediction model categorizes 2229 ABO3 combinations into 1373 perovskites and 856 nonperovskites, whereas the stability prediction model distinguishes 430 stable perovskites from 1799 unstable ones. Three hundred thirty-eight combinations are recognized as both formable and stable perovskites for future investigation.
Perovskite oxides are attractive candidates for various scientific applications because of their outstanding structure flexibilities and attractive physical and chemical properties. However, labor-intensive and high-cost experimental and density functional theory calculation approaches are normally used to screen candidate perovskites. Herein, a machine learning method is employed to identify perovskites from ABO3 combinations formulated as constraint satisfaction problems based on the restrictions of charge neutrality and Goldschmidt tolerance factor. By eliminating five features based on their correlation and importance, 16 features refined from 21 features are employed to describe 343 known ABO3 compounds for perovskite formability and stability model training. It is found that the top three features for predicting formability are structural features of the A-O bond length, tolerance, and octahedral factors, whereas the top nine features for predicting the stability are elemental and structural features related to the B-site elements. The precision and recall of the two models are 0.983, 1.00 and 0.971, 0.943, respectively. The formability prediction model categorizes 2229 ABO3 combinations into 1373 perovskites and 856 nonperovskites, whereas the stability prediction model distinguishes 430 stable perovskites from 1799 unstable ones. Three hundred thirty-eight combinations are recognized as both formable and stable perovskites for future investigation.
In the past few decades,
substantial attention has been devoted
to perovskite oxides because of their fascinating electrical, optical,
magnetic, dielectric, thermal properties, etc.[1−6] For example, their mixed electronic and ionic conductivities enable
them to be used as electrode materials of solid oxides fuel cells.[6] Due to their visible light response capability
and intrinsic activity for oxygen evolution reaction, some of the
perovskites have been used as photo-electrochemical and electrochemical
catalysts for water splitting.[7] Their multiferroics
responses also make them promising actuators and sensors.[8]Another reason for the enormous application
of perovskite oxides
is their structure flexibility, through which their physical and chemical
properties can be easily tuned. Although their chemical formula ABO3 is simple, the 12-fold coordinated A site can be occupied
with low-valent and large-sized alkali, alkali earth, and rare earth
metal cations, while the B site coordinated with 6-fold oxygen anions
can be filled with high-valent and small-sized transition metal cations.[9] Therefore, tens of elements can be accommodated
at the A and B sites because of the structural flexibility of perovskite
oxides.[10] More ABO3 perovskites
can be acquired by doping the A and/or B sites with more than one
cation based on charge balance.[5]However, not all of the stable ABO3 compounds are perovskite
oxides. For example, CsNbO3 is a stable but not a perovskite
oxide due to its larger A–O and smaller B–O bond lengths.[11] Therefore, it is indispensable to investigate
the formability and stability of ABO3 perovskite oxides
before their utilization. Traditionally, trial-and-error approaches
are used for that. The oxides are initially synthesized using their
corresponding stoichiometric precursors and then characterized to
check their formability and stability.[9] However, those methods depend heavily on researchers’ knowledge
and instinct, and they are time-consuming and labor-intensive.[12] It is impractical for researchers to check all
possible ABO3 combinations using experimental methods.Density functional theory (DFT) calculations are also potential
approaches to predict the formability and stability of perovskite
oxides, achieved by solving the Kohn–Sham equations.[12] Jacobs et al.[13] screened
2145 perovskites using high-throughput DFT calculations to confirm
their stability and catalytic activity for the oxygen reduction reaction.
Fifty-two potential candidates were singled out with good stability
on par with high activity. Emery et al.[14] investigated the thermodynamic stability and oxygen vacancy formation
energy of 5329 perovskites using high-throughput DFT calculations.
They discerned 139 favorable perovskites for thermochemical water
splitting. Tezsevin et al.[15] used high-throughput
DFT calculation to screen cubic perovskites for solid oxide fuel cell
cathode materials. Thirty-one candidates were picked out from 270
ABO3 compounds. Although DFT calculations are effective
measures in screening potential perovskites for various applications,
a large number of calculations that have to be carried out result
in high computational expense.[16,17] Therefore, it is highly
desirable to screen potential perovskite oxides with a more economical
and practical method.Machine learning (ML) approaches use both
successful and failed
experimental and computational data to train models. The models are
then used to forecast whether ABO3 compounds are perovskites
or not.[18] For example, Talapatra et al.[10] used available experimental and computational
data of 1505 single perovskites and 3469 double perovskites to train
classification models for predicting new formable and stable perovskites.
They found that 414 compounds are promising candidates for future
evaluation. Li et al.[16] trained ML models
with 1929 perovskite oxide energies calculated with DFT to predict
the energy above the convex hull (Ehull) and phase stability of perovskite oxides. Liu et al.[19] trained an ML model for predicting the formability
of perovskite with known 397 ABO3 compounds. The model
was then used to classify 891 ABO3 compounds from Materials
Project (MP) database to perovskites and nonperovskites. The stability
of the predicted perovskites was described with E. Sharma et al.[20] investigated the feasibility and stability of defect formation
in perovskite oxides using ML methods, anticipating to determine the
factors that affect the defect formation energy of perovskite oxides
during substitution.For the ML methods, appropriate features
related to the formability
and stability of perovskite oxides should be carefully selected for
model training. Up till now, many features have been employed to be
indicators for that.[10,16,19,20] Goldschmidt tolerance factor t was initially proposed to geometrically describe the likelihood
of perovskite formation.[21] It is defined
as t = (rA + rO)/(√2(rB + rO)),
where rA, rB, and rO are the Shannon ionic radii
of the A, B, and O ions,[22] respectively.
For the ideal cubic perovskite structure, t is close
to 1. Experimental evidence suggests that the tolerance factor of
the cubic perovskite structure is in the range of 0.9–1.0,[10] and the range of t can be extended
through structure distortion.[10,23] Nevertheless, some
ABO3 compounds with 0.8 < t < 0.9
do not have perovskite structures, which indicates that additional
features should be used in addition to tolerance factor.[24] The octahedron BO6 is a basic component
for perovskite structure.[24] As the ratio
of r and r is limited in a certain range for BO6 octahedron, it is appropriate to use their ratio (μ
= rB/rO) to
describe the formability and stability of perovskites. Another tolerance
factor defined as tBV = dA–O/√2d was also proposed.[23] Rather than using Shannon ionic radii of the A, B, and
O ions, it utilized A–O and B–O bond lengths (dA–O and dB–O) based on the bond-valence model to calculate tBV. Features with respect to elemental properties of A-
and B-site atoms such as highest occupied molecular orbital (HOMO),
lowest unoccupied molecular orbital (LUMO) energies, ionization energy
(IE), electronegativity (X), and Mendeleev numbers
are also applied for predicting the formability and stability of perovskites.[10,19] Instead of single features, some complex features were also effective
in describing the structural formability of ABO3 perovskites.[19,25]In this work, formable and stable perovskite oxides are screened
from unexplored ABO3 combinations using ML approaches.
The unexplored combinations are generated by a constraint satisfaction
problem (CSP) technique from 73 elements based on the restrictions
of charge neutrality and the Goldschmidt tolerance factor. An input
data set composed of 343 known ABO3 compounds, which are,
respectively, described by 21, 16, and 17 features, is employed to
train ML models for predicting the formability and stability of perovskites.
The 16 and 17 features are refined from the 21 features on the basis
of their correlations and importance. The performance of the six models
is then evaluated using a confusion matrix. The formability prediction
model trained with 16 features classifies the unexplored ABO3 combinations into 1373 perovskites and 856 nonperovskites, while
the stability prediction model trained with those features categorizes
them into 430 stable and 1799 unstable perovskites.
Results and Discussion
Feature Correlation and
Importance
In this work, 21 features were employed to describe
the input data
set for ML model training. To refine them and eliminate those with
high correlation and less importance, the pairwise Pearson correlation
coefficients of the 21 features were calculated, and their importance
was evaluated using the random forest (RF) algorithm and the recursive
feature elimination (RFE) feature selection approach, respectively.Figure shows the
Pearson correlation coefficient (r, r∈[−1, 1]) for the 21 features. The number in each square
represents the coefficient of each pair of features. The positive
values mean that the two features are positively correlated; on the
contrary, the negative values signify that the two features have negative
correlations. In addition, the bigger the absolute value of the coefficient,
the stronger the correlation between the features. Normally, |r| > 0.8 indicates a very strong relationship. Therefore,
the feature pairs of (XA, MA), (XA, ΔXAO*rA/rO), (B-IE, XB), (XB, ΔXBO*rB/rO), (B-ZR, ΔXBO*rB/rO), (μ, dB–O),
(μ, ΔXBO*rB/rO), and (MA, ΔXAO*rA/rO) have strong correlation
because their coefficients are 0.91, 0.88, 0.85, 0.86, −0.86,
0.86, −0.81, and 0.83, respectively. Therefore, some of these
features can be eliminated since they have strong linear correlations.
Figure 1
Pearson
correlation coefficients between the 21 features.
Pearson
correlation coefficients between the 21 features.To explore the importance of the 21 features, we collect their
importance values from the results of constructing 100 formability
and stability prediction models using the RF algorithm. As shown in Figure a, rather than the
normally accepted t,(10,20)dA–O is recognized as the most important feature
in predicting the formability of perovskites. The structural features
of dA–O, t, and
μ are the top three features. Most elemental features are inferior
to the structural features though some of them such as A-ZR and B-ZR
show relatively high importance. Therefore, the perovskite formability
is dominantly determined by the structural features regarding both
A- and B-site elements. Figure b displays the feature importance for predicting the stability
of perovskites. It is remarkably different from that for predicting
the formability of perovskites. Instead of dA–O and t, B-HOMO and B-ZR are two
of the most important features. The top nine features are all elemental
and structural features with respect to the B-site elements, indicating
that the B-site element properties are determining factors for predicting
the stability of perovskites. Most of the features related to the
A-site element rank at the bottom.
Figure 2
Feature importance of the 21 features
in predicting the (a) formability
and (b) stability of perovskites.
Feature importance of the 21 features
in predicting the (a) formability
and (b) stability of perovskites.Therefore, based on the feature correlation and importance investigation,
the 21 features were refined to 16 features for model training for
the perovskite formability and stability prediction model training
by removing the feature sets of [XA, XB, ΔXAO*rA/rO, ΔXBO*rB/rO, dB–O] and [dB–O, ΔXBO*rB/rO, A-IE, XA, MA], respectively.For comparison, the RFE feature selection approach was also used
to investigate the importance of the 21 features. It was found that
A-HOMO, B-EA, B-LUMO, and A-EA were the least important features for
the perovskite formability model training, while A-HOMO, A-EA, A-LUMO,
and A-ZR were eliminated for the most times for the perovskite stability
model training; therefore, they were removed from the 21 features,
leaving 17 features for perovskite formability and stability model
training.
Predicting the Formability of Perovskites
The refined 16 and 17 features (FG-2 and FG-3) were used for the
formability model training, and the models were named model 2 and
model 3, respectively (Figure b). For comparison, the 21 features (FG-1) were also adopted
for model training, and this model was named model 1 (Figure b). Confusion matrices for
the three models are shown in Figure a–c. The accuracy and precision for models 1–3
are 0.977, 0.967, 0.988, 0.983, and 0.942 and 0.950, respectively.
Their recalls and F1 scores are 1.00, 0.983, 1.00, 0.992, and 0.966
and 0.958, respectively. The 100% recall signifies that all of the
perovskites are differentiated from the test data set. The almost
100% F1 score implies the outstanding reliability of our RF classifier
models to distinguish perovskites from nonperovskite compounds. Figure d–f portrays
the receiver operating characteristic (ROC) curves of those formability
prediction models. The area under the ROC curve (AUC) denotes a measure
of separability. The higher the AUC, the better the classification
model. The AUCs of the three models are 0.995, 0.999, and 0.9856,
respectively. The performance of the models for predicting the formability
of perovskites in this study is better or comparable to that in other
studies. For example, the accuracy, precision, recall, and AUC of
a perovskite formability prediction model were 0.9401, 0.9344, 0.9913,
and 0.96, respectively.[10] The accuracy
of another RF model for predicting the formability of the ABX3 perovskite was 0.9655.[26]
Figure 7
Workflow for
the prediction of the formability and stability of
perovskites in this study. (a) Three thousand fifty-seven ABO3 combinations are generated from 129 A and B cations and oxygen
ions using a CSP method. The combinations are then compared with the
ABO3 compounds in the Materials Project[27] database. Two thousand two hundred twenty-nine ABO3 combinations are found to be not included in the database.
(b) Three formability classification models are trained with the input
data set of 343 known ABO3 compounds, and each compound
is described with three groups of features (FG-1, FG-2, and FG-3),
respectively. Model 2 is then used to investigate the formability
of the 2229 ABO3 combinations. (c) Three hundred five ABO3 compounds from the input data set are selected for stability
prediction model training. The stability status for them is described
with Ehull, and the feature groups of
FG-1, FG-4, and FG-5 are also used to describe the compounds. Model
5 is then used to predict the stability of the 2229 ABO3 combinations.
Figure 3
Confusion matrixes
(a–c) and ROC curves (d–f) for
the perovskite formability prediction models trained with 21, 16,
and 17 features, respectively.
Confusion matrixes
(a–c) and ROC curves (d–f) for
the perovskite formability prediction models trained with 21, 16,
and 17 features, respectively.Moreover, it can be seen that, compared with the other two models,
the ML model trained with FG-2 displayed the highest accuracy, precision,
F1 scores, and AUC, indicating its excellent performance in distinguishing
formable perovskite oxides from ABO3 compounds, probably
owing to the elimination of the features with less importance and
strong correlations.Due to its better performance, model 2
was used to predict the
formability of the 2229 ABO3 combinations. It was found
that 1373 and 856 combinations were classified as perovskites and
nonperovskites, respectively. The probability heat map for the classified
perovskites is shown in Figure . Each small rectangle in it represents an ABO3 combination. The deeper the red of the rectangles, the higher the
probability. It can be seen that the heat map is divided into several
regions by A = [Re, Li, Ca, Ag, Al, Zn, W, Mg, I] and B = [Re, Rb,
Cs, Ba, Sr, I]. This should be attributed to the exclusion of ABO3 compounds in the MP database from our ABO3 combinations
and the low possibility of them to be A- and B-site elements. Furthermore,
Nb, Bi, Tl, In, and Zr are more likely to be the A-site elements,
while Lu, Y, Ce, Zr, and Sc tend to be the B-site elements.
Figure 4
Probability
heat map of the classified 1373 ABO3 perovskite
oxides with a possibility of more than 50%.
Probability
heat map of the classified 1373 ABO3 perovskite
oxides with a possibility of more than 50%.
Predicting the Stability of Perovskites
Models 4–6 (Figure c) for predicting the stability of perovskites were trained
with 305 known compounds from the input data set. Each compound was
described with 21 features (FG-1), refined 16 (FG-4), and 17 (FG-5)
features, respectively. Apart from those features, Ehull was also used to describe the thermodynamic stability
of the compounds. The confusion matrixes and ROC curves for the three
models are shown in Figure . In general, our models exhibited excellent performance in
distinguishing stable perovskites from ABO3 compounds,
especially for model 5. The accuracy, precision, recall, F1 score,
and AUC of model 4 were 0.948, 0.970, 0.914, 0.941, and 0.985, respectively,
while those for model 5 were 0.961, 0.971, 0.943, 0.956, and 0.983,
respectively. Hence, model 5 was more robust in differentiating stable
and unstable ABO3 perovskites than model 4, possibly owing
to the elimination of redundant features with high correlations. The
accuracy, precision, recall, F1 score, and AUC of model 6 were 0.961,
1.00, 0.914, 0.955, and 0.966, respectively. Apparently, model 5 also
showed better quality than model 6, although it showed a little bit
lower precision than model 6. The models for forecasting the perovskite
stability in this work demonstrated better or comparable performance
compared to the models in other studies. For instance, the accuracy,
precision, recall, and AUC of an RF model for predicting the perovskite
stability were 0.941, 0.933, 0.936, and 0.98, respectively.[10] Li et al.[16] developed
several ML models for predicting perovskite thermodynamic stability
using several machine learning algorithms, and the best model showed
an accuracy of 0.93 and a precision of 0.89.
Figure 5
Confusion matrixes (a–c)
and ROC curves (d–f) for
the perovskite stability prediction models trained with 21, 16, and
17 features, respectively.
Confusion matrixes (a–c)
and ROC curves (d–f) for
the perovskite stability prediction models trained with 21, 16, and
17 features, respectively.As model 5 showed better performance than the other two models,
it was used to predict the thermodynamic stability of the 2229 ABO3 combinations. It was found that 430 of them were classified
as stable perovskites. The probability heat map for the predicted
stable combinations is displayed in Figure . It could be seen that ABO3 combinations
with A = [Mg, Mn, Pb, Zr, Sm] and B = [Zr, Hf, Nb] were stable with
high probabilities.
Figure 6
Probability heat map of classified 430 stable perovskites
with
a probability of more than 50%.
Probability heat map of classified 430 stable perovskites
with
a probability of more than 50%.On the basis of the prediction results, we found that 338 ABO3 combinations were predicted to be both formable and stable
perovskites. Of these, 17 combinations were had a probability of higher
than 0.8, as shown in Table . These compounds would be promising candidates for further
evaluation.
Table 1
Formable and Stable Perovskite Oxides
with Probability Higher than 0.8
ABO3
formable
probability
stable probability
ABO3
formable
probability
stable probability
HfVO3
1
1
ErCeO3
1
0.818
OsVO3
0.909
1
VZrO3
0.909
0.818
TaCrO3
0.909
0.909
CoNbO3
0.909
0.818
CdZrO3
1
0.909
PdHfO3
0.909
0.818
PtCrO3
1
0.909
TcNbO3
0.909
0.818
PdVO3
1
0.909
AuNbO3
0.818
0.818
NiNbO3
0.818
0.909
MoScO3
0.909
0.818
TcMnO3
1
0.818
RhScO3
0.909
0.818
TaMnO3
0.909
0.818
Methods
Design Workflow
The overarching ML
workflow for the prediction of the formability and stability of ABO3 perovskites in this study is shown in Figure . Initially, 129 cations and oxygen ions were used to generate
ABO3 combinations by a CSP technique based on the restrictions
of charge neutrality (m + n = 6)
and tolerance factor (0.76 ≤ t ≤ 1.44),
and 3057 ABO3 combinations were therefore obtained (Figure a). The combinations
were then compared with the ABO3 compounds existing in
the Materials Project (MP)[27] database.
Surprisingly, 2229 ABO3 combinations were not included
in the database, and they are used for further investigation.Workflow for
the prediction of the formability and stability of
perovskites in this study. (a) Three thousand fifty-seven ABO3 combinations are generated from 129 A and B cations and oxygen
ions using a CSP method. The combinations are then compared with the
ABO3 compounds in the Materials Project[27] database. Two thousand two hundred twenty-nine ABO3 combinations are found to be not included in the database.
(b) Three formability classification models are trained with the input
data set of 343 known ABO3 compounds, and each compound
is described with three groups of features (FG-1, FG-2, and FG-3),
respectively. Model 2 is then used to investigate the formability
of the 2229 ABO3 combinations. (c) Three hundred five ABO3 compounds from the input data set are selected for stability
prediction model training. The stability status for them is described
with Ehull, and the feature groups of
FG-1, FG-4, and FG-5 are also used to describe the compounds. Model
5 is then used to predict the stability of the 2229 ABO3 combinations.An input data set composed of
343 ABO3 compounds was
employed to train machine learning models for predicting the formability
of perovskites (Figure b), which is an intersection of the data sets gathered by Talapatra
et al.,[10] Liu et al.,[19] and Zhang et al.[23] Of these,
218 compounds were perovskites, while 125 compounds were nonperovskites.
Each compound was described with three groups of features. Feature
group 1 (FG-1) consisted of 21 features, while feature group 2 (FG-2)
and feature group 3 (FG-3) were composed of 16 and 17 features, respectively,
which were refined from FG-1 based on their correlation and importance.
The models trained with FG-1, FG-2, and FG-3 were named model 1, model
2, and model 3, respectively. The performance of the three models
was then evaluated and compared. Afterward, model 2 was used to navigate
through a data set of 2229 ABO3 combinations (Figure a).Three hundred
five ABO3 compounds from the input data
set, which were described with FG-1, feature group 4 (FG-4), and feature
group 5 (FG-5), were selected to train stability classification models
(Figure c). FG-4 and
FG-5 were acquired from FG-1 by getting rid of five and four features
from it on the basis of their correlation and importance, respectively.
Apart from these features groups, the compounds were also described
with Ehull. An Ehull threshold of 50 meV/atom was used to distinguish thermodynamically
stable and unstable compounds. The models trained with FG-1, FG-4,
and FG-5 were named model 4, model 5, and model 6, respectively, and
model 5 was then utilized to sieve stable perovskites from the 2229
ABO3 combinations.
Generation
of ABO3 Combinations
Due to the structural flexibility
of ABO3 perovskite
oxides, A and B sites can accommodate various cations. Generating
ABO3 combinations and screening potential perovskite oxides
from them with high accuracy and efficiency is therefore of great
importance. In this study, the generation of new ABO3 combinations
was formulated as constraint satisfaction problems (CSPs): an outstanding
problem-solving paragon in artificial intelligence. For a CSP, a set
of variables are included, each variable is correlated with a finite
domain, and a number of limitations related to these variables are
built. The CSP is solved by satisfying all of the limitations. Formally,
the combination problem is defined as a tuple ⟨V, D, C⟩where V is a set of variables, AD (BD) is the possible
A-site (B-site) cations, and C is a set of constraints:
the sum of the A and B valence values equals 6. The tolerance factor
is in the range of 0.76 and 1.44. The backtracking technique was used
to search for satisfactory solutions (new ABO3 combinations).
Seventy-three elements forming 129 cations due to the multivalence
of some elements were used for generating the ABO3 combination
(Figure ). With the
restrictions of charge neutrality and tolerance factor, 3057 ABO3 combinations were generated. After eliminating the combinations
that exist in the MP database, 2229 combinations were left, which
were further used for predicting the formability and stability of
perovskites.
Figure 8
Elements used for generating the ABO3 combination
using
a CSP technique.
Elements used for generating the ABO3 combination
using
a CSP technique.
Feature
Selection and Reduction
Until
now, many features have been used to indicate the formability and
stability of perovskite oxides. For example, Zhang et al.[23] applied bond lengths of A–O and B–O
(dA–O and dB–O) to fundamentally show the formability and stability
of perovskite oxides. Liu et al.[19] used
a group of nine features to train a Gradient Boosting Decision Tree
model for predicting the formability of perovskites, which were the
Goldschmidt tolerance factor, octahedron factor (μ = rB/rO), radius ratio
of A to O (rA/rO), dA–O and dB–O, electronegativity difference
between A(B) and O (ΔXAO and ΔXBO) multiplied by rA/rO (rB/rO), and the Mendeleev numbers of A and B (MA and MB). In another
work, Talapatra et al.[10] adopted a group
of 14 features (reduced from 28 features due to the elimination of
mismatch factors for ABO3 combinations) to train ML models
for predicting the formability and stability of perovskites. The structural
features include the Goldschmidt tolerance factor (t) and octahedral factor (μ), while the elemental features contain
the highest occupied molecular orbital (HOMO) energy, lowest unoccupied
molecular orbital (LUMO) energy, ionization energy (IE), electronegativity
(X), electron affinity (EA), and Zunger’s
pseudopotential radius (ZR) for A- and B-site cations.Although
the features in refs (10), (19), and (23) were employed to predict
the formability and/or stability of perovskite oxides, they were different
dimensional features. This indicated that the formability and stability
of perovskite oxides could be described with various dimensional features.
Therefore, in this study, the intersection of the features (21 features)
from those references was adopted, and their correlations and importance
were investigated to get rid of the redundant and less important features.
The correlations of the features were investigated by calculating
the Pearson correlation coefficient between each pair of them, while
the feature importance was evaluated using the RF algorithm and the
RFE approach.
Classification Model Training
Classification
is a supervised learning notion that categorizes a set of data into
classes. In our study, the input data set was employed to train the
formability and stability classification models using the RF classifier
implemented in the Scikit-learn package. An RF algorithm consists
of many decision trees and merges them together to improve its classification
accuracy and stability. It uses bootstrapping to select training data
and construct a classifier. To tune the models for better fit and
prediction, a grid search combined with a 10-fold cross-validation
method is normally used for parameter selection and evaluation. Grid
search optimizes the model by traversing parameter combinations. Cross-validation
uses the training data set repeatedly, splits the data set, and combines
them into different training and test data sets. The training data
set is used to train the RFC model, while the test data set is used
for model evaluation. Our formability and stability classification
models were both trained on 80% of the input data set and then tested
on the residual 20%. The number of trees and the maximum tree depth
in the forest for the formability and stability classification models
were 11, 12, 10, and 7, respectively.
Model
Evaluation
Confusion matrixes
were used to evaluate the performance of the RFC models. There are
two rows and two columns for each matrix. Each row signifies the perovskites
in an actual class, while each column denotes the perovskites in a
predicted class. Therefore, four blocks are included for each confusion
matrix, which are true positive (TP), false positive (FP), true negative
(TN), and false negative (FN).Four metrics, accuracy, precision,
recall, and F1 score, calculated from the confusion matrixes were
used for classification model evaluation. Accuracy is a ratio of correct
predictions to the total number of samples in the test data set (eq ). Precision is the proportion
of positive predictions that is actually correct (eq ). It reflects the model’s
ability to distinguish the positive class. A recall is defined as
the ratio of TP to the sum of TP and FN (eq ). The higher the recall, the stronger the
model’s ability to recognize positive samples. F1 score is
the harmonic mean of the precision and recall, as shown in eq . It tries to find the
balance between precision and recall. The higher the F1 score, the
more robust the model.
Conclusions
In summary, a machine learning method was employed to screen formable
and stable perovskite oxides. We introduced a constraint satisfaction
technique in artificial intelligence to generate ABO3 combinations
that meet the constraints of charge neutrality and tolerance factor.
An input data set of 343 known compounds was used to train machine
learning models using a random forest classifier. The models were
then adopted to screen formable and thermodynamically stable perovskite
oxides from the ABO3 combinations. Based on the findings
in this study, the following conclusions can be drawn: The constraint
satisfaction technique is efficient in generating a new ABO3 combination. The refined 16 features were enough to predict the
perovskite formability and thermodynamic stability. The perovskite
formability was dominantly determined by the structural features regarding
the A- and B-site elements, whereas the perovskite stability was primarily
governed by the elemental features related to the B-site element.
Three hundred thirty-eight combinations that do not exist in the Materials
Project database were recognized as promising candidate perovskites
for future evaluation.
Authors: Emiliana Fabbri; Maarten Nachtegaal; Tobias Binninger; Xi Cheng; Bae-Jung Kim; Julien Durst; Francesco Bozza; Thomas Graule; Robin Schäublin; Luke Wiles; Morgan Pertoso; Nemanja Danilovic; Katherine E Ayers; Thomas J Schmidt Journal: Nat Mater Date: 2017-07-17 Impact factor: 43.841