To expand the unchartered materials space of lead-free ferroelectrics for smart digital technologies, tuning their compositional complexity via multicomponent alloying allows access to enhanced polar properties. The role of isovalent A-site in binary potassium niobate alloys, (K,A)NbO3 using first-principles calculations is investigated. Specifically, various alloy compositions of (K,A)NbO3 are considered and their mixing thermodynamics and associated polar properties are examined. To establish structure-property design rules for high-performance ferroelectrics, the sure independence screening sparsifying operator (SISSO) method is employed to extract key features to explain the A-site driven polarization in (K,A)NbO3 . Using a new metric of agreement via feature-assisted regression and classification, the SISSO model is further extended to predict A-site driven polarization in multicomponent systems as a function of alloy composition, reducing the prediction errors to less than 1%. With the machine learning model outlined in this work, a polarity-composition map is established to aid the development of new multicomponent lead-free polar oxides which can offer up to 25% boosting in A-site driven polarization and achieving more than 150% of the total polarization in pristine KNbO3 . This study offers a design-based rational route to develop lead-free multicomponent ferroelectric oxides for niche information technologies.
To expand the unchartered materials space of lead-free ferroelectrics for smart digital technologies, tuning their compositional complexity via multicomponent alloying allows access to enhanced polar properties. The role of isovalent A-site in binary potassium niobate alloys, (K,A)NbO3 using first-principles calculations is investigated. Specifically, various alloy compositions of (K,A)NbO3 are considered and their mixing thermodynamics and associated polar properties are examined. To establish structure-property design rules for high-performance ferroelectrics, the sure independence screening sparsifying operator (SISSO) method is employed to extract key features to explain the A-site driven polarization in (K,A)NbO3 . Using a new metric of agreement via feature-assisted regression and classification, the SISSO model is further extended to predict A-site driven polarization in multicomponent systems as a function of alloy composition, reducing the prediction errors to less than 1%. With the machine learning model outlined in this work, a polarity-composition map is established to aid the development of new multicomponent lead-free polar oxides which can offer up to 25% boosting in A-site driven polarization and achieving more than 150% of the total polarization in pristine KNbO3 . This study offers a design-based rational route to develop lead-free multicomponent ferroelectric oxides for niche information technologies.
Ferroelectrics are an important class of functional materials used in a variety of information and digital technologies including multilayer ceramic capacitors,[
] ferroelectric random access memory,[
] ferroelectric photovoltaic device,[
] or energy converters.[
,
] In addition, ferroelectric materials also inherently exhibit piezoelectricity and pyroelectricity,[
] which makes them suitable for many related technological applications. Given their broad applicabilities to many modern technologies, high‐performance ferroelectric materials with bolstered spontaneous polarizations are in high demand.Notably, due to increasing environmental concerns, lead‐free ferroelectrics[
,
,
] have been in the limelight of eco‐friendly materials design and development. One of the more promising ferroelectric oxides, potassium niobate (KNbO3; from now also referred to as KNO) and its alloys are widely investigated for its high tunability of physico‐chemical properties via multi‐cation substitutions, and this family of KNO‐based alloys has definitely extended with time.Specifically, (K,Na)NbO3 (or in short, KNN) alloys, at various compositions, are well‐known emerging functional ferroelectrics due to their high Curie temperature, T
c along with its attractive electromechanical properties.[
,
,
,
,
] In this regard, the successful tuning of ferroelectric properties based on A‐site substitution has set a strong platform for multicomponent A‐site engineering of KNO for enhanced/boosted ferroelectric and piezoelectric performance.[
,
,
,
,
,
]In the same vein of thought, experiments on the (K,Li,Na)NbO3 ternary alloy system have shown a promising enhancement of electromechanical properties.[
,
] Furthermore, other chemical variants of KNO‐based alloys, such as (K,Na,Rb)NbO3 and (K,Na,Cs)NbO3, have also been investigated for their improved single crystal growth via balancing the ionic radius differences between the co‐dopants.[
,
] In addition, it has also been proposed that the multicomponent solubility limit in these “high entropy ceramic alloys” may be even raised further (i.e., potentially higher than what has been reported so far) due to the multicomponent entropy stabilizing effect.[
]In spite of all these advantages and the notable interest and importance in developing functional lead‐free ferroelectrics, an atomistic design rule for the precise engineering of these multicomponent polar materials is very much lacking. High‐throughput experiments are often plagued with synthesis challenges and simple first‐principles models alone are deemed inadequate for these highly complex multicomponent alloys.To address this pressing problem, many studies are now relying on the use of big data‐driven machine learning (ML) algorithms to search the vast chemical space to predict key material properties and have shown promising results. For instance, Tian et al.[
] have very recently employed active learning and accelerated search from a high‐dimensional virtual space of multicomponent phase diagrams for BaTiO3‐based ferroelectrics. However, like for many materials science and engineering problems, the performance (i.e., the error and efficiency) of these classical ML methods is highly sensitive to the size of available data, and usually very large sets of data are needed for accurate and efficient training and prediction.[
,
]In contrast to the aforementioned conventional ML approaches,[
,
] the recently developed sure independence screening sparsifying operator (SISSO)[
] is a very promising method that helps in the identification of the best physically interpretable descriptor of a target property. By searching extensive nonlinear feature spaces generated via a combination of algebraic/functional operations recursively, SISSO is able to extract effective descriptors for even relatively sparse data.[
,
] Therefore, with this cutting‐edge sparse data‐driven technique, one may now derive simple (yet accurate) physically interpretable ML models for the prediction of complex multicomponent ferroelectric alloys.In this work, using first‐principles density‐functional perturbation theory (DFPT) calculations and an improved SISSO ML model, we aim to examine systematically the structural, thermodynamic, and polar properties of the so‐called binary KNO alloys by varying the A‐site replacement with a gradual substitution of K with A (where A = Li, Na, Rb, and Cs). To illustrate our coupled DFPT+SISSO ML model, we will focus on the polarization boosting (denoted as in this work) in these binary KNO alloys by the mixing of A‐site cations and establish a simple and physically intuitive descriptor for the prediction and classification of the values. By leveraging that this descriptor only contains the atomic primary features of constituent species, we will then proceed to build an improved ML model (using a new metric of agreement) for multicomponent KNO‐based alloys and validate this against DFPT calculations, the well‐known Vegard's relation, and the conventional SISSO model. Last, we will provide an accurate and feature‐derived polarity‐composition map to aid experimentalists in their search for high performance multicomponent ferroelectrics in the unchartered vast chemical space of KNO‐based alloys.
Results and Discussion
Solid Solutions: Crystal Structures and Stability Predictions
Potassium niobate (KNbO3; shorthanded as KNO) undergoes a series of polymorphic phase transitions to lower symmetry phases with decreasing temperature. The most commonly reported polymorphic phases are the cubic (), tetragonal (P4mm), orthorhombic (Amm2), rhombohedral (R3m) crystals.[
] In this work, we have limited our investigation to only the tetragonal and orthorhombic phases since these are the experimentally well‐known phases known to exist at room temperature (300 K) and above.[
,
] Here, the calculated KNO lattice constants agree with available experimental lattice parameters[
,
,
] (to within 1% as presented in Table S1, Supporting Information).To engineer and derive new properties from KNO, it is common to replace a certain percentage of the K atoms (at the A‐site in KNO) with isovalent Group 1 alkali metals.[
,
,
,
] Using the special quasi‐random structure (SQS) approach, the optimized atomistic models of the idealized solid solution K1 − A
NbO3 (where A = Li, Na, Rb, and Cs) alloys are shown in Figure
. To assess the formability of these alloyed perovskites, simply using the elemental information of Shannon ionic radii[
] (where r
Li < r
Na < r
K < r
Rb < r
Cs; cf. Table S3, Supporting Information) and formal charges (n
A where a formal charge of +1 is assumed), we calculate and present the conventional Goldschmidt tolerance factor,[
]
, and the new tolerance factor,[
]
. More details on the Goldschmidt and new tolerance factors can be found in the Supporting Information.
Figure 1
a) Atomic structural models of K1 − A
NbO3 (where A = Li, Na, Rb, and Cs) for x = 0.00, 0.25, 0.33, 0.50, 0.66, 0.75, and 1.00. The atoms of K, A, Nb, and O are depicted by white, blue, gray, and red spheres, respectively. b) The calculated Goldschmidt (t) and new (τnew) tolerance factors of K1 − A
NbO3 weighted by the atomic size of the A‐site cations. c) The A‐site cation‐normalized volume of K1 − A
NbO3 as a function of x composition. For (b) and (c), the markers for the corresponding A‐site cations are labeled following the color scheme in legend.
a) Atomic structural models of K1 − A
NbO3 (where A = Li, Na, Rb, and Cs) for x = 0.00, 0.25, 0.33, 0.50, 0.66, 0.75, and 1.00. The atoms of K, A, Nb, and O are depicted by white, blue, gray, and red spheres, respectively. b) The calculated Goldschmidt (t) and new (τnew) tolerance factors of K1 − A
NbO3 weighted by the atomic size of the A‐site cations. c) The A‐site cation‐normalized volume of K1 − A
NbO3 as a function of x composition. For (b) and (c), the markers for the corresponding A‐site cations are labeled following the color scheme in legend.In Figure 1b, we find that most of the solid solutions are formable, with the exception of pristine CsNbO3, which t is higher than the cutoff tolerance of t = 1.11 (as shown by the vertical dotted line). This instability can be attributed to the large Cs cationic size and formability is improved as the composition of K cations increases between 0.76 < t < 1.11.[
] In addition, taking τnew < 4.18 as the suggested cutoff limit for formability, we conclude that all proposed K1 − A
NbO3 alloy structures are formable. Additional thermodynamic (including configuration entropy considerations[
]) analysis in Figure S2, Supporting Information corroborates with our findings. We caution that constraining our investigations to the parent tetragonal and orthorhombic phases of KNO is motivated from experimental reports[
,
] and the possibility of secondary phase segregation and associated defects are not included.Besides their formability, we have also examined the impact of cationic exchange on the volume changes (which is known to influence the polar properties of ferroelectrics).[
] In Figure 1c, we plot the normalized volume (per A‐site cation) as a function of the compositional changes for the substituents Li, Na, Rb, and Cs. As clearly shown, for these binary K1 − A
NbO3 alloys, linear dependencies of the normalized volume with compositional changes nicely reflect the well‐known Vegard's relation[
,
] commonly reported for alloys and solid solutions. To be more specific, in case of K1 − Li
NbO3 (and K1 − Na
NbO3), the normalized volumes decrease with increasing concentrations of Li (and Na). This can be rationalized by the smaller cationic sizes of Li and Na when compared to that of K. Conversely, the normalized volumes of K1 − Rb
NbO3 and (K1 − Cs
NbO3) increases with increasing concentrations of Rb (and Cs) where now the much larger cationic sizes of Rb and Cs become visible.In fact, besides the normalized volume, we find a linear dependency for the lattice constants as a function of composition as well (cf. Figure S3, Supporting Information). In a nutshell, the structural properties of binary K1 − A
NbO3 alloys via the alkali metal cation‐exchange method can be reasonably justified by the well‐known Vegard's relation.
Trends in Spontaneous Polarization
After addressing the formability of these KNO‐based alloys, we turn our attention to examine the impact of isovalent cation‐exchange on their spontaneous polarization (P
s). The P
s value for pristine tetragonal and orthorhombic KNO is calculated to be 36.4 and 41.2 μCcm−2, respectively. This agrees well with the previously reported experimental value of 37 μCcm−2 for tetragonal KNO[
] and 41 μCcm−2 for orthorhombic KNO.[
] The small deviations here are attributed to possible thermal effects where the reported experimental values were obtained at room temperature while our theoretical calculations are performed for 0 K.By definition (cf. Equation (11)), P
s is strongly correlated to individual atomic displacements and their associated local electron density fluctuations. On this note, the total P
s can be decomposed into partial P
s (for a specific atom or a collective group of atoms). For instance, in this work, it makes sense to inspect the contributions of the A‐site cation and NbO6 octahedron (where both the B‐site Nb cation and surrounding O anions operates collectively). It has been previously shown that for the pristine KNO system, K contributes minimally to the total P
s while the major contribution comes from the NbO6 octahedrons.[
,
] This suppression of the polarization response of K reflects the strong ionic interactions between K and O ions.[
]In Figure
, we plot the total and partial P
s (due to the A‐site cations and the NbO6 octahedrons) for the KNO‐based alloys in the tetragonal (upper panels) and the orthorhombic (lower panels) phases. We note that the general trends for both polymorphic phases are very similar. In Figure 2a, for K1 − Li
NbO3 and K1 − Na
NbO3 alloys, the P
s is clearly driven by the NbO6 octahedrons and the A‐site cations, while in Figure 2b, the A‐site cations in K1 − Rb
NbO3 and K1 − Cs
NbO3 alloys do not contribute to the total P
s. For the latter systems, Nb off‐centering distortions in the NbO6 octahedrons are the main mechanism for ferroelectricity in both K1 − Rb
NbO3 and K1 − Cs
NbO3 alloys.
Figure 2
Spontaneous polarization, P
s of tetragonal (upper panel) and orthorhombic (lower panel) K1 − A
NbO3 (where A = Li, Na, Rb, and Cs) as a function of A‐site cation concentration, x. To illustrate the site‐specific contributions to P
s, the A‐ and B‐site driven polarization is categorized in (a), while that due predominantly to the B‐site (i.e., due to the NbO6 octahedron distortions) in (b). Data points for the total P
s, B‐site cations' contribution, K ion's contribution, and A‐site cations' contribution are represented by gray, blue, red, and yellow markers, respectively.
Spontaneous polarization, P
s of tetragonal (upper panel) and orthorhombic (lower panel) K1 − A
NbO3 (where A = Li, Na, Rb, and Cs) as a function of A‐site cation concentration, x. To illustrate the site‐specific contributions to P
s, the A‐ and B‐site driven polarization is categorized in (a), while that due predominantly to the B‐site (i.e., due to the NbO6 octahedron distortions) in (b). Data points for the total P
s, B‐site cations' contribution, K ion's contribution, and A‐site cations' contribution are represented by gray, blue, red, and yellow markers, respectively.More specifically, the above finding can be explained by considering the relatively smaller cationic size of Li (1.25 Å) and Na (1.39 Å) as compared to that of K (1.64 Å).[
,
,
] This is nicely reflected in the study[
] of Bilc and Singh where high concentration of Li was substituted in KNO (rather similar to our case). Observing the tilt instabilities due to A‐site disorder in the solid solutions, they proposed that a strong A‐site driven P
s boosting can be rationalized by the polar distortion of Li (which is in an agreement with our atomic displacement analysis presented in Figure S4, Supporting Information).[
,
,
]As for the larger Rb (1.72 Å) and Cs (1.88 Å) cations, a larger A‐site occupancy allows for the Nb–O bond in the NbO6 octahedrons to increase/distort more with an overall larger volume expansion (see Figure 1c). This leads to further Nb off‐centering distortions via a second‐order Jahn–Teller effect[
] as identified by the recently proposed anisotropic bond elongation index (Figure S5, Supporting Information).[
] To sum things up, steric effects (from the varying sizes of cations at the A‐site) can be used to enhance polar distortions in both the A‐ and B‐site cations. These results offer an additional engineering rule of tuning the A‐site occupancy via alloying in addition to the known intra‐octahedral distortions in perovskites.Besides these binary K1 − A
NbO3 alloys, more complex ternary alloys have been investigated for their further enhanced P
s and piezoelectric performance, for example, for (K,Li,Na)NbO3 in contrast to (K,Na)NbO3.[
,
] (K,Na,Rb)NbO3 and (K,Na,Cs)NbO3 ternary alloys have also been explored for their improved single crystal growth via different co‐dopant ionic radii compensation mechanism.[
,
] If one were to go even further to consider “high entropy ceramic alloys” of KNO, it has been postulated that the multicomponent solubility limit in these complex multicomponent alloys may be improved due to the multicomponent entropy stabilizing effect.[
]
Feature‐Assisted Machine Learning Workflow
SISSO Method
From the view point of experiments and computations, the precise engineering and atom‐by‐atom design of these complex multicomponent polar ceramics are still very challenging and demanding despite the high interest in employing them for targeted digital and information technologies. This is often impeded by synthesis challenges and the inadequacies of conventional first‐principles models. To mitigate this problem, we extend our study of binary KNO alloys by considering modern machine learning (ML) models (namely, the use of feature engineering[
] via the SISSO method[
]) with our calculated DFPT results as inputs to examine the possibility of high ferroelectric response in multicomponent KNO alloys.Being one of the state‐of‐the‐art ML approaches for new materials design, SISSO allows one to extract physically insightful features of a target property. It distinguishes itself from other ML methods in providing interpretable and explainable results, and thus overcoming the “impenetrable black box” of generic conventional ML models.[
,
] Furthermore, SISSO allows one to extract key features for even relatively sparse data[
,
] unlike in the case of classical ML approaches where big‐data is often needed.[
,
,
,
] Therefore, with this cutting‐edge sparse data‐driven ML technique, we proceed to perform feature extraction via the SISSO method for polar property prediction in multicomponent KNO alloys.
Targeted Property of Interest:
In our ML workflow in Figure
, we start by defining the total spontaneous polarization, P
s of the KNO‐based alloys as the main property of interest. To streamline our process and make it more efficient, we apply two physical constraints to confine our study to focus on more insightful features:
Figure 3
a) Feature‐assisted SISSO machine learning workflow to predict the A‐site polarization boosting, for multicomponent KNO‐based alloys. b) For the tetragonal phase, the root‐mean‐square error (RMSE) of (in %) is plotted as a function of the training data set size, N
using the supervised data sampling method. The insert presents the RMSE values as a box‐plot where the standard deviation is applied for the whiskers and the quartiles are determined via the Tukey method. Outlier points are indicated by diamond markers. c) Similar data is shown for the orthorhombic phase, as is outlined in (b). d) For the tetragonal phase, the comparison between the predicted (ML) via the SISSO model and the calculated (DFT+VL). The data here is presented for both the supervised (with markers) and unsupervised (as a density contour map) sampling methods. The corresponding RMSE values are also listed. e) Similar data is shown for the orthorhombic phase, as is outlined in (d). Note that the reported RMSEs in (b–d) are based on the 2004 test data points generated via a Vegard's law‐like interpolation. More details can be found in the main text.
We assume that the KNO lattice parameters of the tetragonal or orthorhombic polymorphic phases are preserved with a relatively small dopant concentration (in the A‐site), based on the multicomponent entropy stabilized effect,[
]We consider the term , that is, the ratio of the projected spontaneous polarization due to the A‐site cations to the total spontaneous polarization of the alloy. Thus, can be appreciated as an A‐site boosted polarization factor which can be expressed as a percentage of the P
s.a) Feature‐assisted SISSO machine learning workflow to predict the A‐site polarization boosting, for multicomponent KNO‐based alloys. b) For the tetragonal phase, the root‐mean‐square error (RMSE) of (in %) is plotted as a function of the training data set size, N
using the supervised data sampling method. The insert presents the RMSE values as a box‐plot where the standard deviation is applied for the whiskers and the quartiles are determined via the Tukey method. Outlier points are indicated by diamond markers. c) Similar data is shown for the orthorhombic phase, as is outlined in (b). d) For the tetragonal phase, the comparison between the predicted (ML) via the SISSO model and the calculated (DFT+VL). The data here is presented for both the supervised (with markers) and unsupervised (as a density contour map) sampling methods. The corresponding RMSE values are also listed. e) Similar data is shown for the orthorhombic phase, as is outlined in (d). Note that the reported RMSEs in (b–d) are based on the 2004 test data points generated via a Vegard's law‐like interpolation. More details can be found in the main text.As evident from Figure 2, although the contribution of NbO6 octahedrons to P
s is dominant and well‐understood (i.e., via intra‐octahedral tilting mechanism),[
,
,
] the preferred use of is palpable for the enhanced P
s in for example, K1 − Li
NbO3 alloys.
Primary Features and Pearson Correlations Coefficients
To avoid the unwanted bias on the choice of site‐averaged primary feature classes (s),[
,
] we proceed to filter the s using the Pearson correlations coefficients (PCCs) under the feature selection step of our workflow (Figure 3a). The calculated PCCs of s against are tabulated in Table S4, Supporting Information and presented in Figure S6, Supporting Information. To prevent a computationally intractable feature space, we selectively omit less correlated s to by considering a cutoff of |PCC| ⩾ 0.85, that is, s with |PCC| < 0.85 are excluded.Based on this selection, the following s are chosen for this study: The electron affinity (EA, with PCC = 0.95), the Pauling electronegativity (χ, with PCC = 0.88), and the atomic radius (r, with PCC =−0.88). In the construction of the initial feature space, the chosen s (EA, χ, and r) will then be extended to each site of the perovskite structure, that is, , PFNb, and PFO to train the SISSO model.[
,
]Considering the influence of the A‐site on , we rationalize the chemical origin of the strongly correlated features – and – by examining the chemical bonding characteristics between the A‐site cations and the neighboring O anions. The electron affinity, EA is defined as the energy released when an electron is added to a neutral atom while the Pauling electronegativity, χ of an atom is the power to attract the shared electrons within a chemical bond.[
]For the lighter Group 1 elements, the A−O bonds (i.e., for Li–O and Na–O) formed will have a more covalent nature due to their higher EA values when compared to that of K, facilitating a back donation of electrons from O to the A‐site cations.[
] In addition, their lower χ differences (i.e., χO − χLi = 2.46 and χO − χNa = 2.51) when compared to that for the K−O bond (χO − χK = 2.62) lend support to the more dominant covalent bonding character in Li−O and Na−O. These chemical bonding characteristics are also reflected in the strong negative correlation between the primary feature and .
Training Data Sets: Supervised versus Unsupervised Sampling Methods
To perform the SISSO feature extraction for the binary K1 − A
NbO3 alloys, data from both the tetragonal and orthorhombic phases are used together to minimize phase‐dependency, in contrast to the actual regression where the phase‐dependent training sets are then used to evaluate the performance of the descriptor sets. In particular, for the SISSO feature extraction, we have used the 42 DFT data points, while during the validation process (via linear regression based on Equation (14)), an additional 2004 interpolated data points per phase (by assuming an almost linear behavior – Vegard‐like (VL) relation – between the DFT data points in Figure 2 have been included to validate the SISSO model's performance. More detailed information on these SISSO models and processes can be found in the Supporting Information.In Figure 3b,c, we demonstrate the convergence of the RMSE for the SISSO regression as a function of the training data size, N
, for the tetragonal and orthorhombic phase, respectively. Using the 42 DFT calculated data points presented in Figure 2, we propose two ways to build the training data for convergence tests: 1) the supervised sampling method where a subset of the 42 DFT data points are systematically chosen while considering the linearity in the polarization trends (where details are outlined in the Supporting Information); and 2) the unsupervised sampling method to mimic statistical randomness where 100 unique combinations of this subset are generated for each N
.Using this approach to rationalize our data sampling strategy, we clearly show that the convergence in the RMSE of the is obtained for N
used in this study. For N
⩽ 26 for each phase, it is apparent that both the supervised and unsupervised sampling method yield a high RMSE value. However, when using the supervised sampling method for N
⩾ 26, very small and converged RMSE values of 0.39% and 0.15% are achieved for both the tetragonal and orthorhombic K1 − A
NbO3 alloys, respectively. Higher corresponding median (Q2) RMSE values of 0.99% and 1.20% for the tetragonal and orthorhombic phases are found for the unsupervised random sampling method.From the inserts of Figure 3b,c, and Table
, the standard deviation (σ) and the quartile information (from the box plots) underscore the highly dispersed RMSE values for the unsupervised sampling method. We caution that, although the unsupervised sampling method may eventually yield a low RMSE value with increasing N
(close to that obtained by the supervised data sampling approach), this means that the RMSE value of the unsupervised sampling method may not be an appropriate representative value given the large variances observed. It is also worthy to note that the very high σ values observed for the unsupervised sampling method in Table 1 originates from its highly skewed distribution with a very long tail.
Table 1
Evaluation of SISSO descriptors as a function of the training data set size, N
using both the supervised and unsupervised data sampling methods for both the tetragonal and orthorhombic phases. The respective RMSE values (for the predicted in %) are calculated by fitting the data to determine a
and b
in Equation (14). For the unsupervised data sampling method, the first quartile (Q1; for 25% of the data), the second quartile (Q2, also known as the median; for 50% of the data), the third quartile (Q3; for 75% of the data), and the RMSE's standard deviation (σ) are tabulated. In passing, we note that the test data set used here to assess the RMSEs is generated via a Vegard's law‐like interpolation. This is in contrast to the traditional machine learning approach where part of the training data set is normally withheld for testing
Phase
Nt
Supervised
Q1
Unsupervised RMSE [%]
σ
RMSE [%]
Q2
Q3
10
17.80
2.48
4.89
37.54
2.03 × 1014
Tetra
18
1.50
0.96
2.10
6.94
3.19 × 1012
26
0.39
0.61
0.99
1.92
4.94 × 1012
10
12.86
2.89
5.17
36.21
1.98 × 1014
Ortho
18
1.22
1.12
1.83
7.09
7.76 × 1012
26
0.15
0.78
1.20
1.99
3.26 × 1012
Evaluation of SISSO descriptors as a function of the training data set size, N
using both the supervised and unsupervised data sampling methods for both the tetragonal and orthorhombic phases. The respective RMSE values (for the predicted in %) are calculated by fitting the data to determine a
and b
in Equation (14). For the unsupervised data sampling method, the first quartile (Q1; for 25% of the data), the second quartile (Q2, also known as the median; for 50% of the data), the third quartile (Q3; for 75% of the data), and the RMSE's standard deviation (σ) are tabulated. In passing, we note that the test data set used here to assess the RMSEs is generated via a Vegard's law‐like interpolation. This is in contrast to the traditional machine learning approach where part of the training data set is normally withheld for testingUpon a closer inspection of the SISSO‐derived RMSE and feature ranks for different N
values via the supervised training data sampling approach, we find that not only are the RMSE values well converged, the first ranked SISSO descriptors are exactly the same for both N
= 26 and N
= 42. This lends support to the fact that the RMSE difference between N
= 26 and N
= 42 is only numerical while the essential physical insight from the SISSO model is already captured for N
= 26. This underscores an important fact that the SISSO method (which is based on sparse‐data compressed‐sensing formulation) has successfully “reproduced a high‐quality reconstructed signal starting from a very small set of observations”,[
] highlighting that for this study, N
= 26 is sufficient to train the SISSO model. Having said that, we emphasize here that the test set for the binary alloys is generated from the linear Vegard‐like (VL) relation and thus the observed linear correlation is inevitable (but physical). Hence, the observed RMSE for the binary alloy systems is considerably lower and the performance of the SISSO models here can be taken as an overestimation. However, to provide a better performance metric for these SISSO models, we will report the RMSE for the multicomponent alloy systems later in this work.
Optimized SISSO Model: Descriptors and Prediction
Considering the first ranked SISSO regression model (R
1), the R
1 SISSO descriptors ( and ) for N
= 26 (and N
= 42) are:
Using Equation (14), the for the tetragonal and orthorhombic K1 − A
NbO3 alloys can be predicted using:
and
respectively.Thus, using Equations (3) and (4), we derive and plot the relationship between the SISSO‐predicted and the DFT+VL‐derived in Figure 3d,e, for the tetragonal and orthorhombic K1 − A
NbO3 alloys, respectively. Most importantly, using both the supervised (as markers) and unsupervised (as density contour maps) training data sampling methods, we can clearly illustrate this agreement graphically, highlighting the advantage of the supervised sampling method over the unsupervised one. From now, with a well converged SISSO‐derived regression model using the supervised training data sets for the binary K1 − A
NbO3 alloys, we will proceed to address A‐site polarization boosting () for multicomponent KNO‐based alloys.
Predicting A‐Site Polarization Boosting for Multicomponent Alloys
To demonstrate the use of this SISSO model to predict the for multicomponent KNO‐based alloys, we have chosen the tetragonal penternary K1 − (Li
Na
Rb
Cs
)
NbO3 system as an example. This is motivated by previous reports where less complex ternary KNO‐based alloys (e.g., (K,Na,Rb)NbO3 and (K,Na,Cs)NbO3) have been reported and proposed to be stabilized via the multicomponent entropy effect.[
,
,
,
] We anticipate that the more complex penternary K1 − (Li
Na
Rb
Cs
)
NbO3 alloy will also benefit from the high entropy stabilization mechanism.To predict the via our SISSO machine learning model, we first generate a multidimensional grid of the s as a function of x, l, m, n, and o with the constraints of ensuring 0.2 ⩽ x ⩽ 0.5 and l + m + n + o = 1. Using the s for each composition, the corresponding and can be derived by using Equations (1) and (2) as inputs to Equation (3) to predict the values for the corresponding compositions. This yields a multidimensional map of and is plotted in Figure
as a form of a quarternary diagram for a fixed K concentration (1 − x). Here, we find that, especially for Li‐rich penternary KNO‐based alloys, up to 25% A‐site polarization boosting can be achieved. A similar plot for orthorhombic K0.5(Li
Na
Rb
Cs
)0.5NbO3 is shown in Figure S7, Supporting Information. Similar to the binary alloys, the values for the tetragonal alloys are somewhat higher than that of the orthorhombic ones.
Figure 4
Predicted values for the multicomponent tetragonal penternary K1 − (Li
Na
Rb
Cs
)
NbO3 alloy using Equation (3) derived from the first ranked SISSO regression model (R
1). Here, plots are presented as tetrahedron‐shaped quarternary diagrams for a particular fixed K concentration (1 − x, where x = 0.2, 0.3, 0.4, and 0.5). SQS‐DFPT values (and their corresponding VL‐estimated values) for five alloys are also listed to assist in the validation of the SISSO R
1 predicted values.
Predicted values for the multicomponent tetragonal penternary K1 − (Li
Na
Rb
Cs
)
NbO3 alloy using Equation (3) derived from the first ranked SISSO regression model (R
1). Here, plots are presented as tetrahedron‐shaped quarternary diagrams for a particular fixed K concentration (1 − x, where x = 0.2, 0.3, 0.4, and 0.5). SQS‐DFPT values (and their corresponding VL‐estimated values) for five alloys are also listed to assist in the validation of the SISSO R
1 predicted values.To validate our SISSO model predicted values, using the aforementioned SQS method, we have calculated the values for three penternary (K0.5(Li0.55Na0.16Rb0.05Cs0.22)0.5NbO3, K0.5(Li0.27Na0.38Rb0.11Cs0.22)0.5NbO3, and K0.5(Li0.38Na0.27Rb0.22Cs0.11)0.5NbO3) and 2 quaternary (K0.5(Li0.88Na0.06Rb0.06)0.5NbO3 and K0.5(Li0.72Na0.16Rb0.16)0.5NbO3) alloys for the tetragonal and orthorhombic phases. This result in a total of ten DFPT calculations with up to 180 atoms in the supercells considered. These DFPT data are then used to validate both the values derived from the trained SISSO model and classic Vegard's law (VL) model. The VL model is based on the linear interpolation of P
A and P
s as a function of K concentration. As listed alongside the corresponding structures used for the validation, the values obtained via the VL model is typically underestimated (≈4.53%) when compared to the DFT‐derived values, while those predicted by the SISSO model perform very well with an error of only 1.61% as referenced to the DFT ones.
Extracting Physical Insights from the Machine Learning Models
By construction, SISSO is an ideal machine learning method to derive physically intuitive descriptors.[
] The descriptors of a SISSO regression model are, thus, in principle physically intuitive. However, when it is used for linear fitting, the descriptors are multiplied by the coefficients (cf. Equations (3) and (4), for instance), masking the direct physical interpretation of the descriptors (cf. Equations (1) and (2), for example). In contrast, the descriptors of a SISSO classification model are used directly to set the boundaries and thus mitigating this loss of interpretation. Within the constraints of linearity, the SISSO regression model with a small data set may achieve a higher level of accuracy in prediction while the SISSO classification model which depends intricately on the dividing boundaries will generally require a larger data set to train.[
,
]In this study, by setting a lower boundary for , that is, , the SISSO model generates 6 336 486 797 different possible descriptors for this condition. To ensure that our SISSO model remains tractable, we have limited our study to the top 10 000 descriptors. These are then assessed by a linear support vector classification (LSVC) machine algorithm. Due to a clear separation in the data points and high linearity, classification and regression models are all converged and perform well. Thus, we need to develop a method to not only allow us to make accurate predictions using the regression model but to also present a physically interpretable classification model. The updated machine learning workflow is now presented in Figure
.
Figure 5
a) Revised feature‐assisted SISSO machine learning workflow for predictions that include the calculation of the agreement index, α(R
, C
) where the level of agreement between the SISSO regression model (R
) and the SISSO classification model (C
) is examined via the agreement matrix. b) The agreement matrix for the (R
1, C
4972) pair is shown for . The SISSO classification plot for C
4972 using Equations (7) and (8) is also presented. c) Similarly, for the highest value of , the agreement matrix for the (R
2524, C
5064) pair is depicted for . Likewise, the SISSO classification plot for C
5064 using Equations (9) and (10) is also plotted.
a) Revised feature‐assisted SISSO machine learning workflow for predictions that include the calculation of the agreement index, α(R
, C
) where the level of agreement between the SISSO regression model (R
) and the SISSO classification model (C
) is examined via the agreement matrix. b) The agreement matrix for the (R
1, C
4972) pair is shown for . The SISSO classification plot for C
4972 using Equations (7) and (8) is also presented. c) Similarly, for the highest value of , the agreement matrix for the (R
2524, C
5064) pair is depicted for . Likewise, the SISSO classification plot for C
5064 using Equations (9) and (10) is also plotted.To do this, we first introduce a new concept of agreement as represented by the agreement index, α(R
, C
) where we compare the level of agreement between the regression model (R
) and classification model (C
) via the agreement matrix. This agreement matrix resembles the well‐known confusion matrix that has been widely used to assess classification models. Mathematically,
where e represents a unique alloy composition point including the associated primary features, , in an evaluation data set of K1 − (Li
Na
Rb
Cs
)
NbO3 polarization diagram (in Figure 4), N
e is the total number of evaluation points, and H(e, i, j) is a pair parameter to judge the agreement between each pair of regression model (R
) and classification model (C
). Here, H(e, i, j) could be expressed as:
where R
(e) represent the predicted from the given regression model (R
) for the e point, is the desired boundary value (e.g., 10%), and C
(e) represents the classification result from the classification model (C
) for the e point, taking +1 when true and −1 when false. Using this definition of H(x, i, j) (in Figure 5a) in combination with the 2911 points of K1 − (Li
Na
Rb
Cs
)
NbO3, we can now calculate α(R
, C
) (cf. Equation (5)) and determine the agreement matrix for a given condition (i.e., ) in Figure 5b,c.To illustrate this concept of agreement, in Figure 5b, we first determine the 1st ranked regression model, R
1 and calculate α(R
1, C
) iteratively for 10 000 generated C
under the condition that . The highest is found for the (R
1, C
4972) pair and the agreement matrix is plotted in the left bottom corner of Figure 5b. Here, the C
4972 provides two descriptors:To extract the physical insights from this SISSO classification model, we take a closer look at Equation (7). Here, in order for to shift to more negative values (i.e., moving from the right (red) to the left (blue) region), in the term will have to take on larger values, since is always negative. Thus, a more negative with higher values will predict a higher .Turning our attention to Equation (8), we notice that a small value will shift to larger values due to the cubic term, noting that the term is always positive. Thus, a smaller will suggest a higher value, aligning with our Pearson correlation analysis where and are highly correlated (see Table S4, Supporting Information).Given the determined α(R
1, C
4972) is not the highest value for our data set, we remove the constraint of simply replying on R
1 and proceed to calculate the α(R
, C
) values for all possible (R
, C
) pair combinations of 10 000 regression models and 10 000 classification models to uncover the highest possible value of α. After a rigorous assessment of 108 unique combinations of (R
1, C
) pairs, the highest agreement index value of 99.86 % is found for the (R
2524, C
5064) pair under the condition that . The agreement matrix and the SISSO classification plot are presented in Figure 5c.Having a very high α(R
2524, C
5064) value of 99.86 %, we now turn our attention to the descriptors of C
5064:We note that the corresponding R
2524 equations are provided in the Supporting Information. Upon a closer inspection of Equation (9), given that the term is always negative and the term will also always take on a positive value, smaller values of in the exponential term will shift from the left (red) to the right (blue) region. This provides a feature‐assisted prediction of a higher value from a smaller indicating a higher likelihood of A‐site boosted polarization. This nicely corroborates with the Pearson correlation analysis where is one of the key primary features that is highly correlated with .On the other hand, in Equation (10), an increase in (i.e., shifting from the bottom (red) to the upper (blue) region) is a consequence of a corresponding increase in the term where a quadratic dependence is established. Similarly, a higher will predict a larger value, again lending support to the established high Pearson correlation coefficient between and .In essence, by selecting the representative (R
, C
) pair, SISSO provides a consistent approach to suggest physically insightful descriptors (aligning with the Pearson correlation analysis). High A‐site boosted polarization (i.e., high values) in multicomponent KNO‐based alloys can be designed by choosing A‐site cation dopants with small and large . In the same vein, it is noticed that higher values of will also lead to high A‐site boosted polarization.
Perspective: Toward Higher Polarity in Lead‐Free Oxides for Ferroelectric Applications
In this work, we have discussed the training of a feature‐assisted SISSO model for multicomponent KNO‐based alloys in hope to achieve high A‐site polarization boosting via the target property, . We have shown that through the use of the supervised data sampling method and by applying the concept of agreement, we have enabled our predictive SISSO model to determine the of multicomponent KNO‐based alloys with higher level of statistical confidence for a physically interpretable ML model.Through this work, we have also challenged the conventional Vegard's relation (VL) which is commonly used for determining various properties of alloys and solid solutions. In Figure
, it is clear that when the determined via the VL model alone as compared to the actual DFPT calculated values, an underestimation of (with a RMSE of 4.53%) is found. The SISSO‐based R
1 and R
2524 provides a much closer agreement to the DFPT values, with a RMSE of 1.61% and 0.89%, respectively. The further improvement found for R
2524 stems from the higher agreement index where an analytical comparison between SISSO regression and SISSO classification models offers a higher accuracy via the regression model and a physically interpretable classification model at the same footing.
Figure 6
a) Comparison between the DFPT‐calculated values of multicomponent KNO‐based alloys versus the model‐predicted values using the conventional Vegard's relation (VL), the first ranked SISSO regression model (R
1), and the SISSO regression model R
2524 which yielded the highest agreement index value. The insert shows the RMSE values of as predicted by these models. b) Plot of the DFPT‐calculated total spontaneous polarization (P
s) versus the A‐site polarization boosting () for the various binary and multicomponent KNO‐based alloys for both the tetragonal and orthorhombic phases to illustrate the potential enhancement of ferroelectric properties in these oxide alloys. The markers are weighted according to the ideal mixing entropy at T = 300 K where larger markers are deemed to be more entropy stabilized. To afford a comparison with the current well‐known lead‐free ferroelectrics from literature, their experimental P
s are also shown alongside the plot (The P
s values in μC cm−2 are taken from refs. [52, 53]). The colormaps used for the data markers follow that in Figure 4.
a) Comparison between the DFPT‐calculated values of multicomponent KNO‐based alloys versus the model‐predicted values using the conventional Vegard's relation (VL), the first ranked SISSO regression model (R
1), and the SISSO regression model R
2524 which yielded the highest agreement index value. The insert shows the RMSE values of as predicted by these models. b) Plot of the DFPT‐calculated total spontaneous polarization (P
s) versus the A‐site polarization boosting () for the various binary and multicomponent KNO‐based alloys for both the tetragonal and orthorhombic phases to illustrate the potential enhancement of ferroelectric properties in these oxide alloys. The markers are weighted according to the ideal mixing entropy at T = 300 K where larger markers are deemed to be more entropy stabilized. To afford a comparison with the current well‐known lead‐free ferroelectrics from literature, their experimental P
s are also shown alongside the plot (The P
s values in μC cm−2 are taken from refs. [52, 53]). The colormaps used for the data markers follow that in Figure 4.So far, we have focused only on the reduced target property indicator – the A‐site polarization boosting factor, . However, given that the dominant contribution to the total spontaneous polarization, P
s comes mainly from the NbO6 octahedral distortions (as seen in Figure 2), a high may not guarantee a high value for P
s which is a key ingredient for ferroelectric applications. Thus, using DFPT calculations, we now present both the P
s and the corresponding for both binary, quarternary, and penternary KNO‐based alloys in both the tetragonal and orthorhombic phases, and the ideal entropy (at T = 300 K) stabilization‐weighted scatter plot is graphed in Figure 6b.The calculated total spontaneous polarization, P
s is found to vary almost linearly with the A‐site polarization boosting factor, . And interestingly, multicomponent KNO‐based alloys can indeed afford a strong enhancement to the overall spontaneous polarization value, for example, by up to 150% (referenced to pristine KNbO3) for in K0.5(Li0.88Na0.06Rb0.06)0.5NbO3. Besides a promising enhancement in P
s, it is also worth mentioning that the multicomponent KNO‐based alloys exhibit a general tendency for higher stability via the high entropy stabilizing effect.[
,
,
] For instance, in the case of tetragonal quarternary K0.5Li0.44Na0.03Rb0.03NbO3 and binary K0.5Li0.5NbO3, we find that both alloys have relatively high P
s values of 61.21 and 62.52 μC cm−2, respectively. However, the quarternary alloy is determined to have a 30% higher mixing entropy value than the binary alloy which leads to the conjecture that the quarternary alloy will be more entropy‐stabilized.To provide a perspective on how these multicomponent K1 − (Li
Na
Rb
Cs
)
NbO3 alloys fare as compared to experimentally reported lead‐free ferroelectrics,[
,
] we have listed their experimentally determined P
s alongside our scatter plot in Figure 6b for comparison (detailed values are provided in Table S6, Supporting Information). Indeed, it is very clear that the lead‐free multicomponent KNO‐based alloys proposed in this work are very promising and may be a very strong contender to outperform the currently known ones from experiments.
Conclusions
Through the use of ab initio density‐functional perturbation theory calculations and physically interpretable feature‐assisted machine learning models, we systematically examine and investigate the origins of A‐site enhanced polarization mechanism in multicomponent KNbO3‐derived K1 − (Li
Na
Rb
Cs
)
NbO3 alloys. Starting from the simpler analogs of binary K1 − A
NbO3 (where A = Li, Na, Rb, and Cs) generated by SQS method, we determine that they are entropy‐stabilized and exhibit large values of spontaneous polarization (comparable to or higher than that of BaTiO3). Using the SISSO method to extract physically meaningful and interpretable descriptors based on primary elemental features for predicting the polarization enhancement due to the A‐site cation, , we demonstrate numerical convergence for our data set size and provide a statistical analysis via both supervised and unsupervised data sampling methods, achieving a low RMSE value of less than 1.61%. Using the SISSO‐determined descriptors for the binary alloys, we have naturally extended this to multicomponent alloys and have provided a multidimensional prediction diagram. We cross‐validate our SISSO predictions using both the conventional Vegard's relation and ab initio DFPT values. Using a new metric of agreement via both SISSO regression and classification models for %, we have further narrowed the prediction RMSE to 0.89%. Importantly, through this feature‐driven machine learning scheme, we have incontrovertibly demonstrated that precise engineering of the A‐site cation composition in KNbO3 can result in a very high boosting to the total spontaneous polarization values (more than 150% when compared to pristine KNbO3) and these lead‐free multicomponent K1 − (Li
Na
Rb
Cs
)
NbO3 alloys are truly potential contenders for the currently known lead‐free counterparts for the next‐generation ferroelectric applications such as ferroelectric random access memory and piezoelectric energy converters.
Experimental Section
Density‐Functional Theory Calculations
Density‐functional theory (DFT) calculations were performed using periodic boundary conditions, employing the projector augmented wave (PAW)[
,
] method as implemented in the Vienna Ab initio Simulation Package code.[
,
] The Kohn–Sham orbitals were expanded using a plane‐wave basis set with a kinetic energy cutoff of 700 eV. The 1s, 2s, and 2p states of Li, the 2p and 3s states of Na, the 3s, 3p, and 4s states of K, the 4s, 4p, 5s states of Rb, the 5s, 5p, 6s states of Cs, the 4p, 4d, and 5s states of Nb, and the 2s and 2p states of O were explicitly considered as valence states within the PAW approach.For the DFT exchange‐correlation (xc) functional, it had been reported in a previous work[
] that the Perdew–Burke–Ernzerhof xc functional revised for solids (PBEsol)[
] provided the best agreement with the experimental lattice parameters of KNbO3 polymorphs. Brillouin zone integrations were sampled with a Γ‐centered k‐point mesh of 8 × 8 × 8 for the primitive unit cell of tetragonal and orthorhombic phases, and for larger supercells used in this work, the k‐point meshes were then equivalently folded.
Structure Modeling for Solid Solutions
To account for the structural models of solid solutions, K1 − A
NbO3 (where A = Li, Na, Rb, and Cs), the special quasi‐random structures (SQSs)[
,
] method was used. SQSs were periodic structures with selected atomic distributions from the cluster correlations approach where the randomized atomic arrangement mimics the most disordered structure among all inequivalent configurations for a given composition. The SQS method is widely used for solid solution or alloy formation modeling.[
,
,
] Based on the parent structures of the primitive KNbO3 tetrahedral and orthorhombic polymorphs, the integrated cluster expansion toolkit (ICET) code[
] was employed to search for the optimal supercell that best depicts this random structure.The initial lattice constants of the SQSs were approximated by a weighted average of the optimized lattice constants of the parent oxides based on Vegard's law.[
] The atomic coordinates and volume of the constructed alloys were then allowed to be fully relaxed. Here, their randomness is verified by inspecting the atomic pair correlation function (APCF) of these SQSs. Further details and the corresponding supercell configurations can be found in the Supporting Information (e.g., see Table S2, Supporting Information).
Spontaneous Polarization
The spontaneous polarization (P
s) was obtained by considering the displacement of each atom (δd) from the position of the ideal non‐polar centrosymmetric structure and the averaged values of BEC tensor of each atom (Z*), respectively.[
,
] Formally, P
s simply took the form of
where i denotes the ith atom, the Born effective charges derived from density‐functional perturbation theory calculations[
,
] associated with the ith atomic displacement (δd
) of the ions from their position in the unpolarized (non‐polar) structure, e the charge of an electron, and Ω the cell volume considered. By comparing to the non‐polar reference structure, the atomic displacements responsible for ferroelectricity could be thus analyzed.
Compressed‐Sensing Machine Learning: Descriptors and Features
Material characteristics and attributes (or more commonly termed as features) play an important role in determining the accuracy of a descriptor‐based machine learning model.[
,
,
] In general, the elemental, structural, electronic, or other features of materials could be considered. In this work, the focus was only on the primary (or elemental/atomic) features that allow to predict the target property of interest here, that is A‐site element contribution to the total polarization, . Table S5, Supporting Information, tabulates the considered primary features (PFs) used in this work.Site‐specific descriptors of K1 − A
NbO3 cation solid solution (where A = Li, Na, Rb, and Cs), descriptors for each site was transformed into a weighted average according to
where β is A, B, and X for a typical ABX3 perovskite. The averaged PFs for the each site specific properties and the coefficient ζ (which takes a value between 0 and 1) denote the fractional occupancy of the each site in the alloy. This was done to ensure that the compositional variance between the data was kept, and thus allowing to also map out the polarization with regards to any fractional stoichiometry based on essential physical factors.[
] For this paper, varied due to the A‐site substitution, while and were constants.To assess the overall primary feature relations to the target feature, primary feature class () is defined as the following:
Based on the transformed descriptors (; cf. Equation (13)), Pearson correlation coefficients (PCCs) were then surveyed to choose the relevant features having a higher correlation with the target property, (here, the top s are chosen, for |PCC| ⩾ 0.85) while removing less correlated features (i.e., the other s, where |PCC| < 0.85). This pre‐processing step made the SISSO process more effective by preventing the non‐distinctive features from increasing the dimensions of the models meaninglessly. This had shown to greatly help in the reduction of computational memory during feature generation.[
,
] It was important to note that the primary feature class () itself may not retain all the physical information but only to assist in the pre‐processing step of the SISSO process.With the chosen PFs, the SISSO generated a combination of features by applying several mathematical operators (which are +, −, ×, /, exp , exp −, −1, 2, 3, √, , log, | − |) iteratively. To construct arbitrarily large feature spaces, three iterations (or dimensions) were considered, thereby generating the feature spaces Φ1, Φ2, and Φ3 (where Φ
corresponds to the nth dimension feature space). Note that for a given feature space, Φ
, it will automatically include the lower dimension feature spaces. Finally, for this work, the SISSO generated 126 122 065 features and ranked them according to the root‐mean‐square error (RMSE) to find the best descriptor sets. Since a 2D regression was applied to the given training set, the final regression result is expressed as
where a
and b
are the fitted coefficients of regression using two jth rank descriptor sets (, ) and the training set. To obtain the ranking of SISSO descriptors, both the data of the tetragonal and orthorhombic phases were used to minimize phase‐dependency, in contrast to the actual regression where the phase‐dependent training set was used to evaluate the performance of the descriptor sets.
Statistical Analysis
The data in this manuscript can be divided into three main sections: i) DFT‐generated polarization values for 42 different binary alloys; ii) linearly‐interpolated data based on a Vegard's law‐like model (resulting in 2004 points per phase—for both the tetragonal and the orthorhombic phases, respectively. Details regarding the interpolated data can be found in the Supporting Information); iii) ten validation samples of polarization values for selected multicomponent alloys. In (i), the data were used for the training of SISSO, while in (ii), the 2004 interpolated data points were used to test the results of the SISSO for the binary alloys. In (iii), the ten DFT‐calculated polarization values for selected multicomponent alloys were used to validate the accuracy of the SISSO in part to establish its predictive nature for the multicomponent alloys.We note that, in conventional machine learning with DFT data, the DFT data was normally split into the training and test sets. However, due to computational limitations, all DFT data for the binary alloys (in (i)] had been used for the training process and then the validity of the training results were cross checked using data from (ii) and (iii). For all of these data in (ii) and (iii), no additional pre‐processing was applied to prevent any loss of physical meaning. For the regression modeling, the python package sklearn has been used. For the SISSO calculations, the SISSO package developed by Ouyang et al.[
] has been employed.
Conflict of Interest
The authors declare no conflict of interest.
Author Contributions
S.‐H.V.O and W.H. contributed equally to this work. S.‐H.V.O., W.H., and K.K. constructed the atomistic models and performed the calculations. A.S. and J.‐H.L. conceptualized and supervised this work. All authors were involved in the drafting of the manuscript.Supporting InformationClick here for additional data file.
Authors: John P Perdew; Adrienn Ruzsinszky; Gábor I Csonka; Oleg A Vydrov; Gustavo E Scuseria; Lucian A Constantin; Xiaolan Zhou; Kieron Burke Journal: Phys Rev Lett Date: 2008-04-04 Impact factor: 9.161
Authors: An Quan Jiang; Can Wang; Kui Juan Jin; Xiao Bing Liu; James F Scott; Cheol Seong Hwang; Ting Ao Tang; Hui Bin Lu; Guo Zhen Yang Journal: Adv Mater Date: 2011-01-31 Impact factor: 30.849
Authors: Christina M Rost; Edward Sachet; Trent Borman; Ali Moballegh; Elizabeth C Dickey; Dong Hou; Jacob L Jones; Stefano Curtarolo; Jon-Paul Maria Journal: Nat Commun Date: 2015-09-29 Impact factor: 14.919