Literature DB >> 36032555

Machine-Learning Rationalization and Prediction of Solid-State Synthesis Conditions.

Haoyan Huo^1,2, Christopher J Bartel^1,2, Tanjin He^1,2, Amalie Trewartha², Alexander Dunn^1,3, Bin Ouyang^1,2, Anubhav Jain³, Gerbrand Ceder^1,2.

Abstract

There currently exist no quantitative methods to determine the appropriate conditions for solid-state synthesis. This not only hinders the experimental realization of novel materials but also complicates the interpretation and understanding of solid-state reaction mechanisms. Here, we demonstrate a machine-learning approach that predicts synthesis conditions using large solid-state synthesis data sets text-mined from scientific journal articles. Using feature importance ranking analysis, we discovered that optimal heating temperatures have strong correlations with the stability of precursor materials quantified using melting points and formation energies (ΔG f , ΔH f ). In contrast, features derived from the thermodynamics of synthesis-related reactions did not directly correlate to the chosen heating temperatures. This correlation between optimal solid-state heating temperature and precursor stability extends Tamman's rule from intermetallics to oxide systems, suggesting the importance of reaction kinetics in determining synthesis conditions. Heating times are shown to be strongly correlated with the chosen experimental procedures and instrument setups, which may be indicative of human bias in the data set. Using these predictive features, we constructed machine-learning models with good performance and general applicability to predict the conditions required to synthesize diverse chemical systems.

Entities: Chemical

Year: 2022 PMID： 36032555 PMCID： PMC9407029 DOI： 10.1021/acs.chemmater.2c01293

Source DB: PubMed Journal: Chem Mater ISSN： 0897-4756 Impact factor: 10.508

Introduction

While solid-state synthesis is the prevailing approach for making inorganic solids, the determination of synthesis conditions for new solids is mostly based on heuristics and human-acquired experiences, with no analytical predictive approaches.[1,2] Recent work has focused on rationalizing solid-state reaction pathways observed in in situ experiments[3−7] by decomposing them into a sequence of phase evolution steps[1] that can be modeled using thermodynamic calculations.[8−11] To design synthesis routes for new materials, it is essential to understand why certain conditions are preferred and develop models for predicting these conditions for synthesis (e.g., temperature, time). While thermodynamic calculations have been used to rationalize synthesis conditions in specific chemical systems,[8,12] a synthesis condition predictor with broad applicability for general inorganic compounds is still elusive. Here, we use statistical machine-learning (ML) methods to systematically learn and quantitatively evaluate synthesis condition predictors from a large set of experimental data. Such ML approaches require large, high-quality synthesis data sets covering many chemistries, which have only recently become available through the application of natural language processing (NLP) and information retrieval techniques on the large body of scientific literature.[13−19] In this work, using the data set of over 30 000 text-mined solid-state synthesis reactions (denoted as the text-mined “recipes” or the TMR data set in this paper),[16] we demonstrate an inductive ML approach that learns synthesis conditions from the knowledge parsed from the past literature. The overall pipeline of our ML approach is shown in Figure . Data sets of synthesis conditions compiled from NLP/text-mined data sets are used to train ML models. Each synthesis reaction was represented using a set of human-designed features, which will be discussed in more detail in subsequent sections. Interpretable ML models were trained on this basis of features to predict two key solid-state synthesis conditions that must be specified for any reaction: heating temperature and heating time.

Figure 1

Schematic of the ML methods developed in this work for predicting solid-state synthesis conditions.

Schematic of the ML methods developed in this work for predicting solid-state synthesis conditions. Throughout this paper, the prediction of solid-state synthesis conditions is defined as regression (point estimations) of the two experimental condition variables—temperature and time. Several important assumptions have been made: (a) Good synthesizability is assumed;[20−23] i.e., when a publication reports the synthesis of some material at a specified set of conditions, we assume that this reaction was successful. (b) Synthesis experiments are performed in a one-shot fashion; i.e., reactants react and form the target compound in a single heating step, such that a simple synthesis route of “mix and heat” would be sufficient. (c) The ML models predict the “optimal” synthesis conditions as implicitly defined by the consensus of training data. Note that the above assumptions oversimplify the synthesis condition prediction problem. These assumptions are often violated in many cases of practical solid-state syntheses. For example, a simple one-shot reaction route can thermodynamically favor an impurity phase which can only be avoided by using a multistep synthesis with specific intermediate compounds;[11,24] solid-state syntheses are often performed with many more degrees of freedom, such as special heating schedules,[8,24] special mixing devices,[25] different sintering aids,[26] etc. Moreover, the heating atmosphere strongly affects target material formation by changing the chemical potentials of gas species.[27] ML models require sufficient and consistent data to draw statistically significant conclusions,[28,29] while the data set used in this work has too imbalanced distributions for these additional labels. For example, only <5% of the reactions in the TMR data set have nonair synthesis atmospheres. Therefore, the aforementioned conditions, although present in the TMR data set, are not predicted by the ML models in this work. Modeling of these factors may become possible as text-mined data sets become abundant in the future.[30] In this work, we considered 133 synthesis features describing four aspects of solid-state syntheses: (1) precursor properties, (2) composition of the target material, (3) reaction thermodynamics, and (4) experimental procedure setup. We ranked these features according to their predictive power using dominance importance (DI) analysis.[31] The features were used to train linear and nonlinear (tree-based) regressors for synthesis heating temperature and time. For all models, we split the data set into reactions with carbonate precursors and reactions without carbonate reactions. This splitting is necessary because the release of CO2 gas in carbonate precursor materials systematically shifts the reaction driving forces for this subset and, consequently, the coefficients of the related features in linear models. Grouping the data set into carbonate and noncarbonate reactions thus fits two sets of coefficients that account for this shift and improves the overall performance. We performed leave-one-out cross-validation (LOOCV) to diagnose model performance. We also used out-of-sample (OOS) evaluation on Pearson’s Crystal Data[32] (another synthesis data set independently extracted from the literature, denoted as the PCD data set in this paper) to test model generalizability on unseen data sets. The detailed data preprocessing and model construction can be found in the Methods section. Our ML results achieve a goodness-of-fit measured by R2 ∼ 0.5–0.6 and mean absolute error (MAE) ∼ 140 °C for heating temperature prediction. To compare with, typical heating temperatures used in solid-state synthesis range from ∼500 °C to ∼1500 °C. For heating time prediction, the time variable is transformed into a new prediction variable representing reaction speed: t → log10(1/t) . The goodness-of-fit for this new time variable is R2 ∼ 0.3, and MAE is ∼0.3 log10(h–1) (e.g., if the predicted time is t, the MAE estimates a range of [10–0.3·t, 100.3·t], or [0.5t, 2t]). Analysis of the model predictive power reveals that heating temperature prediction is dominated by precursor properties, which we hypothesize to be linked to reaction kinetics. Heating time prediction is dominated by experimental operations, which may be indicative of human selection bias. The ML methods developed and applied in this work provide a statistically rigorous approach toward learning robust synthesis predictors from large data sets mined from the scientific literature.

Results

Synthesis Feature Selection Using Dominance Analysis

In total, we created 133 features in four categories: (1) precursor properties—12 features calculated from melting points, standard enthalpy of formation ΔH300, and standard Gibbs free energy of formation ΔG300 of precursors; (2) composition of the target material—74 indicator variables representing the presence (1) or absence (0) of different chemical elements in the target compound; (3) reaction thermodynamics—33 descriptive features of the driving forces for synthesis-relevant reactions constructed by decomposing synthesis into multistep phase evolution paths using previously developed principles;[7,8] and (4) experiment-adjacent features—14 indicator variables representing whether certain devices, procedures, and/or additives were used in the synthesis procedure. See Methods for a more detailed description of how each of these classes of features was computed. We first use DI analysis[31] to rank the predictive power of these features. In DI analysis, one constructs many linear models that predict outcomes using subsets of features, called submodels. DI analysis then calculates the incremental effect of a feature f on submodels that do not use f in three different ways. The average partial dominance importance (APDI) value for f is computed as the average increase of model performance, measured by R2, when f is added to any submodel that does not include f. In other words, APDI measures the averaged gain of predictive power by including a feature. Individual dominance importance (IDI) values are the R2 of models trained using only one feature and quantify the predictive power of the features by themselves. Interactional dominance importance (IADI) values are the decrease of model R2 when a feature is removed from the whole model that uses all features, therefore measuring the gain of predictive power by a feature over all other features. All three DI values are computed for both heating temperature and time prediction models and are shown in Figure . We split the data set into carbonate reactions (reactions with at least one carbonate precursor) and noncarbonate reactions (reactions with no carbonate precursors). This is necessary because these two subsets have dissimilar distributions of reaction thermodynamic driving forces, which must be separated to be modeled in linear regression.[33,34]

Figure 2

DI values and rankings of the top 15 synthesis features for heating temperature models (a and b) and heating time models (c and d). The data set is split into carbonate reactions (reactions with at least one carbonate precursor) (a and c) and noncarbonate reactions (reactions with no carbonate precursors) (b and d). Interactional DI (IADI): decrease of model R2 when a feature is removed from the whole model that uses all features. Individual DI (IDI): R2 of models trained using only one feature. Average partial DI (APDI): average R2 increase when a feature is added to a submodel. Features are ordered according to the sum of all three DI values. We first evaluate the predictive powers of the features by themselves, as demonstrated by the IDI values in Figure . For heating temperature prediction, Figure a,b shows that the IDI values of the average precursor melting points are significantly higher than those of other features. Average precursor melting points alone achieve R2 ∼ 0.2–0.3 for heating temperature prediction. Other features, such as experimental Gibbs free energy of formation at standard conditions ΔG300K and experimental enthalpy of formation at standard conditions ΔH300K of precursors, are also highly predictive features as measured by IDI. Note that precursor melting points, ΔG300K, and ΔH300K are likely to be good proxy variables for precursor reactivity. The next set of predictive features as ranked by IDI are compositional indicator variables (e.g., indicating the presence/absence of Li, Mo, Bi, etc.). These features can be understood as chemistry-specific corrections to heating temperatures. Note that ML models aim to reduce prediction errors for the whole training data set, which is dominated by the elements that are characteristic of large application fields, such as Li (Li-ion batteries) and Ba (perovskite oxides). It is thus not surprising that these most frequently synthesized chemical systems appear at the top of the list in Figure a,b. For heating time prediction, Figure c,d shows that the IDI of experiment-adjacent features (e.g., indicators of polycrystal synthesis, phosphors, and usage of ball-milling devices) completely outweigh precursor property features. This suggests that heating time is largely controlled by the desired applications (e.g., the need for dense pellets, small particles, single crystals, etc.) and experimental setups rather than reaction mechanisms. Meanwhile, compositional indicator variables still rank second after the experiment-adjacent features, again acting as chemistry-specific corrections. The blue bars in Figure are IADI values. IADI values measure the gain of predictive power by a feature over all other features. For heating temperature prediction, Figure a,b shows that IADI values are very small for most features. A low IADI value is usually due to high correlation among features, e.g., average precursor melting points and maximal precursor melting points. These high correlations suggest it is necessary to use feature selection to choose the strongest feature among highly correlated features, as will be discussed in the next section. Nevertheless, a few features have relatively higher IADI values, a sign that they bring unique extra information over all other features. For example, describing syntheses using the word “sintering” may suggest the experimenters actively chose higher heating temperatures. As a consequence, the experiment-adjacent feature of “sintering” has the highest IADI value for temperature prediction models. The green bars in Figure are APDI values. APDI values are the average R2 increase of a feature to all submodels. Thus, APDI estimates the general usefulness of a feature. APDI and IDI values are therefore two important factors in ranking feature importance. For example, in Figure a, even though average precursor melting point and ΔG300K both have high IDI values, ΔG300K has smaller APDI values and is less important because of correlation with alternative features. By ranking all features according to the summation of DI values, we are able to consistently select the most uniquely predictive features. While, in general, synthesis temperature and time together determine the overall reaction kinetics, they are not ranked as top predictive features in Figure when included as features to predict each other (also see Table S1). This seems contrary to the expectation that they would be strongly correlated because elevated temperatures can lead to faster reactions by promoting atomic diffusion. We hypothesize that the low correlation between time and temperature may be due to a variety of reasons: (1) As opposed to sampling many synthesis conditions for a specific chemical system, the TMR data set spans diverse chemistries. There are usually less than 5 reported syntheses for a majority (>60%) of the chemical systems, which is not enough to reveal a stronger correlation, and (2) The TMR data set is text-mined from journal articles in which synthesis conditions, especially synthesis time, are generally not optimized but are determined by other external factors, such as the desired applications or the researcher’s convenience. These external factors make the time variable more noisy and less correlated to temperature than it might be in a variationally constrained set of data (e.g., the collection of shortest times for each temperature) To summarize, the overall rankings in Figure suggest each prediction variable is dominated by two types of features. For heating temperature prediction, precursor material properties have the most feature importance, while compositional features act as secondary corrections. For heating time prediction, experiment-adjacent features dominate the prediction, while compositional features also provide secondary corrections. Contrary to the common application of decomposing synthesis reactions into multistep phase evolution paths using thermodynamic principles,[8,10−12]Figure shows that the phase evolution thermodynamic driving force features, developed using similar principles in this work, provide little predictive power for heating temperature and time. We hypothesize that this is due to the fact that the TMR data set contains only positive experimental results for which researchers actively optimize for reasonable reaction kinetics. Therefore, reaction driving forces are less useful as these features are more likely to indicate whether something is synthesizable (e.g., if reactions to form a target are thermodynamically spontaneous) rather than indicate at what conditions reactions may occur quickly. We will revisit this finding in more detail in the Discussion section.

Building and Interpreting Linear Regression Models

To build regression models, we start with linear regressors as baseline models because their good interpretability allows one to focus on feature engineering and decipher the relations between features and synthesis conditions. To balance between high predictive power and possible overfitting, we add features in the order of DI rankings and drop any feature that increases model Bayesian information criterion (BIC) values.[29] In total, four linear models (heating temperature and time prediction models for carbonate and noncarbonate reactions) were trained using weighted least-squares (WLS).[29] The scatter plots of the predicted synthesis conditions versus the reported conditions are shown in Figure a,b. For heating temperature prediction, the R2 values of the models are 0.55 on carbonate reactions and 0.56 on noncarbonate reactions, while the MAE values are 134 and 147 °C, respectively. For heating time prediction, the R2 values of the models are 0.31 on carbonate reactions and 0.33 on noncarbonate reactions, while the MAE values are 0.30 log10(h–1) and 0.32 log10(h–1), respectively. Because we predict the transformed time variable log10(1/t), such MAE estimates that the time prediction is within range [10–0.3·t, 100.3·t], or [0.5t, 2t] (e.g., for a 2 h experiment, the expected prediction range is 0.5–4 h). Note that these metrics are evaluated on training data. Thus, they may not reflect the model performance when applied on unseen data. We will perform cross validation and discuss the results in later sections.

Figure 3

Regression result of linear models. The scatter plots show reported conditions vs predicted conditions for temperature prediction (a) and time prediction (b). Opacity of the markers indicates the weights of data points. Histograms of prediction errors are also shown. In a linear regressor ŷ = ∑βx, the feature coefficients β quantify how the regression target variable responds to unit changes of x. As a special case, when x ∈{0, 1} are indicator variables (e.g., compositional and experimental-adjacent features), β can be interpreted as additive effects on the prediction target variable when features x = 1. For all compositional features, the effects are shown in Figure a,b. Note that these values are relative to the “average” according to the training data set and must be interpreted in relative values. For example, if Li is present in the target compound, Figure a suggests the heating temperature will decrease by 360 °C on average for noncarbonate reactions. On the other hand, the presence of N will increase the heating temperature by 260 °C on average. Therefore, Figure a,b show maps that associate different chemistries with their effect on optimal synthesis conditions. Such maps can be used as empirical “synthesis rules” that are helpful for designing synthesis routes to new materials.

Figure 4

Average effect of each chemical element to predicted heating temperatures (a) and times (b) in trained linear models. The values are coefficients of the corresponding features in the linear models, quantifying how much the predicted value changes relatively if a new chemical element is added to (or removed from) the synthesis. The learned coefficients in Figure a,b are sparse because some elements appear only a few times or are even missing in the training data set, precluding a confident estimate of their effect (assessed by the p-values of the coefficients with a 5% significance level[35]). In Figure , we observe more consistent compositional effects across similar element periods and groups for temperature predictions than for heating time predictions. The lack of correlation with compositional effects for time prediction matches the DI analysis result in Figure c,d, which suggests compositional features are less helpful for predicting heating time. Moreover, the compositional effects are less consistent between carbonate reactions and noncarbonate reactions for heating time prediction. These observations suggests the compositional effects are generally less reliable for heating time prediction and must be used with more caution.

Training and Cross-Validating Nonlinear Models

Having used DI analysis and linear models to probe the synthesis prediction features, we next aim to systematically cross-validate ML models to understand their generalizability or propensity for overfitting. Figure shows the model performances versus the number of features, which characterize training R2 and the LOOCV Pseudo-R2 (a metric comparable to R2, see Methods) scores of the linear models as more features are included in training. In Figure , features are added into the models in the order of DI value rankings. Figure shows that both training and LOOCV scores increase quickly when the number of features is less than 10. This result is consistent with the DI values in Figure as the first few features have the highest feature importance. The model performance continues to improve as we include all other features, although the marginal improvement decreases rapidly. The training and LOOCV curves for linear models exhibit very similar performances, suggesting that these linear models have little risk of overfitting.

Figure 5

Model performance versus number of training features for both linear and nonlinear (gradient boosting tree regressor) models. The x-axis shows the number of features used. The features are added in the order of DI value rankings. The first row shows performances of temperature prediction models trained on carbonate reactions (a) and noncarbonate reactions (b). The second row shows performances of time prediction models trained on reactions with (c) and without (d) carbonate precursors. The linear model may be incapable of capturing nonlinear correlations among features and synthesis conditions. We next use advanced ML models that are capable of modeling nonlinear relations on the same set of features as for the linear models. Among the many ML models we attempted during preliminary experiments, gradient boosted regression trees (GBRT), implemented in the XGBoost package,[36] demonstrated the best LOOCV scores after proper hyperparameter tuning. XGBoost models use a large number of weak tree learners to build a strong ensemble regressor and are able to learn nonlinear effects. Indeed, we observe in Figure that XGBoost training Pseudo-R2 (red dashed curves) results are significantly higher than linear model results. However, as shown by the teal crosses in Figure , compared to the LOOCV scores of linear models (green stars), the LOOCV Pseudo-R2 scores of XGBoost models do not improve as much when compared to the LOOCV performance of the linear models, suggesting an increased level of overfitting by XGBoost models. One advantage of XGBoost over linear models is improved utilization of a small number of features, as shown by the steeper curves when the number of features is less than 10 in Figure a,b, although the advantage diminishes once sufficiently many features are used. Finally, to help better understand the uncertainties of the models, we visualize the error distributions of synthesis conditions in Figure using violin plots, where we mark the interquartile range (IQR) representing 50% of the errors, and 1.5x IQR, representing the range of prediction errors beyond which the errors are considered outliers.

Figure 6

LOOCV prediction error distributions of synthesis temperature and time. Plotted are prediction error median values (shown with white dots), interquartile ranges (IQR, or the spread of errors between 25% and 75% percentiles, shown with thick lines), and 1.5× IQR (shown with thin lines). Shaded areas are probabilistic density estimations of the errors. Our models are expected to make prediction errors within the IQR approximately half of the time and within 1.5× IQR most of the time.

Testing Model Generalizability Using the PCD Data Set

When applied to unseen data sets, ML model predictions tend to have larger errors due to data set shift; i.e., unseen data sets have a different distribution than the training data sets.[37] In particular, the relations between features and outcomes may change for unseen data, leading to concept drift, degrading model generalizability and limiting model applicability. The TMR data set mostly contains syntheses for inorganic oxide materials and is dominated by target materials containing Ti, Sr, Li, Ba, La, Nb, Fe, etc., reflecting popular materials in the inorganic materials research community such as perovskite oxides and battery materials. The TMR data set also contains a large fraction of solid solutions or doped materials. To estimate and understand how the ML models trained on the TMR data set generalize to unseen data sets, we utilized the PCD data set as an additional test. The original PCD collection contains inorganic materials syntheses that were manually extracted from the literature in a semistructured natural language form.[32] We processed the PCD (Pearson’s Crystal Data) collection using the same text-mining pipeline and only kept oxide syntheses such that the final PCD data set has a similar chemistry distribution as the TMR data set. To ensure there are no duplicate syntheses, we removed any entry in the PCD data set whose digital object identifier (DOI) is present in the TMR data set (i.e., syntheses in the same papers are not allowed, but the same compositions from different papers are allowed). Compared to the TMR data set, the PCD data set shares a similar distribution of chemical systems and synthesis conditions, as indicated by similar sets of popular chemical elements (i.e., Ti, Fe, Sr, Ba, Si, etc.) and average synthesis temperatures around 1200 °C; see Figure S3. The PCD data set thus represents a reasonable benchmark data set for our ML models. However, because many reactions in the PCD data set do not have heating times extracted, we only predicted heating temperatures for the PCD data set. To establish an upper bound of the model performance, we performed the same training/validation procedure using the PCD data set as was used on the TMR data set. Figure shows the performance of the ML models versus the number of features. The green stars and teal crosses in Figure are the LOOCV scores of linear and XGBoost models, respectively. XGBoost models achieve 0.5–0.6 LOOCV Pseudo-R2 values which is considerably better than linear models (0.4–0.5). Moreover, XGBoost shows a steeper performance increase when few synthesis features are used. Compared to Figure , the advantage of the nonlinear models is much more substantial for the PCD data set than for the TMR data set. This clear advantage of XGBoost models indicates they are more robust than linear models against possible data set shift effects.

Figure 7

Performance of the models versus the number of features evaluated on the PCD data set. X-axes show the number of features used in each model. Features are added in the order of DI value rankings as in Figure . The left panels (a) and (c) show models trained on carbonate reactions, and the right panels (b) and (d) show models trained on noncarbonate reactions. The top panels (a) and (b) show the performance of models trained and evaluated on the PCD data set, which represent the upper bounds of OOS scores (c) and (d), which show performance of the models trained on the TMR data set. A higher OOS score indicates better model generalizability. Next, we performed tests to understand how well ML models trained on the TMR data set are generalizable to the PCD data set. The purple diamonds and yellow-brown triangles in Figure show the OOS performances of the linear and XGBoost models trained using the TMR data set but evaluated on the PCD data set. It is interesting to note that XGBoost and linear models have very similar OOS scores for carbonate reactions, but XGBoost clearly outperforms linear models for noncarbonate reactions when more (>30) features are used. Upon further investigation, the features #30 to #40 used on noncarbonate reactions are mostly related to thermodynamic properties of the reactions. The performance drop after feature #30 suggests that relations between thermodynamic features and heating temperatures learned on the TMR data set by linear models do not transfer well to the PCD data set. On the other hand, XGBoost models seem to be able to consistently maintain good performance regardless of the number of features used. In Figure , the difference between LOOCV scores and OOS scores confirms the ML models have degraded prediction performance (R2 drops by 0.1) when applied to a different data set. The performance degradation caused by the data set shift is often inevitable and requires regularly retraining the ML models in order to adapt to the new data sets. However, Figure suggests XGBoost models are more robust against the data set shift and have a better generalizability. We hypothesize this is due to the strong regularization and therefore recommend ML synthesis condition predictors to be built with XGBoost or similarly regularized models.

Discussion

ML predictions must be statistically evaluated using large data sets, so this work has focused heavily on reducing the expected prediction errors and improving the coefficient of determination R2. We do not optimize models for any particular reaction but aim at predicting the synthesis conditions over a data set of several thousand synthesis reactions. As demonstrated by the cross-validation and OOS evaluations in Figure and Figure , our models achieve R2 ∼ 0.5–0.6 (MAE ∼ 140 °C) for heating temperature predictions and R2 ∼ 0.3 (MAE ∼ 0.3 log10(h–1)) for heating time predictions. When evaluating these R2 values, it is important to consider that heating temperature and time do not have a single value for a synthesis reaction, as compounds can often be synthesized over a broad range of times and temperatures. As such, our models may be more successful at predicting reaction conditions that successfully created the target, as surmised from the R2 scores. On the basis of the ranking of DI values in Figure , the deciding factors for the synthesis conditions can be organized into a two-level hierarchy. Synthesis temperature prediction is dominated by precursor properties, which we speculate are proxies for reactivity stemming from the mobility of ions, with additional corrections learned for different chemistries. Synthesis time prediction is dominated by experiment-adjacent features that are linked to experimental setups/intentions, also with corrections according to chemistry. The features used in this work to account for reaction thermodynamics were inspired by recent efforts to understand phase evolution during synthesis.[7−9,12,38] These features involve decomposing overall synthesis reactions into a sequence of phase evolution reactions between pairs of compounds and quantifying the grand potential thermodynamic driving force for these phase evolution reactions. This approach has proved especially useful for understanding phase evolution pathways observed in in situ experiments. However, in this work, they are shown to provide little predictive power of synthesis conditions and even cause the models to generalize poorly on OOS data sets (as demonstrated in Figure ). This discrepancy will be discussed in more detail in the subsequent sections.

Synthesis Adjacent Information

We use the particular synthesis of BaTiO3 from BaCO3 and TiO2 precursors to demonstrate how ML models combine synthesis adjacent information with the other regressors. BaTiO3 is a popular compound with many applications in materials science and appears more than 100 times as the synthesis target in the TMR data set. A variety of synthesis temperatures have been reported for BaTiO3 in the literature. For example, BaTiO3 has been synthesized at 1000 °C,[39] 1100 °C,[40] 1200 °C,[41] 1300 °C,[42] and 1400 °C.[43] Here we focus on the effect of how many heating steps are used in the synthesis of BaTiO3. Figure shows the distribution of heating temperatures for all the reactions, BaTiO3 with a single heating step, and BaTiO3 with multiple heating steps in the training data set. It is clear that the reported heating temperatures with a single heating step have a lower center around 1100 °C (for example, see ref (40)), while the entries with multiple heating steps have a higher center around 1300–1400 °C (for example, see ref (43)).

Figure 8

Curves are the estimated distribution of heating temperatures for each group of reactions in the training data set. The dashed/dotted lines show temperature distributions for the reaction TiO2 + BaCO3 → BaTiO3 + CO2 (red dashed line for single-heating reactions and blue dotted line for multiple-heating reactions). Green solid line shows the temperature distribution for the entire data set. As a result, adding the target composition and experiment-adjacent features allows ML models to identify different groups of data as in Figure and optimize the predicted heating temperature within each group. For example, if 0 means single heating and 1 means multiple heating, then the ML model should have a coefficient for the feature of “is multiple heating” of about 250 °C, roughly equal to the difference between the centers of the two temperatures distributions in Figure .

Connection to Tamman’s Rule

Our finding that the average precursor melting point is the most predictive feature for heating temperatures is reminiscent of Tamman’s rule.[44,45] Tammans rule can be formulated as predicting that the synthesis temperature of metal alloys should be more than 1/3 (for example, 1/2–2/3) of the precursor melting points. This rule is derived from the observation that atomic diffusion quickly ceases below 1/3 of melting temperatures.[46] Tamman’s empirical rule was never formally defined. It is also questionable whether the rule is applicable to the synthesis of ionic compounds (e.g., oxides) in addition to intermetallics. Nevertheless, variants of Tamman’s rule are still used to help determine solid-state synthesis conditions. For example, Becker and Dronskowski[47] used 2/3 of the most “volatile” compound;[47] other values, such as 1/2, have also been used.[45] Our ML framework allows us to formally model and test Tamman’s rule within a statistical approach. We start with Tamman’s original formulation and fit a linear model without an intercept term:where TTamman is the predicted heating temperature, (minTmelt) is the minimum of precursor melting points, α is a parameter to be learned, and ε is an error term. Both the prediction and the melting points are presented in degrees Kelvin. The fit linear model finds α = 1.2 when trained on carbonate reactions and α = 0.8 when trained on noncarbonate reactions. These α values are larger than the commonly used values for Tamman’s rule, such as 1/2 and 2/3, suggesting the required temperatures for atoms to diffuse significantly in ionic compounds are higher than in intermetallics or that for ionic compounds Tamman’s rule is a surrogate for a property other than diffusion. The above linear model is not the model with the highest predictive power (R2 values). As shown in Figure , using average precursor melting points (instead of minimum precursor melting points) yields the highest prediction performance. Therefore, we update Tamman’s rule to give the optimal synthesis temperature TTamman as proportional to the average of precursor melting points (avgTmelt) plus a constant. Mathematically, the predictor is defined aswhere α and β are parameters to be learned and ε is an error term. As demonstrated in Figure , fitting a linear model reveals a slope of ∼1/3. Because we used the average of precursor melting points, the predicted heating temperatures should be generally larger than 1/3 of the minimal precursor melting point, agreeing with Tamman’s original observation.[44] The predicted versus reported heating temperatures and the histogram of prediction errors are shown in Figure a. The parameters of the fitted linear model are shown in Figure b. The large F-statistic values and very small p-values show strong statistical significance of the model, although this is contrasted by the low coefficient of determination (R2 ∼ 0.2–0.3). Tamman’s rule is not a perfect predictor and has larger prediction errors at low temperatures. However, it contributes more than 1/3 of the maximal predictive power developed in this work.

Figure 9

Fitting result of Tamman’s rule, i.e., synthesis temperature is proportional to the average precursor melting point. (a) Scatter plot of the reported vs predicted synthesis temperatures and histogram of prediction error. Opacity indicates data point weights. (b) Regression parameters and F-test for model significance. A very small p-value indicates that it is extremely unlikely the result is due to random noise.

Roles of Phase Evolution Reaction Analysis in Synthesis Condition Prediction

Predicting heating temperature is of major scientific interest. In solid-state synthesis, the final products are more sensitive to the heating temperature than time, because insufficiently low or high temperatures lead to incomplete reactions, impurities, or the complete absence of a desired target phase. Thus, heating temperatures are more carefully optimized than heating times, which are often chosen for convenience (e.g., to run overnight). There have been many successful examples where solid-state synthesis pathways are rationalized using the thermodynamics of reactions occurring during heating. For example, thermodynamic driving forces have been used to understand and control phase evolution pathways in Y–Mn–O oxides,[12,38] Y–Ba–Cu–O superconductors,[8] Na–Co–O layered oxides,[7] and MgCr2S4 thiospinel compounds.[9] Inspired by this work, we computed features as numerical transformations of the thermodynamic driving forces obtained by decomposing the synthesis into multistep phase evolution paths. Contrary to the success in reconciling experimental observations in the aforementioned systems, these features are shown to provide no observable predictive powers for general synthesis condition predictions in this work (as shown in Figure and Figure ). A low contribution of predictive power does not necessarily negate the effectiveness of phase evolution reaction analysis for understanding solid-state synthesis. It simply suggests that the features developed in this work are not correlated with the synthesis time and temperature over the diverse data sets evaluated in this work. We hypothesize this arises for a few reasons. First, the scale of the reaction driving force may dictate the decision boundary of synthesizable/nonsynthesizable conditions (e.g., synthesis should not occur at temperatures where the target phase is unstable with respect to decomposition). However, the data set used here only contains positive experimental results, so the thermodynamic stability of the target under the chosen synthesis conditions is likely already achieved for all data points. Indeed, in the rationalization of in situ synthesis, thermodynamic analysis has been used more to explain the phases observed along the reaction path rather than the specific conditions.[7,8,38] Second, once we are in the region of synthesizable conditions, the reaction driving force might become insufficient in determining synthesis conditions that lead to “fast” reactions. Because a typical lab synthesis needs to be completed in a reasonable period of time, experimenters may decide to raise heating temperatures to facilitate better reaction rates. Indeed, if we calculate the temperature T at which the reaction driving force is zero for the overall synthesis reaction (using the grand potential, ΔΦ = 0) for all the reactions, we found that this theoretical lower bound of heating temperatures T is generally much lower than the reported experimental T. This suggests experimenters actively use T ≫ T to achieve better kinetics. Unfortunately, reaction driving force analyses do not directly provide kinetic information, which is also chemistry-specific. On the other hand, precursor melting points and formation energies (ΔG300K, ΔH300K) may be correlated to ion transport kinetics, as they are indicative of the relative strength of bonds in the solid precursors. This may explain why precursor material properties are the top predictive features for heating temperatures. Previously, we demonstrated that precursor melting points (akin to Tamman’s rule) provide the most predictive power for heating temperatures if only one feature is allowed (see IDI values in Figure ). We note here that the effectiveness of Tamman’s rule may also be due to the aforementioned selection bias[48] toward fast solid-state syntheses (as well as community knowledge of Tamman’s rule). This selection bias is inherent in the synthesis data set used in this work as the literature only reports “fast” and successful solid-state reactions. We note that some recent investigations of solid-state synthesis mechanisms[8,49] have put more emphasis on modeling reaction speeds. In addition, with the recent developments of autonomous synthesis robots,[50−53] data on synthesizability and reaction speeds could be collected at the same time with a much higher throughput. Such data will be valuable for decorrelating selection bias and developing broadly applicable synthesis condition predictors.

Challenges of Predicting Synthesis Conditions Using Text-Mined Data

The performance of the ML models in this work is reasonable, but there is still much room for improvements to expand their applicability in practical synthesis design efforts. As potential improvements in the future, we summarize a few important aspects for increasing model performance.

Better Synthesis Features

Features are limiting factors in creating ML models with high predictive power. This work used 133 features spanning four categories: precursor material properties, target material compositions, reaction thermodynamics, and experiment-adjacent features. Besides these features, one set of useful features may be further factors that indicate the intention of syntheses. For example, the application for which the target compound is created (battery materials vs thermoelectric materials), desired microstructure of the target material morphology (single-crystal or spin-coated materials), etc. may all play a role in the determination of synthesis conditions. These features are expressed in papers in more subtle ways and could be potentially text-mined using advanced NLP techniques in the future.[54,55]

Improved NLP Data Collection

As a result of the probabilistic nature of the text-mining pipeline that extracted the data sets in this work, errors in the training data are inevitable.[16] Manual inspection reveals that 5% of heating temperatures and 16% of heating times were incorrectly extracted. Improved text-mining algorithms can thus improve data quality and increase ML model performance.

Modeling Nonuniqueness

In this work, we modeled synthesis condition predictions as point value regression problems. However, this may be suboptimal, as the conditions where a given synthesis can proceed are nonunique and often span a range of values. Consequently, there is not a unique ground truth of optimal synthesis conditions, which brings irreducible error to ML models. The issue of nonuniqueness is even more problematic for heating time prediction. If the synthesis finishes within t0, then any heating time t > t0 will yield the desired compound, if it is thermodynamically stable at the synthesis conditions and no selective evaporation of elements occurs. As a result, heating time is seldom optimized but based heavily on furnace heating schedule, lab shifts, etc. Indeed, in Figure , our ML models have larger errors for predicting heating time than for predicting heating temperature. Modeling synthesis conditions as distributions, e.g., generalized linear models,[56] could in principle solve this issue. Note that sufficient training samples must be collected to get accurate condition distribution estimations (as well as uncertainties). Ideally, there would be several conditions sampled for each target that was synthesized in the data set. However, in the TMR data set, even when expanding the search to chemical systems (any targets having the same set of elements), more than 60% contain less than 5 reported syntheses. Furthermore, the distribution learned from the TMR data set may be biased by external factors. For example, for popular Li-ion cathode/anode materials in our data set, the distribution of different synthesis conditions may be correlated with the desired microstructure for a particular electrochemical performance. Decorrelating these factors requires mining of other features/properties beyond the synthesis reactions themselves.

Negative Samples

Negative experimental results are rarely reported in papers. Nevertheless, from an ML point of view, negative data are extremely useful for learning the exact decision boundaries of synthesis conditions. Besides, negative data can be used in other classification tasks, such as predicting the type of synthesis techniques, heating atmospheres, etc. Finally, we note that the models in this work focused primarily on oxides, which make up a substantial fraction of inorganic compounds but not all.[57] Transferring predictive models trained on oxides to other chemistries is challenging because of significant concept drift. For example, the bonding of other types of compounds, such as nonoxide chalcogenides and intermetallics, is fundamentally different than that of oxides, leading to different self-diffusion and interdiffusion rates. This difference modifies the distributions of feature values significantly (e.g., melting points are systematically lower for metal precursors compared to oxides). If simply applied to other chemistries without any retraining, the parameters fit for oxide compounds would systematically mis-predict the synthesis conditions. However, if sufficient data becomes available for desired nonoxide materials classes of interest, the methods used in this work would be useful for training and interpreting these new models.

Conclusion

In this work, we developed an interpretable ML method for predicting solid-state synthesis heating temperatures and times on over 6300 synthesis reactions, which are from a larger (over 30 000) synthesis data set text-mined from scientific literature.[16] The goodness-of-fit values are R2 ∼ 0.5–0.6 for temperature prediction and R2 ∼ 0.3 for time prediction. However, interpretation of such R2 values has to consider the fact that there is no single exact time or temperature for a typical synthesis. For heating temperature prediction, which is an important parameter for solid-state synthesis, the prediction MAE of our model is ∼140 °C, comparable to a similar study using generative conditional variational autoencoder (CVAE).[19] Heating time prediction has an MAE of ∼0.3 log10(h–1), which translates to a prediction range [0.5t, 2t] if the predicted time is t. The expected prediction errors can be estimated from Figure . Analysis of the ML models reveals that melting points and formation energies of precursors are good predictors for heating temperatures, which led us to extend Tamman’s rule from intermetallics to oxide compounds for predicting heating temperatures as linearly proportional to the average precursor melting point. One may use this extended Tamman’s rule to set quick, yet reasonable, initial heating temperatures for new solid-state reactions. The maps of compositional effects (Figure ) can be further used as guides to choose synthesis conditions with better accuracy given the chemistries of interest. Our model was trained and validated on a diverse set of materials and thus has broad applicability. Moreover, the ML methodologies developed in this work can be applied for learning synthesis conditions on other large synthesis data sets, such as solution-based synthesis of inorganic compounds and nanoparticles,[58,59] or even other tasks where strong model interpretability is preferred.

Methods

Curation of Synthesis Training Data

We used the data set of text-mined synthesis recipes that consists of 30 004 solid-state synthesis records[16] to generate the TMR data set. We took the synthesis conditions of the last heating step in the experimental procedures as the target of prediction. The synthesis heating temperatures were predicted in degrees Celsius. The reported heating times were transformed to log10(1/t), which is not only a better variable for measuring reaction speed but also shows smaller skewness and long-tailedness, which is better predicted by statistical ML models.[29] Note that the TMR data set is extracted using ML models and contains errors in synthesis conditions. On the basis of manual inspection, about 5% of the heating temperatures and 16% of the heating times were incorrectly extracted. To preprocess the data set, we first removed all entries with no extracted synthesis heating temperatures and times. To obtain thermodynamic data for all targets, we utilized the Materials Project (MP) database.[57] For targets that appear as entries in MP, we simply used the reported thermodynamic information. For targets without a direct match to an MP entry, we performed interpolation by representing them using linear combinations of the most similar entries in MP as measured by the difference in composition (see Supporting Information for calculation details). The 0 K thermodynamic data was then transformed to finite-temperature Gibbs free energies of formation using the previously developed method.[60] Using the finite-temperature ΔG(T) predictions and thermodynamic properties of gases, we computed reaction driving forces, i.e., the grand potential change for the synthesis reactions, ΔΦ, by assuming the system is open to atmospheric partial pressures of O2 and CO2.[61−63] The reactions were then decomposed into phase evolution steps by selecting pairs of reactants with the largest grand potential change in each step. Details of the thermodynamic quantity calculation and phase evolution construction can be found in the Supporting Information and reproduced using the provided codes. We removed the reactions that cannot be handled by the above thermodynamic calculations (e.g., missing relevant MP entries or containing gases other than O2 and CO2), leading to 7562 remaining reactions. As a result of the release of CO2 gases in carbonate precursor materials, the reaction driving forces have systematically shifted distributions for reactions with and without carbonate precursors. Grouping the data set into carbonate and noncarbonate reactions thus fits two sets of coefficients that account for this shift and improves the overall performance. Therefore, in our analysis, we split the data set into carbonate reactions and noncarbonate reactions. The original Pearson’s Crystal Data (PCD) collection is semistructured containing chemical formulas of input/output materials and a natural language description of the synthesis procedure. We used the same approach as in the generation of the TMR data set to balance synthesis reactions and calculate phase evolution reaction thermodynamic driving forces. The synthesis procedure description text is used to text-mine synthesis operations that contain synthesis condition values. To make the PCD data set have a chemistry distribution similar to that of the TMR data set, we only kept oxide syntheses as the TMR data set is dominated by oxide syntheses. We also ensured there are no duplicates by removing any entries in the PCD data set that are also in the TMR data set by matching their article DOIs.

Features for Synthesis Prediction

For each reaction in the curated training data, we computed four types of synthesis features (133 features in total).

Precursor Compound Properties

The first type of features (12 in total) are the average/minimum/maximum/difference of melting points, standard enthalpy of formation ΔH300K, and standard Gibbs free energy of formation ΔG300K of the precursors. The melting points were retrieved from the NIST Chemistry WebBook[64] and PubChem databases,[65] while the thermodynamic properties were retrieved from the FREED database,[66] an electronic compilation of the U.S. Bureau of Mines (USBM) thermodynamic data obtained with experiment.

Target Compound Compositional Features

The second type of features are 74 indicator variables representing the presence (1) or absence (0) of different chemical elements in the target compound. We did not use more differentiating features such as the fractional compositions of each element because more than 60% of the chemical systems in the TMR data set have less than 5 samples, and more differentiating features make ML models prone to overfitting. Note that this may not be true if training data were to become relatively abundant for each chemical system, in which case numerical encoding of the compositions may be a better approach.

Reaction Thermodynamics Features

We used 33 thermodynamic features, including the total reaction driving force ΔΦ, first and last pairwise reaction driving forces ΔΦ and ΔΦ, and the ratio between first/last pairwise reaction driving force and the total reaction driving force, evaluated at different temperatures T = 800, 900, 1000, 1100, 1200, and 1300 °C. We also calculated the slopes of ΔΦ, ΔΦ, and ΔΦ by assuming they are linear with respect to temperature and used the slopes as additional features.

Experiment-Adjacent Features

The fourth type of features are 14 experiment-adjacent features, i.e., indicator variables representing whether certain devices (zirconia balls for ball-milling), experimental procedures (sintering, ball-milling, multiple heating steps, homogenization, repeated grinding, diameter measurement, polycrystalline preparation), and additives (binder materials, distilled water and other liquid additives, phosphors, poly(vinyl alcohol)) were used in the synthesis. Because we used WLS, which is sensitive to outliers, we performed outlier detection algorithms on the feature values and removed around 10% of the reactions. The final training data consists of two data sets totaling 6325 reactions. The subset of carbonate reactions consists of 3182 reactions. The subset of noncarbonate reactions consists of 3143 reactions.

Training and Evaluation of ML Models

We used linear and nonlinear regressors to train the ML models. For linear models, we used WLS, a weighted version of ordinary least-squares in Python packages scikit-learn(67) and statsmodels.[35] For nonlinear models, we used the XGBoost package[36] and trained GBRT models. To evaluate the model goodness-of-fit, we used the coefficient of determination, R-squared (or R2). For nonlinear regressors and out-of-sample evaluations, R2 is poorly defined, and Efron’s extended version[68] of Pseudo-R2 was used. Pseudo-R2 is calculated as 1 – (mean square error/variance of data) and directly comparable to R2 values. We implemented DI analysis, a model-agnostic method that calculates the average increase of model R2 to rank features according to their contribution of predictive powers. Three types of DI values−APDI values, IDI values, and IADI values−were computed according to Azen and Budescu.[31] However, to compute the exact APDI values for all 133 features, we needed to train 2133 (sub)models, which is a computationally prohibitive task. Instead, we estimated APDI values as Δ() by randomly sampling 200 submodels for each feature. All the features were ranked according to the sum of the APDI, IDI, and IADI values. This ranking measures the relative predictive powers of the features and was used to sort all features into an ordered list, as in Figure . We next used the ranking of predictive power to perform forward feature selection for the ML models. Specifically, we started with a linear model with no features but the intercept term. Features were sequentially added into the linear model according to the ranking of predictive power. In this process, we calculated the BIC value of the linear models and removed any feature that would increase the BIC value (an indicator of overfitting). The final list of features were then used in training the models in Figures and 7. We performed LOOCV to cross-validate regressors and detect overfitting. To test model generalizability, we applied out-of-sample prediction by evaluating model performances on another synthesis condition data set compiled from the PCD data set.[32]

25 in total

1. The dominance analysis approach for comparing predictors in multiple regression.

Authors: Razia Azen; David V Budescu
Journal: Psychol Methods Date: 2003-06

2. Observing and Modeling the Sequential Pairwise Reactions that Drive Solid-State Ceramic Synthesis.

Authors: Akira Miura; Christopher J Bartel; Yosuke Goto; Yoshikazu Mizuguchi; Chikako Moriyoshi; Yoshihiro Kuroiwa; Yongming Wang; Toshie Yaguchi; Manabu Shirai; Masanori Nagao; Nataly Carolina Rosero-Navarro; Kiyoharu Tadanaga; Gerbrand Ceder; Wenhao Sun
Journal: Adv Mater Date: 2021-05-05 Impact factor: 30.849

3. Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks.

Authors: Edward Kim; Zach Jensen; Alexander van Grootel; Kevin Huang; Matthew Staib; Sheshera Mysore; Haw-Shiuan Chang; Emma Strubell; Andrew McCallum; Stefanie Jegelka; Elsa Olivetti
Journal: J Chem Inf Model Date: 2020-01-28 Impact factor: 4.956

4. Text-mined dataset of gold nanoparticle synthesis procedures, morphologies, and size entities.

Authors: Kevin Cruse; Amalie Trewartha; Sanghoon Lee; Zheren Wang; Haoyan Huo; Tanjin He; Olga Kononova; Anubhav Jain; Gerbrand Ceder
Journal: Sci Data Date: 2022-05-26 Impact factor: 8.501

5. Dataset of solution-based inorganic materials synthesis procedures extracted from the scientific literature.

Authors: Zheren Wang; Olga Kononova; Kevin Cruse; Tanjin He; Haoyan Huo; Yuxing Fei; Yan Zeng; Yingzhi Sun; Zijian Cai; Wenhao Sun; Gerbrand Ceder
Journal: Sci Data Date: 2022-05-25 Impact factor: 8.501

6. Quantifying the advantage of domain-specific pre-training on named entity recognition tasks in materials science.

Authors: Amalie Trewartha; Nicholas Walker; Haoyan Huo; Sanghoon Lee; Kevin Cruse; John Dagdelen; Alexander Dunn; Kristin A Persson; Gerbrand Ceder; Anubhav Jain
Journal: Patterns (N Y) Date: 2022-04-08

7. Mechanistic insight of KBiQ₂ (Q = S, Se) using panoramic synthesis towards synthesis-by-design.

Authors: Rebecca McClain; Christos D Malliakas; Jiahong Shen; Jiangang He; Chris Wolverton; Gabriela B González; Mercouri G Kanatzidis
Journal: Chem Sci Date: 2020-11-23 Impact factor: 9.825

8. Machine-learned and codified synthesis parameters of oxide materials.

Authors: Edward Kim; Kevin Huang; Alex Tomala; Sara Matthews; Emma Strubell; Adam Saunders; Andrew McCallum; Elsa Olivetti
Journal: Sci Data Date: 2017-09-12 Impact factor: 6.444

9. PubChem in 2021: new data content and improved web interfaces.

Authors: Sunghwan Kim; Jie Chen; Tiejun Cheng; Asta Gindulyte; Jia He; Siqian He; Qingliang Li; Benjamin A Shoemaker; Paul A Thiessen; Bo Yu; Leonid Zaslavsky; Jian Zhang; Evan E Bolton
Journal: Nucleic Acids Res Date: 2021-01-08 Impact factor: 16.971

10. Kinetically Stabilized Cation Arrangement in Li₃ YCl₆ Superionic Conductor during Solid-State Reaction.

Authors: Hiroaki Ito; Kazuki Shitara; Yongming Wang; Kotaro Fujii; Masatomo Yashima; Yosuke Goto; Chikako Moriyoshi; Nataly Carolina Rosero-Navarro; Akira Miura; Kiyoharu Tadanaga
Journal: Adv Sci (Weinh) Date: 2021-06-17 Impact factor: 16.806