Literature DB >> 34860548

Crystal graph attention networks for the prediction of stable materials.

Jonathan Schmidt¹, Love Pettersson², Claudio Verdozzi², Silvana Botti³, Miguel A L Marques¹.

Abstract

Graph neural networks for crystal structures typically use the atomic positions and the atomic species as input. Unfortunately, this information is not available when predicting new materials, for which the precise geometrical information is unknown. We circumvent this problem by replacing the precise bond distances with embeddings of graph distances. This allows our networks to be applied directly in high-throughput studies based on both composition and crystal structure prototype without using relaxed structures as input. To train these networks, we curate a dataset of over 2 million density functional calculations of crystals with consistent calculation parameters. We apply the resulting model to the high-throughput search of 15 million tetragonal perovskites of composition ABCD2. As a result, we identify several thousand potentially stable compounds and demonstrate that transfer learning from the newly curated dataset reduces the required training data by 50%.

Entities: Chemical

Year: 2021 PMID： 34860548 PMCID： PMC8641929 DOI： 10.1126/sciadv.abi7948

Source DB: PubMed Journal: Sci Adv ISSN： 2375-2548 Impact factor: 14.136

INTRODUCTION

Machine learning methods have found increasing success in materials science and solid-state physics (–). Requiring orders of magnitude less computation time than traditional approaches such as density functional theory (DFT), machine learning methods allow for the prediction of material properties with close to ab initio accuracy. In the past few years, machines were developed to predict a plethora of physical properties, ranging from bandgaps (–), hardness (), magnetic transition temperatures (), etc. A particularly interesting property is the energy that ultimately determines the stability of a given material. Therefore, it is not unexpected that predicting the energy is essential for the challenging task of finding new stable compounds. The modern theoretical approach to finding new materials involves scanning the whole composition space of one crystal structure, optimizing each crystal with DFT, and then comparing the DFT energy with all possible decomposition channels. Binary composition spaces are easily in the reach of DFT and have been extensively explored in the past (). However, there are already around 105 different ternary combinations of chemical elements, and these can exist in a large number of different stoichiometries. Quaternary and higher prototypes are simply out of the reach of systematic DFT searches. Machine learning strategies have shown a lot of promise to speed up this process (–), with different approaches being proposed in the past. The main and most efficient approach to high-throughput searches is to calculate the distance to the convex hull of thermodynamic stability for all compositions of a single prototype (–). This step can be substantially accelerated by a machine learning model trained for the specific prototype, requiring separate training data for every prototype (, , ). An alternative is the development of composition-based models that are prototype agnostic (, ). These models can determine potentially stable compositions; however, they do not yield any information about the crystal structure of the material. There are furthermore a large number of message passing networks (MPNs) (–, –) that predict formation or absolute energies based on atomic positions and compositions. These networks usually achieve very high accuracy, but, unfortunately, they require a priori knowledge of the crystal structures (both lattice vectors and positions of the atoms) that are, in general, not available when searching for new materials. In this work, we go beyond these approaches by developing a model that predicts the distance to the convex hull based on both the composition and the generic structure prototype but without requiring knowledge of the precise crystal structure. A previous approach that used the same philosophy is (), where handcrafted structural features based on the Voronoi tessellations of the unit cell are used as input for random forests. A more recent work () also used Voronoi features, but, unfortunately, the error increases markedly for nonrelaxed structures [see the supplementary material of ]. To circumvent these problems, instead of using handcrafted features, we combine techniques from previous deep MPNs (, , , , , ). The goal of this work is to speed up prototype-based high-throughput searches beyond the possibilities of previous machine learning models. We remark that, as our model does not make use of the complete structural information, it does not give us access to forces and stresses and therefore cannot be used as a generic force field. Just as important as the choice of model is, of course, the dataset. The commonly used datasets for machine learning are obtained from the materials project () and the open quantum materials database (OQMD) (). The former is often used as a benchmark and as a training set for stability prediction [see, e.g., ()]. However, as it contains mostly stable (or close to stable) compounds, one cannot realistically evaluate the error in the distance to the convex hull (or the formation energy) of a model trained exclusively on this dataset. The OQMD, on the other hand, is hard to combine with other large datasets because of the use of incompatible parameters in the DFT calculations. To construct a large dataset that allows for good transfer learning performance, we accumulated and curated data from various sources. In this way, we obtained a dataset that includes more than 2 million DFT calculations of both stable and unstable materials in a large variety of crystal structures. The remainder of this manuscript is structured as follows. In next section, we start with the developed model and the accumulated dataset. We then show the power of our model by studying in detail a quaternary family of perovskites. After the discussion of our results, we go over the details of the work in Materials and Methods.

RESULTS

Crystal graph attention networks

The crystal structure prototype will enter our model as a crystal graph. To incorporate the neighborhood information, each vertex is labeled by an embedding for the elemental species, and each edge by an embedding for the graph distance (see Fig. 1). The edge embeddings are initialized completely randomly, while the vertex embeddings are pretrained embeddings from (). For the latter, an extra one-layer fully connected network is used to allow for possible changes in the embedding.

Fig. 1.

The crystal structure is transformed into a graph.

The crystal structure is transformed into a graph.

In this case, the crystal structure is a mixed perovskite, and we consider the five nearest neighbors. Here, blue edges represent first neighbors, black edges represent second neighbors, and green edges represent third neighbors. During the message passing steps, each individual edge and vertex embedding is updated based on its neighborhood. As in the MegNet model (), during the message passing phase, the information of each vertex and its corresponding neighbors and edges is combined to calculate a new representation for each vertex and edge. This process is repeated several times until we arrive at a final representation for each vertex. This representation is then pooled, taking into account a global context vector along the ideas of (). In the following, we discuss the mathematical details of the message passing phase. Vectors will be denoted in bold letters. Each material starts with a representation for each atom i. This representation is updated through a message passing approachwhere U is an update function that depends on the crystal graph attention networks’ previous representation, the vertices of the neighbors, and the edges connecting the neighbors to the vertex. The neighbors are ordered by distance, and the edges are assigned corresponding embeddings for first neighbor, second neighbor, etc. These embeddings start as randomly initialized vectors and are trained together with the rest of the network. Naturally, one has to use periodic boundary conditions and a cutoff radius as we consider solids. The update function is based on the attention mechanism that has revolutionized natural language processing () and has also found application in graph neural networks (, , ). Previous graph attention networks applied to materials science used simple fully connected neural networks (FCNNs) to calculate a number of coefficientsfrom the concatanation ‖ of the two vertex representations and the edge representation. Here, and in the following equations, the index n counts the number of FCNNs. These coefficients are normalized with a softmax function In contrast to these previous works, we use vectors instead of coefficients, which effectively results in a separate attention coefficient for each element of the representation of each node/message. The resulting attention vectors are used to weight the messages when the representation is updated by Every pair of and can be considered one attention head. Ideally, each attention head learns to direct its focus to different features. This message passing procedure is repeated a number of times. At each message passing step, the edge embeddings are updated in a similar manner according to the following equations The HFCNNs are hypernetworks along the work of (), where the parameters of each network depend on the starting state of the node/edge and the state at step t. Recently, () suggested using an extra global compositional vector for a last attention-based pooling layer. We follow this approach and use an extra roost () model that calculates a representation vector C of the total composition also based on a graph attention network where the composition enters as a complete graph. This vector is then concatenated with the representation of each atom and used to calculate a final representation of the compound according to the following equations In this way, the network can once again evaluate the importance of the different elements using learned knowledge of the whole composition. A fully connected network with residual connections FCNN () is used to obtain the final output. While the study of Bartel et al. () recommends to learn formation energies, other machine learning studies came to different conclusions () and showed that it is advantageous to predict directly the distance to the convex hull. In particular, in (), the authors obtained lower errors when learning formation energies; however, they showed that stability predictions were more successful when distances to the hull were predicted. For simplicity, we decided to predict directly the quantity we are interested in, the distance to the convex hull of stability.

Data

As already discussed in Introduction, the datasets most commonly used for machine learning are the materials project () and the OQMD () as they use internally consistent parameters. The largest freely available database, Automatic FLOW for Materials Discovery (AFLOW) (), is used less commonly even though it contains over 3 million compounds. Unfortunately, the OQMD can neither be easily combined with the materials project nor AFLOW as its calculation parameters are most often not compatible with the other two. To construct a consistent dataset, we decided therefore to use all data from the materials project as well as all data from AFLOW that used functionals, pseudopotentials, and Hubbard Us that were consistent with the materials project. Furthermore, we added about 1.3 million compatible calculations from our group (, , ). The final dataset after filtering all nonsuitable materials contained 2,093,838 compounds. Most of these belong to large groups of prototypes, such as cubic ternary perovskites (∼230,000 systems), tetragonal mixed perovskites (∼340,000 systems), chalcopyrites (∼100,000 systems), and delafossites (∼30,000 systems). This has to be considered during the evaluation, as the out-of-group prediction error () will be larger than for a randomly selected training and validation set. The distribution of the distances to the convex hull of the final dataset is displayed in Fig. 2. It is evident that the materials project data consist mostly of compounds that are stable or close to stable with a mean, median, and SD of 220, 50, and 530 meV/atom. Because of its consistency in calculation parameters and ease of access, it is very commonly used as a benchmark set for machine learning algorithms. However, because it focuses on stable compounds, the distribution of distances to the convex hull is very different from the distribution of a random sample of compounds. Therefore, it is clear that this dataset is not convenient to train general machines to search for new materials. On the other hand, it is an ideal benchmark set for other properties, such as bandgaps, as these are mostly relevant for stable materials.

Fig. 2.

Schema depicting the workflow for the creation of the dataset and the resulting energy distribution.

Schema depicting the workflow for the creation of the dataset and the resulting energy distribution.

A total of 2.7 million calculations from AFLOW, 0.14 million from the materials project, and 1.3 million from our group were accumulated and curated, leaving in the end 2.09 million data points (0.96 million of which from AFLOW, 0.10 million from the materials project, and 1.02 million from our group data). A histogram depicting the distance to the convex hull of the final dataset is shown in the right. M, million. The curated AFLOW follows a typical skewed Gaussian with a mean, median, and SD of 530, 440, and 400 meV/atom. The data from our group consist of multiple subsets as shown in Fig. 3. The highest peak arises from the data of (), in which chemical elements were substituted by similar ones in stable materials. As we can see, this method leads to materials very close to stability. The green peak originates from machine learning–guided high-throughput calculations () (that resulted in relatively stable compounds) together with ~40,000 compounds from random training sets. The rest of the data consist of various traditional high-throughput searches of perovskites (), mixed perovskites, chalcopyrites, Heusler compounds, and delafossites.

Fig. 3.

Distribution of the distances to the convex hull for our group’s data.

Distribution of the distances to the convex hull for our group’s data.

In orange, we show mostly stable or close to stable compounds resulting from substitutions of chemically similar elements into stable structures (). High-throughput studies of several prototypes with all compositions are in blue/red. Machine learning (ML)–guided high-throughput study including the training set in green. We will specifically focus on a dataset of mixed perovskites (see Fig. 1 for the crystal structure) as we apply the models developed in this work to a high-throughput search of their compositional space. We used a training set of around 180,000 randomly selected compositions. A further dataset of ∼64,000 low-energy mixed perovskites was later selected by the machine learning models. This dataset was not considered in the calculation of the hull to avoid any leak of information into the training set.

Experiment

The network was first trained on the newly accumulated large dataset (minus the ∼180,000 mixed perovskites used in the next section) with a training/validation/test split of 80%/10%/10%. The network achieved a mean absolute error (MAE) in the training set of 30 meV/atom for our large dataset. As a large part of the dataset is composed of high-throughput calculations for a small number of crystal structure prototypes, this error does not properly represent the abilities of the model. Consequently, we will do an in-depth validation in the next section using a family of quaternary perovskites. We also researched the ability of the model to differentiate the stability between different polymorphs. To accomplish this, we selected all structures whose composition appeared at least twice in the test set. We then checked whether the model predicts the correct relative stability for each pair of structures. In Fig. 4, we plot the percentage of correct orderings as a function of the minimal difference in energy between the polymorphs. In the inset, we show the distribution of compositions with a given number of polymorphs in the test set. As expected from a random test set, the majority of repeated compositions only appears twice, with around 7500 appearing three times, roughly 2800 four times, and 850 five times. Our model correctly predicts the relative stability of two polymorphs 93% of the time, increasing to 97% for polymorphs with an energy difference of 10 meV/atom and 99% for 100 meV/atom.

Fig. 4.

Percentage of correctly predicted relative stability between polymorphs versus the minimum difference in energy between the compared polymorphs.

Percentage of correctly predicted relative stability between polymorphs versus the minimum difference in energy between the compared polymorphs.

The main plot shows the percentage of correctly predicted relative stability between polymorphs as a function of the minimum difference in energy between the compared polymorphs. Inset: The distribution of chemical compositions that have a given number of polymorphs is shown. This data include all compositions that appear at least twice in the test set. To achieve some comparability with other works, we also tested the model on the materials project dataset for the formation energies from (). Using a training/validation/test split of 60%/20%/20%, we achieved a MAE of 41 meV/atom in the test set. The learning rate/batch size was changed in comparison to the other datasets because of the different dataset size. For the same set (), crystal graph convolutional networks achieve an MAE of 33.2 meV/atom, while MEGNet achieves an MAE of 32.7 meV/atom. We have to note that our result is for a single validation split, while in (), the error was averaged over five different splits. This slightly worse error is not unexpected as our networks do not use the complete information of the optimized crystal structure. Furthermore, we trained our model on the quaternary Heusler dataset of (). We used a training/validation/test split of 85%/5%/10% to be compared with 90%/10% of (). We obtained a test error of 9 meV/atom in comparison to the best validation error of 37 meV/atom in (), demonstrating the quality of our network.

Validation for mixed perovskites

In this section, we analyze in detail the ability of the model trained in the previous section to be used in high-throughput searches. We concentrate on a family of perovskites that can be obtained by alloying the Wyckoff 3d position of the cubic ABX3 system, leading to the quaternary ABX2Y composition. Among these, we find the well-known mixed-anionic oxynitride and oxyfluoride perovskites () that have found interesting applications such as in optoelectronics (). The basic tetragonal crystal structure of our mixed perovskites is depicted in Fig. 1. We note that the size of the compositional space for the chemical elements we took into account is around 15 million. This includes not only the mixed-anionic systems but also inverted perovskites where the nonmetal occupies the center of the octahedra and the vertices are formed by a mixture of two metals. In view of the large number of stable inverted ternary perovskites () and the large number of possibilities for alloying two metals, we expect that the number of stable inverted quaternary perovskites will dwarf the number of mixed-anionic compounds. All elements up to bismuth with the exception of the noble gases and the lanthanides were used. We did not use traditional stability tolerance factors, like the Goldschmidt factor or charge neutrality, in order not to bias the machine. If we consider compounds close to stability, we discover that the majority are not charge neutral in the standard (most common) oxidation state of the constituent chemical elements. If we use pymatgen to assign oxidation states to the structures below 100 meV/atom, we find that only 1056 cases have a charge neutral configuration, corresponding to 13% of the compounds. Considering these results, we believe that it is important not to bias training data using empirical rules but to rely exclusively on DFT results. We started by constructing a dataset for random quaternary compositions composed of around 173,900 systems. This dataset was split into a training set of 139,123 compounds and a validation and test set of 17,390 entries each. The MAE of the general machine discussed in the previous section in this test set is 508 meV/atom, which is considerably higher than the 30 meV/atom obtained for the large training set. This is not unexpected considering the facts that the pretraining dataset is extremely biased toward a few crystal prototypes and has a mean distance to the convex hull of 590 meV/atom, while the mixed perovskites set is extremely unstable with a mean of 1445 meV/atom. If we consider mixed perovskites closer to stability, e.g., below 500 meV/atom, then the MAE becomes a lot more reasonable at 132 meV/atom.

Transfer learning

A way to improve the general model presented before is to use transfer learning. For this, we create sets containing from 2.5 to 80% of the 173,903 mixed quaternary compounds and use them to retrain our previous general model. In this case, the learning rate was reduced by a factor of 10, and no further hyperparameters were changed or optimized. For comparison, we also trained models only in these training sets (i.e., without pretraining) and with a three-dimensional (3D) ElemNet model (see Materials and Methods). The validation and test sets contained 17,390 compounds each as discussed before. We also trained a Representation Learning from Stoichiometry (ROOST) method () to the perovskite data. However, because of the inability of ROOST to differentiate between different compounds with the same overall composition, the error of our trained models remained stubbornly high. In Fig. 5, we show that the MAEs for the pretrained model are improved for all training set sizes and that the purely composition-based ElemNet is outclassed by the graph networks. However, the advantage of using a pretrained model drops off as the training set size increases and the majority of the information learned during the pretraining is already included in the training data. With pretraining, we arrive at an error of 62 meV/atom with a training set size of 17,400 systems, while more than 35,000 samples are required to achieve the same test error without pretraining. As this error is sufficient to start a machine learning–guided high-throughput search, we see that transfer learning can easily reduce the required training data by a factor of two.

Fig. 5.

Test MAE versus number of systems in the training set.

Test MAE versus number of systems in the training set.

We display the MAE for the test set of mixed perovskites in dependence of the number of mixed perovskite for both a pretrained as well as a nonpretrained crystal graph attention network and a 3D ElemNet model. We also trained a model using the scalar attention coefficients of (, ) on the mixed-perovskite dataset. The resulting test MAE was 11% higher, proving the superior quality of the vectorized attention operation used in this work. In Fig. 6, we display the MAE for all mixed perovskites containing a certain chemical element. Note that we left out compounds including vanadium from the training set to investigate the transferability across the periodic table without extra training data (see the next section). As discussed in previous studies (, ), magnetic elements like chromium, manganese, and iron as well as some first row elements show a higher MAE. The first is most likely due to the magnetic interactions not being described properly by the DFT calculation, which leads to difficulty in learning, while the second is caused by the well-known first row anomaly (, ) of the periodic table.

Fig. 6.

MAE for structures containing each element.

MAE for the mixed perovskites in the test set containing each chemical element.

MAE for structures containing each element.

MAE for the mixed perovskites in the test set containing each chemical element.

High-throughput search

As a further independent validation of our machines, we selected all mixed perovskites predicted to have a distance to the hull below 200 meV/atom. We used two different models: the best pretrained model from the previous section and the ElemNet model to select these compounds. Our mixed perovskites have a tetragonal crystal structure. As such, the list of neighbors that are used as input features to our model depends to some extent on the ratio of the cell constants, which are unfortunately unavailable without performing the actual ab initio calculation. To circumvent this problem, we constructed prototypes with different ratios and used the lowest value of the distance to the convex hull as the actual prediction of the machine learning model. In practice, we used cell constant ratios of 0.85, 0.9, 1.1, and 1.15 for c/a while keeping a = b to maintain the symmetry of the system. In total, 64,914 compounds were selected below the cutoff of 200 meV/atom. This choice is higher than the 70 to 100 meV/atom often used [see, for example, ()] and is motivated by (), where it was found that the large majority of experimentally known mixed perovskites are below 200 meV/atom when computed in the five-atom tetragonal unit cell. As before, we optimized the geometries of these compounds with DFT and calculated the distance to the convex hull of thermodynamic stability. The MAE of the pretrained machine learning model for this new validation set using the DFT-relaxed structures was 33.5 meV/atom, which is comparable to the error in the validation set. The error for the unrelaxed structures decreases with the number of considered structures from 45.7 meV/atom for one unrelaxed structure to 36.5 meV/atom for the four considered structures. Furthermore, for 88% of the materials, the model predicted correctly whether a > c or a < c was the more stable phase, illustrating the structure sensitivity of the model. We are now also in position to determine the generalization error of the machine to vanadium-containing compounds. The MAE for these materials is 87 meV/atom for the best pretrained model. This should be compared (see Fig. 6) 34 meV/atom for Ti, 84 meV/atom for Cr, and 85 meV/atom for Fe. We can see that the error is still perfectly acceptable, showing that the machine can reliably interpolate in the periodic table. Last, in Fig. 7, we depict the distribution of the distance to the convex hull (calculated with DFT). If we consider various lower bounds, then we arrive at 21,333 materials below 150 meV/atom, 8681 materials below 100 meV/atom, 2405 materials below 50 meV/atom, 404 below 5 meV/atom, and 325 below the convex hull. We note that we demonstrated in a recent work () that many mixed perovskites can be stabilized by a substantial amount through structural distortions (like the rotation and tilting of the octahedra) or by considering different arrangements of the C and D atoms. Some systems are stabilized by more than 150 meV/atom. Furthermore, configurational entropy, which was not considered in (), is an additional stabilizing factor. Last, the experimentally known mixed perovskite with the highest DFT distance to the convex hull is LaZrO2N at 260 meV/atom (). As such, we believe that there is a very good chance that a large majority of the compounds that are below 150 meV/atom can be synthesized experimentally.

Fig. 7.

Distances to the convex hull for the mixed perovskites predicted to be below 200 meV/atom.

Distances to the convex hull for the mixed perovskites predicted to be below 200 meV/atom.

We show the distances to the convex hull calculated with DFT for the mixed perovskites predicted to be below 200 meV/atom by the machine leaning model and the percentage of systems below the energy on the x axis in red. In Fig. 8, we show the distribution of elements A, B, C, and D for the potentially stable ABC2D perovskites. In this case, we plotted all 8681 compounds that were calculated to have a distance to the convex hull smaller than 100 meV/atom. In what concerns the A atom, we find mainly alkali atoms and metals around indium. This is in agreement with the findings of () for ternary perovskites. The distribution for the B atom (in the center of the octahedra) is especially notable with 66% of the materials containing either hydrogen, or carbon, or nitrogen, or oxygen. This points to the fact that the majority of discovered materials are inverted perovskites. Last, for the C and D atoms (at the vertices of the octahedra), we find a large collection of metals and also a few halogens. Obviously, the former corresponds to inverted perovskites, and the latter corresponds to normal (noninverted) halide perovskites with two halogens alloyed in the vertices of the octahedra.

Fig. 8.

Number of potentially stable Perovskite with a specific element in A, B, C, or D position.

Perovskites ABC2D that are predicted by DFT to have a distance to the convex hull smaller than 100 meV/atom and that have a certain chemical element in positions A, B, C, or D.

Number of potentially stable Perovskite with a specific element in A, B, C, or D position.

Perovskites ABC2D that are predicted by DFT to have a distance to the convex hull smaller than 100 meV/atom and that have a certain chemical element in positions A, B, C, or D. The overwhelming majority of the compounds are metallic, with only 8% of the systems exhibiting a bandgap above 0.1 eV. As expected, the largest bandgaps are observed for halide perovskites or for systems with a halogen occupying the C position and H in the D position. On the other hand, more than 15% of the tetragonal compounds exhibit a magnetic polarization, because of the abundance of magnetic 3d metals in the stable compositions. In the following, we look in more detail into some of these compounds. We note that we restrict our discussion to the tetragonal unit cell and that the properties of this crystal phase might be different from a cell including structural deformations (octahedra tilting and rotation, etc.). Moreover, our five-atom unit cell is not capable of describing different positions of the C and D atoms such as the cis arrangement, which is known to be favorable in some oxynitrides, oxyhalides, and oxysulphides (, ), or more complex orderings (). We also do not analyze oxynitride, oxyfluoride, or nitrofluoride compounds as these were discussed in a previous work that performed an exhaustive study of these systems (). We start with the normal perovskites, where the C and D atoms belong to the same nonmetallic group. This case is rather important as alloying the nonmetallic sites allows for the tuning of electronic properties, such as the bandgap. We could not find any system below 100 meV/atom that alloys two pnictogens. This is not unexpected as it is already very difficult to form nitride perovskites because of the very high oxidation state of nitrogen (, ). When C and D are elements of the chalcogenide family, we find 12 systems, among which six oxysulfides of compositions Ba{Zr,Nb,Hf,Ta}S2O and Na{Nb,Ta}O2S and six S─Se alloys with compositions Ba{Ti,Zr,Hf}S2Se and Ba{Ti,Zr,Hf}Se2S. These are mostly nonmagnetic semiconductors with Perdew-Burke-Ernzerhof (PBE) bandgaps that go up to 1.26 eV for BaHfS2O. We found many more compounds with C and D belonging to the halogen group. We remember that halide perovskites revolutionized research on solar energy with solar power conversion efficiencies that reached up to 22.1% in only 6 years. In modern solar cells, one often alloys I with Br as this was found to improve the stability of these compounds (, ). From the 580 compounds we found, 229 contained F, 354 contained Cl, 335 contained Br, and contained 286 I. Obviously, it is easier to alloy Br─Cl than, for example, F─I, and that is exactly what we find in our results with 149 systems for the former and 60 for the latter. The large majority of the compounds turn out to be nonmagnetic semiconductors, with bandgaps going up to 5.7 eV (for CsCaCl2F). We note that we find in this list the inorganic perovskites that are relevant for photovoltaics, such as CsPbCl2F and CsPbF2Cl for 11 different combinations of the nonmetals, showing the miscibility of the halogens in these compounds. Another very interesting type of system are hydride perovskites (). These have attracted some interest recently as possible materials for, e.g., hydrogen storage (–). We found a series of exotic hydrogen-containing mixed perovskites, where H is combined with a group 15, 16, or 17 elements. Up to 100 meV/atom, we find 42 hydronitride systems, the most stable being La{Cr, Mo, Tc, W, Re}N2H and La{Mn, Fe, Mo, Tc}H2N. Assuming the standard oxidation states of the nonmetals and of La, we see that the metal in the B site should have an oxidation state of +4 and +2, respectively, for the N2H and H2N perovskites. This is true for many of the systems we found, but we also find less common oxidation states in the data. These systems are metallic and can be magnetic when the B atom is a magnetic d metal. Looking now at group 16, we find 12 hydroxides () and 7 hydrosulfides with mostly Sr or Ba in the A position and a +2 or +3 metal in the B position. There are several semiconductors in this list, reaching a maximum PBE gap of 2.43 eV for BaYO2H. Last, we find 49 hydrohalides compounds with a +1 metal in the A position (either an alkali or Tl) and a +2 metal in the B position. There are also several hydride perovskites with similar compositions that were previously studied experimentally and computationally (), suggesting the possibility of the exchange of the hydrogen by a halide ion. These are nonmagnetic and can have rather large bandgaps reaching a maximum of 5.39 eV (in the PBE approximation) for KMgF2H. We now turn out attention to inverted or antiperovskites. These are materials that are finding diverse applications in battery technology, magnetism, superconductivity, etc. (). As mentioned before, we find a very large number of such systems, which is not unexpected as the majority of stable ternary perovskites are also inverted (), and it is relatively easy to alloy two C and D metals. We start by looking at inverted hydrides, of which we find 3007 below 100 meV/atom of the convex hull of stability. The most common elements that we find in the A position in these compounds are noble metals such as Pt (201 systems), Au (163), Ir (125), etc., or an element from groups 13 or 14 such as Sn (148), Ga (143), In (125), Al (100), etc. Naturally, most compounds are metallic, but we find a handful of semiconductors mostly of the form {Se, Te} and {P, As, Sb, Bi}, where A(1) and A(2) are alkalines and A(3) is an alkali earth. We note that the former belong to the family of inverted ternary hydride perovskites with both A and B sites occupied by anions, which were proposed as fast alkali ionic conductors (). We also found a couple of more exotic systems, such as TeH{Rh, Pd}2Li, TeHLi2Pd, and TeHPd2Sc. Our list contains 223 inverted boride perovskites, with the A position predominantly filled by Y (31 systems), Sc (30), In (28), Mg (20), etc. On the other hand, the C and D positions contain often a metal like Ru, Rh, Pd, Ir, Pt, etc., or a magnetic metal like Co or Ni. Not unexpectedly, these are all metals. Carbide antiperovskites have attracted some attention especially since the discovery of superconductivity in MgCNi3 (). There are 1002 (metallic) carbide antiperovskites in our list. Mg is the most common metal we find in the A position (with 82 systems), followed by In (76), Zn (74), Cs (72), etc. From the 82 systems with Mg, 12 of which also include Ni in the C or D position (alloyed with another 3d metal or with Li, Rh, and Ir). In the C position, the most common metals are Sc (228 system), Y (200), and La (200), while in the D position, we find mostly a 3d magnetic element or Ru, Ir, Tc, etc. Inverted nitride perovskites are metal-rich compounds with particularly interesting electronic and magnetic properties (). We find 1284 such systems with distances to the hull below 100 meV/atom. In the A position, we encounter mostly late transition and noble metals such as Ag, Au, Pt or Zn, Cd, Hg, etc., while in the corners of the octahedra, we find metals such as Ca, Se, La, Ba, etc. (in the C position) or Li, Co, Ni, Rh, etc. (in the D position). These tetragonal systems turn out to be mostly metallic, but with a handful of exceptions when the A position is occupied by another nonmetal such as a chalcogen or a halogen. The maximum PBE gap we find is for SNCa2Li with a direct bandgap of 1.04 eV, 39 meV/atom above the convex hull. We also find more than 300 magnetic systems with magnetic 3d elements in the C or D position. With O in the B position, we find 519 systems. Among the most stable ones, we find series of Li-containing compounds such as {Al, Ga, In}O{Sc, Y}2Li or {Sn, Pb}{Sc, Y}2Li, but there is a large variety in the compositions. Recently, ternary oxysilicides and oxygermanides of the type {Si,Ge}O{Ca,Sr}3 were proposed as candidates for nontoxic infrared semiconductors, as they exhibit sharp absorption edges below 1 eV (). We find a number of quaternary mixed perovskites of this family, namely, 7 oxysilicides, 11 oxygermanides, and 21 oxystannites, with Sr, Ca, Ba, but also Y, La, etc., alloyed in the C and D positions. All these compounds turn out to be metallic in our calculations, but this is very likely due to the use of the small five-atom tetragonal cell and the PBE approximation. Similarly to the hydrides, we also find a number of semiconducting systems with both the A and B positions occupied by anions. Examples are {S, Se, Te} and {Cl, Br, I}. We find PBE bandgaps in a rather large range up to 2.88 eV (for SOLi2Ca).

DISCUSSION

In this study, we developed a machine learning model that predicts the energy of a material as a function of the composition and the structure prototype. However, and in contrast with previous approaches, our input features do not require the precise knowledge of the geometry, so our model can be used to accelerate the discovery of new materials with high-throughput methods based on DFT. Our machine relies on crystal graph attention neural networks, and during the message passing steps, each individual edge and vertex embedding is updated based on its neighborhood. To train this machine, we compiled and curated a large dataset of more than 2 million density functional calculations. These include data points from online databases and from our own calculations that were obtained with compatible parameters. Despite its large size, this dataset is somewhat biased as many of the calculations are for a relatively small number of different crystal prototypes. To circumvent this problem, we propose a transfer learning approach, where our general purpose model is retrained for specific crystal structures. We experimented with this idea for a set of quaternary perovskites, showing that transfer learning can speed up the training of the model by a factor of two. By omitting vanadium-containing compounds from the training, we also showed that the network can reliably extrapolate to unknown regions of the periodic table. Last, we used our training model to predict stable quaternary perovskites, predictions that were then validated with DFT. It turns out that there are more than 20,000 materials that have a good chance of being synthesized experimentally. These are mostly inverted perovskites, with hydrogen, carbon, or nitrogen in the center of the octahedra, and with two metals alloyed in the vertices of the octahedra. In view of the above, we believe that our model combined with transfer learning techniques will allow us to explore a large domain of the vast chemical space in the search for new stable crystalline compounds.

MATERIALS AND METHODS

Data accumulation and filtering

As previously discussed, we combined data from the materials project, our group, and AFLOW. Concerning AFLOW, it has to be noted that a number of calculations in the database are “ill-calculated” as noted internally in the code of the AFLOW-CHULL tool (). Accordingly, all prototypes denoted as “_DEVIL_PROTOTYPES_” and all pseudopotential/prototype combinations that are known to be ill-converged were removed from the AFLOW data. Furthermore, outliers from AFLOW were removed along the strategy explained in (). Last, all actinides and nobles gases were removed from the data as they are not relevant for most applications, and some of the energies of the former are questionable. Duplicates were removed by checking for structures with the same composition, space group, and total energy (rounded to the fourth digit). The space group was determined with pymatgen, with the “symprec” parameter set to 0.1. The convex hull was then constructed with pymatgen (). All energies were corrected according to the materials project compatibility scheme. Furthermore, the distances to the hull were evaluated for each compound by removing the compound itself from the convex hull. Stable systems have then negative distances to the hull instead of being truncated at zero. This, in our opinion, should improve the learning as we are predicting a smoother quantity. All elementary substances were removed from the dataset as the roost model used for the global pooling only works for multinary structures. A few hundred compounds that did not have 24 neighbors in the cutoff radius were also removed from the dataset.

Calculation parameters

For all density functional calculations first published in this work, we optimized the geometry and calculated the energy with the code vasp (). Calculation parameters were chosen to be compatible with the materials project database (). We used the projector augmented wave () datasets of version 5.2 with a cutoff of 520 eV. The Brillouin zone was sampled by Γ-centered k-point grids with a uniform density calculated to yield 1000 k-points per reciprocal atom. All forces were converged to better than 0.005 eV/Å. Calculations were performed with spin polarization using the Perdew-Burke-Ernzerhof () exchange-correlation functional, with the exception of oxides and fluorides containing Co, Cr, Fe, Mn, Mo, Ni, V, and W, where an on-site Coulomb repulsive interaction U with a value of 3.32, 3.7, 5.3, 3.9, 4,38, 6.2, 3.25, and 6.2 eV, respectively, was added to correct the d states.

Implementation

The model was implemented in PyTorch () and PyTorch geometric () using PyTorch lightning () for convenience purposes. Code from () and () was reused, demonstrating the importance of sharing code together with a paper. The code developed here will be distributed on github (https://github.com/hyllios/CGAT).

Crystal graph attention network hyperparameters

The size, number of layers of the FCNNs, the number of attention heads, the number of message passing steps, the size of the embeddings, the learning rate, batch size, optimizer, hyperparameters of the optimizer, the maximum number of neighbors, the cutoff radius for both the graph attention network, and the roost model used for the global composition representation are all hyperparameters that had to be optimized over a number of runs. Because of the high training cost, they were not optimized automatically but rather by hand increasing the number of parameters in terms of attention heads, message passing steps, and embedding sizes until a further increase ceased to efficiently improve the error. Afterward, the batch size and learning rate were optimized. AdamW () in combination with a cyclical learning rate scheduler was used for the training of the model. As loss function, we used the expanded MAE used in () that includes an estimate of the aleatoric uncertainty. Using hypernetworks instead of normal fully connected networks resulted only in a very small gain in validation error; however, for larger datasets, the validation error converged after less learning rate cycles, reducing the training time. The hyperparameters used to train the crystal graph attention network were as follows. The output ResNet consisted of seven hidden layers with sizes 1024, 1024, 512, 512, 256, 256, and 128 and rectified linear unit () activation functions (Relu).

ElemNet

For comparison purposes, we use a model with a composition-based input, in this case a modified version of ElemNet. The original ElemNet was a standard FCNN with dropout and batch normalization layers. Naturally, such a network is not able to predict energies for mixed perovskites as it cannot distinguish between, e.g., BaTiPbO2 and TiBaPbO2. To circumvent this problem, we tried two different representations along the ideas of (, ) and used multiple input channels for each crystallographic position, and we ordered the input in the form of a periodic table (Fig. 9). Naturally, this kind of network is fixed to one specific crystal prototype. The ElemNet network started with one 3D convolutional layer [1 input channel, 92 output channels, kernel size (1, 6, 3), stride 1, and padding 0]. The resulting tensor was flattened and input in a 17-layer fully connected network with Relu () activation functions and sizes of 5520, 1024, 1024, 1024, 1024, 512, 512, 512, 256, 256, 256, 128, 128, 128, 64, 64, and 32. The hyperparameters for the training of the ElemNet model were as follows.

Fig. 9.

ElemNet representations with multiple dimensions.

(A) Representation of ElemNet (). (B and C) ElemNet representation extended to two and three dimensions.

ElemNet representations with multiple dimensions.

(A) Representation of ElemNet (). (B and C) ElemNet representation extended to two and three dimensions.

optimizer	AdamW
learning rate	0.000125
starting embedding	matscholar-embedding
nbr-embedding-size	512
msg-heads	6
batch size	512
max-nbr	24
epochs	390
loss	L1 loss
momentum	0.9
weight decay	1e-06
atom-fea-len	128
message passing steps	5
roost message passing steps	3
other roost parameters	default
vector attention	True
edges	updated
learning rate	cyclical
learning rate schedule	(0.1, 0.05)
learning rate period	130
hypernetwork	3 hidden layers; size, 128
hypernetwork activ. funct.	tahn
FCNN	1 hidden layer, size 512
FCNN activ. funct.	leaky RELU (76)

optimizer	AdamW
learning rate	0.001
batch size	200
epochs	until no improvement for 50 epochs
loss	L2 loss
momentum	0.9
weight decay	1e-06
learning rate	stepwise reduction
step size	50 epochs, reduction by factor 0.5

30 in total

1. First-Principles Computational Screening of Perovskite Hydrides for Hydrogen Release.

Authors: Yuanyuan Li; Jin Suk Chung; Sung Gu Kang
Journal: ACS Comb Sci Date: 2019-10-15 Impact factor: 3.784

2. Projector augmented-wave method.

Authors:
Journal: Phys Rev B Condens Matter Date: 1994-12-15

3. Unsupervised word embeddings capture latent knowledge from materials science literature.

Authors: Vahe Tshitoyan; John Dagdelen; Leigh Weston; Alexander Dunn; Ziqin Rong; Olga Kononova; Kristin A Persson; Gerbrand Ceder; Anubhav Jain
Journal: Nature Date: 2019-07-03 Impact factor: 49.962

4. Materials Screening for the Discovery of New Half-Heuslers: Machine Learning versus ab Initio Methods.

Authors: Fleur Legrain; Jesús Carrete; Ambroise van Roekeghem; Georg K H Madsen; Natalio Mingo
Journal: J Phys Chem B Date: 2017-08-16 Impact factor: 2.991

5. Predicting the Band Gaps of Inorganic Solids by Machine Learning.

Authors: Ya Zhuo; Aria Mansouri Tehrani; Jakoah Brgoch
Journal: J Phys Chem Lett Date: 2018-03-19 Impact factor: 6.475

6. Inverse Perovskite Oxysilicides and Oxygermanides as Candidates for Nontoxic Infrared Semiconductor and Their Chemical Bonding Nature.

Authors: Naoki Ohashi; David Mora-Fonz; Shigeki Otani; Takeshi Ohgaki; Masashi Miyakawa; Alexander Shluger
Journal: Inorg Chem Date: 2020-12-10 Impact factor: 5.165