Achintha Ihalage1, Yang Hao1. 1. School of Electronic Engineering and Computer Science, Queen Mary University of London, Mile End Rd, London, E1 4NS, United Kingdom.
Abstract
The success of machine learning (ML) in materials property prediction depends heavily on how the materials are represented for learning. Two dominant families of material descriptors exist, one that encodes crystal structure in the representation and the other that only uses stoichiometric information with the hope of discovering new materials. Graph neural networks (GNNs) in particular have excelled in predicting material properties within chemical accuracy. However, current GNNs are limited to only one of the above two avenues owing to the little overlap between respective material representations. Here, a new concept of formula graph which unifies stoichiometry-only and structure-based material descriptors is introduced. A self-attention integrated GNN that assimilates a formula graph is further developed and it is found that the proposed architecture produces material embeddings transferable between the two domains. The proposed model can outperform some previously reported structure-agnostic models and their structure-based counterparts while exhibiting better sample efficiency and faster convergence. Finally, the model is applied in a challenging exemplar to predict the complex dielectric function of materials and nominate new substances that potentially exhibit epsilon-near-zero phenomena.
The success of machine learning (ML) in materials property prediction depends heavily on how the materials are represented for learning. Two dominant families of material descriptors exist, one that encodes crystal structure in the representation and the other that only uses stoichiometric information with the hope of discovering new materials. Graph neural networks (GNNs) in particular have excelled in predicting material properties within chemical accuracy. However, current GNNs are limited to only one of the above two avenues owing to the little overlap between respective material representations. Here, a new concept of formula graph which unifies stoichiometry-only and structure-based material descriptors is introduced. A self-attention integrated GNN that assimilates a formula graph is further developed and it is found that the proposed architecture produces material embeddings transferable between the two domains. The proposed model can outperform some previously reported structure-agnostic models and their structure-based counterparts while exhibiting better sample efficiency and faster convergence. Finally, the model is applied in a challenging exemplar to predict the complex dielectric function of materials and nominate new substances that potentially exhibit epsilon-near-zero phenomena.
The quest for functional materials has spiked during the last few years marking a paradigm shift in materials design which has traditionally been a laborious process. One major landmark is the emergence of computational methods such as density functional theory (DFT) that can accurately estimate material properties from the smallest repeating unit of constituent atoms. This has shown great promise in discovering materials with target properties by eliminating redundant experimental cycles. However, DFT requires an experimentally or computationally characterized crystal structure to perform calculations, which is unavailable for the majority of hypothetical compounds. Predicting the ground‐state crystal structure requires powerful numerical approaches such as evolutionary algorithms, and high‐performance computing, which cannot be scaled up for structures of high complexity.[
] Moreover, the computational effort of DFT scales as the cube of system size (O(n
3)),[
,
] presenting a serious handicap in material exploration. Diversifying the applications of known materials and discovering new ones with desired properties are the highways to advance technologies. This calls for a fast and unified approach that enables us to predict material properties only from the stoichiometry, and also infer uncharacterized properties when the crystal structure is available.Machine learning (ML) has made rapid inroads into materials science as surrogate models that can learn a complex mapping from a fixed‐shape material descriptor to a target property. The dominance of ML in materials informatics propelled by curated databases is evident not only because of successful instances in new materials discovery, but also due to its significant impact on every step of material design hierarchy.[
,
,
,
] This includes replacing first‐principles calculations,[
,
,
,
,
,
,
] optimal design of experiments,[
,
,
] material characterization,[
,
,
] and improved understanding of material phenomena.[
,
,
] While hand‐crafted material descriptors may warrant uniqueness and invariance to translations, rotations, and permutations of constituents, the performance of ML models is heavily reliant on how fine the descriptor is and the level of chemical and structural information captured.[
] Structure‐agnostic descriptors enable new materials discovery without the need of crystal structure. However, this may require a unique mapping from composition to property, often achieved by accounting only the most stable polymorph (i.e. crystal structure with the lowest energy per atom). Therefore, metastable polymorphs are not attainable. Structure‐based descriptors on the other hand can encode polymorphs and generally yield much better ML performance, but they are restricted only to characterized crystals. Many compounds have arbitrary number of atoms and element types. This means conventional material representations are inevitably limited by varying‐size to fixed‐size conversion efficiency.Ordered nature of crystalline materials exhibits a natural graph where atoms represent nodes and the interactions between them indicate edges. Graph neural network (GNN) is an ideal candidate to obtain a global representation of a material by exchanging information between neighboring nodes and edges while preserving the original graph through several layers. This alleviates the drawback of prior descriptors by automatically learning a material encoding based on data. To this end, seminal works have proposed various graph convolutional architectures to learn chemical and geometric features of molecules and/or crystals. Notably, MPNN,[
] SchNet,[
] CGCNN,[
] MEGNet,[
] ALIGNN,[
] and DeeperGATGNN[
] models have shown excellent performance in predicting diverse material properties. Both the GNN architecture and the hyperparameter setting were found to have a significant impact on the model performance.[
,
] Structure‐based representation domain is further enriched by a plethora of other GNN models[
,
,
,
] and physically intuitive descriptors.[
,
,
]While crystal structure can be directly mapped into a graph, devising a graph from the stoichiometry alone requires intuitive reasoning. Roost is a structure‐agnostic GNN model that represents stoichiometric formula as a dense weighted graph between elements.[
] It has achieved impressive error values when predicting the properties of bulk materials. Unfortunately, the two types of GNN models that exist in materials literature are not interchangeable. That is because structure‐based GNNs expect atomic spacing as an input whereas structure‐agnostic models intentionally disregard the crystal structure. Therefore, current practice is to maintain separate GNN models for respective domains. This technology gap has hindered domain transferability and direct evaluation of the effect of crystal structure on prediction performance on top of what is achievable by stoichiometry‐based models because ML architectures adopted in both processes are simply different.Here, we introduce formula graph, a versatile representation of crystalline materials based on chemical formula that can also take crystal structure into consideration when available. In the structure‐agnostic domain, our key intuition is to obtain integer formula of the material and treat every atom as an individual node in a fully connected graph. The edge weights are estimated during training. Such a process ensures that the stoichiometry is preserved and the edge predictions work toward improving the overall performance. On the other hand, a crystal graph can be generalized as a formula graph containing the unit cell formula. Because geometry information is available in this case, edge attribute is characterized by the actual distance between the two atoms that form the edge. This simple distinction between formula‐only and structure‐based representations permits us to design a more general GNN that can bridge the gap between the two avenues of materials property prediction.We hereby develop a universal ML model, Finder (Formula graph self‐attention network for materials discovery), to predict material properties using formula alone or by accounting the crystal structure, independently. Finder is a message passing GNN that adopts a variant of self‐attention mechanism in the transformer architecture.[
] The attention mechanism has been adapted in several ML architectures for materials property prediction with improved accuracy.[
,
,
,
,
,
,
,
] We show that Finder can outperform some state‐of‐the‐art stoichiometry‐only models such as Roost and compete with crystal graph models such as MEGNet and CGCNN on diverse benchmark databases curated from the Materials Project (MP) repository. Compared to other models revisited in this work, our model displays faster convergence and achieves lower errors at all training set sizes explored.Finally, as a challenging application, we investigate Finder's competence in predicting the frequency‐dependent dielectric constant of materials from the JARVIS DFT repository.[
] Subsequently, we identify promising epsilon‐near‐zero (ENZ) materials with operating frequencies ranging from near infra‐red (NIR) to ultra‐violate (UV) regions. Our results highlight the compounds containing vanadium oxoanions as an exciting class of materials for low loss ENZ candidacy. ENZ materials display exotic properties such as nonlinear electro‐optical phenomena[
,
,
] that facilitate harmonic generation,[
] wave mixing,[
] ultrafast optical switching,[
] and phase‐tunable metasurface design.[
] Despite the limited size of training database, our model can accurately predict the dielectric function of materials without the use of crystal structure, making it a powerful materials discovery platform at any given scale.
Results
An important virtue of our representation is to account every atom in the chemical formula as a separate node as opposed to canonical descriptors that couple element types with their molar fraction. A crystal unit cell contains one or more integer formula units (Z). This is the motivation behind our formula graph representation as it provides means to unify various graph based material descriptors. Formula graph allows neighboring node information to flow through edges toward the parent node via a series of message passing operations. Message passing is a powerful feature extracting method that consolidates some simple mathematical operations applied on the graph with function approximators learned from data. Therefore, node embeddings learned after a stack of message passing layers will be globally aware of the constituent atomic species as well as the data context. In what follows, we use the term “formula graph” to denote integer formula graph or unit cell formula graph (i.e., crystal graph) in general whereas specific terms will be used when needed to differentiate between these two concepts.We first initialize each node of formula graph with an atom‐specific numeric vector, identified as node attribute. Node attributes can be manually derived based on element properties or, they can be extracted as learned element embeddings from a ML model trained on vast amount of materials data. While the former representation is more interpretable, latter ensures that the vectors are properly normalized, compressed, and some chemical and contextual information about the elements is captured. In the structure‐agnostic case, our formula graph is fully connected. Structure‐based formula graph is obtained by connecting the atoms that are located at a distance less than a threshold radius. Depending on crystal structure, this may or may not yield a fully connected graph.Figure
displays the conversion of an example material Cu2Ag2O3 to its formula graphs. Cu2Ag2O3 crystallizes in tetragonal I41/amd structure as shown in Figure 1a. Its primitive unit cell contains two formula units (Z = 2) indicating that crystal graph can be much larger than its integer formula counterpart and therefore selective bonding of atoms is necessary to minimize computational complexity. Figure 1b shows the binary adjacency matrix extracted by applying the threshold radius over the distance matrix of the crystal. The region highlighted in cyan color indicates the atoms that are connected in the crystal graph. The integer formula graph of the example material, as depicted in Figure 1c contains seven atoms whereas the crystal graph (Figure 1d) consists of fourteen atoms connected with strong bonds, reflecting the unit cell of the material.
Figure 1
Formula graph representation. a) Crystal structure of Cu2Ag2O3 in I41/amd space group symmetry. b) Adjacency matrix of the crystal. Shaded cells indicate pairs of atoms with atomic spacing less than 2.5 Å. c) Integer formula graph of the example material. Each node carries an element‐specific node attribute. The edge attributes are predicted by a neural network whose input is an aggregation of the associated node attributes. Note that although a single connection between any two nodes is shown for simplicity, the integer formula graph is directional and the edges are bi‐directional. This means, for example, e
Cu‐Ag is not necessarily equal to e
Ag‐Cu. d) Simplified crystal graph constructed from the adjacency matrix in (b). Here, we use the Gaussian expansion of actual distance between atoms as the edge attribute. This is an undirectional graph.
Formula graph representation. a) Crystal structure of Cu2Ag2O3 in I41/amd space group symmetry. b) Adjacency matrix of the crystal. Shaded cells indicate pairs of atoms with atomic spacing less than 2.5 Å. c) Integer formula graph of the example material. Each node carries an element‐specific node attribute. The edge attributes are predicted by a neural network whose input is an aggregation of the associated node attributes. Note that although a single connection between any two nodes is shown for simplicity, the integer formula graph is directional and the edges are bi‐directional. This means, for example, e
Cu‐Ag is not necessarily equal to e
Ag‐Cu. d) Simplified crystal graph constructed from the adjacency matrix in (b). Here, we use the Gaussian expansion of actual distance between atoms as the edge attribute. This is an undirectional graph.Finder's core architecture is designed with several attention‐integrated message passing layers followed by a global pooling layer to learn context‐specific material descriptor from the formula graph. The computation inside our message passing layer involves three steps ‐ message, aggregate, and update. The node attributes are updated according to the formula
where v
is the node attribute vector of node i after r number of updates, C
is the set of neighbors of node i and f
m, f
a, and f
u are the message, aggregate, and update functions, respectively. During the message step, a message vector between two connected nodes i and j is generated. The first move toward obtaining is to determine the edge attribute vector between nodes i and j. In the structure‐agnostic domain, we estimate on the fly during training time as follows;
where ϕe is a feed‐forward neural network with two hidden layers, N
is the total number of neighbors of node i and, ⊕ and || denote element‐wise summation and concatenation operators, respectively. This edge predictive function ensures that the edge attribute draws information not only from the two atoms that form the edge, but also from all other atoms available in the formula graph. If the crystal structure is considered, is simply calculated as the Gaussian expansion G of the atomic distance d
between nodes i and j.Finder employs a variant of self‐attention mechanism to compute an alignment score vector between every pair of nodes (i, j). Self‐attention has excelled especially in natural language processing (NLP) by allotting a certain attention to different words in a sequence in order to obtain a more robust representation of the same sequence. Because the ordering of atoms is irrelevant in our formula graph, we calculate element‐wise alignment scores that account the importance of other constituent atoms when creating the message . This involves following steps
where F
Q, F
K and F
V denote single‐hidden‐layer neural networks applied on the neighboring node attributes to obtain query, key, and value vectors, respectively. d
K is the dimension of key vector and ⊙ symbols element‐wise product. Likewise, the proposed attention mechanism deviates from the scalar dot product attention in the transformer model.[
] In this work, we use a single attention head. Each entry in is normalized considering all nodes (N) in the formula graph. Note that is not necessarily equal to . We may try to physically interpret vector as, given all atoms in the formula, how much attention should be placed on atom j when updating the attribute vector of atom i. Finally, is obtained by processing all attribute vectors involved in forming a message via another two‐hidden‐layer neural network ϕm and regularizing its output by the alignment scores as follows.We notice that using the element‐wise mean as a function to merge two node attributes (e.g., ) yields better performance than the concatenation function (e.g., ) because the mean is naturally permutation‐invariant, normalized, and it retains the original dimension of the vectors.In the aggregate step, messages around each node are aggregated using another permutation‐invariant function Δagg. We use the element‐wise mean as the aggregate function. Finally, the node attribute is updated by adding the aggregated vector to the current node attribute transformed through a trainable weight matrix W
int to equate the dimensions. Figure
summarizes the operations within a message passing layer and the overall architecture.
Figure 2
The architecture of Finder and message passing layer operations. Finder expects a formula graph as input which is processed through several message passing layers followed by a post‐processing neural network. Each message passing layer is coupled with a global attention pooling layer to enable residual connections to latter layers. Message phase executes the core operations of our architecture. Specifically, by predicting all directional edge attributes , we allow information to cascade from neighboring nodes to the edges. These edge features along with end‐node attributes and contribute to a message vector, . Each entry of is already weighted by a self‐attention mechanism that quantifies the importance of other nodes for the current message vector. Aggregate step summarizes all messages around a given node via a local pooling function. Finally at the update step, the aggregated message vector is added to the initial node attribute completing one cycle of information flow.
The architecture of Finder and message passing layer operations. Finder expects a formula graph as input which is processed through several message passing layers followed by a post‐processing neural network. Each message passing layer is coupled with a global attention pooling layer to enable residual connections to latter layers. Message phase executes the core operations of our architecture. Specifically, by predicting all directional edge attributes , we allow information to cascade from neighboring nodes to the edges. These edge features along with end‐node attributes and contribute to a message vector, . Each entry of is already weighted by a self‐attention mechanism that quantifies the importance of other nodes for the current message vector. Aggregate step summarizes all messages around a given node via a local pooling function. Finally at the update step, the aggregated message vector is added to the initial node attribute completing one cycle of information flow.After P number of message passing layers, we apply an attention‐based pooling layer attn_pool that is invariant to the indexing of atoms to obtain a fixed‐length global representation V
M of the material. Our attn_pool layer is inspired from that of Roost, however, element weighting is not required in our model because formula graph already carries this information.We probe every message passing layer through separate attn_pool layers in order to make residual connections to latter layers of the network. Therefore, our model has P number of global pooling layers that help propagate features extracted at different levels of abstraction. Residual connections enable deeper model training by shortening the effective path of gradient flow.[
] Such connections are recently employed to design a very deep GNN for materials property prediction.[
] However, deeper GNNs in particular suffer from the over‐smoothing issue,[
] requiring additional resolving strategies.[
,
] Therefore, we keep P to a low value (P ⩽ 3), minimizing the risk of over‐smoothing the node attributes in our model.The learned representation V
M is then sent through a standard convolutional layer and a set of fully connected layers with residual connections to produce the final output. The model is trained to minimize L
1 robust loss[
] between the predictions and the targets. We note that robust loss is less sensitive to the outliers and it yields better performance compared to the standard mean absolute error (MAE) or mean squared error (MSE) loss functions. It also enables quantifying the aleatoric uncertainty of predictions (i.e. inherent uncertainty due to the probabilistic variability). This approach evaluates model uncertainty in a single run whereas quantifying the epistemic uncertainty caused by a lack of knowledge about the best model requires several runs and can be too computationally expensive.[
] Nevertheless, we use the MAE as the performance metric to benchmark our model against those from the literature.
Benchmark Datasets
We curate six datasets from the Materials Project (MP) database relating to DFT computed properties ‐ formation energy per atom (E
f), final energy per atom (E
DFT), band gap (E
g), refractive index (n), bulk modulus (K
VRH) and shear modulus (G
VRH). Because composition to property mapping should be uniquely defined, we only record the property value for the lowest energy (E
DTF) polymorph. Therefore, none of these datasets contains other polymorphs or duplicate compositions. Using the same databases for both structure‐based Finder and its structure‐agnostic counterpart facilitates direct comparison of these two models as well. All datasets are split into 70% training, 15% validation, and 15% test sets. The models are trained on the training set, the best model is selected by evaluating on the validation set and finally the performance (MAE) on the test set is reported. The distributions of train, validation and test data of all benchmark databases are shown in Figure S1, Supporting Information. We further evaluate our model on the standard Matbench v0.1 test suite[
] with the details provided later in this paper.
Structure‐Agnostic Model Evaluation
As ML toolbox is constantly enriched with more powerful architectures such as GNNs, the worth of classical ML in materials informatics, or even convolutional neural networks (CNNs) for that matter is sometimes overlooked. Here, we employ a random forest model trained with Magpie composition representations as the baseline model (RF_Magpie).[
]
Magpie features carry a wealth of known information about the elements in a composition. We further implement a deep residual neural network optimized with several standard 1D convolutional layers followed by a series of fully connected layers as the deep learning baseline (ResCNN). This model only takes a vector of element fractions as its input. In this section, we evaluate structure‐agnostic Finder model on six benchmark datasets and compare it with Roost, ResCNN, and RF_Magpie models.Table
summarizes the MAEs of Finder and other structure‐agnostic models on the same test set. It can be observed that Finder outperforms all other models irrespective of material property or dataset size. Roost produces impressive results too. We identify several focal points of our model that elevates its performance above that of Roost. First, our formula graph inherently encodes the stoichiometry whereas the representation of Roost requires propagating the fractional element weights to the message passing layer and the model performance might depend on where and how the fractional weights are injected. Second, our edge predictive function fetches information from all other atoms in the formula graph to estimate edge attributes. This is inspired by the fact that the actual distance between two atoms in a crystal structure depends on other atoms in the crystal. Third, transformer‐based self‐attention component in our model includes trainable function approximators that enable abstracting the node attributes in different subspaces. This allows gaining a more vivid representation of the constituent atomic species before assigning an attention score to each of them.
Table 1
MAEs of structure‐agnostic models in predicting six benchmark properties. The results show the mean and the standard deviation (in parenthesis) of MAE for three repeated runs with randomly initialized models. Same training, validation, and test sets are used in evaluating all models. The number of samples in each set is shown in the last column. The best performing model is indicated in bold. p‐valueFR < 0.05 reflects that the difference between the results of Finder and Roost are statistically significant. The training progress comparison of Finder and Roost is shown in Figure S2, Supporting Information
Property [unit]
Finder
Roost
ResCNN
RF_Magpie
p‐valueFR
Train‐validation‐test
Ef [eV per atom]
0.0858(4)
0.0913(8)
0.1131(11)
0.1434(1)
0.0004
68699‐14721‐14722
EDFT [eV per atom]
0.0896(1)
0.0960(14)
0.1229(12)
0.2058(1)
0.0017
68699‐14721‐14722
Eg [eV]
0.2911(9)
0.3278(53)
0.3207(11)
0.3321(3)
0.0003
68699‐14721‐14722
n
0.1726(40)
0.1866(169)
0.1975(21)
0.3238(70)
0.2345
3920‐840‐840
log(KVRH) [GPa]
0.0835(6)
0.0854(9)
0.0871(19)
0.0934(2)
0.0372
7024‐1506‐1506
log(GVRH) [GPa]
0.1153(14)
0.1226(19)
0.1314(16)
0.1235(2)
0.0059
6699‐1445‐1441
MAEs of structure‐agnostic models in predicting six benchmark properties. The results show the mean and the standard deviation (in parenthesis) of MAE for three repeated runs with randomly initialized models. Same training, validation, and test sets are used in evaluating all models. The number of samples in each set is shown in the last column. The best performing model is indicated in bold. p‐valueFR < 0.05 reflects that the difference between the results of Finder and Roost are statistically significant. The training progress comparison of Finder and Roost is shown in Figure S2, Supporting InformationWe calculate the p‐value between the MAE distributions of Finder and Roost to validate the statistical significance of our results (see Section 4 for more details). P‐value is a measure of probability that an observed difference is merely due to random chance.[
] Assuming that the cut‐off value for the p‐value to reject the null hypothesis is 0.05, we can conclude that the results of our model are statistically significant for all properties except for refractive index (see Table 1).Notably, ResCNN displays good MAE values despite operating on extremely simple material descriptors that only contain element fractions. This is possibly because standard convolutional layers are still remarkable feature extractors, and descriptor to property mapping function is potentially simplified by the use of a simple descriptor. Nevertheless, the performance of ResCNN and RF_Magpie that use fixed‐length descriptors is nearly always lower than that of graph‐based models demonstrating the power of GNNs in representing diverse material compositions.Materials data, especially experimental measurements are often limited by size. This raises concerns on the competence of GNNs or ML models in general to learn from largely undersampled datasets and yet provide fairly accurate out‐of‐database predictions. To evaluate the sample efficiency of Finder, we observe its performance under different training set sizes. Figure
depicts formation energy prediction MAE curves of all structure‐agnostic models. Finder achieves the lowest error scores at all training set levels ranging from 102 to ≈ 7 104. Classical ML models that use explainable features such as RF_Magpie are generally known to work well with small data. Despite inheriting from deep learning regime, our model starts outperforming RF_Magpie as the training set size hits 102. While the MAE curve of Finder always hovers below that of Roost, the two error curves have a similar gradient.
Figure 3
Sample efficiency evaluation of structure‐agnostic models on the MP formation energy dataset. Both axes are in log scale. The shaded region indicates the standard deviation of each model obtained from three repeated runs of randomly initialized models. The predictions in general abide by the power law.[
] By fitting to a power law function as shown in the figure, we obtain similar gradients for the error curves of Finder and Roost (m = −0.21). The absolute gradient is understandably smaller for ResCNN and RF_Magpie.
Sample efficiency evaluation of structure‐agnostic models on the MP formation energy dataset. Both axes are in log scale. The shaded region indicates the standard deviation of each model obtained from three repeated runs of randomly initialized models. The predictions in general abide by the power law.[
] By fitting to a power law function as shown in the figure, we obtain similar gradients for the error curves of Finder and Roost (m = −0.21). The absolute gradient is understandably smaller for ResCNN and RF_Magpie.The E
f parity plot shown in Figure
indicates that structure‐agnostic Finder makes acceptable individual predictions, especially when the target value is negative. This is because computational materials databases tend to report more stable materials that typically have a negative E
f. As expected, the aleatoric uncertainty of relatively inaccurate predictions is higher than that of the samples lying close to perfect prediction line.
Figure 4
Parity plots of a) structure‐agnostic Finder and b) structure‐based variant as obtained for the formation energy test set. The inclusion of spatial distances to the formula graph has significantly reduced both the error and the uncertainty in predictions. The marginal distributions of test data and the predictions are shown on the secondary axes. RMSE ‐ root mean squared error, R
2 score ‐ coefficient of determination. Parity plots for the other properties are included in Figures S3 and S4, Supporting Information. c) Edge attribute matrices of perovskite materials SrTiO3, BaTiO3, KNbO3, and CsPbI3 visualized by probing E
f and n
Finder models. The EAM is not necessarily symmetric because is not always equal to (see Equation (2)).
Parity plots of a) structure‐agnostic Finder and b) structure‐based variant as obtained for the formation energy test set. The inclusion of spatial distances to the formula graph has significantly reduced both the error and the uncertainty in predictions. The marginal distributions of test data and the predictions are shown on the secondary axes. RMSE ‐ root mean squared error, R
2 score ‐ coefficient of determination. Parity plots for the other properties are included in Figures S3 and S4, Supporting Information. c) Edge attribute matrices of perovskite materials SrTiO3, BaTiO3, KNbO3, and CsPbI3 visualized by probing E
f and n
Finder models. The EAM is not necessarily symmetric because is not always equal to (see Equation (2)).One key advancement of our network architecture is the simultaneous prediction of edge attributes from the associated node embeddings during training. We investigate whether these predicted edge values indeed capture some chemical or structural insights which are not explicitly fed to the model. Figure 4c presents edge attribute matrices (EAMs) of four well known perovskites, obtained from the final message passing layer of the trained E
f and n models, respectively. One might think of EAM as formula‐domain relative of crystal‐domain distance matrix. However, at it stands, such comparison is implausible because crystal structure may have several formula units. Nevertheless, from the E
f model, we find that compositionally and structurally similar materials such as SrTiO3 and BaTiO3 have similar EAMs. Intriguingly, compositionally different, yet structurally similar perovskite KNbO3 is found to have a comparable EAM to the ones above. The EAM of halide perovskite CsPbI3 is considerably different from its oxide counterparts. Consistent trend is observed from the refractive index model, yet resulting in different EAMs for the same material. Obviously, EAM entries are determined by the constituent element types and the data context. Although individual edge attributes carry no physical meaning, certain analogies between compositions can still be recovered from the EAM. Recently, we have shown that quantifying materials analogies can accelerate target driven discovery of materials.[
] Such analogies rely on stoichiometry‐derived global material embeddings. Incorporating EAMs that reflect interactions between atoms adds another dimension for materials similarity analysis.
Structure‐Based Model Evaluation
In our formula graph representation, shifting from structure‐agnostic domain to structure‐based domain is as simple as replacing the edge attributes with Gaussian‐expanded atomic spacings and de‐densifying the graph by connecting only the atoms arranged locally within a certain distance. This permits us to use the same message passing architecture, and any improvement in performance over structure‐agnostic results is merely due to the addition of crystal structure, more specifically, the atomic spacings. In this section, we examine the structure‐based variant of Finder with other materials graph networks such as MEGNet and CGCNN. Table
lists the MAEs of all structure‐based models on the same benchmark datasets. Finder outperforms CGCNN in all properties and MEGNet in four out of six properties.
Table 2
MAEs of structure‐based models in predicting six benchmark properties. p‐valueFM < 0.05 means that the difference between the results of Finder and MEGNet are statistically significant
Property [unit]
Finder
MEGNet
CGCNN
p‐valueFM
Train‐validation‐test
Ef [eV per atom]
0.0342(3)
0.0368(12)
0.0425(6)
0.022
68699‐14721‐14722
EDFT [eV per atom]
0.0351(1)
0.0332(12)
0.0890(17)
0.0523
68699‐14721‐14722
Eg [eV]
0.2627(10)
0.2609(7)
0.2948(26)
0.063
68699‐14721‐14722
n
0.1554(33)
0.1654(39)
0.2564(92)
0.0270
3920‐840‐840
log(KVRH) [GPa]
0.0728(3)
0.0732(26)
0.0829(19)
0.8043
7024‐1506‐1506
log(GVRH) [GPa]
0.1028(13)
0.1091(7)
0.1164(24)
0.0018
6699‐1445‐1441
MAEs of structure‐based models in predicting six benchmark properties. p‐valueFM < 0.05 means that the difference between the results of Finder and MEGNet are statistically significantBoth Finder and MEGNet produce errors within quantum chemical accuracy (1 kcal mol−1 or equivalently 43 meV per atom)[
] for energy predictions. However, one should be mindful that the MP database contains many similar samples such as perovskites (that get split into both training and test set), and therefore the test errors reported here may not reflect the actual error of energy predictions for out‐of‐database samples including many hypothetical crystal structures. Figure 4b demonstrates the generalizability of our model for the E
f test set. Notably, incorporating crystal structure lowers E
f and E
DFT prediction errors by over 60%, a significant improvement from structure‐agnostic results. E
f and E
DFT datasets are sufficiently sized and well assorted with 224 space group symmetries to facilitate a more granular learning of structural features.Surprisingly, structure‐based band gap prediction only observes about 10% reduction in error compared to its formula‐only counterpart. This underpins the fact that band gap is a difficult quantity to predict even with modern DFT energy functionals,[
,
] demanding a certain degree of empiricism to counterpoise DFT errors.[
] Noticing that overall, the error of structure‐agnostic version of our model is similar to the error of CGCNN for all properties excluding formation energy is particularly encouraging.The learning efficiency of a ML model is determined by how fast it reaches the convergence. An efficient learning model should ideally produce a lower error value relative to other competing models at any given point of time in the training process. Our model exhibits superior learning efficiency compared to CGCNN and MEGNet as corroborated by formation energy training curves in Figure
. Finder reaches chemical accuracy as quickly as in about 75 min (83 training epochs) whereas MEGNet takes about 500 min (285 epochs) to touch that level. CGCNN settles just around chemical accuracy after about 620 min (800 epochs). Figure 5b shows the cumulative distribution functions (CDFs) of three structure‐based models. 79.1% of predictions made by Finder is within the chemical accuracy, higher than MEGNet (77.4%), and CGCNN (69.8%). The CDF of our model stays above that of other models before all curves start to overlap near an error value of 0.12 eV/atom implying that all models find it equally difficult to predict the E
of the remaining portion of materials.
Figure 5
Learning efficiency evaluation and t‐SNE/PCA visualizations of material embeddings for the E
f test set. a) The training progress curves of Finder, MEGNet, and CGCNN. Shaded region represents the standard deviation of MAEs. b) Cumulative distribution plots illustrating the portion of predictions within a given absolute error. Scatter plots show t‐SNE projections of the latent embeddings of crystals. The embeddings are taken from the final attention pooling layer of c) structure‐agnostic and d) structure‐based versions of our model. The scatters are color‐coded according to the predicted E
f. Identical perplexity (30) and same random initializations are used when creating the t‐SNE plots. e) Scatter plot shows the correlation between the first principal components of structure‐agnostic and structure‐based material embeddings, and f) depicts the same trend between the second principle components. Each scatter in plots (c)–(e) represents a material from the E
f test set.
Learning efficiency evaluation and t‐SNE/PCA visualizations of material embeddings for the E
f test set. a) The training progress curves of Finder, MEGNet, and CGCNN. Shaded region represents the standard deviation of MAEs. b) Cumulative distribution plots illustrating the portion of predictions within a given absolute error. Scatter plots show t‐SNE projections of the latent embeddings of crystals. The embeddings are taken from the final attention pooling layer of c) structure‐agnostic and d) structure‐based versions of our model. The scatters are color‐coded according to the predicted E
f. Identical perplexity (30) and same random initializations are used when creating the t‐SNE plots. e) Scatter plot shows the correlation between the first principal components of structure‐agnostic and structure‐based material embeddings, and f) depicts the same trend between the second principle components. Each scatter in plots (c)–(e) represents a material from the E
f test set.Figure 5c,d refer to the t‐distributed stochastic neighbor embedding (t‐SNE)[
] visualizations of the internal representations of materials in the E
f test set, assembled from structure‐agnostic and structure‐based Finder models, respectively. Interestingly, both latent maps resemble each other quite closely. Because t‐SNE is a more visualization oriented algorithm that involves non‐linear projection, we further perform principal component analysis (PCA) on the same data and observe a similar trend (PCA plots are provided in Figure S5, Supporting Information). We investigate whether the location of the same material on both latent spaces is approximately similar by coupling the corresponding PCA components from the two domains. Figure 5e displays the correlation between the first principal components of the material embeddings obtained with and without crystal structure. Figure 5f shows the same data for the second principal components. We find that structure‐agnostic and structure‐based PCA components are highly correlated with a Pearson's correlation coefficient (r) of 0.96 between the first components and 0.94 between the second components. This means the two Finder models in general produce linearly related material embeddings where one is inferable from the other, although a few exceptions exist.In structure‐based domain, the latent vectors are expected to encode crystal structure details, and the materials in proximity are likely to be compositionally and structurally alike. This is quite compelling because it allows domain transferability at a reasonable fidelity. For example, structure‐agnostic embedding of an undiscovered compound may be placed on the structure‐based latent map and the neighboring materials may be analyzed to understand the possible crystal structure and the properties of the said hypothetical compound. This transferability was not possible between previous crystal graph and stoichiometry‐only models due to contrasting ML model architectures and material descriptors. However, dedicated research is necessary to quantify the accuracy and validity of the specified strategy. Likewise, one has to perform such experiments in the original high‐dimensional space rather than in the t‐SNE or PCA reduced space for a more concrete analysis.We note that training batch size has a considerable impact on the MAEs of all structure‐based models. A batch size of 24 yields optimal results for both Finder and CGCNN in predicting formation energy, whereas MEGNet requires a relatively large batch size of 128 to stabilize training. Small batch sizes introduce noise in the error gradient estimation because model weights are updated more frequently. This might be desirable in some cases to circumvent local minima in the error surface. A batch size of 128 moderately increases formation energy MAEs of Finder and CGCNN to 0.0365 and 0.0455, respectively. Refractive index prediction uses an optimal batch size of 24 for all models and a default batch size of 128 operates well on all other properties.It is worth mentioning that MAEs reported in this work for the structure‐based models are slightly higher than those reported in original works.[
,
] This is likely because we eliminate structural polymorphs from our databases. This intercepts the distinct advantage of seeing the same composition with slightly different properties triggered by the polymorphs, eventually increasing the prediction error by a small degree. Different sizes and random generation of training and test databases may also be a contributing factor. A standard materials database like the Matbench suite[
] that also contains structural polymorphs may provide a wider platform to benchmark our approach against those from literature.
Evaluation on Matbench v0.1 Suite
Matbench serves as a common test set for ML models and includes 13 materials property prediction tasks. Here, we select eight regression tasks where the crystal structure is available. Two recent composition‐only algorithms AtomSets
[
] and CrabNet,[
] and two structure‐based models MEGNet and CGCNN are selected for comparison. We follow the fivefold nested cross validation (NCV) strategy with the same random seed variable recommended in the original study[
] to evaluate our algorithm. Briefly, current NCV approach runs an outer test loop with 20% test and 80% training+validation data. For each outer NCV fold, there is an algorithm‐dependent internal validation process.It is known that non‐graph models such as Automatminer[
] and MODNet[
,
] that perform extensive hyperparameter tuning and feature selection steps at the inner loop of NCV typically outperform GNNs on small datasets. Once the internal optimization is complete, such models are fit on the entire fold so that no validation data is left out. This approach is somewhat different from the common practice in GNNs where the validation data is indeed left out for model selection which actually reduces the amount of data available for training. This is the strategy used in ref. [54] in evaluating MEGNet and CGCNN where the overall split for each fold is 60% training, 20% validation, and 20% test. We use the same splitting ratio when evaluating our structure‐based model to be consistent with previous results. Structure‐agnostic Finder follows 72% training, 8% validation, and 20% test portions criterion used in AtomSets.[
]Table
shows the performance comparison on the Matbench suite. Finder achieves the best MAE scores in five out of eight structure‐agnostic tasks. Especially, Finder performs equally well on both small and large datasets. While CrabNet displays impressive results for small datasets (<104), our model outperforms CrabNet by up to 13% as the database size reaches ≈ 105. AtomSets achieves the best MAE for the perovskite formation energy dataset perhaps due to the use of a simple model and descriptor. Furthermore, Finder settles to better MAEs than alternative composition‐only models such as SkipAtom
[
] and ElemNet.[
]
Finder leads five out of eight tasks in structure‐based domain as well, consistent with our conclusions on the MP datasets with no polymorphs.
Table 3
Performance comparison on the Matbench suite. The best performing models (within 1% tolerance) in each domain are indicated in bold. The composition‐only results for AtomSets are taken from ref. [61] and those for MEGNet and CGCNN are taken from ref. [54]. CrabNet's performance metrics are reported in Matbench leaderboard at https://matbench.materialsproject.org/ (accessed on 1 March 2022). It should be noted that the datasets are used as‐is (e.g. preprocessing such as removing duplicate compositions and outliers have not been applied) for consistent comparison.[
]
jdft2d ‐ exfoliation energy;[
]
phonons ‐ phonon DOS peak frequency;[
]
dielectric ‐ refractive index;[
]
‐ log of shear moduli;[
]
‐ log of bulk moduli;[
]
perovskites ‐ perovskite formation energy;[
]
‐ band gap;[
]
‐ formation energy.[
]
Property [unit]
Structure‐agnostic
Structure‐based
Dataset size
Finder
AtomSets [61]
CrabNet [39]
Finder
MEGNet [13]
CGCNN [12]
jdft2d [meV per atom]
48
52
45.6
46.1
55.9
49.2
636
phonons [cm−1]
46.6
63
55.1
50.7
36.9
57.8
1265
dielectric
0.3204
0.36
0.3234
0.3197
0.478
0.599
4764
log_gvrh [GPa]
0.0996
0.11
0.1014
0.091
0.0914
0.0895
10987
log_kvrh [GPa]
0.0764
0.08
0.0758
0.0693
0.0712
0.0712
10987
perovskites [eV per atom]
0.645
0.082
0.407
0.032
0.0417
0.0452
18928
mp_gap [eV]
0.231
0.26
0.266
0.219
0.235
0.228
106113
mp_e_form [eV per atom]
0.0839
0.094
0.0862
0.0342
0.0327
0.0332
132752
Performance comparison on the Matbench suite. The best performing models (within 1% tolerance) in each domain are indicated in bold. The composition‐only results for AtomSets are taken from ref. [61] and those for MEGNet and CGCNN are taken from ref. [54]. CrabNet's performance metrics are reported in Matbench leaderboard at https://matbench.materialsproject.org/ (accessed on 1 March 2022). It should be noted that the datasets are used as‐is (e.g. preprocessing such as removing duplicate compositions and outliers have not been applied) for consistent comparison.[
]
jdft2d ‐ exfoliation energy;[
]
phonons ‐ phonon DOS peak frequency;[
]
dielectric ‐ refractive index;[
]
‐ log of shear moduli;[
]
‐ log of bulk moduli;[
]
perovskites ‐ perovskite formation energy;[
]
‐ band gap;[
]
‐ formation energy.[
]
Ablation Experiments
We perform ablation experiments to understand the causality of different components of our model and optimize its architecture. Table
shows the results of ablation study on the E
f database. While the number of message passing layers does not have a noticeable impact on the performance of structure‐agnostic model, structure‐based model observes a significant performance gain with multiple message passing layers compared to a solitary layer. Further increase of the number of message passing layers comes at the cost of degraded accuracy on small databases. As such, four message passing layers in composition‐only domain increases the error in bulk modulus prediction by 5.3% relative to the default architecture (shown in Table S1, Supporting Information).
Table 4
E
f MAEs of different model architectures considered in ablation study. The reference model is indicated in bold. Other models differ from the default architecture of Finder as follows. Model 1 ‐ uses one‐hot node embeddings; Model 2 ‐ post‐processing network removed; Model 3 ‐ all residual connections removed; Model 4 ‐ only the residual connections coming from message passing layers removed; Model 5 ‐ self‐attention component in message passing layer replaced with a soft‐attention mechanism. Model 6 ‐ self‐attention component removed from message passing layer; Model 7 ‐ soft‐attention component removed from global layer
# Message passing layers
Model 1
Model 2
Model 3
Model 4
Model 5
Model 6
Model 7
1
2
3
Structure‐agnostic
0.0861
0.0858
0.0867
0.0876
0.1056
0.0898
0.0846
0.0871
0.0931
0.0873
Structure‐based
0.0393
0.0342
0.0342
0.0398
0.0403
0.0373
0.036
0.0366
0.0345
0.0386
E
f MAEs of different model architectures considered in ablation study. The reference model is indicated in bold. Other models differ from the default architecture of Finder as follows. Model 1 ‐ uses one‐hot node embeddings; Model 2 ‐ post‐processing network removed; Model 3 ‐ all residual connections removed; Model 4 ‐ only the residual connections coming from message passing layers removed; Model 5 ‐ self‐attention component in message passing layer replaced with a soft‐attention mechanism. Model 6 ‐ self‐attention component removed from message passing layer; Model 7 ‐ soft‐attention component removed from global layerElement embeddings of our model are transferred from ref. [71]. Alternatively, we investigate one‐hot element embeddings and observe a substantial dip in accuracy, particularly when the crystal structure is considered (see Model 1 in Table 4). This indicates node attributes that capture prior knowledge still help in navigating to a lower minimum in the error surface although this accuracy gap is expected to narrow down as the database size grows. Post‐processing neural network is an essential component of our model as an exclusive message passing architecture inflates the error by up to 23% (Model 2). Including one standard convolutional layer in the post‐processing network yields optimal results while adding more such layers impairs the performance. The number of rear dense layers and their widths are calibrated heuristically.In Model 3, we remove all residual connections and observe a substantial error increase in both structure‐based and composition‐only models. We then remove only the residual connections coming from message passing layers to post‐processing network and keep the rest of the residual connections intact (Model 4). Somewhat unexpectedly, this reduces the error in structure‐agnostic model while increasing the error in structure‐based model. This means that while the reference architecture performs well overall, it is possible to achieve lower errors with domain‐specific hyperparameter tuning. Note that both Models 3 and 4 reduce the number of layers from P to one, applied right after the last message passing layer.In order to verify that it is indeed the proposed formula graph representation that leads to improved performance, we replace the self‐attention block in our message passing layers with a Roost‐like soft‐attention mechanism in Model 5. This makes material representation the most important distinction between Model 5 and Roost as the rest of the functions such as local pooling and node update are mostly standard message passing operations. The MAE of Model 5 is 0.0871 whereas that of Roost is 0.0913, indicating that formula graph is a more complete representation of compositions. Slightly increased MAE in Model 5 compared to default Finder model implies that the proposed self‐attention variant contributes to the performance. We observe this in Model 6 by removing the self‐attention component from all message passing layers. Structure‐agnostic MAE is significantly increased to 0.0931. We note that the self‐attention part is vital in composition‐only domain, while it has only a marginal effect in our structure‐based results. Finally, we capture that it is the soft‐attention component in global layer that leads to improved performance compared to MEGNet and CGCNN. This can be observed in Model 7 as a global sum pooling layer without the attention section increases the structure‐based error by over 12%.
Epsilon‐Near‐Zero Materials Discovery as an Application of Finder
Undoubtedly, the intimacy between chemical structures and GNNs produces excellent results in predicting various material properties. However, GNNs may still find it challenging to predict materials properties in the form of a complex function. This renders a multi‐output regression problem. Recently, GNNs have been successfully applied to predict the absorption spectra of three‐cation metal oxides[
] and phonon density of states.[
] In a similar vein, a multi‐class classification GNN is implemented to predict protein functions.[
] Here, we employ structure‐agnostic Finder to predict frequency‐dependent dielectric constant of inorganic compounds and eventually locate ENZ candidates. ENZ materials possess a vanishingly small permittivity at a certain frequency that induces exceptional properties, some of which are still being experimented following theoretical predictions.[
] While structural ENZ materials such as metamaterials have been extensively studied, they achieve ENZ behavior only as an effective property occurring on wavelengths larger than the size of structural unit, not to mention the increased fabrication cost and complexity. Hence, there is a growing interest in natural materials that exhibit ENZ phenomena, especially with low dielectric loss.We extract a database of real (εre) and imaginary (εim) dielectric functions from the JARVIS repository. These are calculated using the OptB88vdW (OPT) DFT functional and shown to agree well with experimental data.[
] Duplicate compositions have been removed from this database by selecting the most stable polymorph, resulting in a total of 12 353 materials. While the dielectric constant can be anisotropic, the quantities for different directions such as xx, yy, zz, usually follow a similar trend and show equivalent resonance frequencies. Therefore, we only focus on the xx direction. εre and εim databases are divided into 80% training, 10% validation, and 10% test sets, separately. We investigate the dielectric function prediction performance of Finder and ResCNN on the test set. Both models achieve respectable performance metrics with Finder outperforming ResCNN as expected (see Table
).
Table 5
Performance metrics of Finder and ResCNN on frequency‐dependent dielectric function prediction. Both models are trained with the L
1 loss. MAD stands for mean absolute deviation
εre
εim
MAE
RMSE
R2 score
MAD:MAE
MAE
RMSE
R2 score
MAD:MAE
Finder
0.69
3.05
0.81
4.1
0.72
3.18
0.81
5.2
ResCNN
0.76
3.39
0.82
3.7
0.77
3.45
0.83
4.8
Performance metrics of Finder and ResCNN on frequency‐dependent dielectric function prediction. Both models are trained with the L
1 loss. MAD stands for mean absolute deviationFigure
depicts representative predictions for two selected materials from the test set. Note that Finder successfully captures several dielectric resonances of compositionally diverse materials (e.g., Ba2LiCu(CO5)2 as shown in Figure 6b). What is more intriguing is that the predictions are based on a relatively small training database (9882 samples), and no structural information is incorporated in the process. Clearly, our model learns the complex mapping from composition to dielectric function, bypassing the need for crystal structure and the computational complexity of DFT which is further intensified by accurate functionals such as OPT.
Figure 6
Epsilon‐near‐zero materials discovery from the MP database. Frequency‐dependent εre and εim functions of two representative materials a) K2TeBr6 and b) Ba4LiCu(CO5)2 as predicted by Finder and ResCNN. The x‐axis represents the frequency in the units of photon energy. c) Scatter plot shows the imaginary permittivity at the crossover energy point inferred from the predicted dielectric functions. d) Network plot indicates element pairs that are present together in at least five predicted ENZ compositions. Each edge represents a pair of elements and the edge width is proportional to the number of element‐pair appearances. e,f) Show the predicted εre and εim of promising ENZ materials CaV2P2O9 and NaCr2FeO8 along with their crystal structure.
Epsilon‐near‐zero materials discovery from the MP database. Frequency‐dependent εre and εim functions of two representative materials a) K2TeBr6 and b) Ba4LiCu(CO5)2 as predicted by Finder and ResCNN. The x‐axis represents the frequency in the units of photon energy. c) Scatter plot shows the imaginary permittivity at the crossover energy point inferred from the predicted dielectric functions. d) Network plot indicates element pairs that are present together in at least five predicted ENZ compositions. Each edge represents a pair of elements and the edge width is proportional to the number of element‐pair appearances. e,f) Show the predicted εre and εim of promising ENZ materials CaV2P2O9 and NaCr2FeO8 along with their crystal structure.The frequency range where |εre| < 1 is identified as the ENZ region. This is usually a narrow band positioned around the crossover energy point, the frequency at which the real part of permittivity crosses zero, ωco. While all metals achieve the ENZ condition at the bulk plasma resonance typically in the UV band, present studies are focused on investigating ENZ materials in the NIR range that is close to the telecommunications wavelengths (1550 nm), as well as visible range that is directly accessible by optical experiments. Although the ENZ condition is only reliant on the real permittivity, large imaginary permittivity values, that is high loss, severely suppress the ENZ effect. In this work, we predict εre and εim functions of materials in the MP database and report low loss ENZ candidates (εim < 2)[
] in NIR to UV band (0.5 − 12.4 eV) that are potentially stable (energy above convex hull, E
hull < 25 meV).Figure 6c depicts the inferred εim vs ωco dispersion. Evidently, the number of compositions with εim < 2 is extremely small compared to the composition space, making low loss ENZ materials discovery further challenging. We found 353 compositions from the MP database that satisfy above conditions. The candidate list covers 80 periodic table elements. Interestingly, the predicted ENZ compositions include alkaline earth metal vanadates such as Mg2V2O7 and Ca2V2O7 that relate to recently identified low loss ENZ materials, namely, CaVO3 and SrVO3.[
] Strong electron‐electron interactions present in such transition metal oxides can be capitalized to achieve ENZ condition in the visible spectrum. Unexpectedly, we found that 49 predicted ENZ materials contain vanadium and oxygen together, the most for any pair of periodic table elements. Other commonly appearing element pairs include Ca–O, Na−O, and Fe−F (see Figure 6d). Linking these observations with already characterized ENZ correlated metals, we foresee vanadate compounds as an exciting class of materials for ENZ candidacy. Materials that feature a zero‐permittivity wavelength in the NIR band are of great importance in telecommunications.[
] We identify two potentially stable new compositions, CaV2P2O9 and NaCr2FeO8 that are predicted to exhibit low‐loss ENZ properties in NIR region. Real and imaginary parts of the dielectric function as predicted by our model are shown in Figure 6e,f for the respective materials. E
hull and εim of both materials are well within the tolerable margins specified above. The full list of predicted ENZ compositions is provided as Supporting Information. We believe the search for functional ENZ materials has an expansive future especially considering the demonstrated potential of correlated metals.
Discussion
Both structure‐based and structure‐agnostic branches of materials property prediction come with inherent pros and cons. Current GNNs are limited to only one of the two branches owing to the little overlap between the material representations adopted in these two domains. In our unified approach, we aim not only to achieve state‐of‐the‐art materials property prediction performance in both domains but also to enable transferability between composition and crystal structure embeddings that may lead to further research opportunities such as crystal structure prediction and structure prototype selection for DFT.In support of this objective, we propose formula graph, a systematic generalization of composition‐only and crystal structure dependent material representations. Our intuition is to denote individual atoms in a chemical formula as nodes in a graph. The only decisive fragment between composition‐only and structure‐based formula graphs is the edge attribute which is readily available as atomic spacings for the latter and predicted during training for the former. We construct a self‐attention driven message passing GNN and demonstrate that our model outperforms some previously reported models irrespective of the representation domain in predicting various material properties.Moreover, our model displays better sample efficiency and learning efficiency compared to other models. This makes it a frontrunner for small data learning tasks that are abundant in materials science. Our deep learning baseline model produces respectable results in many tasks. We reckon standard CNNs still have some scope left in materials informatics, especially owing to the use of simple descriptors and easier implementation.Finally, we expose Finder to a challenging task of predicting the frequency‐dependent dielectric constant of inorganic compounds. Subsequently, we identify promising low loss ENZ materials that are of technological importance especially in optics and antenna engineering, demonstrating a real‐world materials discovery application.Our framework is not restricted to stoichiometric compounds. It can represent alloys, non‐stoichiometric compounds or doped substances by converting the fractional element contributions to integer values. However, too small doping ratios multiply the size of formula graph and increase time and memory complexities, similar to how generating too large a supercell can make DFT calculations intractable. The same is true for alloys. As such, transforming a fractional formula of A0.33B0.67 to AB2 as its integer form makes more sense than a naïve conversion to A33B67.We view Finder as a potential distance matrix predictor that may help discover new crystal structures. This may be achieved, for example, by minimizing the error between predicted edge attribute and the actual distance between corresponding atoms, instead of predicting a global material property. Alternatively, one might attempt to find a general mapping from EAM to distance matrix given Z. However, the existence of such a function is not a known priori. Yet another potential avenue of improvement is transfer learning from structure‐based Finder to a structure‐agnostic task as opposed to same domain transfer learning common in materials informatics. Nonetheless, these are recognized as future research directions. We believe domain invariant frameworks such as Finder that incorporate methodological successes from other disciplines including NLP and computer vision inaugurate a truly interdisciplinary avenue of research in materials science.
Experimental Section
The architecture and hyperparameters of Finder were tuned to an adequate level by heuristically selecting a pool of hyperparameters that allowed sufficient degrees of freedom yet remain computationally tolerable. The authors eventually settled to an architecture composed of two message passing layers followed by a post processing residual neural network with one convolutional‐1D layer and four dense layers having 512, 1024, 1024, and 256 units, respectively. The element embeddings were adopted from ref. [71] each having a dimension of 200. It was realized that keeping the same dimension through the message passing layer improves the performance, although this was probed as a user‐specified hyperparameter equivalent to the output shape of the preprocessing weight matrix W
.Function approximator networks ϕe and ϕm contained two hidden layers carrying 128 and 64 units. F
Q, F
K, and F
V networks were all composed of individual weight matrices with a tunable output dimension defaulting to 200. Our global attention pooling component had pre‐ and post‐ processing layers each having 256‐units. L
2 regularization of 10−6 was applied to all weights. Tensor clipping was employed to evade exploding gradient problem[
] which is one of the pitfalls of L
1 robust loss, despite its high empirical performance.Rectified linear units (ReLU) was used as hidden layer activation function that was changed to linear activation for the output layer. We used Adam optimiser with an initial learning rate of 3 × 10−4 which was reduced by a factor of 0.999 at every iteration to allow finer convergence. Structure‐agnostic models usually converged within 500 epochs with a batch size of 128 while structure‐based models required about 1000 epochs to converge. Per‐epoch timing for the E
database was about 35 s for the former and 55 s for the latter on an RTX 2080 Ti graphics processing unit. Finder is implemented in Keras
[
] on top of Spektral graph deep learning library.[
]A cutoff distance of 4 Å as recommended in MEGNet
[
] was used to derive the crystal graph. Structure‐based edge attribute was of length 20 stemmed by the Gaussian expansion of spatial distance that took the basis exp(− (r − r
0)2/σ2) centered at 20 equidistant points between 0 to 5 where σ = 0.5.[
]
Roost, MEGNet, and CGCNN models were trained with recommended parameters from their respective repositories.[
,
,
] A small batch size of 24 was investigated in addition to the default value of 128 for all structure‐based models. The models were trained for 1200 epochs or until stopped by an early stopping criterion. It was noted that because MEGNet discards crystal graphs with isolated atoms, its training set size in this work is slightly smaller (67814) relative to the full training set size (68699). However, the authors stick to the default cut‐off distance of 4 Å because increasing this value to 6 Å downgrades the performance as found in the original work[
] while multiplying the computational complexity by almost three times. According to the power law, such a marginal difference in training set sizes at that scale should have a negligible effect on the performance.RF_Magpie model adopted the implementation from scikit‐learn
[
] with default parameters and the Magpie features were acquired from Matminer package.[
]
ResCNN model was optimized to have four convolutional‐1D layers containing 64, 128, 256, and 256 filters, respectively. A global max pooling layer was then placed to reduce the dimensionality and introduced local translation invariance. A set of postprocessing dense layers similar to that of Finder was appended with skip connections to complete the deep learning baseline architecture. The default parameters of Finder and ResCNN are given in Tables S1 and S2, Supporting Information, respectively.
Statistical Analysis
The training, validation, and test sets of the MP datasets used in this work were kept intact for all algorithms to allow fair comparison. The target property values were z‐score normalized based on training data as follows
is the normalized value of the original quantity x. mu
tr and σtr represent the mean and standard deviation of training data, respectively. The predictions were denormalized accordingly. The performance metrics MAE, RMSE, R
2 score and r were obtained using scikit‐learn python package.A t‐test was performed to calculate the two‐tailed p‐value between two observed means and . The t‐statistic (t) is calculated as follows
n
1 and n
2 are the number of samples in each group. In this experiment, n
1 = n
2 = 3. s
1 and s
2 are the two standard deviations. denote the standard error. The significance level (α) was set to 0.05. The p‐value was calculated in python using scipy.stats package. The dielectric constant data was preprocessed to have a fixed dimension of 3000 points per sample. This corresponded to 3000 equally spaced points of photon energy ranging from 0 to 30 eV.
Conflict of Interest
The authors declare no conflict of interest.
Author Contributions
A.I. designed the machine learning framework, performed data analysis, and wrote the paper. Y.H. directed and coordinated the research. All authors discussed the results and reviewed the manuscript.
Code Availability
The codes required to reproduce the results of this study are available at https://github.com/ihalage/Finder.Supporting InformationClick here for additional data file.
Authors: Lei Zhang; Yuanjun Zhou; Lu Guo; Weiwei Zhao; Anna Barnes; Hai-Tian Zhang; Craig Eaton; Yuanxia Zheng; Matthew Brahlek; Hamna F Haneef; Nikolas J Podraza; Moses H W Chan; Venkatraman Gopalan; Karin M Rabe; Roman Engel-Herbert Journal: Nat Mater Date: 2015-12-14 Impact factor: 43.841
Authors: Vladimir Gligorijević; P Douglas Renfrew; Tomasz Kosciolek; Julia Koehler Leman; Daniel Berenberg; Tommi Vatanen; Chris Chandler; Bryn C Taylor; Ian M Fisk; Hera Vlamakis; Ramnik J Xavier; Rob Knight; Kyunghyun Cho; Richard Bonneau Journal: Nat Commun Date: 2021-05-26 Impact factor: 14.919
Authors: Maarten de Jong; Wei Chen; Thomas Angsten; Anubhav Jain; Randy Notestine; Anthony Gamst; Marcel Sluiter; Chaitanya Krishna Ande; Sybrand van der Zwaag; Jose J Plata; Cormac Toher; Stefano Curtarolo; Gerbrand Ceder; Kristin A Persson; Mark Asta Journal: Sci Data Date: 2015-03-17 Impact factor: 6.444
Authors: Jonathan Schmidt; Love Pettersson; Claudio Verdozzi; Silvana Botti; Miguel A L Marques Journal: Sci Adv Date: 2021-12-03 Impact factor: 14.136