| Literature DB >> 30443367 |
Duong-Nguyen Nguyen1, Tien-Lam Pham1,2, Viet-Cuong Nguyen3, Tuan-Dung Ho1, Truyen Tran4, Keisuke Takahashi5, Hieu-Chi Dam1,5,6.
Abstract
A method has been developed to measure the similarity between materials, focusing on specific physical properties. The information obtained can be utilized to understand the underlying mechanisms and support the prediction of the physical properties of materials. The method consists of three steps: variable evaluation based on nonlinear regression, regression-based clustering, and similarity measurement with a committee machine constructed from the clustering results. Three data sets of well characterized crystalline materials represented by critical atomic predicting variables are used as test beds. Herein, the focus is on the formation energy, lattice parameter and Curie temperature of the examined materials. Based on the information obtained on the similarities between the materials, a hierarchical clustering technique is applied to learn the cluster structures of the materials that facilitate interpretation of the mechanism, and an improvement in the regression models is introduced to predict the physical properties of the materials. The experiments show that rational and meaningful group structures can be obtained and that the prediction accuracy of the materials' physical properties can be significantly increased, confirming the rationality of the proposed similarity measure.Entities:
Keywords: data mining; first-principles calculations; machine learning; materials informatics; physical properties of materials; similarity
Year: 2018 PMID: 30443367 PMCID: PMC6211525 DOI: 10.1107/S2052252518013519
Source DB: PubMed Journal: IUCrJ ISSN: 2052-2525 Impact factor: 4.769
Figure 1The data flow in our proposed method to measure similarity between materials, focusing on specific target physical properties and using the MapReduce representation language. The process consists of two subprocesses: (a) an exhaustive test for all predicting variable combinations, from which we can select the best combinations yielding the most likely regression models, and (b) a utilization of the regression-based clustering technique to search for partition models that can break down the data set into a set of separate smaller data sets, so that each target variable can be predicted by a different linear model. We can obtain a prediction model with higher predictive accuracy by taking an ensemble average of the models yielded in (a). We use the obtained partitioning models in (b) to construct a committee machine that votes for the similarity between materials.
The designed predicting variables describing the intrinsic properties of the constituent elements and the structural properties of the materials in the E form prediction problem
The A and B elements comprise the AB materials with a binary cubic structure identical to that of the symmetry group.
| Category | Predicting variables |
|---|---|
| Atomic properties of |
|
| Atomic properties of |
|
| Structural information |
|
Figure 2The numbers of predicting variable combinations that yield corresponding prediction models with R 2 larger than 0.90 for different problems: (a) the prediction of E form for the AB materials, (b) the prediction of L const for the b.c.c. AB materials and (c) the prediction of magnetic phase-transition temperature T C for the rare earth–transition metal alloys.
Figure 3(a) The affinity matrix between the AB materials yielded by the regression-based committee voting machine. (b) Enlarged views of highly similar elements in the G1 and G2 regions of the affinity matrix shown with dashed lines in panel (a). (c) Confusion matrices measuring linear similarities among materials in G1 and G2, as well as dissimilarities between models generated for materials in different groups.
PA values for the E form, L const and T C prediction problems
The results obtained with and without using the similarity measure (SM) information are shown for comparison.
|
|
|
| |||||
|---|---|---|---|---|---|---|---|
| Prediction method | Without SM | With SM | Without SM | With SM | Without SM | With SM | |
| GKR with all variables |
| 0.929 | 0.954 | 0.982 | 0.986 | 0.893 | 0.929 |
| MAE | 0.189 | 0.154 | 0.022 | 0.018 | 78.80 | 58.09 | |
| GKR with the best variable combination |
| 0.967 | 0.978 | 0.989 | 0.992 | 0.968 | 0.988 |
| MAE | 0.122 | 0.110 | 0.014 | 0.013 | 42.74 | 25.76 | |
| Ensemble of GKRs with top selected best variable combinations |
| 0.972 | 0.982 | 0.991 | 0.992 | 0.974 | 0.991 |
| MAE | 0.117 | 0.101 | 0.013 | 0.011 | 37.87 | 24.16 | |
Figure 4(From left to right) Observed and predicted target variables taking ensemble averaging of 139 (E form problem), 57 (L const problem) and 59 (T C problem) best prediction models including similarity measure information. Ensemble models yield PAs with R 2 scores of 0.982 (MAE: 0.101 eV) for predicting the E form problem, 0.992 (MAE: 0.011 Å) for predicting the L const problem and 0.991 (MAE: 24.16 K) for predicting the T C problem.
The designed predicting variables describing the intrinsic properties of the constituent elements and the structural properties of the materials in the lattice parameter prediction problem
A and B are elements of the binary AB b.c.c. materials.
| Category | Predicting variables |
|---|---|
| Atomic properties of metals |
|
| Atomic properties of metals |
|
| Structural and additional information | ρ, |
Figure 5(a) The similarity matrix between materials for the L const prediction problem yielded by the regression-based committee voting machine. This similarity matrix can be approximated as three disjoint groups of materials denoted G1, G2 and G3. (b) Confusion matrices measuring linear similarities among materials in each group, as well as dissimilarities between models generated for materials in different groups.
The designed predicting variables describing the intrinsic properties of the constituent elements and the structural properties in the T C value prediction for the rare earth–transition metal alloys problem
| Category | Predicting variables |
|---|---|
| Atomic properties of transition metals |
|
| Atomic properties of rare earth metals |
|
| Structural information |
|
Figure 6(a) The similarity matrix between the rare earth–transition metal alloys yielded by the regression-based committee voting machine. (b) Enlarged views of highly similar elements in the G1, G2 and G3 regions of the similarity matrix shown with dashed lines in panel (a). (c) Confusion matrices measuring linear similarities among alloys in each group as well as dissimilarities between models generated for alloys in different groups.