| Literature DB >> 32118175 |
Yong Zhao1, Yuxin Cui1, Zheng Xiong1, Jing Jin1, Zhonghao Liu1, Rongzhi Dong2, Jianjun Hu1,2.
Abstract
Structural information of materials such as the crystal systems and space groups are highly useful for analyzing their physical properties. However, the enormous composition space of materials makes experimental X-ray diffraction (XRD) or first-principle-based structure determination methods infeasible for large-scale material screening in the composition space. Herein, we propose and evaluate machine-learning algorithms for determining the structure type of materials, given only their compositions. We couple random forest (RF) and multiple layer perceptron (MLP) neural network models with three types of features: Magpie, atom vector, and one-hot encoding (atom frequency) for the crystal system and space group prediction of materials. Four types of models for predicting crystal systems and space groups are proposed, trained, and evaluated including one-versus-all binary classifiers, multiclass classifiers, polymorphism predictors, and multilabel classifiers. The synthetic minority over-sampling technique (SMOTE) is conducted to mitigate the effects of imbalanced data sets. Our results demonstrate that RF with Magpie features generally outperforms other algorithms for binary and multiclass prediction of crystal systems and space groups, while MLP with atom frequency features is the best one for structural polymorphism prediction. For multilabel prediction, MLP with atom frequency and binary relevance with Magpie models are the best for predicting crystal systems and space groups, respectively. Our analysis of the related descriptors identifies a few key contributing features for structural-type prediction such as electronegativity, covalent radius, and Mendeleev number. Our work thus paves a way for fast composition-based structural screening of inorganic materials via predicted material structural properties.Entities:
Year: 2020 PMID: 32118175 PMCID: PMC7045551 DOI: 10.1021/acsomega.9b04012
Source DB: PubMed Journal: ACS Omega ISSN: 2470-1343
Figure 3Distribution of crystal systems in the dataset.
Performance of RF for Predicting Crystal Systems
| crystal system | Magpie (F1-score/MCC) | atom vector (F1-score/MCC) | atom frequency (F1-score/MCC) |
|---|---|---|---|
| Cubic | 0.753/0.538 | 0.775/0.457 | |
| Hexagonal | 0.647/0.374 | 0.704/0.433 | |
| Monoclinic | 0.670/0.360 | 0.730/0.467 | |
| orthorhombic | 0.611/0.297 | 0.705/0.425 | |
| Tetragonal | 0.654/0.388 | 0.723/0.477 | |
| Triclinic | 0.686/0.412 | 0.644/0.337 | |
| Trigonal | 0.616/0.320 | 0.703/0.436 |
Performance of MLP for Predicting Crystal Systems
| crystal system | Magpie (F1-score/MCC) | atom vector (F1-score/MCC) | atom frequency (F1-score/MCC) |
|---|---|---|---|
| Cubic | 0.815/0.632 | 0.805/0.612 | |
| Hexagonal | 0.774/0.553 | 0.741/0.486 | |
| Monoclinic | 0.699/0.399 | 0.698/0.396 | |
| orthorhombic | 0.692/0.385 | 0.689/0.380 | |
| Tetragonal | 0.767/0.536 | 0.743/0.488 | |
| Triclinic | 0.663/0.331 | 0.676/0.353 | |
| Trigonal | 0.701/0.409 | 0.705/0.412 |
Performance of RF for Predicting Crystal Systems by Over-Sampling
| crystal system | Magpie (F1-score/MCC) | atom vector (F1-score/MCC) | atom frequency(F1-score/MCC) |
|---|---|---|---|
| Cubic | 0.846/0.693 | 0.779/0.557 | 0.777/0.556 |
| Hexagonal | 0.808/0.622 | 0.714/0.428 | 0.674/0.361 |
| Monoclinic | 0.750/0.500 | 0.707/0.418 | 0.725/0.450 |
| orthorhombic | 0.739/0.485 | 0.667/0.336 | 0.698/0.405 |
| Tetragonal | 0.803/0.613 | 0.720/0.441 | 0.695/0.409 |
| Triclinic | 0.714/0.429 | 0.690/0.383 | 0.707/0.419 |
| Trigonal | 0.742/0.494 | 0.680/0.360 | 0.693/0.402 |
Performance of MLP for Predicting Crystal Systems by Over-Sampling
| crystal system | Magpie (F1-score/MCC) | atom vector (F1-score/MCC) | atom frequency (F1-score/MCC) |
|---|---|---|---|
| Cubic | 0.806/0.613 | 0.788/0.575 | 0.820/0.640 |
| Hexagonal | 0.752/0.507 | 0.717/0.435 | 0.759/0.518 |
| Monoclinic | 0.701/0.410 | 0.696/0.393 | 0.731/0.463 |
| orthorhombic | 0.682/0.365 | 0.678/0.358 | 0.727/0.454 |
| Tetragonal | 0.749/0.501 | 0.725/0.450 | 0.757/0.517 |
| Triclinic | 0.677/0.368 | 0.682/0.366 | 0.705/0.410 |
| Trigonal | 0.683/0.372 | 0.686/0.372 | 0.722/0.445 |
Figure 1Ranking of Magpie Features for crystal system prediction.
Performance for Multiclass Prediction of the Crystal System
| Magpie (F1-score/MCC) | atom vector (F1-score/MCC) | atom frequency (F1-score/MCC) | |
|---|---|---|---|
| RF | 0.511/0.445 | 0.575/0.511 | |
| MLP | 0.559/0.486 | 0.559/0.489 | 0.615/0.551 |
| RF-oversample | 0.644/0.585 | 0.524/0.448 | 0.562/0.494 |
| MLP-oversample | 0.509/0.424 | 0.541/0.469 | 0.598/0.533 |
Performance for Crystal System Polymorphism Prediction
| Magpie (F1-score/MCC) | atom vector (F1-score/MCC) | atom frequency (F1-score/MCC) | |
|---|---|---|---|
| RF | 0.652/0.350 | 0.610/0.293 | 0.668/0.354 |
| MLP | 0.646/0.308 | 0.642/0.289 | |
| RF-oversample | 0.670/0.343 | 0.636/0.272 | 0.672/0.348 |
| MLP-oversample | 0.646/0.304 | 0.633/0.267 | 0.699/0.399 |
Performance for Multilabel Crystal System Predictiona
| AF + MLP | Magpie + BR | Magpie + CC | Magpie + LP | |
|---|---|---|---|---|
| exact MR | 0.579 | 0.469 | 0.534 | |
| accuracy | 0.631 | 0.504 | 0.568 | |
| precision | 0.660 | 0.531 | 0.601 | |
| recall | 0.516 | 0.574 | 0.649 | |
| F1-score | 0.650 | 0.516 | 0.580 |
AF = atom frequency, BR = BinaryRelevance, CC = ClassifierChain, LP = LabelPowerset.
Average Performance for Predicting the Space Group Using RF and MLP
| Magpie (F1-score/MCC) | atom vector (F1-score/MCC) | atom frequency (F1-score/MCC) | |
|---|---|---|---|
| RF | 0.765/ | 0.649/0.365 | 0.722/0.470 |
| MLP | 0.751/0.507 | 0.729/0.461 | |
| RF-oversample | 0.787/0.579 | 0.726/0.454 | 0.725/0.459 |
| MLP-oversample | 0.743/0.493 | 0.718/0.437 | 0.753/0.508 |
Performance for Multi-Class Prediction of Space Groups
| Magpie (F1-score/MCC) | atom vector (F1-score/MCC) | atom frequency (F1-score/MCC) | |
|---|---|---|---|
| RF | 0.519/0.501 | 0.576/0.556 | |
| MLP | 0.571/0.540 | 0.540/0.517 | 0.616/0.591 |
| RF-oversample | 0.643/0.619 | 0.531/0.505 | 0.566/0.543 |
| MLP-oversample | 0.557/0.528 | 0.525/0.502 | 0.597/0.573 |
Performance for Space Group Polymorphism Prediction
| Magpie (F1-score/MCC) | atom vector (F1-score/MCC) | atom frequency (F1-score/MCC) | |
|---|---|---|---|
| RF | 0.610/0.273 | 0.540/0.147 | 0.614/0.253 |
| MLP | 0.582/0.205 | 0.591/0.190 | |
| RF-oversample | 0.651/0.305 | 0.607/0.218 | 0.635/0.275 |
| MLP-oversample | 0.626/0.267 | 0.597/0.198 | 0.663/0.326 |
Figure 2Magpie feature importance ranking for space group polymorphism prediction.
Performance for Multilabel Space Group Prediction Using MLPa
| AF + MLP | Magpie + BR | Magpie + CC | Magpie + LP | |
|---|---|---|---|---|
| exact MR | 0.569 | 0.446 | 0.472 | |
| accuracy | 0.612 | 0.467 | 0.491 | |
| precision | 0.633 | 0.485 | 0.510 | |
| recall | 0.634 | 0.472 | 0.495 | |
| F1-score | 0.626 | 0.474 | 0.498 |
AF = atom frequency, BR = BinaryRelevance, CC = ClassifierChain, LP = LabelPowerset.
Distribution of Materials with Respect to the No. of Elements
| no. of elements | no. of compounds |
|---|---|
| 2 | 14,026 |
| 3 | 41,751 |
| 4 | 22,798 |
| 5 | 6585 |
| 6 | 874 |
| 7 | 67 |
| 8 | 5 |
Figure 4Distribution of space groups in the dataset.