| Literature DB >> 29242566 |
Abstract
The identification of models capable of rapidly predicting material properties enables rapid screening of large numbers of materials and facilitates the design of new materials. One of the leading challenges for computational researchers is determining the best ways to analyze large material data sets to identify models that can rapidly predict a given property. In this paper, we demonstrate the use of genetic programming to generate simple models of dielectric breakdown based on 82 representative dielectric materials. We identified the band gap Eg and phonon cut-off frequency ωmax as the two most relevant features, and new classes of models featuring functions of Eg and ωmax were uncovered. The genetic programming approach was found to outperform other approaches for generating models, and we discuss some of the advantages of this approach.Entities:
Year: 2017 PMID: 29242566 PMCID: PMC5730619 DOI: 10.1038/s41598-017-17535-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Eight feature properties related to dielectric breakdown strength.
| Name | Symbol | Name | Symbol |
|---|---|---|---|
| Band Gap |
| Dieletric constant |
|
| Phonon frequency (max) |
| Dielectric constant (electron) |
|
| Phonon frequency (mean) |
| Nearest Neighbor Distance |
|
| Density |
| Bulk Modulus |
|
Figure 1The frequency with which each of the eight features appears on the Pareto frontier (parameter f). Error bars are from eight parallel Eureqa runs.
Figure 2Distribution of the frequency of occurrence (parameter p) of products of two features.
Figure 3Average (a) RMSE and (b) PCC performance of models on total input data (training plus validation datasets) by MAE, RMSE, PCC optimizations compared with the LASSO solution. The error bar is the calculated standard deviation from averaging over 16 parallel Eureqa runs.
Figure 4Average (a) RMSE and (b) PCC performance of models on validation and test sets by MAE, RMSE, PCC optimizations compared with the LASSO solution. The error bar is the calculated standard deviation from averaging over 16 parallel Eureqa runs.
Figure 5Complexity versus number of appearances (Nc) plots after parameter re-optimization based on training and validation data. Higher coverage indicates a better model as evaluated against the test data. Zero coverage represents (a) the highest RMSE, 1.27 ln(MV/m) and (b) the lowest PCC, 0.61. Full coverage represents (a) the lowest RMSE, 0.39 ln(MV/m) and (b) the highest PCC, 0.90.
Figure 6Comparison of RMSE performance of models on (a) training + validation and (b) test data between LASSO, the universal Pareto frontier and models from all Pareto frontiers. The arrow in (a) indicates the point at which there is a relatively large change in the slope along the Pareto frontier.
Figure 7Dielectric breakdown strength Fb predicted by machine learning and density functional theory (DFT) for (a) model with complexity 9 at Universal Pareto Frontier, (b) the LASSO solution. Blue squares represent the prediction for training and validation data and red circles represent the prediction for test data. The black solid line indicates a perfect match between machine learning and DFT.
The performance in predicting dielectric breakdown strength on all data ε , training data ε , and validation data ε for models on the universal Pareto frontier constructed using training, validation, and test data.
| Complexity | Model | Benchmark |
|
|
|
|---|---|---|---|---|---|
| 1 | 395 | RMSE | 639 | 627 ± 148 | 614 ± 157 |
| PCC | N/A | N/A | N/A | ||
| 3 | 34.7 | RMSE | 524 | 492 ± 183 | 487 ± 188 |
| PCC | 0.58 | 0.63 ± 0.17 | 0.63 ± 0.17 | ||
| 5 (S1) |
| RMSE | 325 | 321 ± 48 | 320 ± 48 |
| PCC | 0.87 | 0.86 ± 0.05 | 0.85 ± 0.07 | ||
| 7 |
| RMSE | 321 | 319 ± 51 | 315 ± 53 |
| PCC | 0.87 | 0.86 ± 0.05 | 0.86 ± 0.06 | ||
| 8 (S2) |
| RMSE | 248 | 248 ± 51 | 238 ± 51 |
| PCC | 0.92 | 0.91 ± 0.05 | 0.91 ± 0.05 | ||
| 10 (S3) |
| RMSE | 235 | 235 ± 45 | 226 ± 43 |
| PCC | 0.93 | 0.91 ± 0.05 | 0.92 ± 0.05 | ||
| 13 |
| RMSE | 233 | 231 ± 50 | 223 ± 49 |
| PCC | 0.93 | 0.92 ± 0.05 | 0.92 ± 0.05 | ||
| 14 |
| RMSE | 229 | 227 ± 50 | 221 ± 48 |
| PCC | 0.93 | 0.92 ± 0.05 | 0.92 ± 0.05 | ||
| 15 |
| RMSE | 227 | 227 ± 49 | 217 ± 47 |
| PCC | 0.94 | 0.92 ± 0.05 | 0.92 ± 0.05 | ||
| 17 |
| RMSE | 225 | 226 ± 44 | 215 ± 44 |
| PCC | 0.94 | 0.92 ± 0.04 | 0.93 ± 0.05 | ||
| 19 |
| RMSE | 224 | 226 ± 43 | 214 ± 42 |
| PCC | 0.94 | 0.92 ± 0.04 | 0.93 ± 0.05 | ||
| 20 |
| RMSE | 217 | 219 ± 37 | 210 ± 35 |
| PCC | 0.94 | 0.93 ± 0.04 | 0.93 ± 0.05 | ||
| 21 |
| RMSE | 189 | 185 ± 38 | 165 ± 42 |
| PCC | 0.96 | 0.95 ± 0.04 | 0.96 ± 0.03 | ||
| 22 |
| RMSE | 174 | 184 ± 32 | 156 ± 39 |
| PCC | 0.96 | 0.95 ± 0.04 | 0.96 ± 0.03 | ||
| 25 |
| RMSE | 169 | 165 ± 38 | 166 ± 36 |
| PCC | 0.96 | 0.96 ± 0.03 | 0.95 ± 0.03 | ||
| 15 (S4) (LASSO) |
| RMSE | 692 | 674 ± 242 | 614 ± 259 |
| PCC | 0.74 | 0.78 ± 0.08 | 0.79 ± 0.10 |
Figure 8RMSE performance of models on all data (i.e., training + validation + test) for LASSO, the universal Pareto frontier, and models from all Pareto frontiers trained on all data when using dielectric breakdown strength as the output value.
Figure 9Contour plot of dielectric breakdown strength Fb predicted by machine learning along with scatter plot of the values calculated by density functional theory (DFT) for (a) solution S1, (b) solution S2, (c) solution S3 and (d) solution S4 (the LASSO solution). The circles are training and validation data, and the squares are test data. The spheres and squares share the same color-coding scheme as the contour plot. The black dashed lines indicate contour levels labeling machine-learning-predicted Fb values at 200, 500, 1000, 2000, 4000 MV/m.