| Literature DB >> 30733290 |
Frits Daeyaert1, Fengdan Ye2, Michael W Deem3,2.
Abstract
We report a machine-learning strategy for design of organic structure directing agents (Entities:
Keywords: OSDA; machine learning; neural network; zeolite beta
Year: 2019 PMID: 30733290 PMCID: PMC6397530 DOI: 10.1073/pnas.1818763116
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Top two sets of hyperparameters selected from models 1–4
| Model | Number of intensities | Total number of weights | |||||||
| 1a | 24 | 0.500 | 49 | 5 | 256 | 1.52 (0.03) | 1.79 (0.07) | 1.45 | 1.41 |
| 1b | 8 | 0.500 | 17 | 8 | 153 | 1.59 (0.02) | 1.75 (0.06) | 1.52 | 1.47 |
| 2a | 24 | 0.500 | 49 | 4 | 205 | 1.66 (0.04) | 1.83 (0.08) | 1.50 | 1.65 |
| 2b | 8 | 0.500 | 17 | 8 | 153 | 1.68 (0.02) | 1.84 (0.07) | 1.59 | 1.59 |
| 3a | 8 | 0.500 | 17 | 2 | 39 | 1.61 (0.07) | 1.68 (0.14) | 1.50 | 1.64 |
| 3b | 32 | 0.500 | 65 | 1 | 68 | 1.55 (0.04) | 1.75 (0.13) | 1.55 | 1.68 |
| 4a | 32 | 0.500 | 65 | 5 | 336 | 1.90 (0.05) | 1.92 (0.07) | 1.87 | 1.87 |
| 4b | 24 | 0.250 | 97 | 2 | 199 | 1.91 (0.05) | 1.95 (0.09) | 1.88 | 1.89 |
The is defined in , and is defined in . The values between brackets are the corresponding SDs. The is defined in , and is defined in .
Fig. 1.Scatter plots of MD- versus ML-predicted stabilization energies for the OSDAs in the validation set for the eight models (A–H). Models 1a and 1b were trained on all compounds without weighing. Models 2a and 2b were trained on all compounds with weighing. Compared with models 1a and 1b, models 2a and 2b have better prediction for OSDAs with MD-calculated energy below −15 kJ/mol Si. Models 3a and 3b were trained on charged compounds only without weighing. No charged OSDAs have an MD-calculated energy below −17.5 kJ/mol Si, which limited the ability of the neural network to find favorable OSDAs. Models 4a and 4b used a linear activation function in the output node.
Fig. 2.Results for OSDA design using model 1b. (A) The top five molecules produced. The molecule scores in this figure are the ML determined binding energy in kJ/(mol Si). (B) Proposed synthesis route to the first molecule in the output shown in A. The outcome of the synthesis route is listed together with the acronym of the reaction used (ALKYLATENP), as well as the structures and catalog names of the proposed reagents.
Best OSDA found with its ML-predicted and MD-calculated stabilization energy, number of compounds with an ML-predicted stabilization energy below −15 kJ/(mol Si), the total number of molecules for which the stabilization energy was predicted, and the total number of unique molecules generated in each run
The number of compounds with ML-predicted energies below −15 kJ/(mol Si), the number of compounds with ML-predicted energies between −15 and −14 kJ/(mol Si) and among which the number of compounds with MD-calculated energies below −17 kJ/(mol Si), the number of TP, and the prediction precision for the eight in silico materials design runs
| Model | −15 < | TP (precision) | |
| 1a | 1,058 (1,054) | 839 (32, 3.8%) | 812 (76.7%) |
| 1b | 1,179 (1,177) | 625 (6, 0.9%) | 865 (73.4%) |
| 2a | 836 (832) | 696 (33, 4.7%) | 690 (82.5%) |
| 2b | 910 (908) | 550 (14, 2.5%) | 672 (73.8%) |
| 3a | 1,857 (1,840) | 915 (60, 6.6%) | 727 (39.1%) |
| 3b | 1,280 (1,280) | 1,204 (104, 8.6%) | 660 (51.6%) |
| 4a | 712 (695) | 827 (34, 4.1%) | 538 (75.6%) |
| 4b | 599 (599) | 805 (57, 7.1%) | 484 (80.8%) |
In parentheses is prediction precision, defined as TP/(number with ≤ −15) ≡ TP/(TP + FP), where FP is false positive and TP is true positive.
In parentheses is the number of MD energies, as some MD evaluations failed.
Cross-section of the putative OSDAs generated in different runs with ML-predicted stabilization energies E ≤ −15. kJ/(mol Si)
| Run | 1a | 1b | 2a | 2b | 3a | 3b | 4a | 4b | In training set |
| 1a | 1,058 | 749 | 630 | 560 | 477 | 452 | 497 | 453 | 13 |
| 1b | 1,179 | 585 | 691 | 445 | 446 | 402 | 384 | 10 | |
| 2a | 836 | 565 | 386 | 374 | 419 | 435 | 11 | ||
| 2b | 910 | 320 | 312 | 339 | 328 | 7 | |||
| 3a | 1,857 | 1,051 | 322 | 254 | 21 | ||||
| 3b | 1,280 | 354 | 311 | 12 | |||||
| 4a | 712 | 386 | 17 | ||||||
| 4b | 599 | 11 | |||||||
| Total unique molecules: 3,062 | |||||||||
Column 10 lists the number of molecules generated in one run that are present in the training or validation set.