| Literature DB >> 31139725 |
Zach Jensen1, Edward Kim1, Soonhyoung Kwon1, Terry Z H Gani1, Yuriy Román-Leshkov1, Manuel Moliner2, Avelino Corma2, Elsa Olivetti1.
Abstract
Zeolites are porous, aluminosilicate materials with many industrial and "green" applications. Despite their industrial relevance, many aspects of zeolite synthesis remain poorly understood requiring costly trial and error synthesis. In this paper, we create natural language processing techniques and text markup parsing tools to automatically extract synthesis information and trends from zeolite journal articles. We further engineer a data set of germanium-containing zeolites to test the accuracy of the extracted data and to discover potential opportunities for zeolites containing germanium. We also create a regression model for a zeolite's framework density from the synthesis conditions. This model has a cross-validated root mean squared error of 0.98 T/1000 Å3, and many of the model decision boundaries correspond to known synthesis heuristics in germanium-containing zeolites. We propose that this automatic data extraction can be applied to many different problems in zeolite synthesis and enable novel zeolite morphologies.Entities:
Year: 2019 PMID: 31139725 PMCID: PMC6535764 DOI: 10.1021/acscentsci.9b00193
Source DB: PubMed Journal: ACS Cent Sci ISSN: 2374-7943 Impact factor: 14.553
Figure 1Schematic overview of zeolite data engineering including (1) literature extraction from sources such as NLP from body text, parsing of html tables, and regex matching between text and tables, (2) regression modeling, and (3) zeolite structure prediction.
Figure 2Pairwise plot of gel composition data automatically extracted from zeolite tables.
Excerpt of the Data Set of Germanium-Containing Zeolitesa
| Si/Ge | Si/H2O | Si/F– | OSDA | product | reference |
|---|---|---|---|---|---|
| 4 | 0.08 | 1.6 | 1,2-dimethyl-3-(3-methylbenzyl)imidazolium | CIT-13 | ( |
| 30 | 0.19 | 1.9 | hexamethonium | ITQ-13 | ( |
| 2 | 0.67 | 2.7 | benzyltriethylammonium | ITQ-44 | ( |
| 1 | 0.1 | 1 | 1-methyl-3-(2′-methylbenzyl)imidazolium | NUD-2 | ( |
| 7.5 | 0.13 | 1.76 | pentamethyldiethylenetriamine | amorph | ( |
The full data set is available online (see Supporting Information).
Figure 3Germanium-containing zeolite data extracted with our pipeline. (a) Framework density clusters corresponding to different classes of germanium-containing zeolites. (b) Trade-off between Ge content and the amount of F– ions required to stabilize different zeolites. The three letter codes refer to specific zeolite framework structures defined by the IZA. ADOR is an interzeolite transformation synthesis method.[73]
Figure 4Random forest regression model predicting zeolite framework density from synthesis conditions. (a) Cross-validation results for the random forest model showing the actual experimental vs model predicted values for framework density. (b) A single decision tree regression model trained to predict framework density. Samples values correspond to the percentage of data passing through a node. Density refers to the average framework density value passing through each node. Vol SDA = the volume of the OSDA.