Literature DB >> 34079901

Discovering Relationships between OSDAs and Zeolites through Data Mining and Generative Neural Networks.

Zach Jensen1, Soonhyoung Kwon2, Daniel Schwalbe-Koda1, Cecilia Paris3, Rafael Gómez-Bombarelli1, Yuriy Román-Leshkov2, Avelino Corma3, Manuel Moliner3, Elsa A Olivetti1.   

Abstract

Organic structure directing agents (OSDAs) play a crucial role in the synthesis of micro- and mesoporous materials especially in the case of zeolites. Despite the wide use of OSDAs, their interaction with zeolite frameworks is poorly understood, with researchers relying on synthesis heuristics or computationally expensive techniques to predict whether an organic molecule can act as an OSDA for a certain zeolite. In this paper, we undertake a data-driven approach to unearth generalized OSDA-zeolite relationships using a comprehensive database comprising of 5,663 synthesis routes for porous materials. To generate this comprehensive database, we use natural language processing and text mining techniques to extract OSDAs, zeolite phases, and gel chemistry from the scientific literature published between 1966 and 2020. Through structural featurization of the OSDAs using weighted holistic invariant molecular (WHIM) descriptors, we relate OSDAs described in the literature to different types of cage-based, small-pore zeolites. Lastly, we adapt a generative neural network capable of suggesting new molecules as potential OSDAs for a given zeolite structure and gel chemistry. We apply this model to CHA and SFW zeolites generating several alternative OSDA candidates to those currently used in practice. These molecules are further vetted with molecular mechanics simulations to show the model generates physically meaningful predictions. Our model can automatically explore the OSDA space, reducing the amount of simulation or experimentation needed to find new OSDA candidates.
© 2021 The Authors. Published by American Chemical Society.

Entities:  

Year:  2021        PMID: 34079901      PMCID: PMC8161479          DOI: 10.1021/acscentsci.1c00024

Source DB:  PubMed          Journal:  ACS Cent Sci        ISSN: 2374-7943            Impact factor:   14.553


Introduction

Zeolites and related zeotype materials are crystalline, microporous materials extensively used in a variety of industrial applications.[1−3] Among their different physicochemical properties, the crystalline structure and building unit geometry are critical in determining their suitability for target applications based on structure-dependent molecular shape selectivity, diffusivity, and confinement. Although there are over 250 recognized zeolite structures,[4] the exact mechanisms associated with the nucleation and crystallization of zeolites are still not fully understood,[5−8] making the a priori prediction of a desired zeolite phase from an initial set of conditions inexact and difficult. For this reason, the discovery of new zeolite structures has historically been based on trial-and-error synthesis methodologies guided by accumulated human knowledge and chemical intuition.[9] Variables known to influence zeolite formation include the types and amounts of framework atoms, mineralizing agents, and inorganic/organic structure directing agents.[1,9,10] Organic structure directing agent (OSDA) molecules play a crucial role in zeolite synthesis. They can provide different effects within the synthesis from charge balancing and space filling to a templating, lock-and-key relationship.[11] This results in a wide range of OSDA specificity with some OSDAs able to crystallize many different zeolite phases while others can only direct the formation of a limited number of phases. The size, flexibility, hydrophilicity, and charge of the OSDA, among other factors, play an important role in zeolite crystallization kinetics and phase specificity.[12−14] Indeed, experimental heuristics within the zeolite community connects the OSDA size with increasing zeolite pore size and increasing OSDA rigidity with increasing specificity or formation of fewer zeolite phases, although designing OSDAs from these heuristics remains challenging.[12] Researchers have used computational approaches including density functional theory and molecular dynamics to suggest candidate OSDAs for specific zeolite structures,[15−17] but these approaches are typically limited to a single zeolite system, computationally expensive, and focus on pure silica systems. More recently, a strategy involving the “ab initio” design of the OSDA to mimic the transition states of industrially relevant catalytic reactions has gained attention,[18,19] but this technique relies on computationally expensive density functional theory calculations, hindering its widespread implementation. Undoubtedly, we must develop new modeling approaches that are more efficient and comprehensive to advance OSDA design. Data-driven approaches have been used to study porous materials,[20−23] but data-driven zeolite synthesis studies are limited in scope and rely on overly simplified OSDAzeolite interactions that cannot capture the complexity of the system. Machine learning (ML) and data mining have been used in studies that do not require explicitly modeling the OSDAzeolite interaction including specific zeolites within limited regions of the chemical space,[24,25] OSDA-free zeolite systems,[26] and interzeolite transformations.[27] Studies that have attempted to model this interaction either simplify the OSDA representation to basic properties such as molecular volume[28] or are limited to a single zeolite structure,[16] suggesting that more advanced ML techniques and larger data sets are needed to better model the OSDAzeolite relationship.[29] The literature provides a comprehensive data set of the known OSDAzeolite pairs, and recent studies have provided natural language processing (NLP) frameworks that can be adapted to extract OSDA, zeolite, and chemistry information, including all the elemental species present in the synthesis gel.[28,30−32] Literature-extracted data combined with advanced ML techniques for the OSDAzeolite relationship could expand the scope of data-driven zeolite studies. ML also enables the pursuit of inverse design for both porous materials[33−35] and organic molecules. One approach to inverse design is generative neural network models, which have been successful for many applications including drug discovery,[36] property optimization,[37] synthesis prediction,[38] and molecular design.[39,40] These models learn a latent representation of an organic molecule typically by compressing the training data into a multidimensional Gaussian distribution and reconstructing it from sampled vectors. This latent space can then be explored to generate novel organic molecules that resemble the support distribution. These new samples are then converted into standard molecular representations such as the “simplified molecular-input line-entry system” (SMILES) format.[41] Recent models have been trained to generate molecules directly from the molecule’s physical and chemical properties.[42] During inference, researchers input in the desired properties and generate molecules that possess them. This model can be adapted to zeolite data by training the model to generate organic molecules from specific zeolite structures and gel chemistry. In this paper, we use a data-driven approach to examine the relationships between OSDAs, qualitative gel chemistry, and resulting zeolite structures. We present an exhaustive OSDA, zeolite, and qualitative synthesis data set extracted through NLP and text mining techniques. We use structural descriptions of the OSDAs to reduce the dimensionality of the chemical space and visualize trends found in the crystallization of certain zeolites. Finally, we adapt a generative neural network model trained on this extracted data set to suggest potential OSDA molecules conditioned on specific zeolite structures and synthesis conditions. The data, models, and resulting analyses provide research opportunities for the community to further expedite zeolite research and represent an important first step toward developing a high-throughput zeolite research pipeline.

Results and Discussion

Extracted Data Set

We extract a data set of OSDAs, chemistry, and zeolite phases from across the entire zeolite literature with automated techniques.[28,30,31] This data set consists of articles from over 15 different publishers and 140 journals and spans the year range from 1966 to 2020. It contains 5,663 synthesis routes from 1,384 articles containing the OSDAs, qualitative synthesis gel components, and the resulting zeolite phases. This data set contains information on 758 distinct OSDA molecules and 205 zeolite phases. Among the different synthesis routes, 3,085 describe traditional zeolites (pure Si, Si/Al, and Si/B frameworks), 1,274 describe aluminophosphate (AlPO)-type materials, while the remaining 1,304 data points describe additional zeotypes including germanium-based and metal-containing (Ti, Sn, or Zr, among others) microporous structures. Distributions of different zeolite structures and synthesis gel chemistry contained in the data set can be seen in Figures S1 and S2. Figure a shows the average molecular volume distribution of the OSDAs in the data set. The molecular volumes range from about 30 to 1000 Å3. Figure a shows that larger OSDAs are related to the synthesis of zeolites instead of AlPO-type materials. This observation agrees with less correlation between organic molecules and the pores/cages observed experimentally for AlPO-type materials[12] and the limited stability of large-pore AlPO-type materials compared to their aluminosilicate counterparts. This limited stability has mostly precluded heuristic studies using bulky and expensive OSDA molecules in their synthesis.
Figure 1

Overview of the automatically extracted data set. (a–c) Average molecular volume, OSDA specificity, and charge distributions for all OSDAs in the data set. (d) Shows the five OSDAs known to make the most zeolite structures. (e) Shows the five zeolites that can be made with the most OSDAs.

Overview of the automatically extracted data set. (a–c) Average molecular volume, OSDA specificity, and charge distributions for all OSDAs in the data set. (d) Shows the five OSDAs known to make the most zeolite structures. (e) Shows the five zeolites that can be made with the most OSDAs. The majority of the OSDAs have high specificity, producing fewer than 5 zeolite phases, while a few outliers are capable of making more than 20 phases (Figure b). These lower-specificity OSDAs are typically small and simple alkylammonium cations, such as tetramethylammonium (TMA) or tetraethylammonium (TEA) shown in Figure d. These molecules act as space-filling molecules to provide charge balance to the framework and generally do not provide a true templating effect. Other low-specificity OSDAs feature high flexibility with many rotatable bonds, such as hexamethonium (see Figure d). The zeolites that have been experimentally obtained using the most organic molecules are MFI, MTW, *BEA, CHA, and MOR (Figure e). These topologies are among the most widely used industrial applications (along with FAU and FER), thereby having more fundamental research efforts to improve their physicochemical properties and cost effectiveness.[43] The number and distribution of ionic charges within the OSDAs play an important role in the nucleation and crystallization processes and, together with the presence/absence of alkali cations, are crucial for positioning the negatively charged heteroatoms in specific framework positions. Heteroatom location has been shown to drastically alter the catalytic properties of the materials.[44−46] In zeolite synthesis, most OSDAs contain one or two positive charges, generally in the form of mono- or dicationic ammonium species (Figure c).[12−14] While the use of neutral amines has also been reported for the synthesis of zeolite-type materials, these molecules mostly act as pore fillers. In contrast, AlPO-type materials are preferentially synthesized using amines as OSDAs (blue bar in 0 charge in Figure c), which are protonated in the neutral or acidic media of a typical AlPO-type material synthesis gel.

Literature-Mined OSDA/Zeolite Correlations

Due to the complex interactions between OSDAs and the resulting zeolite framework (Figures S3 and S4), simple descriptors like molecular volumes and/or flexibility parameters (e.g., nConf20)[47] are insufficient to describe links between specific OSDAs and zeolite structures. We also consider nonstructural properties of the OSDAs and their effect on zeolite structure (Figure S5), but these features also inadequately describe the OSDAzeolite relationship. To capture molecular shape matching, we need a more informative structural descriptor to capture not only the size of the molecule but also other structural features such as folding and charge distributions. Accordingly, weighted holistic invariant molecular (WHIM)[48] descriptors contain information about the size, shape, symmetry, and atom distribution that is dependent on the three-dimensional conformation of the molecule. Depending on the flexibility of the molecule, different conformations can have drastically different WHIM representations. For example, a long linear molecule can either stretch out or fold, giving two different three-dimensional representations (Figure S6). To address this challenge, we calculate the average conformation WHIM descriptor using geometries obtained with RDkit[49] to capture the varying three-dimensional representation of each molecule based on its different conformations. Because WHIM is a high-dimensional descriptor, we use principal component analysis (PCA) to reduce the dimensionality of the WHIM descriptor space and enable visualization of all the OSDAs in the data set. The first principal component (PCA 1) accounts for 58% of the variance and correlates with the volume of the molecule, as it contains WHIM features corresponding to the longest axial and global dimension of the molecule. The second principal component (PCA 2) accounts for 15% of the variance and is composed of global dimension and symmetry features. The third principal component (PCA 3) accounts for 13% of the variance and has many contributing features including all three of the axial dimensions and the global dimensions (Figures S7 and S8). We show the PCA WHIM visualization comparing the OSDAs to a sampling of the entire organic space to highlight the limited chemical space of known OSDAs (Figure S9). These dimensionally reduced WHIM descriptors highlight relationships between OSDAs and zeolite phases. We select five cage-based small-pore zeolites, LEV, CHA, AEI, LTA, and AFX (Figure a), to evaluate the OSDAzeolite correlations through the WHIM descriptor featurization and PCA analysis (Figure b,c). Cage-based zeolites have a strong correlation between the three-dimensional structure of the OSDA and the shape of the cage, making them good candidates for analysis. Since gel composition also affects the relationship between the OSDA and zeolite, we filter the data set to include only conventional zeolite chemistry versions for the selected zeolites using the extracted qualitative synthesis gel information. We also explore this relationship for selected large-pore zeolites to examine the generalization of this approach to other zeolite systems (Figure S10).
Figure 2

Principal component analysis (PCA) WHIM vector representation of OSDA molecules used in five cage-based small-pore zeolite systems. PCA 1, 2, and 3 represent the first three principal component axes. The gray points represent all of the OSDAs extracted from the literature.

Principal component analysis (PCA) WHIM vector representation of OSDA molecules used in five cage-based small-pore zeolite systems. PCA 1, 2, and 3 represent the first three principal component axes. The gray points represent all of the OSDAs extracted from the literature. For these five zeolites, Figure shows that each zeolite topology is associated with specific and distinct OSDA characteristics. Differences are observed between the locations of the clusters, particularly PCA 2 and PCA 3, likely due to the differences in cage size and shape requiring different molecular structures as OSDAs. The OSDAs for LTA show larger variability among the PCA parameters than those for the other zeolite clusters (purple crosses in Figure b). The synthesis of high-silica LTA has been preferentially reported by using large aromatic molecules[50,51] (Figure S11), while small organic molecules have been employed for the synthesis of low-silica LTA, i.e., tetramethylammonium or diethyldimethylammonium (Figure S11), which act as pore fillers in combination with additional alkali cations. The difference in OSDA size for high- and low-silica LTA materials is likely responsible for the large PCA variability observed for the LTA cluster. The two clusters representing CHA and AEI are very close in the PCA WHIM vector representation (blue diamonds and red circles in Figure b,c) and have significantly reduced variance compared to LTA. The overlapped region of both clusters suggests that the OSDAs used to synthesize these frameworks are structurally similar. In fact, some of these molecules can be used to make either framework by modifying the synthesis conditions (Figure S12). This phenomenon is expected given that both AEI and CHA zeolites have many structural similarities including a cage-like three-dimensional small-pore system and identical framework density (15.1 T/1000 Å3). However, since these materials present cavities with different shapes (Figure a), elongated and symmetrical in the case of CHA (11.7 × 10.2 Å) and basket-cage-type in AEI (12.6 × 11.2 Å), there are specifically shaped OSDA molecules that would selectively fit CHA or AEI cavities, thus guiding their preferential crystallization (Figure S12).

Suggesting New Candidate OSDAs through Generative Modeling

We adapt a generative neural network model published by Kotsias et al.[42] to suggest alternative organic molecules for use as OSDAs. This model is trained on the extracted literature data to output a SMILES string for an OSDA molecule given a zeolite phase and gel chemistry as input (architecture and training procedure is described in the Experimental Section). This model allows us to move beyond mining relationships from the literature toward the process of discovering new OSDAs for particular zeolite structures. This model requires a large quantity of data to train a useful model,[52] which is enabled by the size of our extracted data set. Quantitative performance and benchmarking metrics for the model are discussed in the Supporting Information (see Table S1 and Figure S13). With this model, we generate potential OSDA molecules for a cage-based zeolite system featured above, CHA, due to its industrial relevance. A total of 10,000 samples are drawn from the model using different zeolite gel chemistry variations including pure Si, SiAl, and Si–B while also including Na+ and K+ cations and F– as a mineralizer. This procedure generates 408 unique OSDAs for CHA. To filter the generated OSDA molecules, we compare them to the OSDA currently used in industry for CHA, N,N,N-trimethyladamantammonium (TMAda). We take the PCA-reduced WHIM coordinates of the TMAda and create an ellipsoid around the point taking 5% of the range along the first three principal component axes (see Figure a,b). Of the 408 generated CHA molecules, 57 fall within the TMAda ellipsoid. Another 11 OSDAs previously reported in the literature for CHA and 24 other OSDAs reported for other topologies also fall within this range. Organic molecules within the ellipsoid are expected to be structurally similar to TMAda and therefore may be suitable alternative OSDAs as we explore further below.
Figure 3

Comparing literature OSDAs and generated OSDAs of a CHA zeolite. (a) Shows the position of TMAda (shown with the blue star) relative to the rest of the OSDAs in the PCA WHIM space. (b) A zoomed in view of the ellipse surrounding it. (c) The blue square contains literature CHA OSDAs that fall within the ellipse. (d) The orange square contains examples of generated OSDAs for CHA that fall within the ellipse.

Comparing literature OSDAs and generated OSDAs of a CHA zeolite. (a) Shows the position of TMAda (shown with the blue star) relative to the rest of the OSDAs in the PCA WHIM space. (b) A zoomed in view of the ellipse surrounding it. (c) The blue square contains literature CHA OSDAs that fall within the ellipse. (d) The orange square contains examples of generated OSDAs for CHA that fall within the ellipse. Figure shows this information flow and some of the resulting generated organic molecules for CHA (Figure d). The highlighted points within the WHIM space represent OSDAs that fall within the ellipsoid in all three PCA dimensions. Looking qualitatively, the generated OSDAs contain many similar features as the OSDAs found in literature used for the synthesis of CHA (Figure c). For instance, different adamantyl-type, rigid molecules are predicted (row 1 in Figure d), in good agreement with the experimentally described TMAda, considered as the most effective template to stabilize the CHA cavity.[53−55] Beyond adamantyl-type molecules, different alkyl-substituted spiro and piperidinium molecules have been generated by the model as proposed OSDAs for CHA (rows 2 and 3 respectively in Figure d), which present similar structural features as some reported CHA OSDAs. In addition, two simple tetraalkylammonium cations have also been generated (row 4 in Figure d). We note that tetraethylammonium has been recently reported as an OSDA for the synthesis of CHA in its silicoaluminate form.[56] The model also generates other types of molecules not directly seen in the literature (row 5 in Figure d) but have commonly observed features including a single positively charged nitrogen atom and cyclic structures. The generated molecules demonstrate the model’s ability to add domain and data-informed chemical noise into the OSDA space in a way that allows intelligent prediction of potential OSDA candidates. We also evaluate the generated OSDA candidates for a zeolite that is less studied than CHA. We choose the SFW framework, which has been synthesized as a SiAl zeolite using three different OSDA molecules according to our data set and presents high potential interest for its application as an effective catalyst for NO abatement.[57,58] SFW is structurally similar to CHA, having the same framework density (15.1 T/1000 Å3) and being cage-based with the gme cage replacing the cha cage. Since few OSDAs are known for SFW, we use molecular mechanic simulations to calculate the binding energy of each of the generated molecules with the SFW framework to gauge our model’s predictive ability, rather than comparing to known molecules as for CHA. The atomistic simulations follow the procedures laid out by Schwalbe-Koda and Gómez-Bombarelli[59,60] (see also the Experimental Section). The molecular mechanic simulations show that many of the generated molecules produced by our model are suitable OSDA candidates for SFW. Of the generated molecules, 60% have binding energies within the range of the literature OSDAs (−9.98 to −7.48 kJ/mol SiO2). Interestingly, an additional 7% have lower binding energies than the known OSDAs. Figure a shows the results of generating molecules for SFW in the reduced WHIM space. The blue stars represent the OSDAs known to synthesize SFW, N-ethyl-N-(2,4,4-trimethylcyclopentyl)pyrrolidinium, N-ethyl-N-(3,3,5-trimethylcyclohexyl)pyrrolidinium, and N,N-diethyl-5,8-dimethyl-azonium bicyclo[3.2.2]nonane, while the orange points represent generated molecules. Figure b shows the binding energy for each of the three literature OSDAs. We select five of the generated molecules, shown in Figure c. Molecules 1, 2, and 3 are structurally similar to the known OSDAs and have very low binding energies. These strong binding energies support the relationship between distance in the WHIM space and OSDA potential. Molecules 4 and 5 are chosen for strong binding energies while being structurally different than the known OSDAs. Molecule 4 is significantly larger than the known OSDAs, indicating that a single, well-fitting OSDA per cage could also have a strong templating effect toward SFW, while molecule 5 is significantly smaller than the known OSDAs, requiring packing more molecules into the cage. These two molecules demonstrate the model’s ability to suggest molecules that are structurally dissimilar from the known OSDAs.
Figure 4

OSDAs for SFW obtained from literature and generated by our model. (a) PCA-reduced WHIM locations for the three OSDAs known to make SFW (blue stars) and five selected molecules generated by our model (orange stars). (b) Minimum conformer binding energy with SFW for the three literature OSDAs. (c) Binding energy with SFW for the five selected generated molecules.

OSDAs for SFW obtained from literature and generated by our model. (a) PCA-reduced WHIM locations for the three OSDAs known to make SFW (blue stars) and five selected molecules generated by our model (orange stars). (b) Minimum conformer binding energy with SFW for the three literature OSDAs. (c) Binding energy with SFW for the five selected generated molecules. While the model is able to generate physically meaningful suggestions for the SFW zeolite, it has performance limitations. We probe its ability to provide different distributions of molecules depending on the zeolite and chemistry. First, we compare the generated SFW OSDAs with generated LAU OSDAs. LAU is structurally very different than SFW, having a higher framework density (18.0 T/1000 Å3), a 1-dimensional, 10-membered ring channel, and no composite building units in common with SFW. Furthermore, LAU is typically synthesized as an M–(Al/Ga)PO (M = Co, Mn, Zn, Fe)-type material,[61,62] while SFW is a conventional zeolite,[57,58] making them chemically different as well. There is a clear difference in the WHIM distributions of the molecules generated for the two systems indicating the model’s ability to distinguish between the structures during prediction (Figure S14a). Figure S14b shows the distributions of minimum distance in the WHIM space to one of the known SFW and LAU OSDAs. We also generate LAU OSDAs using the SFW zeolite chemistry to compare the effect chemistry has on the model. As expected, having similar chemistry shifts the generated distributions closer together although they are still distinct. We also compare the SFW binding energies of the generated OSDAs and OSDAs from the entire zeolite literature (Figure S15). Figure S15 shows these distributions are very similar, indicating the model may have a limited ability to predict OSDAs specific to each zeolite system. However, the model is able to match the literature distribution, containing molecules known to be suitable OSDAs. These results taken together demonstrate the model’s ability to generate different OSDA suggestions by injecting chemical noise into the OSDA space but still matching the performance of known literature OSDAs. This result indicates that generated molecules may have potential as OSDAs for several structurally similar zeolite systems. Pairing this model with binding energy simulations could help in selecting predicted OSDAs.

Conclusion

We have extracted and featurized data on OSDAs, zeolite phases, and gel chemistry from across the zeolite literature, resulting in a large, comprehensive data set of zeolite synthesis parameters. We have then mined this literature data to uncover relationships between the structure of the OSDA and the resulting zeolite phase using a calculated three-dimensional feature called WHIM. Finally, we model the interaction between the OSDA, zeolite, and gel chemistry using a generative neural network. This model can suggest novel organic molecules with binding energies below and comparable with their known literature counterparts. While all of the chemistry data extracted in this paper is qualitative, a promising avenue for supplemental work is to extract quantitative information about the gel chemistry. This information would allow for more detailed thermodynamic and kinetic studies of zeolite synthesis. Additional atomistic simulations could further aid the selection of OSDAs with greatest potential to experimentally form the target zeolite. This model and data could be combined with more advanced, rapid simulation techniques and experimental optimization to develop a high-throughput zeolite synthesis pipeline.

Experimental Section

Data Extraction, Processing, and Validation

Over 3.5 million chemistry and materials science journal articles were scanned for keywords relating to zeolite materials including “zeolite”, “osda”, “aluminophosphate”, and “molecular sieve”, resulting in a corpus of approximately 90,000 papers. From this corpus, OSDA names, zeolite structures, and synthesis gel components were extracted from the tables and synthesis sections of each paper using regular expression and domain specific keyword matching. While this approach works well for extracting raw zeolite data with very high recall, it is difficult to determine specific OSDAzeolite–synthesis systems, especially for papers that contain multiple experimental samples. Because of this, each extracted paper was manually checked to ensure integrity and accuracy of the extracted synthesis route.

Data Normalization and Featurization

Since authors use a variety of chemical names to describe both OSDA molecules and zeolite structures, the extracted text data needed to be normalized so different naming schemes did not affect the final representation. For OSDAs, the CIRpy (Chemical Identifier Resolver) Python package was used to determine the IUPAC name and SMILES string. If the OSDA name was not given in the paper, a chemistry expert determined the correct IUPAC name and SMILES. Each zeolite material was normalized to its International Zeolite Association (IZA) code through its list of known materials. Materials not in the IZA database were manually assigned the correct three letter code. RDkit[49] was utilized to featurize the OSDA molecules. In addition to the canonical SMILES representation, physical and chemical properties of the organic molecules were also calculated, including molecular volume, surface area, charge, and WHIM descriptors. A total of 2,000 gas phase conformers for each molecule were generated, embedded, and optimized with the MMFF94 force field.[63] Average WHIM descriptors were calculated from the WHIM descriptors of all conformers. PCA transformations for the WHIM vectors were calculated using scikit-learn after each WHIM feature was standardized to remove the mean and scale to unit variance. Zeolite structures were featurized with structural data obtained from the IZA database including framework density, maximum ring size, channel dimensionality, maximum included volume of a sphere, accessible volume, maximum channel area, and minimum channel area. Qualitative gel chemistry was one-hot encoded to describe the important components of zeolite synthesis. One-hot categories were the presence of Si, Al, Ge, P, Ti, B, Ga, Fe, Na, K, F, additional framework elements, additional cations, extra solvents in addition/instead of water, acid used in the synthesis, and other synthesis components.

Generative OSDA Model

The generative neural network borrowed heavily in both architecture and training protocol from Kotsias et al.,[42] but instead of using organic molecular descriptors, the model used zeolite and synthesis gel features as inputs. For each extracted synthesis route, the zeolite and synthesis are featurized and concatenated into the input vector, while the SMILES string of the OSDA is the output. To augment the training data, up to 100 different noncanonical versions of each OSDA’s SMILES string are generated, resulting in training sets of approximately 150,000 points for the different train/test splits. This data augmentation has been shown to increase the accuracy of generative models for organic molecules.[64] The input is fed through 6 dense layers of 256 units with ReLU activation. Then, the data is fed through three unidirectional LSTM layers consisting of 256 units. Finally this output goes through a feedforward dense layer with 35 units having a softmax activation. Batch normalization is used on the first dense layers and LSTM layers. The model was implemented in Keras v2.2.4 with TensorFlowGPU v2.0.0 backend and trained using two NVIDIA Titan Xp GPUs. The model was trained for 100 epochs on a variety of train/test splits to test various aspects of the generative model using the “teacher’s forcing method”.[65] The Adam optimizer with default parameters was used with a batch size of 128. A custom learning rate scheduler was used with an initial rate of 10–3 for 50 epochs, and then each epoch was exponentially decayed down to 10–6. Four different training and testing splits were used to train the model. (1) The training and test split was chosen at random with 80% of the data used for training and 20% used for testing. (2) The training and test split was chosen so all data points resulting in CHA were isolated in the test set. This results in 5,398 training points and 265 (5%) testing points. (3) Data was split in the same manner as (2) but using AEI. This results in 5,555 training points and 108 (2%) testing points. (4) The final model was trained on the entire data set with no held out test set. Splits 1, 2, and 3 are used to evaluate the model’s performance, while split 4 is used to look at specific zeolite systems CHA and SFW. Holding out an entire zeolite structure from the training tests the model’s capability of suggesting new OSDA candidates for previously unseen zeolites and can confirm that the model is not memorizing pairs of OSDAs and zeolites, which can occur when randomly splitting. CHA and AEI were chosen due to their cage-like structure, industrial relevance, and presence of enough data to construct a large-enough test set for benchmarking. OSDA generation followed the procedure outlined in Kotsias et al.[42] very closely. All generation occurred with multinomial sampling with the temperature parameter set equal to 1. Specific zeolite phases were manually chosen and paired with the appropriate chemistry conditions. For example, when looking at CHA zeolites, the CHA phase is paired with Si/F, Si/B, Si/Na, Si/K, Si/Al/Na, Si/Al/K, and Si/Al/F. For each zeolite/chemistry pair, 10,000 molecules are generated along with the negative log-likelihoods of generating that molecule.

Atomistic Simulations

Molecular mechanics simulations were performed using the General Utility Lattice Program (GULP),[66,67] version 5.1.1, through the GULPy package.[59] The Dreiding force field[68] was used to model interactions between the zeolite and the OSDA. The initial structure for the SFW zeolite was retrieved from the International Zeolite Association database and optimized using the Sanders–Leslie–Catlow force field.[69] Docking of OSDAs in SFW was performed using the VOID package using the default parameters.[60] Pose optimizations were performed at constant volume, and binding energies were calculated following the frozen pose method.[59]
  30 in total

Review 1.  Tuning the Aluminum Distribution in Zeolites to Increase their Performance in Acid-Catalyzed Reactions.

Authors:  Jiri Dědeček; Edyta Tabor; Stepan Sklenak
Journal:  ChemSusChem       Date:  2018-12-21       Impact factor: 8.928

2.  Towards the rational design of efficient organic structure-directing agents for zeolite synthesis.

Authors:  Manuel Moliner; Fernando Rey; Avelino Corma
Journal:  Angew Chem Int Ed Engl       Date:  2013-10-02       Impact factor: 15.336

3.  Looking deeper into zeolites.

Authors:  Stephen Shevlin
Journal:  Nat Mater       Date:  2020-10       Impact factor: 43.841

4.  Supramolecular self-assembled molecules as organic directing agent for synthesis of zeolites.

Authors:  Avelino Corma; Fernando Rey; Jordi Rius; Maria J Sabater; Susana Valencia
Journal:  Nature       Date:  2004-09-16       Impact factor: 49.962

5.  SSZ-52, a zeolite with an 18-layer aluminosilicate framework structure related to that of the DeNOx catalyst Cu-SSZ-13.

Authors:  Dan Xie; Lynne B McCusker; Christian Baerlocher; Stacey I Zones; Wei Wan; Xiaodong Zou
Journal:  J Am Chem Soc       Date:  2013-07-08       Impact factor: 15.419

6.  Syntheses and characterizations of transition-metal-substituted aluminophosphate molecular sieves |(C3N2H5) 8|[M8Al16P24O96] (M = Co, Mn, Zn) with zeotype LAU topology.

Authors:  Xiaowei Song; Jiyang Li; Yanan Guo; Qinhe Pan; Lin Gan; Jihong Yu; Ruren Xu
Journal:  Inorg Chem       Date:  2009-01-05       Impact factor: 5.165

7.  Graph similarity drives zeolite diffusionless transformations and intergrowth.

Authors:  Daniel Schwalbe-Koda; Zach Jensen; Elsa Olivetti; Rafael Gómez-Bombarelli
Journal:  Nat Mater       Date:  2019-10-07       Impact factor: 47.656

8.  Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility in a Single Descriptor.

Authors:  Jerome G P Wicker; Richard I Cooper
Journal:  J Chem Inf Model       Date:  2016-12-06       Impact factor: 4.956

9.  Machine-learning approach to the design of OSDAs for zeolite beta.

Authors:  Frits Daeyaert; Fengdan Ye; Michael W Deem
Journal:  Proc Natl Acad Sci U S A       Date:  2019-02-07       Impact factor: 11.205

10.  Inverse design of porous materials using artificial neural networks.

Authors:  Baekjun Kim; Sangwon Lee; Jihan Kim
Journal:  Sci Adv       Date:  2020-01-03       Impact factor: 14.136

View more
  4 in total

1.  Analysis of Data Interaction Process Based on Data Mining and Neural Network Topology Visualization.

Authors:  Nina Dai
Journal:  Comput Intell Neurosci       Date:  2022-06-29

2.  Enterprise Information Security Management Using Internet of Things Combined with Artificial Intelligence Technology.

Authors:  Hongbin Sun; Shizhen Bai
Journal:  Comput Intell Neurosci       Date:  2022-06-14

3.  MOFSimplify, machine learning models with extracted stability data of three thousand metal-organic frameworks.

Authors:  Aditya Nandy; Gianmarco Terrones; Naveen Arunachalam; Chenru Duan; David W Kastner; Heather J Kulik
Journal:  Sci Data       Date:  2022-03-11       Impact factor: 6.444

4.  MOF Synthesis Prediction Enabled by Automatic Data Mining and Machine Learning.

Authors:  Yi Luo; Saientan Bag; Orysia Zaremba; Adrian Cierpka; Jacopo Andreo; Stefan Wuttke; Pascal Friederich; Manuel Tsotsalas
Journal:  Angew Chem Int Ed Engl       Date:  2022-03-10       Impact factor: 16.823

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.