Literature DB >> 36188274

Quantitative Structure-Property Relationship Analysis for the Prediction of Propylene Adsorption Capacity in Pure Silicon Zeolites at Various Pressure Levels.

Li Zhao1, Qi Zhang1, Chang He1, Qinglin Chen1, Bing J Zhang1.   

Abstract

This work is devoted to the development of quantitative structure-property relationship (QSPR) models using various regression analyses to predict propylene (C3H6) adsorption capacity at various pressures in zeolites from a topologically diverse International Zeolite Association database. Based on univariate and multilinear regression analysis, the accessible volume and largest cavity diameter are the most crucial factors determining C3H6 uptake at high and low pressures, respectively. An artificial neural network (ANN) model with five structural descriptors is sufficient to predict C3H6 uptake at high pressures. For combined pressures, the prediction of an ANN model with pore size distribution is pleasing. The isosteric heat of adsorption (Q st) has a significant impact on the improvement of the prediction of low-pressure gas adsorption, which finely classifies zeolites into high or low C3H6 adsorbers. The conjunction of high-throughput screening and QSPR models contributes to being able to prescreen the database rapidly and accurately for top performers and perform further detailed and time-consuming computational-intensive molecular simulations on these candidates for other gas adsorption applications.
© 2022 The Authors. Published by American Chemical Society.

Entities:  

Year:  2022        PMID: 36188274      PMCID: PMC9520561          DOI: 10.1021/acsomega.2c02779

Source DB:  PubMed          Journal:  ACS Omega        ISSN: 2470-1343


Introduction

Nanoporous materials are defined as materials with interpenetrating channels and pore sizes less than 100 nm which is often comparable to the size of a molecule.[1] Nanoporous materials composed of countless molecular building blocks in their synthesis, such as zeolites, porous carbons, metal–organic frameworks (MOFs), zeolitic imidazolate frameworks (ZIFs), and covalent organic frameworks (COFs), exhibit superior chemical and geometrical tunability, a diverse range of surface areas, pore surfaces, and void fractions, which perceive them as the next-generation technology.[2] A wide range of properties have successfully promoted diverse applications of nanoporous materials including, but not limited to gas storage, separation, catalysis, drug delivery, and sensing.[3] In the realm of nanoporous materials, it is always expected that an optimal material is tailored to a specific application. In recent years, an exponential increase in published lab-synthesized and computer-generated hypothetical nanoporous materials has provided us with a library of tens of thousands of potentially interesting new materials.[4] These new materials provide an ideal platform for understanding thoroughly how to tailor-make an optimal material for a given application. High-throughput computational screening techniques using brute-force experimental and simulated methods could generate the thermodynamic data needed to predict the performance of these materials for specific applications.[5,6] It is impossible to synthesize and characterize materials sequentially for identifying promising candidates by traditional extensive trial and error experimental methods with the advent and development of large databases due to time-consuming and expensive characteristics. To accelerate high-throughput screening, several outstanding theoretical computing tools have been widely adopted for characterization of novel materials over the last few years, for instance, first principles (ab initio) methods, density functional theory, and molecular simulations (Monte Carlo and molecular dynamics simulations).[7] In particular, grand canonical Monte Carlo (GCMC) simulation is an excellent example for adsorption studies among them, and its adsorption capacities are in good agreement with experimental results for many systems, which have been confirmed in prior work.[8] So far, the GCMC simulation methods have been extensively applied to methane storage,[9] hydrogen storage,[10] carbon dioxide (CO2) capture,[11] ethanol purification,[12] propylene/propane (C3H6/C3H8) separation,[13] and other aspects[14−16] by many groups. Additionally, preliminary structure–property relationships have been revealed. Undoubtedly, the time, cost, and human effort required for the characterization of material properties using GCMC simulation are greatly reduced compared to experimental methods. Although high-throughput screening studies based on GCMC simulation have been helpful in guiding experimental synthesis, this brute-force approach is limited to an almost unlimited number of new structures due to the required expensive computational costs. At present, the lack of efficient computational tools has gradually become a bottleneck in the rapid development of novel materials; consequently, the development of alternative screening methodologies is urgent.[2] Machine learning methods such as decision tree, support vector machine, and neural networks have become a powerful tool to prescreen high-performing materials and accelerate large-scale simulation in the material field. It saves a lot of time to perform further detailed and time-consuming calculations only on candidates prescreened by machine learning. Explicitly, machine learning methods devise complex models to produce reliable predictions about unknown data through learning from relationships in the dataset provided, which makes the screening of countless nanoporous materials practicable. Quantitative structure–property relationship (QSPR) models[17,18] trained by data-driven machine learning methods can systematically correlate structural features of materials (referred to as descriptors) to their functional properties in quantitative terms, which would be expected to play a crucial role in material screening. For nanoporous materials, one of the most desirable properties to predict is the adsorption capacity of guest molecules at the required temperature and pressure.[19] Naturally, the morphology of the pores described by various descriptors is essential for the adsorption behavior of guest molecules, where adsorbates are located and interact with the material surface.[20] The selection of highly predictive descriptors for determining adsorption capacity is prominent. Standard structural descriptors for pore morphology, such as mass density, surface area, void fraction, largest cavity diameter, and others, have been used to construct feature vectors for systems under certain thermodynamic conditions and provided satisfactory results. For instance, Durá et al.[21] innovatively made a reasonable fitting for CO2 adsorption capacity in porous carbons through a simple regression approach derived from microporous and mesoporous volumes. Subsequently, artificial neural network (ANN) methods using microporous and mesoporous volumes and Brunauer–Emmett–Teller surface area were used to predict CO2 uptake,[22] nitrogen (N2) uptake under ambient conditions, and CO2/N2 selectivity[23,24] of porous carbons. Fernandez et al.[31] accurately predicted methane (CH4) uptake of MOFs at 100 bar based only on the dominant pore diameter, the maximum pore diameter, the void fraction, the gravimetric surface area, the volumetric surface area, and the framework density. Furthermore, results for N2,[25] CO2 working capacities, and the CO2/CH4 selectivity[26] of MOFs were also studied using machine learning algorithms and standard structural descriptors. Similarly, simple textural descriptors were regarded as the MOF fingerprints to predict hydrogen (H2) adsorption uptake and CO2/H2 selectivity.[27] Recently, Lin et al.[28] systematically screened hypothetical pure-silica zeolites and identified 230 pre-eminent zeolites for effective removal of linear siloxanes and derivatives using a random forest method based on simple structural descriptors. Analogously, high-performing zeolites for anion removal from water were identified.[29] Guest molecules occupy almost the entire void space of nanoporous materials at high pressures, so these descriptors capturing global porosity characteristics are often popular. Although the prediction of high-pressure gas adsorption performance is encouraging, there is no clear principle guiding the selection of appropriate descriptors, especially for low-pressure gas adsorption with poor predictive performance. At low pressures, guest molecules are usually adsorbed in the strongly binding regions of the material’s pore, which cannot be captured well by simple structural descriptors.[30] To address this problem, some specific descriptors have been continuously developed by researchers for obtaining a universal model for adsorption behavior prediction at different pressures. A novel atomic property weighted radial distribution function descriptor accounting for the topological diversity was tailored by Fernandez et al.[25,31,32] and combined with traditional structural descriptors for the prediction of CH4, CO2, and H2 uptakes at low pressures with pleasing results. Lee and co-workers[2,33,34] developed a new descriptor for nanoporous materials by using topological data analysis to quantify similarity of pore structures and successfully predicted CH4 uptake at low pressures. A vectorized persistence diagram for topology analysis was also expected to be applied to the screening of various materials.[19,35] Although detailed pore structure information is captured by a topological descriptor, it is incapable of reflecting the relative proportion of pores with a special pore size, let alone the guest–host interaction. Accordingly, the Voronoi energy descriptor[36] takes into account both geometrical structural information and the energetics information, to be highly predictive of xenon/krypton (Xe/Kr) separation performance. Later, a histogram of the guest–host energy was regarded as the feature for machine learning, and the predicted gas adsorption capacities in MOFs were in good agreement with GCMC simulations.[8,10] Fanourgakis et al.[37] proposed to treat the probabilities of a set of different probe atoms adsorbed by materials as new descriptors for fast screening of large databases. Recently, heat of adsorption and Henry’s coefficient of special adsorbates were combined with traditional structural descriptors for gas adsorption and separation study.[38,39] For nanoporous materials with a wide variety of elements, especially MOFs, chemical descriptors considering types and contents of chemical elements were used to predict gas adsorption performance at low pressures.[40−42] In spite of the continuous generation of new descriptors, the selection of appropriate descriptors for a particular application remains an open scientific issue. As an essential component of various household plastic products, C3H6 is obtained by an energy-intensive cryogenic distillation process.[43] The development of alternative separation technologies with low energy consumption is of great value. A physical adsorption process, especially pressure-swing adsorption (PSA) technology with high gas purity, is a promising choice.[44] In the PSA process, guest molecules are adsorbed at high pressures and desorbed at low pressures. The selection of high-performance adsorbents is the key to achieving an efficient separation process. Various porous materials have been reported for the PSA process so far, such as zeolites,[45,46] MOFs,[47] and ZIFs.[48] Considering the uniform system of pores, high porosity, and excellent thermal and chemical stability, zeolites have been proven to be promising for gas adsorption.[49] In this work, we present a comprehensive QSPR analysis of the database of 232 zeolites. Correlations between various descriptors and C3H6 adsorption capacity (NC) at pressures of 5,065, 1,013, 303.9, 202.6, 101.3, and 50.65 kPa, at 300 K, were determined using multilinear regression analysis, quadratic regression analysis, and ANN models. In addition, pore size distribution (PSD) and isosteric heat of adsorption (Qst) were introduced to predict NC at low pressures, and the prediction performance was further evaluated by receiver-operator-curve (ROC) analysis. These QSPR models allowed for the accurate identification of high-performing zeolites, and rapid material prescreening significantly reduced the number of computational-intensive GCMC simulations. Finally, we described the relative importance of various descriptors determining NC at different pressures, which provided insights into the understanding of structural performance relationships at the atomic level.

Models and Methods

Molecular Models

First, 232 ordered pure silicon zeolites (Si/O = 1:2) considered in this study were obtained from the International Zeolite Association (IZA) database. The zeolite framework types were generated using a library of 49 composite building units. The framework atoms were described by Lennard-Jones (LJ) and electrostatic potentials.[50,51]where ε and σ are the well depth and collision diameter, respectively, r is the distance between atoms i and j, q is the charge of atom i, and ε0 = 8.8542 × 10–12 C2 N–1 m–2 is the permittivity of vacuum. The adsorbate C3H6 was represented by a united-atom model with CH as a single interaction site. The LJ potential parameters and atomic charges of zeolites and C3H6 were adopted from the COMPASS force field, which fairly well predicted gas adsorption in a wide variety of zeolites.[52,53] The Lorentz-Berthelot combining rules were employed to calculate the cross LJ parameters.

Simulation Methods

Before adsorption simulation, C3H6 molecules and zeolite frameworks were geometrically optimized to obtain the configurations with stable structures, and the optimized structures with minimum energy were used in the subsequent adsorption simulation process. GCMC[54] simulation in the sorption module of Materials Studio 2018[55] was conducted to evaluate the adsorption performance of 232 zeolites toward C3H6. GCMC is a statistical mechanical approach, in which the adsorption process is explored depending on random sampling and probabilistic interpretation in the adsorbent framework. The adsorption was assumed to be conducted at 300 K and 6 different pressure levels (5,065, 1,013, 303.9, 202.6, 101.3, and 50.65 kPa). During simulation, the C3H6 molecule was considered as an ideal gas with negligible interactions, whose fugacity was equal to pressure.[56] Zeolite atoms were assumed to be rigid, and their positions remained constant. A spherical cut-off of 15.5 Å was used to calculate the LJ interactions, whereas the electrostatic interactions were calculated using the Ewald summation method. The cell lengths of each zeolite were expanded to at least 31 Å (twice the cut-off distance) along all three dimensions, and the periodic boundary conditions were exerted. In each zeolite, the GCMC simulation was run for 1.1 × 106 cycles with 1 × 105 for equilibration and the remaining for ensemble average. Each cycle consisted of n trial moves (n: the number of adsorbate molecules), including translation, rotation, regrowth, and swap. To verify the suitability of the COMPASS force field and the above assumptions used in this study, Figure shows the adsorption isotherms of pure C3H6 in CHA and STT and C3H8 in MFI and DDR, respectively. Good agreement is observed between the simulation and open published data,[46,57−60] which suggests the reliability of the force field selected. Besides GCMC simulation for the adsorption of pure C3H6, the Qst of C3H6 at infinite dilution was estimated. For this case, one adsorbate molecule (C3H6) was added into a zeolite and simulation was conducted in a canonical ensemble.
Figure 1

Comparison between simulated and open published adsorption isotherms of (a) pure C3H6 and (b) pure C3H8 in zeolites.

Comparison between simulated and open published adsorption isotherms of (a) pure C3H6 and (b) pure C3H8 in zeolites.

Descriptor Selection

In this work, in addition to C3H6 adsorption data simulated, five general 1D structural descriptors including the largest cavity diameter (LCD), pore-limiting diameter (PLD), accessible surface area (ASA), accessible volume (AV), density (ρ) and a 2D structural descriptor of PSD, and an energy descriptor of Qst were selected in our QSPR analysis. As described in Figure S1, based on principles of moderate correlation between each descriptor and NC and no strong correlation among descriptors, we selected the above five general 1D structure descriptors from the initial seven descriptors to quantitatively describe the structure of zeolites. The LCD corresponds to the maximum of the PSD, and the PLD refers to the largest characteristic guest molecule size for which there is a nonzero AV. LCD and PLD determine whether a specific guest molecule can enter the zeolite window; furthermore, ASA and AV reflect the void space that guest molecules can reach. All void space of a zeolite is reflected by ρ indirectly. The PSD provides information about the fraction of void space that is occupied by pores of a certain size, and the Qst value reflects the energy information related to the adsorption process. These diverse descriptors with strong structure–performance relationships between guest molecules and zeolites reveal the features of zeolites from various aspects, which could be applied in accuracy prediction of machine learning.[17,26,38,61] These descriptors are relatively easy to measure, and they can be used directly to guide the synthesis and application of zeolites. In this work, ρ was obtained directly through zeolite crystalline structure, and LCD, PLD, ASA, AV, and PSD were determined by the Zeo++ program, in which the radius of a probe (1.2 Å)[28] was used for ASA and AV, and a bin size of 0.1 Å[62] was used to obtain PSD histograms. The Qst value was calculated by the NVT-Monte Carlo simulation.

Multilinear and Quadratic Regression Models

Multilinear regression analysis is performed when the relationship between multiple descriptors and NC of 232 zeolites is assumed to be linear. The general form of the multilinear regression model is as follows: The quadratic regression is performed through adding binary interaction terms to the multilinear regression model. The general form of the quadratic regression model is as follows:where y and x refer to the target value and input value of sample k; β and β are binary and linear coefficients, respectively; β0 and γ are the constant and error term, respectively.

Neural Network Models

To clarify the role of structural descriptors and an energy descriptor on adsorption capacity, the above descriptors were chosen as neurons, imported to the input layer, and passed in an orderly manner into the hidden layers and output layer. The information obtained from the ANN model was finally stored and transferred via a feed-forward process to predict NC of 232 zeolites. As a typical machine learning algorithm, the ANN model was trained countless times by comparing the simulated and calculated output values and then adjusting the weights and thresholds to decrease the error, where the mean squared error was used as the cost function. The optimization of the cost function was carefully monitored to determine the optimal number of epochs so that overfitting to the training data was avoided. Figure exhibits the architecture of the ANN model used in this paper and the ANN model was performed on a MATLAB R2020a platform. ANN models with different descriptors as nodes, two hidden layers, and seven nodes for each layer were built to predict the GCMC-simulated NC of zeolites.
Figure 2

Architecture of an ANN model.

Architecture of an ANN model.

Performance Criteria

During the training process, the data sets were primary randomly divided into two parts, 80% of which was used for training, and the remaining 20% was used to test the generalization ability of the model. Moreover, to reduce input data set partition uncertainties and minimize overfitting issues, a fivefold cross validation (CV) approach was also employed for the ANN model, and the average of the results of the five calculations was taken as the model performance. All input and output data sets were preprocessed by a normalization method to speed up the training process. The quality of the training and test results was evaluated by the determinate coefficient (R2), the root mean square error (RMSE), and the mean absolute error (MAE) as follows:where n, y, ŷ, and y̅ refer to the number of samples, target value, predicted value, and average target value, respectively.

Results and Discussion

Univariate Analysis

A comprehensive understanding of the relationship between the structural and energy descriptors and performances of the zeolites used for C3H6 adsorption would be conductive to the identification of potential materials. We initially performed a simple univariate analysis where we looked for correlations between a single descriptor and the simulated NC at 300 K, 5,065 and 50.65 kPa, in which the top 20% of the data was classified as zeolites with high adsorption capacity, and the remaining 80% as zeolites with low adsorption capacity. Figures a–d, and S2a–d show that four structural descriptors (LCD, PLD, ASA, and AV) are positively corrected with NC. When LCD is less than 3.75 Å, adsorption of guest molecules is impeded due to unfavorable potential overlap with the framework, and NC could be considered to be almost zero. With the gradual increase of LCD, the large void space makes the adsorption capacity increase overall at 5,065 and 50.65 kPa. ASA and AV are also strongly correlated with NC at 5,065 and 50.65 kPa. The relationship between NC and ρ also shows a similar linear trend in Figures e and S2e, but NC gradually decreases as ρ increases. There is no evident trend in the relationship between NC and Qst, which shows that Qst cannot be a good linear explanation of NC. The interpretation of NC by a single descriptor at 50.65 kPa is all reduced compared to that of 5,065 kPa even though similar linear trends are observed. From the perspective of univariate analysis, the correlation of simple structure descriptors is AV > LCD > ρ > ASA > PLD at 5,065 kPa (high pressure), which is different from that at 50.65 kPa (low pressure) with LCD > AV > ρ > ASA > PLD. It can also be observed that NC is not uniquely determined by a single descriptor. Overall, a single structural or energy descriptor can only determine the individual performance relationship and cannot explain synergies between various descriptors that might contribute significantly to the performance of zeolites.
Figure 3

Relationships between (a) NC ∼ LCD, (b) NC ∼ PLD, (c) NC ∼ ASA, (d) NC ∼ AV, (e) NC ∼ ρ, and (f) NC ∼ Qst at 300 K and 5,065 kPa.

Relationships between (a) NC ∼ LCD, (b) NC ∼ PLD, (c) NC ∼ ASA, (d) NC ∼ AV, (e) NC ∼ ρ, and (f) NC ∼ Qst at 300 K and 5,065 kPa. A heat map of the correlation coefficient matrix for various descriptors and NC at various pressures is shown in Figure . Upon observing the correlations among the set of descriptors, a strong positive correlation is observed between LCD and AV, with the Pearson correlation coefficient value of r equal to 0.75; moreover, a strong negative correlation exists between ρ and ASA or AV, with an absolute value of r greater than 0.71. However, no significant correlation is observed between Qst and structural descriptors. For NC, three structural descriptors (AV, LCD, and ρ) are strongly related to it with the absolute value of r greater than 0.62, and two structural descriptors (ASA and PLD) have moderate correlation with r greater than 0.37. Nevertheless, there is a little correlation between the energy descriptor (Qst) and NC with an absolute value of r lower than 0.10. It can also be clearly observed from Figure that with the decrease of adsorption pressure, the contribution of a single descriptor to NC gradually decreases.
Figure 4

Pearson correlation matrix for all the set of descriptors and NC at various pressures (the color bars represent the size of the Pearson correlation coefficients).

Pearson correlation matrix for all the set of descriptors and NC at various pressures (the color bars represent the size of the Pearson correlation coefficients). Due to the limitations of univariate analysis in identifying synergies, five structural descriptors (LCD, PLD, ASA, AV, and ρ) with strong linear correlation with NC in univariate analysis were used for multilinear regression and quadratic regression analysis. The predicted results of the two models for NC at different pressures are included in the Supporting Information. Although the prediction accuracy of the two prediction models is not excellent, these two models confirm that AV plays the most critical role in NC at 5,065 kPa. Furthermore, multilinear regression analysis further confirms that LCD is the most important descriptor for determining NC at 50.65 kPa.

ANN Models

Five Structural Descriptors

Machine learning has been widely used to predict the adsorption performance of materials. In this section, ANN models with five input nodes (LCD, PLD, ASA, AV, and ρ), two hidden layers, and seven nodes for each layer were built to predict the GCMC-simulated NC of zeolites. During the training process, both random validation and fivefold CV were implemented, and the results of random validation are shown in the Supporting Information. At 300 K and 5,065 kPa, the predicted outcomes of the ANN model are in good agreement with the GCMC-simulated NC with R2 = 0.910 for fivefold CV, as shown in Figure , indicating that these structural descriptors correlate well with NC at high adsorption pressures. The explanation for this satisfactory situation is that adsorption is mainly a molecular stacking mechanism at high pressures, and these five descriptors describing the global porosity characteristics of the zeolites could fully determine NC of zeolites. To examine this more closely, we select the actual top 20% (46) zeolites in the database as determined by simulations, which are the subset to the right of the solid red line. Likewise, the top 20% zeolites as predicted by the ANN model are also picked up, which are the zeolites above the dashed red line in Figure . The intersection part produces top performing zeolites that are recovered by the model. For getting the real top 20% zeolites with excellent NC, one only needs to identify the top 31.0% zeolites predicted by the ANN model for further research, which greatly shortens the time to find optimal zeolites for C3H6 adsorption. A significant acceleration of the screening process for identifying high-performing candidates from millions of materials is also made possible, whereas it is extremely time-consuming by means of molecular simulations alone.
Figure 5

Comparison of ANN model predictions using five structural descriptors (LCD, PLD, ASA, AV, and ρ) with NC obtained by GCMC simulation (fivefold CV approach) at 300 K and 5,065 kPa.

Comparison of ANN model predictions using five structural descriptors (LCD, PLD, ASA, AV, and ρ) with NC obtained by GCMC simulation (fivefold CV approach) at 300 K and 5,065 kPa. As demonstrated in Figure a–e, the significantly enhanced prediction performance does not change the fact that it deteriorates as pressure decreases in comparison with multilinear regressing and quadratic regression models. With the gradual decrease of adsorption pressure, the external force is no longer large enough for the guest molecules to fill the entire void space of zeolites. At low pressures, guest molecules are usually adsorbed in a portion of the void regions with strong binding sites that are not sufficiently captured by these general global descriptors, causing undesirable performance at 50.65 kPa in Figure e (R2 = 0.740). Accordingly, the search for a specific descriptor to enhance the prediction accuracy of the model for low-pressure gas adsorption seems to be a must. Considering the combined pressures (see Figure a), the R2 (0.887) of the ANN model increases greatly compared with the previous models (R2 = 0.722 for a multilinear regression model, and R2 = 0.803 for a quadratic regression model), which indicates that strong nonlinear capacity of the ANN model makes the P descriptor affecting NC work at various pressures. Additionally, the fine predictive performance of the ANN model is further demonstrated by the residual histogram with a standard normal distribution shown in Figure b.
Figure 6

Comparison of ANN model predictions using five structural descriptors (LCD, PLD, ASA, AV, and ρ) with NC obtained by GCMC simulation (fivefold CV approach) at 300 K and (a) 1,013 kPa, (b) 303.9 kPa, (c) 202.6 kPa, (d) 101.3 kPa, and (e) 50.65 kPa.

Figure 7

(a) Comparison of ANN model predictions using five structural descriptors (LCD, PLD, ASA, AV, and ρ) and P with combined NC obtained by GCMC simulation (fivefold CV approach) at six different pressures; (b) residual statistical histogram for all sets.

Comparison of ANN model predictions using five structural descriptors (LCD, PLD, ASA, AV, and ρ) with NC obtained by GCMC simulation (fivefold CV approach) at 300 K and (a) 1,013 kPa, (b) 303.9 kPa, (c) 202.6 kPa, (d) 101.3 kPa, and (e) 50.65 kPa. (a) Comparison of ANN model predictions using five structural descriptors (LCD, PLD, ASA, AV, and ρ) and P with combined NC obtained by GCMC simulation (fivefold CV approach) at six different pressures; (b) residual statistical histogram for all sets.

Six Descriptors with PSD

Simple structural descriptors provide global porosity characteristics of the zeolites, which are not enough to predict low-pressure gas adsorption. Consequently, one tries to explore descriptors with more implicit information, and PSD is a good choice, which refers to the rate of change of pore volume with pore size. PSD describes the pore morphology with upper and lower bounds of pore diameters and their relative proportions, and it is extremely sensitive to small changes in the pore diameter, while it could not embody subtle changes in pore surface texture and other features. Notably, we just need to calculate the PSD histogram of the pore diameter range of 3∼6 Å as the powerful attraction potential of C3H6 molecules and O atoms in frameworks emerges in this range, as demonstrated in Figure , and corresponding LJ parameters are shown in Table S4. Additionally, it is also found that expansion of the pore diameter range does not enhance the prediction of the ANN model. The implementation of this strategy significantly reduces the complexity of the input data while minimizing the impact on the accuracy.
Figure 8

LJ potential surfaces for C3H6···O as a function of distance, where the O atom is assumed to be a concentration of LJ potential energy for zeolites.

LJ potential surfaces for C3H6···O as a function of distance, where the O atom is assumed to be a concentration of LJ potential energy for zeolites. The five structural descriptors mentioned above and the present PSD serve as inputs of ANN models to make predictions to NC at diverse pressures. As shown in Figure S9, the addition of PSD does not improve the prediction accuracy of the model compared with the ANN model using five simple structural descriptors alone, which shows that the supplement of PSD has little effect on the generalization of the model at diverse pressures. Surprisingly, for combined pressures, the R2 value of the ANN model with six structural descriptors and P descriptor is 0.967 (see Figure a) with an enhancement of 9.0% compared with no PSD, which is a pleasant outcome. Moreover, the residual histogram with normal distribution shown in Figure b also verifies the excellent prediction performance. It is promising that the PSD with many data points may be well matched with a lot of data sets to obtain good fitting accuracy.
Figure 9

(a) Comparison of ANN model predictions using six structural descriptors (LCD, PLD, ASA, AV, ρ, and PSD) and P with combined NC obtained by GCMC simulation (fivefold CV approach) at six different pressures; (b) residual statistical histogram for all sets.

(a) Comparison of ANN model predictions using six structural descriptors (LCD, PLD, ASA, AV, ρ, and PSD) and P with combined NC obtained by GCMC simulation (fivefold CV approach) at six different pressures; (b) residual statistical histogram for all sets.

Six Descriptors with Qst

The ideal zeolite for C3H6 adsorption has a high uptake at adsorption pressure, as well as a low uptake at desorption pressure. To date, the prediction of high-pressure gas adsorption has been very pleasing; however, the desired expectations have not been achieved at low pressures. Consequently, we continued to search for the descriptor that could explain the adsorption mechanism at low pressures. Given the regional features of gas adsorption at low pressures, the chemistry of pores is likely to be a dominating factor in determining the low-pressure adsorption of C3H6 and likely other gases, which is also Burner’s opinion.[63] As an essential parameter to characterize the heterogeneity of the adsorption surface, Qst could disclose essential information about chemical interactions between the adsorbate and adsorbent, and C3H6 molecules with an excessively large Qst value in a zeolite indicate very strong interaction with the framework, which is a complement to PSD. Five general structural descriptors (LCD, PLD, ASA, AV, and ρ) together with an energy descriptor (Qst) are used as inputs to train the ANN model. As shown in Figure S10a, the accuracy of the model (R2 = 0.929) is improved by 2.1% with the addition of Qst at 300 K and 5,065 kPa, which means that the five structural descriptors alone are sufficient to predict NC if concise input data are required at high pressures. The proportion of void space filled by guest molecules decreases with the decline of pressure, and a gradual improvement in R2 (1.3% ∼ 5.1%) is observed, compared to the case where only five structural descriptors are used (see Figure S10b–e). At 50.65 kPa, the model’s R2 (0.809) is enhanced by 9.3% in Figure , which is quite remarkable in contrast with no energy descriptor (Qst), and this further confirms the indispensability of Qst for low-pressure gas adsorption prediction. The actual top 20% and predicted top 20% zeolites are all selected to further observe the deviation predicted for the ANN model from Figure . Predicted top 41.8% zeolites are necessary to be identified for subsequent research for the purpose of gaining the real top 20% zeolites, which is satisfactory. Accordingly, the Qst value does accurately capture the local porosity characteristics of gas adsorption at low pressures. For combined pressures, while the prediction performance (R2 = 0.902) of the ANN model containing Qst in Figure S11a is inferior to the ANN model with PSD (R2 = 0.967), the superiority of it is not negligible compared with the ANN model with five general structural descriptors (R2 = 0.887). Furthermore, the residual histogram shown in Figure S11b is a standard normal distribution. For clarity, R2, RMSE, and MAE values corresponded to 232 zeolites for three different ANN models using the fivefold CV approach are summarized in Table , and the contribution of various descriptors to the predictive performance of ANN models is quite clear. Additionally, R2 on the training and test sets of three ANN models using the random validation approach at various pressures is summarized in Table S5 in the Supporting Information.
Figure 10

Comparison of ANN model predictions using five structural descriptors (LCD, PLD, ASA, AV, and ρ) and an energy descriptor (Qst) with NC obtained by GCMC simulation (fivefold CV approach) at 300 K and 50.65 kPa.

Table 1

R2, RMSE, and MAE Values Corresponded to 232 Zeolites of Three ANN Models (Fivefold CV Approach) at Various Pressures with S Representing the General Structural Descriptor

 ANN (5S)
ANN (5S + PSD)
ANN (5S + Qst)
pressure (kPa)R2RMSEMAER2RMSEMAER2RMSEMAE
5,0650.9100.3420.2560.9020.3560.2760.9290.3040.236
1,0130.9070.3390.2650.8980.3540.2780.9190.3160.241
303.90.8560.3890.3090.8610.3820.3130.8780.3580.283
202.60.8230.4080.3230.8150.4180.3280.8510.3750.293
101.30.7880.4120.3270.7710.4290.3510.8280.3720.291
50.650.7400.4220.3370.7340.4280.3580.8090.3620.293
combined0.8870.6440.2650.9670.3480.1320.9020.6010.246
Comparison of ANN model predictions using five structural descriptors (LCD, PLD, ASA, AV, and ρ) and an energy descriptor (Qst) with NC obtained by GCMC simulation (fivefold CV approach) at 300 K and 50.65 kPa. A comparison of the predicted labels from the ANN model and the labels from the GCMC simulation at different pressures is depicted using an ROC plot, which illustrates the ability of the model to correctly label zeolites. The ROC depicts relative trade-offs between true positives (benefits) and false positives (costs). The perfect prediction method would yield a point in the upper left corner of the ROC plot, with a 100% sensitivity (no false negatives) and 100% specificity (no false positives) and an area under the curve (AUC) of 1. In order to examine the fine predicted performance of the ANN model with the five structural descriptors and Qst on NC at low pressures closely, we initially selected a cut-off criterion of 20% in light of simulated NC and then labeled zeolites as “positive” or “negative” depending on whether the predicted NC was in the top 20% or in the remaining 80% on the 232 zeolites using random validation and fivefold CV approaches at 300 K and 50.65 kPa, as shown in Figure . The ANN model displays outstanding performance, with an AUC of 0.955 for random validation and 0.951 for fivefold CV at 300 K and 50.65 kPa. Taking into account the possible impact of the criteria for dividing labels, AUC values for random validation and fivefold CV at 50.65 kPa were calculated according to several classification criteria. As illustrated in Figure , the criteria do not affect the fact that the AUC values exceed 0.9 for these two validation methods. This excellent performance certifies the superiority of the ANN model in classifying zeolites into high or low C3H6 adsorbers at 50.65 kPa based on five structural descriptors (LCD, PLD, ASA, AV, and ρ) and an energy descriptor (Qst).
Figure 11

Receiver-operator-curve (ROC) plots that show how well the ANN model labels zeolites as “positive” or “negative” depending on whether the NC at 300 K and 50.65 kPa is in the top 20% or in the remaining 80% for (a) random validation and (b) fivefold CV.

Figure 12

Relationships between AUC values and proportion of “positive” samples for random validation and fivefold CV at 300 K and 50.65 kPa.

Receiver-operator-curve (ROC) plots that show how well the ANN model labels zeolites as “positive” or “negative” depending on whether the NC at 300 K and 50.65 kPa is in the top 20% or in the remaining 80% for (a) random validation and (b) fivefold CV. Relationships between AUC values and proportion of “positive” samples for random validation and fivefold CV at 300 K and 50.65 kPa. Subsequently, the influence of six descriptors (LCD, PLD, ASA, AV, ρ, and Qst) on NC in the ANN model at various pressures was further explored. A common feature importance measure, namely, permutation feature importance (PFI),[64] was calculated. To get the PFI of a descriptor such as Qst, we randomly permute Qst and use it to predict NC, while keeping all other input features nonpermuted, resulting in reduced prediction accuracy. The value of PFI for Qst is given by the ratio in error measures between the original accuracy and the accuracy resulting from having that Qst randomly permuted.[65] The above steps are repeated 5 times, and the average value is taken as the final PFI value of Qst to avoid the influence of the uncertainty of random permutation. All input features have been considered for PFI with the error measure being MAE. The PFI values of each descriptor at different pressures are depicted in Figure . The PFI values of the ANN model are of an order of AV > ASA > Qst > LCD > ρ > PLD at 300 K and 5,065 kPa. Undoubtingly, AV is the most crucial porosity characteristic determining NC at high pressures, and this is also revealed by the previous multilinear regression and quadratic regression models, which confirms the mechanism of molecular filling again for high-pressure gas adsorption. With the decline of pressure, the importance of AV decreases, while the significance of Qst and LCD becomes increasingly prominent. The order of PFI values of the ANN model is Qst > LCD > AV > PLD > ρ > ASA at 300 K and 50.65 kPa. When the pressure is not high enough to squeeze the guest molecules into channels, LCD becomes the vital element preventing the guest molecules from entering the window. In this situation, guest molecules are no longer adsorbed into the void space in a stacked state but selectively adsorbed in the regions with strong interaction, and Qst just reflects this local porosity characteristic. In short, AV is the most critical feature at high pressures, while Qst and LCD become the two most meaningful features at low pressures, which coincides with the comments in the literature.[19,38]
Figure 13

PFI maps of different descriptors with NC at different pressures based on ANN prediction, where PFI = MAEperm/MAEorig.

PFI maps of different descriptors with NC at different pressures based on ANN prediction, where PFI = MAEperm/MAEorig. The prediction performance of the ANN model after the addition of Qst is indeed enhanced for low-pressure gas adsorption, yet it is worth exploring carefully which factor is responsible for this improvement, the change in freedom of input data or the Qst itself. For clarifying this confusion, ASA with a minimum PFI value was deleted while introducing Qst to keep the freedom of inputs unchanged for ANN model prediction at 300 K and 50.65 kPa. As illustrated in Figure , in the absence of ASA, the prediction performance of the ANN model is still pleasing (R2 = 0.795), and the prediction performance decreases slightly by 1.7% compared with the ANN model with six descriptors, while the performance is enhanced by 7.4% in contrast with no energy descriptor (Qst). Furthermore, we only need to identify the predicted top 49.6% of zeolites for further research to get the real top 20% zeolites, demonstrating this ANN model’s slightly strong impact on zeolites with low NC. The above analysis proves that the enhancement of prediction performance for low-pressure gas adsorption is due to Qst rather than the increase of freedom of input data, and ASA is not completely ineffective. In light of the simplicity and non-negligible contribution of ASA for low-pressure gas adsorption, the ANN model with five structural descriptors (LCD, PLD, ASA, AV, and ρ) and an energy descriptor (Qst) is still considered suitable for predicting NC at low pressures.
Figure 14

Comparison of ANN model predictions using four structural descriptors (LCD, PLD, AV, and ρ) and an energy descriptor (Qst) with NC obtained by GCMC simulation (fivefold CV approach) at 300 K and 50.65 kPa.

Comparison of ANN model predictions using four structural descriptors (LCD, PLD, AV, and ρ) and an energy descriptor (Qst) with NC obtained by GCMC simulation (fivefold CV approach) at 300 K and 50.65 kPa.

Conclusions

A comprehensive QSPR analysis for NC in 232 ordered pure silicon zeolites (Si/O = 1:2) at different pressures was investigated with the aid of a variety of regression models. Univariate analysis and multilinear regression analysis consistently illustrated that AV and LCD were the most significant factors determining NC at 5,065 and 50.65 kPa, respectively. Due to the involvement of complex binary interaction terms, the prediction performance of the quadratic regression model was significantly superior to multilinear regression, with the enhancements of 10.1% at 5,065 kPa and 31.2% at 50.65 kPa. For high-pressure gas adsorption with a molecular stacking mechanism, the predicted NC of the ANN model with only five structural descriptors (LCD, PLD, ASA, AV, and ρ) was in good agreement with the GCMC-simulated NC. A pleasant prediction (R2 = 0.967) was obtained with the introduction of PSD for combined pressures. At low pressures, the prediction performance of the ANN model increased by 9.3% with the addition of Qst reflecting chemical interactions between the adsorbate and adsorbent. ROC analysis certifies the superiority of the ANN model with Qst in classifying zeolites into high or low C3H6 adsorbers at 50.65 kPa. Based on the relative weight analysis, AV is the most critical feature at high pressures, while LCD and Qst become the two most meaningful features at low pressures. Moreover, the enhancement of model performance for low-pressure gas adsorption is due to the addition of Qst rather than the increase of freedom of input data. Our comprehensive insights into QSPR analysis in this work will provide new ideas for understanding of structure–performance relationships at the atomic level and the design of high-performance zeolites for C3H6 adsorption.
  22 in total

1.  Prediction of Carbon Dioxide Adsorption via Deep Learning.

Authors:  Zihao Zhang; Jennifer A Schott; Miaomiao Liu; Hao Chen; Xiuyang Lu; Bobby G Sumpter; Jie Fu; Sheng Dai
Journal:  Angew Chem Int Ed Engl       Date:  2018-12-04       Impact factor: 15.336

2.  Rapid and Accurate Machine Learning Recognition of High Performing Metal Organic Frameworks for CO2 Capture.

Authors:  Michael Fernandez; Peter G Boyd; Thomas D Daff; Mohammad Zein Aghaji; Tom K Woo
Journal:  J Phys Chem Lett       Date:  2014-08-25       Impact factor: 6.475

3.  Optimizing nanoporous materials for gas storage.

Authors:  Cory M Simon; Jihan Kim; Li-Chiang Lin; Richard L Martin; Maciej Haranczyk; Berend Smit
Journal:  Phys Chem Chem Phys       Date:  2014-01-07       Impact factor: 3.676

4.  Machine Learning Prediction on Properties of Nanoporous Materials Utilizing Pore Geometry Barcodes.

Authors:  Xiangyu Zhang; Jing Cui; Kexin Zhang; Jiasheng Wu; Yongjin Lee
Journal:  J Chem Inf Model       Date:  2019-11-12       Impact factor: 4.956

5.  In silico screening of carbon-capture materials.

Authors:  Li-Chiang Lin; Adam H Berger; Richard L Martin; Jihan Kim; Joseph A Swisher; Kuldeep Jariwala; Chris H Rycroft; Abhoyjit S Bhown; Michael W Deem; Maciej Haranczyk; Berend Smit
Journal:  Nat Mater       Date:  2012-05-27       Impact factor: 43.841

6.  A Robust Machine Learning Algorithm for the Prediction of Methane Adsorption in Nanoporous Materials.

Authors:  George S Fanourgakis; Konstantinos Gkagkas; Emmanuel Tylianakis; Emmanuel Klontzas; George Froudakis
Journal:  J Phys Chem A       Date:  2019-07-02       Impact factor: 2.781

7.  High-Throughput Screening Approach for Nanoporous Materials Genome Using Topological Data Analysis: Application to Zeolites.

Authors:  Yongjin Lee; Senja D Barthel; Paweł Dłotko; Seyed Mohamad Moosavi; Kathryn Hess; Berend Smit
Journal:  J Chem Theory Comput       Date:  2018-07-30       Impact factor: 6.006

8.  Screening metal-organic frameworks for adsorption-driven osmotic heat engines via grand canonical Monte Carlo simulations and machine learning.

Authors:  Rui Long; Xiaoxiao Xia; Yanan Zhao; Song Li; Zhichun Liu; Wei Liu
Journal:  iScience       Date:  2020-12-09

9.  An AUC-based permutation variable importance measure for random forests.

Authors:  Silke Janitza; Carolin Strobl; Anne-Laure Boulesteix
Journal:  BMC Bioinformatics       Date:  2013-04-05       Impact factor: 3.169

10.  High-throughput gas separation by flexible metal-organic frameworks with fast gating and thermal management capabilities.

Authors:  Shotaro Hiraide; Yuta Sakanaka; Hiroshi Kajiro; Shogo Kawaguchi; Minoru T Miyahara; Hideki Tanaka
Journal:  Nat Commun       Date:  2020-08-03       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.