Li Zhao1, Qi Zhang1, Chang He1, Qinglin Chen1, Bing J Zhang1. 1. School of Materials Science and Engineering, Guangdong Engineering Center for Petrochemical Energy Conservation, The Key Laboratory of Low-carbon Chemistry & Energy Conservation of Guangdong Province, Sun Yat-sen University, Xiaoguwei Island, Panyu District, Guangzhou, 510006, P. R. China.
Abstract
This work is devoted to the development of quantitative structure-property relationship (QSPR) models using various regression analyses to predict propylene (C3H6) adsorption capacity at various pressures in zeolites from a topologically diverse International Zeolite Association database. Based on univariate and multilinear regression analysis, the accessible volume and largest cavity diameter are the most crucial factors determining C3H6 uptake at high and low pressures, respectively. An artificial neural network (ANN) model with five structural descriptors is sufficient to predict C3H6 uptake at high pressures. For combined pressures, the prediction of an ANN model with pore size distribution is pleasing. The isosteric heat of adsorption (Q st) has a significant impact on the improvement of the prediction of low-pressure gas adsorption, which finely classifies zeolites into high or low C3H6 adsorbers. The conjunction of high-throughput screening and QSPR models contributes to being able to prescreen the database rapidly and accurately for top performers and perform further detailed and time-consuming computational-intensive molecular simulations on these candidates for other gas adsorption applications.
This work is devoted to the development of quantitative structure-property relationship (QSPR) models using various regression analyses to predict propylene (C3H6) adsorption capacity at various pressures in zeolites from a topologically diverse International Zeolite Association database. Based on univariate and multilinear regression analysis, the accessible volume and largest cavity diameter are the most crucial factors determining C3H6 uptake at high and low pressures, respectively. An artificial neural network (ANN) model with five structural descriptors is sufficient to predict C3H6 uptake at high pressures. For combined pressures, the prediction of an ANN model with pore size distribution is pleasing. The isosteric heat of adsorption (Q st) has a significant impact on the improvement of the prediction of low-pressure gas adsorption, which finely classifies zeolites into high or low C3H6 adsorbers. The conjunction of high-throughput screening and QSPR models contributes to being able to prescreen the database rapidly and accurately for top performers and perform further detailed and time-consuming computational-intensive molecular simulations on these candidates for other gas adsorption applications.
Nanoporous materials are
defined as materials with interpenetrating
channels and pore sizes less than 100 nm which is often comparable
to the size of a molecule.[1] Nanoporous
materials composed of countless molecular building blocks in their
synthesis, such as zeolites, porous carbons, metal–organic
frameworks (MOFs), zeolitic imidazolate frameworks (ZIFs), and covalent
organic frameworks (COFs), exhibit superior chemical and geometrical
tunability, a diverse range of surface areas, pore surfaces, and void
fractions, which perceive them as the next-generation technology.[2] A wide range of properties have successfully
promoted diverse applications of nanoporous materials including, but
not limited to gas storage, separation, catalysis, drug delivery,
and sensing.[3] In the realm of nanoporous
materials, it is always expected that an optimal material is tailored
to a specific application. In recent years, an exponential increase
in published lab-synthesized and computer-generated hypothetical nanoporous
materials has provided us with a library of tens of thousands of potentially
interesting new materials.[4] These new materials
provide an ideal platform for understanding thoroughly how to tailor-make
an optimal material for a given application.High-throughput
computational screening techniques using brute-force
experimental and simulated methods could generate the thermodynamic
data needed to predict the performance of these materials for specific
applications.[5,6] It is impossible to synthesize
and characterize materials sequentially for identifying promising
candidates by traditional extensive trial and error experimental methods
with the advent and development of large databases due to time-consuming
and expensive characteristics. To accelerate high-throughput screening,
several outstanding theoretical computing tools have been widely adopted
for characterization of novel materials over the last few years, for
instance, first principles (ab initio) methods, density functional
theory, and molecular simulations (Monte Carlo and molecular dynamics
simulations).[7] In particular, grand canonical
Monte Carlo (GCMC) simulation is an excellent example for adsorption
studies among them, and its adsorption capacities are in good agreement
with experimental results for many systems, which have been confirmed
in prior work.[8] So far, the GCMC simulation
methods have been extensively applied to methane storage,[9] hydrogen storage,[10] carbon dioxide (CO2) capture,[11] ethanol purification,[12] propylene/propane
(C3H6/C3H8) separation,[13] and other aspects[14−16] by many groups. Additionally,
preliminary structure–property relationships have been revealed.
Undoubtedly, the time, cost, and human effort required for the characterization
of material properties using GCMC simulation are greatly reduced compared
to experimental methods. Although high-throughput screening studies
based on GCMC simulation have been helpful in guiding experimental
synthesis, this brute-force approach is limited to an almost unlimited
number of new structures due to the required expensive computational
costs. At present, the lack of efficient computational tools has gradually
become a bottleneck in the rapid development of novel materials; consequently,
the development of alternative screening methodologies is urgent.[2]Machine learning methods such as decision
tree, support vector
machine, and neural networks have become a powerful tool to prescreen
high-performing materials and accelerate large-scale simulation in
the material field. It saves a lot of time to perform further detailed
and time-consuming calculations only on candidates prescreened by
machine learning. Explicitly, machine learning methods devise complex
models to produce reliable predictions about unknown data through
learning from relationships in the dataset provided, which makes the
screening of countless nanoporous materials practicable. Quantitative
structure–property relationship (QSPR) models[17,18] trained by data-driven machine learning methods can systematically
correlate structural features of materials (referred to as descriptors)
to their functional properties in quantitative terms, which would
be expected to play a crucial role in material screening. For nanoporous
materials, one of the most desirable properties to predict is the
adsorption capacity of guest molecules at the required temperature
and pressure.[19] Naturally, the morphology
of the pores described by various descriptors is essential for the
adsorption behavior of guest molecules, where adsorbates are located
and interact with the material surface.[20] The selection of highly predictive descriptors for determining adsorption
capacity is prominent.Standard structural descriptors for pore
morphology, such as mass
density, surface area, void fraction, largest cavity diameter, and
others, have been used to construct feature vectors for systems under
certain thermodynamic conditions and provided satisfactory results.
For instance, Durá et al.[21] innovatively
made a reasonable fitting for CO2 adsorption capacity in
porous carbons through a simple regression approach derived from microporous
and mesoporous volumes. Subsequently, artificial neural network (ANN)
methods using microporous and mesoporous volumes and Brunauer–Emmett–Teller
surface area were used to predict CO2 uptake,[22] nitrogen (N2) uptake under ambient
conditions, and CO2/N2 selectivity[23,24] of porous carbons. Fernandez et al.[31] accurately predicted methane (CH4) uptake of MOFs at
100 bar based only on the dominant pore diameter, the maximum pore
diameter, the void fraction, the gravimetric surface area, the volumetric
surface area, and the framework density. Furthermore, results for
N2,[25] CO2 working
capacities, and the CO2/CH4 selectivity[26] of MOFs were also studied using machine learning
algorithms and standard structural descriptors. Similarly, simple
textural descriptors were regarded as the MOF fingerprints to predict
hydrogen (H2) adsorption uptake and CO2/H2 selectivity.[27] Recently, Lin et
al.[28] systematically screened hypothetical
pure-silica zeolites and identified 230 pre-eminent zeolites for effective
removal of linear siloxanes and derivatives using a random forest
method based on simple structural descriptors. Analogously, high-performing
zeolites for anion removal from water were identified.[29] Guest molecules occupy almost the entire void
space of nanoporous materials at high pressures, so these descriptors
capturing global porosity characteristics are often popular. Although
the prediction of high-pressure gas adsorption performance is encouraging,
there is no clear principle guiding the selection of appropriate descriptors,
especially for low-pressure gas adsorption with poor predictive performance.At low pressures, guest molecules are usually adsorbed in the strongly
binding regions of the material’s pore, which cannot be captured
well by simple structural descriptors.[30] To address this problem, some specific descriptors have been continuously
developed by researchers for obtaining a universal model for adsorption
behavior prediction at different pressures. A novel atomic property
weighted radial distribution function descriptor accounting for the
topological diversity was tailored by Fernandez et al.[25,31,32] and combined with traditional
structural descriptors for the prediction of CH4, CO2, and H2 uptakes at low pressures with pleasing
results. Lee and co-workers[2,33,34] developed a new descriptor for nanoporous materials by using topological
data analysis to quantify similarity of pore structures and successfully
predicted CH4 uptake at low pressures. A vectorized persistence
diagram for topology analysis was also expected to be applied to the
screening of various materials.[19,35] Although detailed pore
structure information is captured by a topological descriptor, it
is incapable of reflecting the relative proportion of pores with a
special pore size, let alone the guest–host interaction. Accordingly,
the Voronoi energy descriptor[36] takes into
account both geometrical structural information and the energetics
information, to be highly predictive of xenon/krypton (Xe/Kr) separation
performance. Later, a histogram of the guest–host energy was
regarded as the feature for machine learning, and the predicted gas
adsorption capacities in MOFs were in good agreement with GCMC simulations.[8,10] Fanourgakis et al.[37] proposed to treat the probabilities of a set of different probe
atoms adsorbed by materials as new descriptors for fast screening
of large databases. Recently, heat of adsorption and Henry’s
coefficient of special adsorbates were combined with traditional structural
descriptors for gas adsorption and separation study.[38,39] For nanoporous materials with a wide variety of elements, especially
MOFs, chemical descriptors considering types and contents of chemical
elements were used to predict gas adsorption performance at low pressures.[40−42] In spite of the continuous generation of new descriptors, the selection
of appropriate descriptors for a particular application remains an
open scientific issue.As an essential component of various
household plastic products,
C3H6 is obtained by an energy-intensive cryogenic
distillation process.[43] The development
of alternative separation technologies with low energy consumption
is of great value. A physical adsorption process, especially pressure-swing
adsorption (PSA) technology with high gas purity, is a promising choice.[44] In the PSA process, guest molecules are adsorbed
at high pressures and desorbed at low pressures. The selection of
high-performance adsorbents is the key to achieving an efficient separation
process. Various porous materials have been reported for the PSA process
so far, such as zeolites,[45,46] MOFs,[47] and ZIFs.[48] Considering the
uniform system of pores, high porosity, and excellent thermal and
chemical stability, zeolites have been proven to be promising for
gas adsorption.[49]In this work, we
present a comprehensive QSPR analysis of the database
of 232 zeolites. Correlations between various descriptors and C3H6 adsorption capacity (NC) at pressures of 5,065, 1,013, 303.9,
202.6, 101.3, and 50.65 kPa, at 300 K, were determined using multilinear
regression analysis, quadratic regression analysis, and ANN models.
In addition, pore size distribution (PSD) and isosteric heat of adsorption
(Qst) were introduced to predict NC at low pressures,
and the prediction performance was further evaluated by receiver-operator-curve
(ROC) analysis. These QSPR models allowed for the accurate identification
of high-performing zeolites, and rapid material prescreening significantly
reduced the number of computational-intensive GCMC simulations. Finally,
we described the relative importance of various descriptors determining NC at different pressures,
which provided insights into the understanding of structural performance
relationships at the atomic level.
Models and Methods
Molecular Models
First, 232 ordered
pure silicon zeolites (Si/O = 1:2) considered in this study were obtained
from the International Zeolite Association (IZA) database. The zeolite
framework types were generated using a library of 49 composite building
units. The framework atoms were described by Lennard-Jones (LJ) and
electrostatic potentials.[50,51]where ε and σ are the
well depth and collision diameter, respectively, r is the distance between atoms i and j, q is the charge of atom i, and ε0 = 8.8542 × 10–12 C2 N–1 m–2 is the
permittivity of vacuum. The adsorbate C3H6 was
represented by a united-atom model with CH as a single interaction site. The LJ potential parameters and atomic
charges of zeolites and C3H6 were adopted from
the COMPASS force field, which fairly well predicted gas adsorption
in a wide variety of zeolites.[52,53] The Lorentz-Berthelot
combining rules were employed to calculate the cross LJ parameters.
Simulation Methods
Before adsorption
simulation, C3H6 molecules and zeolite frameworks
were geometrically optimized to obtain the configurations with stable
structures, and the optimized structures with minimum energy were
used in the subsequent adsorption simulation process. GCMC[54] simulation in the sorption module of Materials
Studio 2018[55] was conducted to evaluate
the adsorption performance of 232 zeolites toward C3H6. GCMC is a statistical mechanical approach, in which the
adsorption process is explored depending on random sampling and probabilistic
interpretation in the adsorbent framework. The adsorption was assumed
to be conducted at 300 K and 6 different pressure levels (5,065, 1,013,
303.9, 202.6, 101.3, and 50.65 kPa). During simulation, the C3H6 molecule was considered as an ideal gas with
negligible interactions, whose fugacity was equal to pressure.[56] Zeolite atoms were assumed to be rigid, and
their positions remained constant. A spherical cut-off of 15.5 Å
was used to calculate the LJ interactions, whereas the electrostatic
interactions were calculated using the Ewald summation method. The
cell lengths of each zeolite were expanded to at least 31 Å (twice
the cut-off distance) along all three dimensions, and the periodic
boundary conditions were exerted. In each zeolite, the GCMC simulation
was run for 1.1 × 106 cycles with 1 × 105 for equilibration and the remaining for ensemble average.
Each cycle consisted of n trial moves (n: the number of adsorbate molecules), including translation, rotation,
regrowth, and swap. To verify the suitability of the COMPASS force
field and the above assumptions used in this study, Figure shows the adsorption isotherms
of pure C3H6 in CHA and STT and C3H8 in MFI and DDR, respectively. Good agreement is observed
between the simulation and open published data,[46,57−60] which suggests the reliability of the force field selected. Besides
GCMC simulation for the adsorption of pure C3H6, the Qst of C3H6 at infinite dilution was estimated. For this case, one adsorbate
molecule (C3H6) was added into a zeolite and
simulation was conducted in a canonical ensemble.
Figure 1
Comparison between simulated
and open published adsorption isotherms
of (a) pure C3H6 and (b) pure C3H8 in zeolites.
Comparison between simulated
and open published adsorption isotherms
of (a) pure C3H6 and (b) pure C3H8 in zeolites.
Descriptor Selection
In this work,
in addition to C3H6 adsorption data simulated,
five general 1D structural descriptors including the largest cavity
diameter (LCD), pore-limiting diameter (PLD), accessible surface area
(ASA), accessible volume (AV), density (ρ) and a 2D structural
descriptor of PSD, and an energy descriptor of Qst were selected in our QSPR analysis. As described in Figure S1, based on principles of moderate correlation
between each descriptor and NC and no strong correlation among descriptors, we selected
the above five general 1D structure descriptors from the initial seven
descriptors to quantitatively describe the structure of zeolites.The LCD corresponds to the maximum of the PSD, and the PLD refers
to the largest characteristic guest molecule size for which there
is a nonzero AV. LCD and PLD determine whether a specific guest molecule
can enter the zeolite window; furthermore, ASA and AV reflect the
void space that guest molecules can reach. All void space of a zeolite
is reflected by ρ indirectly. The PSD provides information about
the fraction of void space that is occupied by pores of a certain
size, and the Qst value reflects the energy
information related to the adsorption process. These diverse descriptors
with strong structure–performance relationships between guest
molecules and zeolites reveal the features of zeolites from various
aspects, which could be applied in accuracy prediction of machine
learning.[17,26,38,61] These descriptors are relatively easy to measure,
and they can be used directly to guide the synthesis and application
of zeolites. In this work, ρ was obtained directly through zeolite
crystalline structure, and LCD, PLD, ASA, AV, and PSD were determined
by the Zeo++ program, in which the radius of a probe (1.2 Å)[28] was used for ASA and AV, and a bin size of 0.1
Å[62] was used to obtain PSD histograms.
The Qst value was calculated by the NVT-Monte
Carlo simulation.
Multilinear and Quadratic Regression Models
Multilinear regression analysis is performed when the relationship
between multiple descriptors and NC of 232 zeolites is assumed to be linear. The
general form of the multilinear regression model is as follows:The quadratic regression
is performed through adding binary interaction terms to the multilinear
regression model. The general form of the quadratic regression model
is as follows:where y and x refer to the target
value and input value of sample k; β and β are binary
and linear coefficients, respectively; β0 and γ are the constant and error term, respectively.
Neural Network Models
To clarify
the role of structural descriptors and an energy descriptor on adsorption
capacity, the above descriptors were chosen as neurons, imported to
the input layer, and passed in an orderly manner into the hidden layers
and output layer. The information obtained from the ANN model was
finally stored and transferred via a feed-forward process to predict NC of 232 zeolites.
As a typical machine learning algorithm, the ANN model was trained
countless times by comparing the simulated and calculated output values
and then adjusting the weights and thresholds to decrease the error,
where the mean squared error was used as the cost function. The optimization
of the cost function was carefully monitored to determine the optimal
number of epochs so that overfitting to the training data was avoided. Figure exhibits the architecture
of the ANN model used in this paper and the ANN model was performed
on a MATLAB R2020a platform. ANN models with different descriptors
as nodes, two hidden layers, and seven nodes for each layer were built
to predict the GCMC-simulated NC of zeolites.
Figure 2
Architecture of an ANN model.
Architecture of an ANN model.
Performance Criteria
During the training
process, the data sets were primary randomly divided into two parts,
80% of which was used for training, and the remaining 20% was used
to test the generalization ability of the model. Moreover, to reduce
input data set partition uncertainties and minimize overfitting issues,
a fivefold cross validation (CV) approach was also employed for the
ANN model, and the average of the results of the five calculations
was taken as the model performance. All input and output data sets
were preprocessed by a normalization method to speed up the training
process. The quality of the training and test results was evaluated
by the determinate coefficient (R2), the
root mean square error (RMSE), and the mean absolute error (MAE) as
follows:where n, y, ŷ, and y̅ refer to the number of samples, target value, predicted
value, and average target value, respectively.
Results and Discussion
Univariate Analysis
A comprehensive
understanding of the relationship between the structural and energy
descriptors and performances of the zeolites used for C3H6 adsorption would be conductive to the identification
of potential materials. We initially performed a simple univariate
analysis where we looked for correlations between a single descriptor
and the simulated NC at 300 K, 5,065 and 50.65 kPa, in which the top 20% of the
data was classified as zeolites with high adsorption capacity, and
the remaining 80% as zeolites with low adsorption capacity. Figures a–d, and S2a–d show that four structural descriptors
(LCD, PLD, ASA, and AV) are positively corrected with NC. When LCD is less than 3.75
Å, adsorption of guest molecules is impeded due to unfavorable
potential overlap with the framework, and NC could be considered to be almost zero.
With the gradual increase of LCD, the large void space makes the adsorption
capacity increase overall at 5,065 and 50.65 kPa. ASA and AV are also
strongly correlated with NC at 5,065 and 50.65 kPa. The relationship between NC and ρ also
shows a similar linear trend in Figures e and S2e, but NC gradually decreases
as ρ increases. There is no evident trend in the relationship
between NC and Qst, which shows that Qst cannot be a good linear explanation of NC. The interpretation of NC by a single descriptor
at 50.65 kPa is all reduced compared to that of 5,065 kPa even though
similar linear trends are observed. From the perspective of univariate
analysis, the correlation of simple structure descriptors is AV >
LCD > ρ > ASA > PLD at 5,065 kPa (high pressure), which
is different
from that at 50.65 kPa (low pressure) with LCD > AV > ρ
> ASA
> PLD. It can also be observed that NC is not uniquely determined by a single
descriptor.
Overall, a single structural or energy descriptor can only determine
the individual performance relationship and cannot explain synergies
between various descriptors that might contribute significantly to
the performance of zeolites.
Figure 3
Relationships between (a) NC ∼ LCD, (b) NC ∼ PLD, (c) NC ∼ ASA, (d) NC ∼ AV, (e) NC ∼ ρ, and (f) NC ∼ Qst at 300 K and 5,065 kPa.
Relationships between (a) NC ∼ LCD, (b) NC ∼ PLD, (c) NC ∼ ASA, (d) NC ∼ AV, (e) NC ∼ ρ, and (f) NC ∼ Qst at 300 K and 5,065 kPa.A heat map of the correlation coefficient matrix
for various descriptors
and NC at various
pressures is shown in Figure . Upon observing the correlations among the set of descriptors,
a strong positive correlation is observed between LCD and AV, with
the Pearson correlation coefficient value of r equal
to 0.75; moreover, a strong negative correlation exists between ρ
and ASA or AV, with an absolute value of r greater
than 0.71. However, no significant correlation is observed between Qst and structural descriptors. For NC, three structural descriptors
(AV, LCD, and ρ) are strongly related to it with the absolute
value of r greater than 0.62, and two structural
descriptors (ASA and PLD) have moderate correlation with r greater than 0.37. Nevertheless, there is a little correlation between
the energy descriptor (Qst) and NC with an absolute
value of r lower than 0.10. It can also be clearly
observed from Figure that with the decrease of adsorption pressure, the contribution
of a single descriptor to NC gradually decreases.
Figure 4
Pearson correlation matrix for all the
set of descriptors and NC at various pressures
(the color bars represent the size of the Pearson correlation coefficients).
Pearson correlation matrix for all the
set of descriptors and NC at various pressures
(the color bars represent the size of the Pearson correlation coefficients).Due to the limitations of univariate analysis in
identifying synergies,
five structural descriptors (LCD, PLD, ASA, AV, and ρ) with
strong linear correlation with NC in univariate analysis were used for multilinear
regression and quadratic regression analysis. The predicted results
of the two models for NC at different pressures are included in the Supporting Information. Although the prediction accuracy of
the two prediction models is not excellent, these two models confirm
that AV plays the most critical role in NC at 5,065 kPa. Furthermore, multilinear
regression analysis further confirms that LCD is the most important
descriptor for determining NC at 50.65 kPa.
ANN Models
Five Structural Descriptors
Machine
learning has been widely used to predict the adsorption performance
of materials. In this section, ANN models with five input nodes (LCD,
PLD, ASA, AV, and ρ), two hidden layers, and seven nodes for
each layer were built to predict the GCMC-simulated NC of zeolites. During the training
process, both random validation and fivefold CV were implemented,
and the results of random validation are shown in the Supporting Information. At 300 K and 5,065 kPa,
the predicted outcomes of the ANN model are in good agreement with
the GCMC-simulated NC with R2 = 0.910 for fivefold CV,
as shown in Figure , indicating that these structural descriptors correlate well with NC at high adsorption
pressures. The explanation for this satisfactory situation is that
adsorption is mainly a molecular stacking mechanism at high pressures,
and these five descriptors describing the global porosity characteristics
of the zeolites could fully determine NC of zeolites. To examine this more closely,
we select the actual top 20% (46) zeolites in the database as determined
by simulations, which are the subset to the right of the solid red
line. Likewise, the top 20% zeolites as predicted by the ANN model
are also picked up, which are the zeolites above the dashed red line
in Figure . The intersection
part produces top performing zeolites that are recovered by the model.
For getting the real top 20% zeolites with excellent NC, one only needs to identify
the top 31.0% zeolites predicted by the ANN model for further research,
which greatly shortens the time to find optimal zeolites for C3H6 adsorption. A significant acceleration of the
screening process for identifying high-performing candidates from
millions of materials is also made possible, whereas it is extremely
time-consuming by means of molecular simulations alone.
Figure 5
Comparison
of ANN model predictions using five structural descriptors
(LCD, PLD, ASA, AV, and ρ) with NC obtained by GCMC simulation (fivefold
CV approach) at 300 K and 5,065 kPa.
Comparison
of ANN model predictions using five structural descriptors
(LCD, PLD, ASA, AV, and ρ) with NC obtained by GCMC simulation (fivefold
CV approach) at 300 K and 5,065 kPa.As demonstrated in Figure a–e, the significantly enhanced prediction
performance
does not change the fact that it deteriorates as pressure decreases
in comparison with multilinear regressing and quadratic regression
models. With the gradual decrease of adsorption pressure, the external
force is no longer large enough for the guest molecules to fill the
entire void space of zeolites. At low pressures, guest molecules are
usually adsorbed in a portion of the void regions with strong binding
sites that are not sufficiently captured by these general global descriptors,
causing undesirable performance at 50.65 kPa in Figure e (R2 = 0.740).
Accordingly, the search for a specific descriptor to enhance the prediction
accuracy of the model for low-pressure gas adsorption seems to be
a must. Considering the combined pressures (see Figure a), the R2 (0.887)
of the ANN model increases greatly compared with the previous models
(R2 = 0.722 for a multilinear regression
model, and R2 = 0.803 for a quadratic
regression model), which indicates that strong nonlinear capacity
of the ANN model makes the P descriptor affecting NC work at various pressures.
Additionally, the fine predictive performance of the ANN model is
further demonstrated by the residual histogram with a standard normal
distribution shown in Figure b.
Figure 6
Comparison of ANN model predictions using five structural descriptors
(LCD, PLD, ASA, AV, and ρ) with NC obtained by GCMC simulation (fivefold
CV approach) at 300 K and (a) 1,013 kPa, (b) 303.9 kPa, (c) 202.6
kPa, (d) 101.3 kPa, and (e) 50.65 kPa.
Figure 7
(a) Comparison of ANN model predictions using five structural
descriptors
(LCD, PLD, ASA, AV, and ρ) and P with combined NC obtained by GCMC simulation
(fivefold CV approach) at six different pressures; (b) residual statistical
histogram for all sets.
Comparison of ANN model predictions using five structural descriptors
(LCD, PLD, ASA, AV, and ρ) with NC obtained by GCMC simulation (fivefold
CV approach) at 300 K and (a) 1,013 kPa, (b) 303.9 kPa, (c) 202.6
kPa, (d) 101.3 kPa, and (e) 50.65 kPa.(a) Comparison of ANN model predictions using five structural
descriptors
(LCD, PLD, ASA, AV, and ρ) and P with combined NC obtained by GCMC simulation
(fivefold CV approach) at six different pressures; (b) residual statistical
histogram for all sets.
Six Descriptors with PSD
Simple
structural descriptors provide global porosity characteristics of
the zeolites, which are not enough to predict low-pressure gas adsorption.
Consequently, one tries to explore descriptors with more implicit
information, and PSD is a good choice, which refers to the rate of
change of pore volume with pore size. PSD describes the pore morphology
with upper and lower bounds of pore diameters and their relative proportions,
and it is extremely sensitive to small changes in the pore diameter,
while it could not embody subtle changes in pore surface texture and
other features. Notably, we just need to calculate the PSD histogram
of the pore diameter range of 3∼6 Å as the powerful attraction
potential of C3H6 molecules and O atoms in frameworks
emerges in this range, as demonstrated in Figure , and corresponding LJ parameters are shown
in Table S4. Additionally, it is also found
that expansion of the pore diameter range does not enhance the prediction
of the ANN model. The implementation of this strategy significantly
reduces the complexity of the input data while minimizing the impact
on the accuracy.
Figure 8
LJ potential surfaces for C3H6···O
as a function of distance, where the O atom is assumed to be a concentration
of LJ potential energy for zeolites.
LJ potential surfaces for C3H6···O
as a function of distance, where the O atom is assumed to be a concentration
of LJ potential energy for zeolites.The five structural descriptors mentioned above
and the present
PSD serve as inputs of ANN models to make predictions to NC at diverse pressures. As shown
in Figure S9, the addition of PSD does
not improve the prediction accuracy of the model compared with the
ANN model using five simple structural descriptors alone, which shows
that the supplement of PSD has little effect on the generalization
of the model at diverse pressures. Surprisingly, for combined pressures,
the R2 value of the ANN model with six
structural descriptors and P descriptor is 0.967
(see Figure a) with
an enhancement of 9.0% compared with no PSD, which is a pleasant outcome.
Moreover, the residual histogram with normal distribution shown in Figure b also verifies the
excellent prediction performance. It is promising that the PSD with
many data points may be well matched with a lot of data sets to obtain
good fitting accuracy.
Figure 9
(a) Comparison of ANN model predictions using six structural
descriptors
(LCD, PLD, ASA, AV, ρ, and PSD) and P with
combined NC obtained
by GCMC simulation (fivefold CV approach) at six different pressures;
(b) residual statistical histogram for all sets.
(a) Comparison of ANN model predictions using six structural
descriptors
(LCD, PLD, ASA, AV, ρ, and PSD) and P with
combined NC obtained
by GCMC simulation (fivefold CV approach) at six different pressures;
(b) residual statistical histogram for all sets.
Six Descriptors with Qst
The ideal zeolite for C3H6 adsorption has a high uptake at adsorption pressure, as well as
a low uptake at desorption pressure. To date, the prediction of high-pressure
gas adsorption has been very pleasing; however, the desired expectations
have not been achieved at low pressures. Consequently, we continued
to search for the descriptor that could explain the adsorption mechanism
at low pressures. Given the regional features of gas adsorption at
low pressures, the chemistry of pores is likely to be a dominating
factor in determining the low-pressure adsorption of C3H6 and likely other gases, which is also Burner’s
opinion.[63] As an essential parameter to
characterize the heterogeneity of the adsorption surface, Qst could disclose essential information about
chemical interactions between the adsorbate and adsorbent, and C3H6 molecules with an excessively large Qst value in a zeolite indicate very strong interaction
with the framework, which is a complement to PSD.Five general
structural descriptors (LCD, PLD, ASA, AV, and ρ) together with
an energy descriptor (Qst) are used as
inputs to train the ANN model. As shown in Figure S10a, the accuracy of the model (R2 = 0.929) is improved by 2.1% with the addition of Qst at 300 K and 5,065 kPa, which means that the five structural
descriptors alone are sufficient to predict NC if concise input data are required
at high pressures. The proportion of void space filled by guest molecules
decreases with the decline of pressure, and a gradual improvement
in R2 (1.3% ∼ 5.1%) is observed,
compared to the case where only five structural descriptors are used
(see Figure S10b–e). At 50.65 kPa,
the model’s R2 (0.809) is enhanced
by 9.3% in Figure , which is quite remarkable in contrast with no energy descriptor
(Qst), and this further confirms the indispensability
of Qst for low-pressure gas adsorption
prediction. The actual top 20% and predicted top 20% zeolites are
all selected to further observe the deviation predicted for the ANN
model from Figure . Predicted top 41.8% zeolites are necessary to be identified for
subsequent research for the purpose of gaining the real top 20% zeolites,
which is satisfactory. Accordingly, the Qst value does accurately capture the local porosity characteristics
of gas adsorption at low pressures. For combined pressures, while
the prediction performance (R2 = 0.902)
of the ANN model containing Qst in Figure S11a is inferior to the ANN model with
PSD (R2 = 0.967), the superiority of it
is not negligible compared with the ANN model with five general structural
descriptors (R2 = 0.887). Furthermore,
the residual histogram shown in Figure S11b is a standard normal distribution. For clarity, R2, RMSE, and MAE values corresponded to 232 zeolites for
three different ANN models using the fivefold CV approach are summarized
in Table , and the
contribution of various descriptors to the predictive performance
of ANN models is quite clear. Additionally, R2 on the training and test sets of three ANN models using the
random validation approach at various pressures is summarized in Table S5 in the Supporting Information.
Figure 10
Comparison
of ANN model predictions using five structural descriptors
(LCD, PLD, ASA, AV, and ρ) and an energy descriptor (Qst) with NC obtained by GCMC simulation (fivefold CV approach)
at 300 K and 50.65 kPa.
Table 1
R2, RMSE,
and MAE Values Corresponded to 232 Zeolites of Three ANN Models (Fivefold
CV Approach) at Various Pressures with S Representing
the General Structural Descriptor
ANN
(5S)
ANN (5S + PSD)
ANN (5S + Qst)
pressure (kPa)
R2
RMSE
MAE
R2
RMSE
MAE
R2
RMSE
MAE
5,065
0.910
0.342
0.256
0.902
0.356
0.276
0.929
0.304
0.236
1,013
0.907
0.339
0.265
0.898
0.354
0.278
0.919
0.316
0.241
303.9
0.856
0.389
0.309
0.861
0.382
0.313
0.878
0.358
0.283
202.6
0.823
0.408
0.323
0.815
0.418
0.328
0.851
0.375
0.293
101.3
0.788
0.412
0.327
0.771
0.429
0.351
0.828
0.372
0.291
50.65
0.740
0.422
0.337
0.734
0.428
0.358
0.809
0.362
0.293
combined
0.887
0.644
0.265
0.967
0.348
0.132
0.902
0.601
0.246
Comparison
of ANN model predictions using five structural descriptors
(LCD, PLD, ASA, AV, and ρ) and an energy descriptor (Qst) with NC obtained by GCMC simulation (fivefold CV approach)
at 300 K and 50.65 kPa.A comparison of the predicted labels from the ANN
model and the
labels from the GCMC simulation at different pressures is depicted
using an ROC plot, which illustrates the ability of the model to correctly
label zeolites. The ROC depicts relative trade-offs between true positives
(benefits) and false positives (costs). The perfect prediction method
would yield a point in the upper left corner of the ROC plot, with
a 100% sensitivity (no false negatives) and 100% specificity (no false
positives) and an area under the curve (AUC) of 1. In order to examine
the fine predicted performance of the ANN model with the five structural
descriptors and Qst on NC at low pressures closely, we
initially selected a cut-off criterion of 20% in light of simulated NC and then labeled
zeolites as “positive” or “negative” depending
on whether the predicted NC was in the top 20% or in the remaining 80% on the 232
zeolites using random validation and fivefold CV approaches at 300
K and 50.65 kPa, as shown in Figure . The ANN model displays outstanding performance, with
an AUC of 0.955 for random validation and 0.951 for fivefold CV at
300 K and 50.65 kPa. Taking into account the possible impact of the
criteria for dividing labels, AUC values for random validation and
fivefold CV at 50.65 kPa were calculated according to several classification
criteria. As illustrated in Figure , the criteria do not affect the fact that the AUC
values exceed 0.9 for these two validation methods. This excellent
performance certifies the superiority of the ANN model in classifying
zeolites into high or low C3H6 adsorbers at
50.65 kPa based on five structural descriptors (LCD, PLD, ASA, AV,
and ρ) and an energy descriptor (Qst).
Figure 11
Receiver-operator-curve (ROC) plots that show how well the ANN
model labels zeolites as “positive” or “negative”
depending on whether the NC at 300 K and 50.65 kPa is in the top 20% or in the
remaining 80% for (a) random validation and (b) fivefold CV.
Figure 12
Relationships between AUC values and proportion of “positive”
samples for random validation and fivefold CV at 300 K and 50.65 kPa.
Receiver-operator-curve (ROC) plots that show how well the ANN
model labels zeolites as “positive” or “negative”
depending on whether the NC at 300 K and 50.65 kPa is in the top 20% or in the
remaining 80% for (a) random validation and (b) fivefold CV.Relationships between AUC values and proportion of “positive”
samples for random validation and fivefold CV at 300 K and 50.65 kPa.Subsequently, the influence of six descriptors
(LCD, PLD, ASA,
AV, ρ, and Qst) on NC in the ANN model at various
pressures was further explored. A common feature importance measure,
namely, permutation feature importance (PFI),[64] was calculated. To get the PFI of a descriptor such as Qst, we randomly permute Qst and use it to predict NC, while keeping all other input features nonpermuted,
resulting in reduced prediction accuracy. The value of PFI for Qst is given by the ratio in error measures between
the original accuracy and the accuracy resulting from having that Qst randomly permuted.[65] The above steps are repeated 5 times, and the average value is taken
as the final PFI value of Qst to avoid
the influence of the uncertainty of random permutation. All input
features have been considered for PFI with the error measure being
MAE. The PFI values of each descriptor at different pressures are
depicted in Figure . The PFI values of the ANN model are of an order of AV > ASA
> Qst > LCD > ρ > PLD
at 300 K and 5,065
kPa. Undoubtingly, AV is the most crucial porosity characteristic
determining NC at high pressures, and this is also revealed by the previous multilinear
regression and quadratic regression models, which confirms the mechanism
of molecular filling again for high-pressure gas adsorption. With
the decline of pressure, the importance of AV decreases, while the
significance of Qst and LCD becomes increasingly
prominent. The order of PFI values of the ANN model is Qst > LCD > AV > PLD > ρ > ASA at
300 K and 50.65
kPa. When the pressure is not high enough to squeeze the guest molecules
into channels, LCD becomes the vital element preventing the guest
molecules from entering the window. In this situation, guest molecules
are no longer adsorbed into the void space in a stacked state but
selectively adsorbed in the regions with strong interaction, and Qst just reflects this local porosity characteristic.
In short, AV is the most critical feature at high pressures, while Qst and LCD become the two most meaningful features
at low pressures, which coincides with the comments in the literature.[19,38]
Figure 13
PFI maps of different descriptors with NC at different pressures based on ANN prediction,
where PFI = MAEperm/MAEorig.
PFI maps of different descriptors with NC at different pressures based on ANN prediction,
where PFI = MAEperm/MAEorig.The prediction performance of the ANN model after
the addition
of Qst is indeed enhanced for low-pressure
gas adsorption, yet it is worth exploring carefully which factor is
responsible for this improvement, the change in freedom of input data
or the Qst itself. For clarifying this
confusion, ASA with a minimum PFI value was deleted while introducing Qst to keep the freedom of inputs unchanged for
ANN model prediction at 300 K and 50.65 kPa. As illustrated in Figure , in the absence
of ASA, the prediction performance of the ANN model is still pleasing
(R2 = 0.795), and the prediction performance
decreases slightly by 1.7% compared with the ANN model with six descriptors,
while the performance is enhanced by 7.4% in contrast with no energy
descriptor (Qst). Furthermore, we only
need to identify the predicted top 49.6% of zeolites for further research
to get the real top 20% zeolites, demonstrating this ANN model’s
slightly strong impact on zeolites with low NC. The above analysis proves that
the enhancement of prediction performance for low-pressure gas adsorption
is due to Qst rather than the increase
of freedom of input data, and ASA is not completely ineffective. In
light of the simplicity and non-negligible contribution of ASA for
low-pressure gas adsorption, the ANN model with five structural descriptors
(LCD, PLD, ASA, AV, and ρ) and an energy descriptor (Qst) is still considered suitable for predicting NC at low pressures.
Figure 14
Comparison
of ANN model predictions using four structural descriptors
(LCD, PLD, AV, and ρ) and an energy descriptor (Qst) with NC obtained by GCMC simulation (fivefold CV approach) at 300
K and 50.65 kPa.
Comparison
of ANN model predictions using four structural descriptors
(LCD, PLD, AV, and ρ) and an energy descriptor (Qst) with NC obtained by GCMC simulation (fivefold CV approach) at 300
K and 50.65 kPa.
Conclusions
A comprehensive QSPR analysis
for NC in 232
ordered pure silicon zeolites (Si/O
= 1:2) at different pressures was investigated with the aid of a variety
of regression models. Univariate analysis and multilinear regression
analysis consistently illustrated that AV and LCD were the most significant
factors determining NC at 5,065 and 50.65 kPa, respectively. Due to the involvement
of complex binary interaction terms, the prediction performance of
the quadratic regression model was significantly superior to multilinear
regression, with the enhancements of 10.1% at 5,065 kPa and 31.2%
at 50.65 kPa.For high-pressure gas adsorption with a molecular
stacking mechanism,
the predicted NC of the ANN model with only five structural descriptors (LCD, PLD,
ASA, AV, and ρ) was in good agreement with the GCMC-simulated NC. A pleasant prediction
(R2 = 0.967) was obtained with the introduction
of PSD for combined pressures. At low pressures, the prediction performance
of the ANN model increased by 9.3% with the addition of Qst reflecting chemical interactions between the adsorbate
and adsorbent. ROC analysis certifies the superiority of the ANN model
with Qst in classifying zeolites into
high or low C3H6 adsorbers at 50.65 kPa. Based
on the relative weight analysis, AV is the most critical feature at
high pressures, while LCD and Qst become
the two most meaningful features at low pressures. Moreover, the enhancement
of model performance for low-pressure gas adsorption is due to the
addition of Qst rather than the increase
of freedom of input data. Our comprehensive insights into QSPR analysis
in this work will provide new ideas for understanding of structure–performance
relationships at the atomic level and the design of high-performance
zeolites for C3H6 adsorption.
Authors: Zihao Zhang; Jennifer A Schott; Miaomiao Liu; Hao Chen; Xiuyang Lu; Bobby G Sumpter; Jie Fu; Sheng Dai Journal: Angew Chem Int Ed Engl Date: 2018-12-04 Impact factor: 15.336
Authors: Michael Fernandez; Peter G Boyd; Thomas D Daff; Mohammad Zein Aghaji; Tom K Woo Journal: J Phys Chem Lett Date: 2014-08-25 Impact factor: 6.475
Authors: Li-Chiang Lin; Adam H Berger; Richard L Martin; Jihan Kim; Joseph A Swisher; Kuldeep Jariwala; Chris H Rycroft; Abhoyjit S Bhown; Michael W Deem; Maciej Haranczyk; Berend Smit Journal: Nat Mater Date: 2012-05-27 Impact factor: 43.841