Hilal Daglar1, Seda Keskin1. 1. Department of Chemical and Biological Engineering, Koc University, Rumelifeneri Yolu, Sariyer, 34450 Istanbul, Turkey.
Abstract
Due to the enormous increase in the number of metal-organic frameworks (MOFs), combining molecular simulations with machine learning (ML) would be a very useful approach for the accurate and rapid assessment of the separation performances of thousands of materials. In this work, we combined these two powerful approaches, molecular simulations and ML, to evaluate MOF membranes and MOF/polymer mixed matrix membranes (MMMs) for six different gas separations: He/H2, He/N2, He/CH4, H2/N2, H2/CH4, and N2/CH4. Single-component gas uptakes and diffusivities were computed by grand canonical Monte Carlo (GCMC) and molecular dynamics (MD) simulations, respectively, and these simulation results were used to assess gas permeabilities and selectivities of MOF membranes. Physical, chemical, and energetic features of MOFs were used as descriptors, and eight different ML models were developed to predict gas adsorption and diffusion properties of MOFs. Gas permeabilities and membrane selectivities of 5249 MOFs and 31,494 MOF/polymer MMMs were predicted using these ML models. To examine the transferability of the ML models, we also focused on computer-generated, hypothetical MOFs (hMOFs) and predicted the gas permeability and selectivity of 1000 hMOF/polymer MMMs. The ML models that we developed accurately predict the uptake and diffusion properties of He, H2, N2, and CH4 gases in MOFs and will significantly accelerate the assessment of separation performances of MOF membranes and MOF/polymer MMMs. These models will also be useful to direct the extensive experimental efforts and computationally demanding molecular simulations to the fabrication and analysis of membrane materials offering high performance for a target gas separation.
Due to the enormous increase in the number of metal-organic frameworks (MOFs), combining molecular simulations with machine learning (ML) would be a very useful approach for the accurate and rapid assessment of the separation performances of thousands of materials. In this work, we combined these two powerful approaches, molecular simulations and ML, to evaluate MOF membranes and MOF/polymer mixed matrix membranes (MMMs) for six different gas separations: He/H2, He/N2, He/CH4, H2/N2, H2/CH4, and N2/CH4. Single-component gas uptakes and diffusivities were computed by grand canonical Monte Carlo (GCMC) and molecular dynamics (MD) simulations, respectively, and these simulation results were used to assess gas permeabilities and selectivities of MOF membranes. Physical, chemical, and energetic features of MOFs were used as descriptors, and eight different ML models were developed to predict gas adsorption and diffusion properties of MOFs. Gas permeabilities and membrane selectivities of 5249 MOFs and 31,494 MOF/polymer MMMs were predicted using these ML models. To examine the transferability of the ML models, we also focused on computer-generated, hypothetical MOFs (hMOFs) and predicted the gas permeability and selectivity of 1000 hMOF/polymer MMMs. The ML models that we developed accurately predict the uptake and diffusion properties of He, H2, N2, and CH4 gases in MOFs and will significantly accelerate the assessment of separation performances of MOF membranes and MOF/polymer MMMs. These models will also be useful to direct the extensive experimental efforts and computationally demanding molecular simulations to the fabrication and analysis of membrane materials offering high performance for a target gas separation.
Entities:
Keywords:
gas separation; machine learning; mixed matrix membrane; permeability; selectivity
Metal-organic frameworks
(MOFs) have become a well-known class
of materials to solve energy-related gas separation challenges due
to their high porosities, large surface areas, and easy-to-modify
structural properties.[1,2] Due to the virtually unlimited
combinations of metal parts and organic ligands, an enormous number
of MOFs (>105,000) have been synthesized to date.[3] MOFs have been widely investigated for gas storage and
separation applications such as H2 storage, CH4 storage, CO2 capture, H2 purification, and
separation of CO2 from natural gas and flue gas.[4−7] Due to the environmental and economic advantages of membrane-based
gas separations,[8] MOFs have been studied
as membrane materials.[9] Experimental fabrication
and testing of each MOF membrane for a target gas separation are not
practical in terms of time and cost; thus, computational screening
plays an important role in assessing the gas separation performances
of a large number of MOFs to identify the top promising membranes.[10−12] Several computational screening studies, which use molecular simulations
to assess MOF membranes for various gas separations, CO2/N2, CO2/CH4, H2/CO2, O2/N2, and Xe/Kr, have been reported.[13−17] However, performing computationally demanding grand canonical Monte
Carlo (GCMC) and molecular dynamics (MD) simulations for several thousands
of MOFs, analyzing and interpreting the very large amount of simulated
data while keeping up with the fast progress of discovery of new MOFs
are the current challenges in this field.Machine learning (ML)
is an excellent approach to analyzing a large
amount of simulated material data since establishing structure–performance
relations for MOFs can lead to the design and development of new MOF
materials with better performances.[18] In
the last several years, ML algorithms have been used to study MOFs
for various adsorption-based gas separations such as CO2 capture,[19−21] H2O/(O2 + N2),[22] H2S/CH4,[23] propane/propylene,[24] and Xe/Kr[25] separations. On the other hand, ML has been
used to study MOF membranes in a very limited number of studies due
to the difficulty of generating gas permeability data using computationally
demanding MD simulations. Zhou et al.[26] used different ML algorithms to predict the D2/H2 selectivity of MOF membranes at infinite dilution, 77 K,
and found that the D2/H2 membrane selectivity
of the best MOFs is one order of magnitude higher than those previously
reported in the literature. Qiao et al.[27] used the ML approach to compute the relative importance of MOF features
on the predicted membrane selectivities and showed that porosity and
the largest cavity diameter (LCD) have high importance. Zhong et al.[28] developed an ML model to predict i-C4H8 permeability and i-C4H8/C4H6 selectivity of 601
covalent organic framework (COF) membranes at 1 bar, 298 K, and showed
that porosity and pore limiting diameter (PLD) are key factors controlling
the selectivity and permeability of COF membranes. Bai et al.[29] recently developed eight different ML algorithms
to predict H2 permeability, H2/CH4 membrane selectivity, and trade-off multiple selectivity and permeability
(TMSP) of MOFs and showed that two ML models are the most suitable
ones for predicting the H2 separation performances of MOFs.
In our recent study, ML models were trained to predict O2/N2 adsorption, diffusion, and membrane selectivities
of 5632 MOFs and 137,953 hypothetical MOFs (hMOFs) at 1 bar, 298 K,
to identify the hMOFs with high O2/N2 selectivity.[30]Compared to MOF membranes, a much larger
variety of MOF/polymer
mixed matrix membranes (MMMs) have been fabricated and the incorporation
of MOFs as fillers into polymers to generate MMMs has been shown to
improve the gas permeability and/or selectivity of the pure polymer
in several experimental and computational studies.[31,32] The gas adsorption and diffusion data of MOFs obtained from GCMC
and MD simulations have been used to estimate the gas permeability
of the MOF/polymer MMMs in computational studies,[14,33] and this approach has been shown to provide accurate predictions
for CO2/N2,[32] CO2/CH4,[34] and H2/N2[35] separation performances
of MOF/polymer MMMs. Although a large number of MOF/polymer MMM studies
exist in the literature, no ML study has been reported to predict
the gas permeabilities of these MMMs to date.In this study,
we combined the ML and large-scale molecular simulation
approaches to assess the potential of both MOF membranes and MOF/polymer
MMMs for six different gas separations, He/H2, He/CH4, He/N2, H2/CH4, H2/N2, and N2/CH4. We first performed
GCMC and MD simulations to obtain the adsorption and diffusion properties
of He, H2, N2, and CH4 gases for
the total of 5249 MOFs at 1 bar, 298 K. We then developed ML models
that can accurately predict the uptake and diffusivities of the gases
in MOFs. By using the ML-predicted gas uptake and diffusivity, we
calculated the gas permeabilities and selectivities of the total of
5249 MOF membranes and 31,494 different MOF/polymer MMMs composed
of six polymers for six different gas separations. We finally investigated
the transferability of our ML models to unseen computer-generated,
hMOF data set for predicting the gas permeability and selectivity
of 1000 hMOF/polymer MMMs composed of 500 hMOFs and 2 polymers. The
ML models that we developed in this work will be very useful to accurately
and rapidly predict gas permeabilities and selectivities of MOF membranes
and MOF/polymer MMMs without performing computationally demanding
molecular simulations. These predictions will be useful to accelerate
both the identification and fabrication of the best-performing MOF
membranes and MOF/polymer MMMs for various types of gas separations.
The ML models that we developed also revealed the most important MOF
features for high gas permeabilities and selectivities so that they
will shed light on the design of new high-performing membrane materials
that have not been fabricated yet.
Methods
Our computational methodology
combining molecular simulations and
ML to examine gas separation performances of MOF-based MMMs is illustrated
in Figure . We first
filtered the MOF database by setting two criteria related to pore
size and surface area of MOFs to enable the adsorption of gases in
the MOFs’ pores (step 1). Gas adsorption and diffusion in MOFs
were then investigated by performing molecular simulations, GCMC and
MD, respectively (step 2a), which were used as target data in our
ML models. The physical, chemical, and energetic features of MOFs
such as pore size, pore geometry, atom types, metallic percentage,
and heat of adsorption of gases in MOFs were analyzed (step 2b), and
these features were used as input variables for training ML models
to predict the target data, gas uptake, and diffusivity in MOFs. Using
input variables and target data, we trained and developed ML models.
ML-predicted gas adsorption and diffusion data were compared with
the simulated data of MOFs to determine the accuracy of these ML models
(step 3).
Figure 1
Computational workflow of this study: (1) selection of the MOFs
based on the pore sizes and accessible surface areas; (2a) performing
molecular simulations to obtain the adsorption and diffusion data
of He, H2, CH4, and N2 in MOFs; (2b)
analyzing features and determining the physical and chemical descriptors
of MOFs; (3) comparing the ML-predicted uptake, diffusion, and permeability
of gases with the simulated results of MOF
membranes and MOF/polymer MMMs; (4) predicting the uptake, diffusion,
and permeability of gases for the unseen hMOF data set using the ML
models generated; and (5) evaluating the accuracy of ML models for
the unseen hMOFs by comparing ML-predicted data with the simulated
data of unseen hMOF.
Computational workflow of this study: (1) selection of the MOFs
based on the pore sizes and accessible surface areas; (2a) performing
molecular simulations to obtain the adsorption and diffusion data
of He, H2, CH4, and N2 in MOFs; (2b)
analyzing features and determining the physical and chemical descriptors
of MOFs; (3) comparing the ML-predicted uptake, diffusion, and permeability
of gases with the simulated results of MOF
membranes and MOF/polymer MMMs; (4) predicting the uptake, diffusion,
and permeability of gases for the unseen hMOF data set using the ML
models generated; and (5) evaluating the accuracy of ML models for
the unseen hMOFs by comparing ML-predicted data with the simulated
data of unseen hMOF.We then obtained gas permeability and selectivity
of MOF membranes
and MOF-based MMMs using the gas adsorption and diffusion data computed
from molecular simulations and predicted from ML models (step 3).
The ML models were finally used to predict the gas adsorption and
diffusion properties of unseen hypothetical MOFs (hMOF) (step 4) by
repeating the same steps (steps 1–3) for them. Molecular simulation
results were compared with the ML predictions for hMOF membranes and
hMOF/polymer MMMs. More details about the data refinement, molecular
simulations, and generation of ML models are given below.
Curation of the MOF Data Set
In this
study, we used the most recent collection of experimentally synthesized
MOF database (CoRE MOF 2019), which consists of 12,020 materials.[36] As shown in Figure , we narrowed down the CoRE MOF data set
by focusing on the MOFs with PLD > 3.8 Å and accessible surface
area (SA) >0 m2/g so that all gas molecules that we
studied
(He, H2, N2, and CH4) can pass through
the MOFs’ pores. Since the output of GCMC simulations (loading
and positions of the gas molecules in MOFs) was used as the initial
states of MD simulations, we only studied the MOFs for which GCMC
simulations resulted in at least one molecule of adsorbed gas per
structure. After MD simulations, we only considered the MOFs exhibiting
gas self-diffusivities >10–8 cm2/s,
the
limit to accurately characterize molecular diffusion in MOFs using
MD.[37] In training ML models, we defined
the cutoff threshold values for uptakes and diffusivities of He, H2, N2, and CH4, as shown in Table S1, to refine the data and increase the
accuracy of ML models. Using these threshold values, a small number
of MOFs (0.2, 0.6, 0.8, and 1.7% of all MOFs for He, N2, CH4, and H2, respectively) was identified
as outliers and eliminated. For the ML models developed for He and
H2 diffusion, we calculated the difference between the
simulated and ML-predicted diffusivities and computed the standard
deviation for each MOF. If this difference was greater than double
of the standard deviations of the training data for any MOF, then
this MOF was not used in the training of models. We finally note that
the MOF set used to train ML models for adsorption and diffusion was
identical for a given gas. Having gone through these steps, we ended
up with 677 MOFs for training ML models for He, 2715 MOFs for H2, 5215 MOFs for CH4, and 5224 MOFs for N2.
Molecular Simulations and Membrane Calculations
We computed gas uptakes (N) and self-diffusivities
(D) of He, H2, N2, and CH4 by performing GCMC and MD simulations, respectively,
at 1 bar, 298 K. All simulations were performed using RASPA software.[38] Dispersion interactions between MOF–gas
and gas–gas were described with Lennard-Jones 12-6 (LJ) potentials.
The universal force field (UFF)[39] parameters
were used for the framework atoms. While CH4,[40] H2,[41] and
He[40] were modeled as single, spherical,
and nonpolar atoms, N2 was modeled as three-site molecules:
two N atoms and a dummy atom as the center of mass.[42] N2 has quadrupole moments for which electrostatic
interactions between the gas and the MOFs were considered. The charge
equilibration method (Qeq)[43] as implemented
in RASPA was used to estimate the partial atomic charges of MOFs.
The Ewald summation was used to calculate the long-range electrostatic
interactions.[44] The potential parameters
of gases are listed in Table S2. In GCMC
simulations, we used 2 × 104 cycles for initialization
and another 2 × 104 cycles for taking the ensemble
averages. In MD simulations, NVT ensemble was used, where the step
size and total simulation time were 1 fs and 5 ns at 298 K, respectively.
We run MD simulations for 5 × 106 cycles, using 103 cycles for initialization and 104 cycles for the
equilibration of each MOF. More details of simulations can be seen
in our previous works.[14,33] By using simulated gas adsorption
and diffusion data, gas permeabilities of MOFs were calculated using Pi = c/f, where
c, D, and f represent the adsorbed concentration, self-diffusivity,
and feed side pressure of gas i, respectively. The
feed (permeate) side of the membrane was assumed to be at 1 bar (under
vacuum).[45] Then, ideal membrane selectivities
were calculated as the ratio of single-component gas permeabilities, Smem = P/P.MOF-based MMMs were studied
for six different separations, He/H2, He/N2,
He/CH4, H2/N2, H2/CH4, and N2/CH4. For each separation, we
selected at least three polymers representing membranes with high,
medium, and low gas permeabilities, which defined Robeson’s
upper bound.[46] Experimentally reported
gas permeabilities of these polymers are listed in Table S3. To predict the gas permeabilities of the MOF-based
MMMs, we used the Maxwell model[47] since
it was previously shown that the simulated gas permeability of MOF-based
MMMs calculated by this model agrees well with the experimental data.[14,33] Maxwell model uses simulated gas permeability data of MOFs and experimentally
measured gas permeability data of polymers to compute the gas permeability
of MOF/polymer MMM as follows, PiMMM = PiP × . Here, PiMMM, PiMOF, and PiP represent the gas permeability of MMM, MOF, and polymer, respectively.
ϕ is the volume fraction of MOF fillers in the polymer and was
used as 0.2 throughout this study. We calculated the He permeabilities
of 2031 MMMs, H2 permeabilities of 10,860 MMMs, CH4 permeabilities of 26,075 MMMs, and N2 permeabilities
of 31,344 MOF-based MMMs. The ratio of gas permeabilities was used
to compute the selectivities of MMMs, SMMM = PiMMM/PjMMM.
Feature Analysis of MOFs
The ML models
aim to establish the relations between MOF descriptors and the target
data, which are the gas uptake and diffusivity data of MOFs at 1 bar,
298 K. Ideally, descriptors should be easy to obtain/calculate and
have low dimensionality and correlation with the target data to some
extent. We extracted 20 different features as potential descriptors
in Table . LCD, PLD,
and their ratios (LCD/PLD) were shown to affect the adsorption and
diffusion of gases in MOFs.[15,32,48,49] We also considered the features
of the pore geometry such as pore volume, porosity, density, and SA,
which are commonly used in ML studies.[50−53]
Table 1
Descriptors Used to Construct a Feature
Vector for ML Models
groupa
feature (unit)
symbol
A
largest cavity diameter (Å)
LCD
pore limiting diameter (Å)
PLD
pore size ratio
LCD/PLD
B
density (g/cm3)
ρ
pore volume (cm3/g)
PV
porosity
φ
surface area (m2/g)
SA
C
carbon percentage
C%
hydrogen percentage
H%
nitrogen
percentage
N%
oxygen percentage
O%
halogen (Br, Cl, F, I) percentage
halogen%
metalloid (As, B, Ge, Te,
Sb, Si) percentage
metalloid%
ametal (Se, S, P) percentage
ametal%
metal percentage
metal%
D
total degree of unsaturation
TDU
degree of unsaturation
DU
metallic percentage (#of metal/#of C atoms)
MP
oxygen-to-metal ratio
O-to-M
E
heat of adsorption (kJ/mol)
Qst0
The features are separated into
five groups. A–E represent features of the pore size, pore
geometry, atom types, and chemical and energy-based descriptors, respectively.
The features are separated into
five groups. A–E represent features of the pore size, pore
geometry, atom types, and chemical and energy-based descriptors, respectively.To further improve the predicting power of ML models,
we also used
the atom types in the frameworks, which is the number of specified
elements divided by the number of total atoms in a unit cell of MOF
multiplied by 100, such as C%, H%, and metal%. Degree of unsaturation
(DU), which indicates the total number of π bonds and rings,
total degree of unsaturation (TDU), metallic percentage (MP), and
oxygen-to-metal ratio (O-to-M) are essential chemical descriptors
describing the molecular structures.[54] While
the descriptors related to pore size and geometry such as PLD, LCD,
and porosity were calculated using Zeo++ software,[55] atom type and chemical descriptors were extracted from
the crystallographic information files (CIFs) of MOFs taken from the
CoRE MOF database. A nitrogen probe with a radius of 1.86 Å and
2 × 103 trials were used for the surface area calculations.
Geometric pore volumes were computed using a probe radius of 0 Å
and 5 × 104 trials. We finally used the isosteric
heat of adsorption values (Qst0) of gases computed at infinite
dilution, 298 K, using the Widom insertion method as the energy descriptor
in ML models developed for N2 adsorption and diffusion.[37] Details for computing Qst0 using molecular
simulations can be found in our previous work.[14]The Pearson correlation coefficient (r) was used
to determine the feature correlations, which can be expressed as , where x and y are the features, and x̅ and y̅ are the means of x and y. If the
two descriptors are strongly correlated, it can cause problems such
as multicollinearity and overtraining of ML models.[56] To avoid these, we computed the r values
between each descriptor and removed the one having a strong correlation
(r > 0.90).
Machine Learning
We used the tree-based
pipeline optimization tool (TPOT)[57] in
auto-machine learning[58] to efficiently
select the best algorithm and optimize the model parameters. TPOT
is based on the evolutionary algorithm (EA) optimization and includes
three steps of ML: feature engineering, model generation, and model
evaluation. In TPOT, a random principal singular value decomposition
variant called randomized principal component analysis (PCA)[59] is used for feature extraction. Comparison of
a CH4 working capacity of 403,959 hypothetical COFs predicted
using the algorithms defined by TPOT and traditional ML models such
as decision tree (DT), random forest (RF), and support vector machine
(SVM) showed that the accuracy of ML predictions obtained from TPOT
is higher than those of traditional ML models.[56] For the model selection in TPOT, the regression algorithms
in the scikit-learn toolkit[59] were used.
A stratified sampling method was implemented to keep the feature distribution
in training and test data as consistent as possible. The data was
split into two sets, 80% as a training set and 20% as a test set.
We also used a fivefold cross-validation to avoid overfitting. TPOT
parameters listed in a table were provided on GitHub (https://github.com/hdaglar/MOF-basedMMMs_ML). We compared the range of descriptors in the training and test
sets for He, H2, N2, and CH4 in Figures S1 and S2 and showed that the feature
distribution in the training and test sets is similar for each gas
species. Results also highlighted that the MOFs in the training set
are representative of the entire MOF set, providing more accurate
predictions for the test set with similar characteristics.To
evaluate the model accuracy, we used the coefficient of determination
(R2), mean absolute error (MAE), and root-mean-square
error (RMSE) as follows, , Here, M represents the
number of samples, y and ŷ represent the simulated (true) value and predicted value, respectively,
and y̅ denotes the average of the simulated
value by the model. As RMSE and MAE increase, the accuracy of models
decreases. We also used the Spearman rank correlation coefficient
(SRCC) to calculate the ranking correlation between simulated and
ML-predicted data using , where D is the difference
between paired ranks and n is the number of observations.
SRCC is an important tool to understand how well the two rankings
agree. As the value of SRCC increases, the similarity between the
two rankings and the accuracy of models increase. Based on RMSE, MAE,
and R2, the results of the ML algorithms
with their optimized parameters are presented in Table S4. The best ML algorithms for predicting the adsorption
and diffusion properties of He, H2, CH4, and
N2 in MOFs were found as LassoLarsCV, Extra Trees Regressor,
Gradient Boosting Regressor, and Random Forest Regressor. The last
three are tree-based ensemble methods, while LassoLarsCV is a regulated
linear regression model implemented using the least angle regression
(Lars) algorithm and cross-validation (CV). We note that these models
(Lasso,[7] Random Forest,[24,30] Gradient Boosting[20]) have been commonly
used to train ML models for MOFs.After developing the ML models
for predicting the gas separation
performances of the MOF membranes and MOF/polymer MMMs, we focused
on the hypothetical MOF (hMOF) database,[60] which includes 137,593 computer-generated materials to test the
transferability of our ML models to a different material database.
We eliminated the hMOFs with nonaccessible SA and PLD < 3.8 Å
and ended up with 102,926 hMOFs. Performing molecular simulations
for that many materials is computationally very expensive. Therefore,
we ranked 102,926 hMOFs based on their LCDs and created a representative
subset composed of 500 materials, which involve 1st hMOF and every
205th hMOF thereafter. Figure S3 shows
that the ranges of all features of our representative hMOF set (500
hMOFs) are similar to those of the complete hMOF set (102,926 hMOFs).
Then, we predicted He, H2, N2, and CH4 uptakes and diffusivities in 500 hMOFs using the ML models that
we developed for MOFs. GCMC and MD simulations were then performed
to compute He, H2, N2, and CH4 adsorption
and diffusion in 500 hMOFs following the simulation methods described
in Section . ML-predicted
(simulated) gas permeabilities of hMOFs were obtained using the ML-predicted
(simulated) gas uptakes and diffusivities. Finally, we compared the
simulated and ML-predicted gas permeabilities and selectivities of
1000 hMOF/polymer MMMs composed of 2 polymers and 500 hMOFs.
Results and Discussion
Feature Correlation and Univariate Analysis
After the descriptors were determined, relations between these
descriptors and the simulated gas adsorption and diffusion data of
MOFs were examined. We focused on two features in each group of the
descriptors: LCD and PLD for the pore size, pore volume, and density
for the pore geometry, C% and metal% for the atom types, and O-to-M
and TDU for the chemical descriptors. Figure illustrates the correlations between these
features and uptakes for He and CH4. Figure a shows that the He uptake in MOFs generally
increases as the LCDs and PLDs expand. Not surprisingly, the MOF density
and He uptake have an inverse relationship, implying that high pore
volume generally leads to high He uptake, as shown in Figure b. Figure c,d shows that He adsorption is typically
favored in the MOFs having high C% and low metal%. Figure e represents that the MOFs
with narrow pore sizes are favorable for high CH4 uptake.
For many MOFs, CH4 uptake increases as the framework density
increases up to 1.5 g/cm3 and generally decreases in denser
MOFs (>1.5 g/cm3), as shown in Figure f. While CH4 uptake generally
increases as the C% increases, there is an inverse relation between
the metal% and CH4 uptake, as shown in Figure g. There is almost no observable
correlation between the CH4 uptake and chemical descriptors
in Figure h. We observed
similar results for H2 and N2 uptakes, as shown
in Figure S4. Overall, some features correlate
with the gas uptake of MOFs, but many exceptions exist complicating
the explanation of the structure–performance relations.
Figure 2
Effect of features
on gas adsorption: simulated He uptakes in 677
MOFs as a function of (a) pore size (LCD, PLD), (b) pore geometry
(density, pore volume), (c) atom types (C%, metal%), and (d) chemical
descriptors (O-to-M, TDU). Simulated CH4 uptakes in 5215
MOFs as a function of (e) pore size (LCD, PLD), (f) pore geometry
(density, pore volume), (g) atom types (C%, metal%), and (h) chemical
descriptors (O-to-M, TDU).
Effect of features
on gas adsorption: simulated He uptakes in 677
MOFs as a function of (a) pore size (LCD, PLD), (b) pore geometry
(density, pore volume), (c) atom types (C%, metal%), and (d) chemical
descriptors (O-to-M, TDU). Simulated CH4 uptakes in 5215
MOFs as a function of (e) pore size (LCD, PLD), (f) pore geometry
(density, pore volume), (g) atom types (C%, metal%), and (h) chemical
descriptors (O-to-M, TDU).Figure S5 represents
the relations between
He and CH4 diffusion in MOFs and material features. He
self-diffusivity in MOFs increases as PLD and LCD increase in Figure S5a. While there is a linear correlation
between the pore volume and He diffusion, an inverse relation between
density and diffusivity is observed in Figure S5b. Atom types and chemical descriptors weakly correlate with
He diffusivity in Figure S5c,d. High CH4 diffusion is generally observed in MOFs having large PLD,
large LCD, low density, high pore volume, low-medium C%, and high
metal%, as shown in Figure S5e–g.There is almost no correlation between the chemical descriptors
and CH4 diffusivity, as shown in Figure S5h. Similar results were obtained for self-diffusivities of
H2 and N2, as shown in Figure S6. We inferred that, compared to the gas uptake, there is
a weaker relation between MOF features and gas diffusivities since
the movement of the gas molecules through the MOFs’ pores is
generally more complicated than the adsorption of gas molecules in
the pores of MOFs.
Predictions of ML Models for MOF Membranes
Considering the results of the previous section, we employed the
pore size, pore geometry, chemical descriptors, and atom types (shown
in Table ) to train
eight ML models to describe the uptakes and diffusivities of He, H2, N2, and CH4 in MOFs. Figure S7 shows the heatmap with the Pearson correlations
across different features of MOFs. Although there are strong correlations
between some features such as pore volume and porosity (r: 0.82), LCD and PLD (r: 0.77), no pair of features
is overly correlated (r > 0.9), suggesting that
all
features can be used as input variables while training the ML models.[56] Therefore, we considered all of the features
given in Table to
investigate how the descriptor selection affects the accuracy of ML
models.Table lists R2, MAE, RMSE, and SRCC of the
training and test sets based on the feature groups. While our simplest
ML model was established using only pore size (group A), other features
were added to build extended, more predictive/accurate models such
as A+B, A+B+C, A+B+C+D, and A+B+C+D+E. For example, when pore size
and pore geometry (A+B) were used to predict the CH4 uptake
in MOFs, R2 of the test set was computed
as 0.6. When atom types and chemical descriptors were added to the
feature list, R2 of the test set increased
to 0.73. This shows the supportive effect of the atom types and the
chemical descriptors in multivariate analysis, while they have almost
no correlation with the gas uptake and/or diffusivity in univariate
analysis, as previously shown in Figure . Table also shows that pore size and pore geometry are the
dominant features determining the accuracy of ML models for the gas
uptake and diffusivity predictions. Incorporating the atom types and
chemical descriptors into the ML models improved the accuracy of predictions
only marginally. There can be slightly different trends (increase
or decrease) in the calculated SRCC and R2 values of the training and test sets given in Table , which can be considered acceptable. The
most pronounced change was observed for the H2 uptake model
where SRCC and R2 values were decreasing
from 0.999 to 0.986 and from 0.999 to 0.962 in the training set, while
these values were increasing from 0.58 to 0.86 and from 0.38 to 0.80
in the test set, respectively. This might be due to the overfitting
in the ML model using only A group of descriptors for H2 uptake. As shown in Table , when we used the A+B+C+D group of descriptors, R2 and SRCC of ML models for N2 uptake and diffusivity
are not as high as those obtained for other gases. Therefore, we also
included Qst0 in ML models for N2 uptake and
diffusivity to improve the accuracy. We note that since experimental
measurements and molecular simulations to determine Qst0 require
more time and more inputs compared to other structural properties
that we used, we did not use Qst0 in ML models for He, H2, and CH4 uptakes and diffusivities. Based on the analysis
presented in Table , we used A+B+C+D (A+B+C+D+E) descriptor groups to train the ML models
for predicting the uptake and diffusivity of He, H2, and
CH4 (N2) in MOFs.
Table 2
Selection of Descriptor Groups for
ML Models
training
set
test
set
RMSE
MAE
SRCC
R2
RMSE
MAE
SRCC
R2
Descriptor Groups
He Uptake
A
1.12 × 10–2
7.35 × 10–3
0.816
0.688
1.12 × 10–2
8.32 × 10–3
0.711
0.56
A+B
2.06 × 10–3
1.68 × 10–3
0.985
0.989
2.20 × 10–3
1.78 × 10–3
0.979
0.98
A+B+C
1.55 × 10–3
1.20 × 10–3
0.992
0.994
1.75 × 10–3
1.34 × 10–3
0.984
0.99
A+B+C+D
1.54 × 10–3
1.18 × 10–3
0.992
0.994
1.70 × 10–3
1.30 × 10–3
0.984
0.99
He Diffusion
A
4.00 × 10–4
3.11 × 10–4
0.859
0.751
6.06 × 10–4
4.79 × 10–4
0.592
0.41
A+B
3.53 × 10–4
2.85 × 10–4
0.87
0.805
4.81 × 10–4
4.05 × 10–4
0.719
0.63
A+B+C
3.20 × 10–4
2.56 × 10–4
0.902
0.84
4.63 × 10–4
3.89 × 10–4
0.758
0.64
A+B+C+D
3.29 × 10–4
2.63 × 10–4
0.894
0.831
4.76 × 10–4
3.90 × 10–4
0.747
0.65
H2 Uptake
A
1.0 × 10–6
1.0 × 10–6
0.999
0.999
1.65 × 10–2
1.20 × 10–2
0.576
0.38
A+B
4.57 × 10–3
2.63 × 10–3
0.976
0.954
1.03 × 10–2
6.56 × 10–3
0.818
0.75
A+B+C
2.67 × 10–3
1.26 × 10–3
0.993
0.985
9.71 × 10–3
5.82 × 10–3
0.846
0.78
A+B+C+D
4.21 × 10–3
2.03 × 10–3
0.986
0.962
9.23 × 10–3
5.45 × 10–3
0.862
0.80
H2 Diffusion
A
5.62 × 10–4
4.43 × 10–4
0.602
0.499
6.35 × 10–4
4.94 × 10–4
0.542
0.35
A+B
2.26 × 10–4
1.71 × 10–4
0.952
0.919
4.73 × 10–4
3.67 × 10–4
0.734
0.64
A+B+C
2.50 × 10–4
1.99 × 10–4
0.936
0.901
4.41 × 10–4
3.47 × 10–4
0.773
0.69
A+B+C+D
1.79 × 10–4
1.35 × 10–4
0.973
0.951
4.40 × 10–4
3.43 × 10–4
0.768
0.70
CH4 Uptake
A
5.25 × 10–1
4.04 × 10–1
0.587
0.322
6.03 × 10–1
4.68 × 10–1
0.39
0.14
A+B
2.34 × 10–1
1.60 × 10–1
0.94
0.865
4.12 × 10–1
2.86 × 10–1
0.792
0.60
A+B+C
4.67 × 10–2
2.75 × 10–2
0.998
0.995
3.38 × 10–1
2.11 × 10–1
0.872
0.72
A+B+C+D
8.57 × 10–2
4.79 × 10–2
0.995
0.981
3.39 × 10–1
2.12 × 10–1
0.874
0.73
CH4 Diffusion
A
9.17 × 10–5
6.19 × 10–5
0.793
0.59
1.17 × 10–4
7.97 × 10–5
0.62
0.31
A+B
3.56 × 10–5
2.11 × 10–5
0.974
0.938
7.46 × 10–5
4.66 × 10–5
0.861
0.72
A+B+C
1.22 × 10–5
6.76 × 10–6
0.997
0.993
6.72 × 10–5
4.10 × 10–5
0.889
0.77
A+B+C+D
2.62 × 10–5
1.43 × 10–5
0.987
0.967
6.70 × 10–5
4.09 × 10–5
0.89
0.78
N2 Uptake
A
2.44 × 10–1
1.50 × 10–1
0.461
0.18
2.51 × 10–1
1.56 × 10–1
0.288
0.01
A+B
1.35 × 10–1
6.42 × 10–2
0.926
0.749
2.11 × 10–1
1.17 × 10–1
0.671
0.34
A+B+C
3.29 × 10–2
1.38 × 10–2
0.994
0.985
1.83 × 10–1
8.90 × 10–2
0.792
0.49
A+B+C+D
8.06 × 10–2
2.96 × 10–2
0.985
0.911
1.89 × 10–1
9.49 × 10–2
0.768
0.47
A+B+C+D+E
2.99 × 10–2
1.84 × 10–2
0.991
0.988
1.04 × 10–1
5.62 × 10–2
0.936
0.84
N2 Diffusion
A
1.13 × 10–4
7.34 × 10–5
0.738
0.538
1.22 × 10–4
8.30 × 10–5
0.623
0.40
A+B
5.71 × 10–5
3.34 × 10–5
0.935
0.882
9.33 × 10–5
5.75 × 10–5
0.791
0.65
A+B+C
2.39 × 10–5
1.13 × 10–5
0.993
0.979
7.29 × 10–5
4.80 × 10–5
0.843
0.76
A+B+C+D
2.40 × 10–5
1.08 × 10–5
0.994
0.979
7.38 × 10–5
4.72 × 10–5
0.844
0.76
A+B+C+D+E
3.75 × 10–5
2.35 × 10–5
0.966
0.949
7.05 × 10–5
4.46 × 10–5
0.860
0.80
Figure 3
Comparison of the ML-predicted
adsorption of (a) He, (b) H2, (c) CH4, and (d)
N2 in MOFs with the
simulation results. Blue (red) symbols represent the training (test)
data.
Comparison of the ML-predicted
adsorption of (a) He, (b) H2, (c) CH4, and (d)
N2 in MOFs with the
simulation results. Blue (red) symbols represent the training (test)
data.We then compared the ML-predicted adsorption and diffusion
properties
of He, H2, and CH4 (N2) with the
simulation results using the 19 (20) descriptors, as listed in Table . Figure represents the scatter plots
with marginal histograms for the gas adsorption properties of MOFs.
The predicting power of ML models is generally good. Figure a shows the highest accuracy
observed for He adsorption with SRCC: 0.98 and R2: 0.99. Figure b also shows a quite good agreement between the ML-predicted and
simulated H2 adsorption data of MOFs with SRCC: 0.86 and R2: 0.80 in the test set. Although the lowest R2 and SRCC values in the test set were observed
for CH4 uptake, the predicting power of the ML model can
be considered as good (R2: 0.73) in Figure c. In the case of
CH4 uptake, the ML models overpredicted (underpredicted)
the simulation results at low (high) uptakes of <1.5 mol/kg (>1.5
mol/kg). Figure d
represents the high accuracy of the ML model for N2 uptake
prediction with an R2 of 0.84 and an SRCC
of 0.94 in the test set. Overall, with the lowest SRCC value of 0.86,
the rankings of MOF based on the ML-predicted gas uptakes are strongly
correlated with those based on the simulation results in the test
set for all gases.We then trained ML models to predict the
gas diffusion in MOFs. R2 and SRCC values
of the test set for He, H2, N2, and CH4 gases were computed to
be in the ranges of 0.65–0.80 and 0.75–0.89, respectively,
as shown in Figure . Some R2 and SRCC values that we collected
from the literature are as follows: R2 values for the three ML models developed for predicting N2 diffusivity (O2/N2 adsorption selectivity)
in MOFs were reported to be in the range of 0.74–0.80 (0.32–0.55).[61]R2 (SRCC) values
of ML models trained for predicting the C3H8 uptake, Henry’s constant of C3H8, and
adsorption selectivity for C3H8/C3H6 separation were reported as 0.82 (0.89), 0.93 (0.96),
and 0.73 (0.76) in the test set, respectively.[24] As discussed before, gas diffusivity depends on more complex
parameters compared to gas uptakes; thus, ML models predicting diffusivity
in MOFs have not been widely studied. In our recent work, R2 of ML models were reported as 0.74 for N2 diffusivity in MOFs and 0.76 for O2 diffusivity
in MOFs for O2/N2 separation.[30] Overall, we showed that although the level of agreement
between the ML predictions and simulation results is lower for the
gas diffusivities compared to that for the gas uptakes, the accuracy
of ML models is still acceptable based on the previous literature.
The predicting power of ML models for He and H2 diffusivities
shown in Figure a,b
is lower than that for N2 and CH4 diffusivities,
as shown in Figure c,d. Among the diffusivities of He, H2, N2,
and CH4 gases, the best prediction was made for N2 diffusivities (Figure d), resulting in a high R2 of 0.80, an
SRCC of 0.86, and a low RMSE of 7.1 × 10–5.
This can be attributed to the fact that gas molecules with smaller
kinetic diameters (He, H2) diffuse easily, with less dependency
on the pore geometry of the MOF, compared to molecules with larger
kinetic diameters (N2, CH4).
Figure 4
Comparison of the ML-predicted
diffusion of (a) He, (b) H2, (c) CH4, and (d)
N2 in MOFs with the simulated
ones. Blue (red) symbols represent the training (test) data.
Comparison of the ML-predicted
diffusion of (a) He, (b) H2, (c) CH4, and (d)
N2 in MOFs with the simulated
ones. Blue (red) symbols represent the training (test) data.Figure shows the
feature importance analysis for all target variables. The relative
importance of the features varies across the ML models developed to
predict the adsorption and diffusion properties of gases in MOFs.
While the pore size and geometry are more important for training ML
models for H2 adsorption, atom types and chemical descriptors
significantly affect CH4 and N2 adsorption.
For the development of the ML model to predict N2 uptake, Qst0 was also considered as the energy descriptor and played the most
important role in describing the N2 uptake. The importance
of the pore size and geometry in the models predicting gas diffusivities
is generally higher compared to those predicting gas uptakes. Especially,
the importance of the pore size ratio (LCD/PLD) used in the ML models
to predict N2 and CH4 diffusivities is generally
more pronounced than those used to estimate the gas uptakes. Porosity
is the most important descriptor to accurately predict N2 diffusivities, and Qst0 also has an impact. Overall, we concluded
that physical features such as pore size and geometry of MOFs are
important to train the ML models for both gas adsorption and diffusion
data. Compared to the gas diffusivity, predictions for gas uptakes
are much more affected by the inclusion of chemical descriptors, atom
types, and energy descriptors in the ML models. We finally note that
He uptake was not shown in Figure because ML models for all target data except He uptake
were trained with tree-based algorithms, which were constructed using
the Gini index that determines the relative importance of features.
Figure 5
Feature
importance for the gas adsorption and diffusion properties
of MOFs. The width range of each color shows the importance of the
related feature. The colors were taken from the same palette for each
group.
Feature
importance for the gas adsorption and diffusion properties
of MOFs. The width range of each color shows the importance of the
related feature. The colors were taken from the same palette for each
group.Next, we calculated the ML-predicted gas permeabilities
and compared
them with the simulated permeabilities in Figure . We note that the term “ML-predicted
permeability” was used for the permeability that was calculated
using ML-predicted adsorption and diffusion data and “simulated
permeability” was used for the permeability that was calculated
using simulated gas adsorption and diffusion data. To the best of
our knowledge, these are the first ML models developed to predict
He, H2, N2, and CH4 permeabilities
of MOFs at realistic conditions, 1 bar, 298 K. Figure a,b shows that there is a good agreement
between ML-predicted and simulated permeabilities, especially for
He and H2. Figure c,d presents that ML-predicted N2 and CH4 permeabilities are generally lower than simulated ones in the high
gas permeability range (>106 Barrer), but the agreement
is good in the low permeability range. We also showed the ratios of
the ML-predicted gas uptakes, diffusivities, and permeabilities to
the simulated ones for the training and test sets in Figure S8. The average ratio is close to unity for gas uptakes,
indicating the good agreement between ML and simulations. The range
of the ratios (0.11–47.5) is larger for gas diffusivities;
therefore, deviations between ML-predicted and simulated gas permeabilities
were more observable compared to those between uptakes and diffusivities.
Figure 6
Comparison
of the ML-predicted (a) He, (b) H2, (c) CH4,
and (d) N2 permeability of the MOFs with the
simulated ones. Blue (red) symbols represent the training (test) data.
The inset figures represent the data in the dashed boxes in the log–log
scale.
Comparison
of the ML-predicted (a) He, (b) H2, (c) CH4,
and (d) N2 permeability of the MOFs with the
simulated ones. Blue (red) symbols represent the training (test) data.
The inset figures represent the data in the dashed boxes in the log–log
scale.In addition to the gas permeability, selectivity
is an important
metric to assess membranes’ separation performances. We calculated
He/H2, He/N2, He/CH4, H2/N2, H2/CH4, and N2/CH4 membrane selectivities of MOFs. Since two different gas permeability
data are needed to calculate the membrane selectivity of an MOF, we
calculated selectivities only for the MOFs commonly existing in the
test sets of both gases. Figure S9 shows
that there is good agreement between the ML-predicted and simulated
membrane selectivities of MOFs for six different gas separations that
we considered. Overall, the results so far suggest that ML models
that we developed in this work for predicting gas adsorption and diffusion
properties of MOFs can accurately estimate gas permeabilities and
selectivities of MOF membranes and therefore they would be very useful
for the initial assessment of MOF membranes for a target gas separation
before the experimental efforts.
Predictions of ML Models for MOF/Polymer MMMs
Motivated by the good agreement between the ML-predicted and simulated
gas permeabilities of pristine MOFs, we calculated the permeability
and selectivity of MOF/polymer MMMs using both the ML models and results
of molecular simulations. Figure shows that there is good agreement between the ML-predicted
and simulated gas permeabilities and selectivities of MMMs. ML predictions
were found to be in strong agreement with the simulations for the
MMMs composed of polymers having low or medium gas permeability (polypropylene,
PBOI-2-Cu+). On the other hand, the accuracy of ML predictions
was found to be lower for the MMMs composed of highly permeable polymers
(TeflonAF-2400, PTMSP). Figure a shows that ML-predicted permeabilities of MMMs are in a
wider range when the polymers having high gas permeabilities (>103 Barrer) are used compared to those having polymers with relatively
low permeabilities (<103 Barrer). The most significant
difference between the ML-predicted and simulated permeabilities was
observed for MMMs composed of two highly permeable polymers, TeflonAF-2400
and PTMSP. Thus, we focused on MOF/TeflonAF-2400 and MOF/PTMSP MMMs
in Figure c.
Figure 7
Comparison
of the ML-predicted and simulated (a) He and H2 and (b)
N2 and CH4 permeabilities of MOF-based
MMMs. (c) Comparison of the ML-predicted and simulated selectivities
of MOF/polymer MMMs for He/H2, He/N2, He/CH4, H2/CH4, H2/N2, and N2/CH4 separations. Blue (red) symbols
represent the training (test) set. The data for the test set are shown
with smaller symbols than those for the training set in panels (a–c)
to make all data visible.
Comparison
of the ML-predicted and simulated (a) He and H2 and (b)
N2 and CH4 permeabilities of MOF-based
MMMs. (c) Comparison of the ML-predicted and simulated selectivities
of MOF/polymer MMMs for He/H2, He/N2, He/CH4, H2/CH4, H2/N2, and N2/CH4 separations. Blue (red) symbols
represent the training (test) set. The data for the test set are shown
with smaller symbols than those for the training set in panels (a–c)
to make all data visible.For He-related separations (He/H2, He/N2,
and He/CH4), the ML-predicted and simulated selectivities
of MMMs are in strong agreement. For example, the ratios of the ML-predicted
He/CH4 selectivity over the simulated one for MOF/TeflonAF-2400
MMMs in the test set were 0.98–1.07, suggesting that our ML
models can accurately predict the He/CH4 selectivity of
these MMMs. The ratios of the ML-predicted N2/CH4, H2/CH4, and H2/N2 selectivities
over the simulated selectivity in the test set were calculated to
be in a wider range, 0.70–1.33, 0.72–1.29, and 0.72–1.31,
respectively, for MOF/PTMSP MMMs. The ML-predicted selectivity of
MMMs for most MOFs in the test set was generally lower than the simulated
selectivity when the polymer having a high gas permeability was used.
This is expected due to the overestimation of the gas permeabilities
by the ML models, as discussed in Figure a,b. We note that we considered the common
MOFs in training and test sets for each gas pair; therefore, the number
of MOFs used for selectivity predictions is lower than those used
for permeability predictions. For example, 677 and 2715 MOFs were
used to develop ML models for predicting He and H2 permeabilities
but a much smaller number of MOFs, 382 and 28 MOFs (in the training
and test sets, respectively), was used for the evaluation of the
ML models to predict the He/H2 selectivity of the MMMs.
Comparing ML Predictions with Experimental
Data
We so far compared the ML-predicted and simulated gas
separation performances of MOF membranes and MOF/polymer MMMs. Despite
the scarcity in the reported experimental gas permeabilities of the
pure MOF membranes, there are several MOF/polymer MMMs that were tested
for different gas separations in the literature.[12] To make a comprehensive comparison between ML predictions,
molecular simulations, and experiments, we collected the experimental
He, H2, N2, and CH4 permeabilities
of the MOF membranes and MOF-based MMMs from the literature. We note
that simulated and ML-predicted gas permeabilities of MOF-based MMMs
were calculated using the same filler loading as the corresponding
experiments. These experimental permeability data of MOF membranes
and MMMs are presented in Figure together with our corresponding ML predictions and
simulation results. Figure a represents the ML-predicted, simulated, and experimentally
measured gas permeabilities of two MOFs, Cu-BTC and MIL-96, which
were in our material database used for training ML models. Simulated
and ML-predicted gas permeabilities of the MOFs strongly agree, but
they generally overestimate experimental gas permeabilities of Cu-BTC[62,63] and MIL-96.[64] As previously discussed
in the literature,[16] MOFs were modeled
as perfect, defect-free crystal structures in the molecular simulations,
which leads to high permeabilities, but defects may exist in the fabricated
membranes.
Figure 8
Comparison of ML-predicted and simulated gas permeabilities with
the available experimental data for (a) MOF membranes and (b) MOF/polymer
MMMs. Blue lines show the experimental gas permeabilities collected
from the literature. The number of the blue lines on each column represents
the number of experimental data at (a) 1 bar, 298 K, for MOF membranes
and (b) 0.5–5 bar, 298–308 K, for MOF/polymer MMMs.
The values in parentheses in panel (b) represent the volume fraction
of MOF fillers. * (**) represents that the MOF was taken from the
test (training) set.
Comparison of ML-predicted and simulated gas permeabilities with
the available experimental data for (a) MOF membranes and (b) MOF/polymer
MMMs. Blue lines show the experimental gas permeabilities collected
from the literature. The number of the blue lines on each column represents
the number of experimental data at (a) 1 bar, 298 K, for MOF membranes
and (b) 0.5–5 bar, 298–308 K, for MOF/polymer MMMs.
The values in parentheses in panel (b) represent the volume fraction
of MOF fillers. * (**) represents that the MOF was taken from the
test (training) set.Even though our ML models somehow overpredicted
the gas permeabilities
of MOF membranes, the rankings of MOFs based on the ML-predicted gas
adsorption and diffusion data agree well with the simulated ones (SRCC
in the range of 0.75–0.98), as discussed above. These rankings
can be useful to the experimentalists for selecting the best candidates
from a large group of MOFs for membrane fabricating and testing. Figure b shows He, H2, N2, and CH4 permeabilities of three
different MMMs[65−67] composed of well-known MOFs (Cu-BTC, Mg-MOF74, and
MIL-53) with different volume fractions and polymers (Matrimid, PIM-1).
Simulated, ML-predicted, and experimental gas permeabilities all agree
well, showing the strength of our ML models to predict the gas separation
performances of the MOF/polymer MMMs. This is an important result
because considering the existence of thousands of MOFs and hundreds
of polymers, a theoretically infinite number of MOF/polymer MMMs can
be generated and accurate estimates for the gas separation performances
of all of these possible MMMs using the ML models that we develop
will significantly accelerate the design and fabrication of new MMMs
for a variety of gas separations.
Transferability of ML Models
One
of the main advantages of developing ML models for a set of materials
is the ability to transfer these models to a different set of new,
unexplored materials and make accurate predictions for these unseen
materials. Motivated by the good agreement between the ML, molecular
simulations, and experiments, we used our ML models, which were originally
developed for experimentally synthesized MOFs, to predict the separation
performances of hMOFs. hMOFs have not been synthesized yet; thus,
no experimental gas adsorption, diffusion, and/or permeability data
is available for them. After determining the ML-predicted adsorption
and diffusion properties of hMOFs for He, H2, N2, and CH4, we performed GCMC and MD simulations for hMOFs
to compare ML predictions with simulation results. The heatmap with
the Pearson correlations across different features of hMOFs is shown
in Figure S10, which indicates that the
correlations are generally like those observed for MOFs. Figure shows the comparison
of the ML-predicted and simulated uptakes and diffusivities of He
and H2 in 500 hMOFs. We also computed the MAE, R2, SRCC, and RMSE of the ML-predicted gas uptake
and diffusivity in hMOFs, as shown in Table S5. Figure a,b shows
that the ML-predicted He and H2 uptakes agree well with
the corresponding simulated uptakes. On the other hand, ML-predicted
uptakes of most hMOFs for CH4 and N2 (71 and
88% of all hMOFs, respectively) are higher than the simulated uptakes,
as shown in Figure S11a,b. It is important
to note that the ranges of simulated He, H2, and CH4 uptakes of hMOFs are similar to those predicted by ML models
in MOFs (as shown in Figure ), but the range of simulated N2 uptakes in hMOFs
is narrower than that in MOFs.
Figure 9
Comparison of ML-predicted (a, c) uptake
and (b, d) diffusivity
for He and H2 in 500 hMOFs with the simulated ones. The
black line represents x = y. (e)
The ratio of ML-predicted permeability and selectivity values to that
of simulated ones for 1000 MMMs. The left (right) side of the figure
represents the results related to hMOF/PTMSP (hMOF/TeflonAF-2400)
MMMs. Boxes show the quartiles of the data set, while whiskers extend
to show the rest of the distribution, except for outliers that were
defined as values more than 1.5IQR (IQR = interquartile range) from
either end of the box.
Comparison of ML-predicted (a, c) uptake
and (b, d) diffusivity
for He and H2 in 500 hMOFs with the simulated ones. The
black line represents x = y. (e)
The ratio of ML-predicted permeability and selectivity values to that
of simulated ones for 1000 MMMs. The left (right) side of the figure
represents the results related to hMOF/PTMSP (hMOF/TeflonAF-2400)
MMMs. Boxes show the quartiles of the data set, while whiskers extend
to show the rest of the distribution, except for outliers that were
defined as values more than 1.5IQR (IQR = interquartile range) from
either end of the box.In Figure c,d,
it is shown that for most of the hMOFs, the ML-predicted He and H2 diffusivities are similar to the simulated ones. The ML models
consistently underestimated the simulated gas diffusivities in a small
number of hMOFs exhibiting diffusivities above certain values (>4
× 10–3 cm2/s for He diffusivities
and >5 × 10–3 cm2/s for H2 diffusivities). This can be attributed to the fact that the
tree-based
algorithm, by construction, suffers from the extrapolation of unseen
data. In other words, they cannot reach the trends for cases lying
outside the training data.[68] Similar results
were observed for the self-diffusivity predictions of N2 and CH4, as shown in Figure S11c,d. Overall, these results showed that ML models that we trained for
MOFs can predict the gas uptake and diffusivities of hMOFs fairly
well, suggesting the transferability of ML models to different membrane
materials.We finally investigated the applicability of ML models
to predict
the gas permeability and selectivity of hMOF/polymer MMMs. Since the
lowest predictability power of ML models were obtained for the MOF/polymer
MMMs having highly permeable polymers (previously shown in Figure a,b), we focused
on 1000 hMOF/polymer MMMs composed of 500 different hMOFs and 2 highly
permeable polymers, TeflonAF-2400 and PTMSP, for He/CH4 and H2/CH4 separations. Figure e shows the ratio of the ML-predicted H2 (He) permeability and H2/CH4 (He/CH4) selectivity of 500 hMOF/PTMSP (hMOF/TeflonAF-2400) MMMs
to the simulated ones. For He/CH4 separation, the ranges
of these ratios for hMOF/Teflon MMMs were found to be between 0.85
and 1.1 for He permeability and 0.87 and 1.07 for He/CH4 selectivity. Similarly, even if we studied one of the most permeable
polymers (PTMSP), the ratios were found to be close to unity for H2 permeability (0.85–1.26) and H2/CH4 selectivity (0.73–1.07). Thus, we can conclude that
the ML models developed to predict the gas uptake and diffusivity
of MOFs lead to accurate gas permeability and selectivity predictions
for the unseen hMOF-based MMMs.
Conclusions
In this study, we investigated
the gas separation performances
of MOF membranes and MOF/polymer MMMs by combining molecular simulations
and machine learning for six different separations, He/H2, He/N2, He/CH4, H2/N2, H2/CH4, and N2/CH4.
Using 20 different physical and chemical and energy-based descriptors
of MOFs, we developed eight different ML models including LassoLarsCV,
ETR, GBR, and RFR algorithms to predict the uptake and diffusivity
of He, H2, N2, and CH4 in MOFs. The
accuracy of ML models was found to be high for both the gas uptake
and diffusion properties of MOFs leading to an R2 of 0.73–0.99 and 0.65–0.80, respectively, and
an SRCC of 0.86–0.98 and 0.75–0.89, respectively. The
feature importance analysis revealed that the physical properties
such as porosity are more critical for the accurate prediction of
gas adsorption and diffusion data of MOFs compared to the chemical
descriptors such as atom types and degree of unsaturation. ML-predicted
gas uptake and diffusivity data were used to compute He, H2, CH4, and N2 permeabilities of a total of
5249 MOF membranes and a total of 31,494 MOF/polymer MMMs, and the
results were shown to be in good agreement with the permeabilities
computed from the simulations. Comparisons between the ML-predicted,
simulated, and experimentally reported gas permeabilities of different
MOF membranes and MOF/polymer MMMs showed that our ML models will
be very useful to estimate gas separation performances of MOF-based
membranes in a rapid and accurate manner. Finally, the transferability
of ML models developed for real MOFs to hMOFs was examined and results
showed that ML models can successfully predict gas permeabilities
of hMOFs/polymer MMMs. Overall, the ML models that we developed in
this work to predict the gas uptake and diffusion properties of MOFs
will be very useful to evaluate the gas separation performance of
a large number and variety of MOF membranes and MOF/polymer MMMs by
saving an enormous amount of computational time for molecular simulations
and huge amounts of efforts for the experimental fabrication and testing
of membranes. These rapid and accurate models will also be beneficial
for allocating experimental efforts, resources, and time to the most
promising membrane materials.
Authors: Christopher E Wilmer; Michael Leaf; Chang Yeon Lee; Omar K Farha; Brad G Hauser; Joseph T Hupp; Randall Q Snurr Journal: Nat Chem Date: 2011-11-06 Impact factor: 24.427
Authors: Michael Fernandez; Peter G Boyd; Thomas D Daff; Mohammad Zein Aghaji; Tom K Woo Journal: J Phys Chem Lett Date: 2014-08-25 Impact factor: 6.475
Authors: Omar K Farha; Ibrahim Eryazici; Nak Cheon Jeong; Brad G Hauser; Christopher E Wilmer; Amy A Sarjeant; Randall Q Snurr; SonBinh T Nguyen; A Özgür Yazaydın; Joseph T Hupp Journal: J Am Chem Soc Date: 2012-08-31 Impact factor: 15.419
Authors: Xue Han; Harry G W Godfrey; Lydia Briggs; Andrew J Davies; Yongqiang Cheng; Luke L Daemen; Alena M Sheveleva; Floriana Tuna; Eric J L McInnes; Junliang Sun; Christina Drathen; Michael W George; Anibal J Ramirez-Cuesta; K Mark Thomas; Sihai Yang; Martin Schröder Journal: Nat Mater Date: 2018-06-11 Impact factor: 43.841