Literature DB >> 35010109

Machine Learning-Assisted Computational Screening of Metal-Organic Frameworks for Atmospheric Water Harvesting.

Lifeng Li1, Zenan Shi1, Hong Liang1, Jie Liu2, Zhiwei Qiao1.   

Abstract

Atmospheric water harvesting by strong adsorbents is a feasible method of solving the shortage of water resources, especially for arid regions. In this study, a machine learning (ML)-assisted high-throughput computational screening is employed to calculate the capture of H2O from N2 and O2 for 6013 computation-ready, experimental metal-organic frameworks (CoRE-MOFs) and 137,953 hypothetical MOFs (hMOFs). Through the univariate analysis of MOF structure-performance relationships, Qst is shown to be a key descriptor. Moreover, three ML algorithms (random forest, gradient boosted regression trees, and neighbor component analysis (NCA)) are applied to hunt for the complicated interrelation between six descriptors and performance. After the optimizing strategy of grid search and five-fold cross-validation is performed, three ML can effectively build the predictive model for CoRE-MOFs, and the accuracy R2 of NCA can reach 0.97. In addition, based on the relative importance of the descriptors by ML, it can be quantitatively concluded that the Qst is dominant in governing the capture of H2O. Besides, the NCA model trained by 6013 CoRE-MOFs can predict the selectivity of hMOFs with a R2 of 0.86, which is more universal than other models. Finally, 10 CoRE-MOFs and 10 hMOFs with high performance are identified. The computational screening and prediction of ML could provide guidance and inspiration for the development of materials for water harvesting in the atmosphere.

Entities:  

Keywords:  absorption; algorithm; metal-organic frameworks; molecular simulation; water harvesting

Year:  2022        PMID: 35010109      PMCID: PMC8746952          DOI: 10.3390/nano12010159

Source DB:  PubMed          Journal:  Nanomaterials (Basel)        ISSN: 2079-4991            Impact factor:   5.076


1. Induction

As we all know, 71% of the earth’s surface is covered by water, and the remaining 29% is land. At first glance, we have a lot of water resources; in fact, the water we use is mainly fresh water, and fresh water only accounts for 2.5% of all water resources on the earth [1]. Among them, approximately 69% of the fresh water is enclosed in the ice layer of Antarctica and Greenland, and the remaining 30% is stored in the ground, so the fresh water (such as river water and fresh water lakes) that humans can directly use accounts for only 0.4% of all water resources [1]. As population growth and living standards improve, water resources are becoming increasingly scarce, especially daily water for residents of arid regions. Currently, one third of the world’s population live in regions with medium and high water shortages. It is estimated that two thirds of the world will face water shortages by 2050 [2]. Therefore, the lack of fresh water has become one of the major crises to be resolved. At present, several technologies are being used to address this issue. Desalination is one of the main ways to develop new fresh water resources, but the construction of this infrastructure requires a lot of money and the production process is highly energy intensive [3]. In addition, since the main arid and water-scarce areas are far inland, there are never sea water resources available in these places. It is therefore especially important to develop a fresh water technology that can be used in arid areas. The atmosphere of our planet contains a lot of water, which exists in the form of droplets and vapor. It accounts for about 10% of all fresh water [4]. Therefore, if atmospheric water can be efficiently harvested and used, water shortages can be greatly alleviated. At present, the two main methods for collecting water vapor from the air are vapor condensation and adsorption–desorption technology [5]. Condensation conditions are relatively harsh, usually requiring very high amounts of water vapor from the air (relative humidity, RH = 100%) and a large amount of energy, which is unrealistic when the relative humidity is less than 50% [4]. On the contrary, the adsorption–desorption technology can adsorb the water vapor from the air at low temperature, then, using low-grade energy such as natural sunlight or waste heat, it can desorb and condense it into liquid water. The entire process does not require additional energy. Obviously, adsorption–desorption technology is more convenient and energy-saving, and can be used in relatively low humidity regions. The key to adsorption–desorption technology is to select a suitable adsorbent. Currently, the main adsorbents are polymers, zeolites, and silica gels. Unfortunately, these adsorbents have some shortcomings that cannot be ignored, such as low adsorption capacity and the fact that regeneration requires a lot of energy [6]. AMetal-organic frameworks (MOFs), a porous crystalline material, is self-assembled from metal ions and organic ligands [7]. Due to its characteristics of high porosity, large specific surface area and adjustable structure, it is regarded as a candidate for traditional adsorption. Although MOFs have long been used in the fields of gas adsorption [8], separation [9,10], storage [11,12], catalysis [13], and heat pumps [14,15], the use of MOFs to harvest water vapor from the atmosphere has only been proposed in recent years [1,16,17]. This is because structural stability of MOFs may deteriorate after absorbing water. As more MOFs with good water stability join the MOF family, the use of MOFs for atmospheric water harvesting has received widespread attention. Furukawa et al. [16] studied and evaluated the water adsorption properties of 23 materials including six newly synthesized zirconium-based MOFs and determined that MOF-841 can be used as a candidate for atmospheric water harvesting in arid regions. In 2017, Kim et al. [4] developed a heat pump system using the MOF as an adsorbent to collect water from the atmosphere. The principle is that the heat pump absorbs water at night, and uses low-grade energy such as natural sunlight to desorb and condense liquid water during the day. The system harvested 2.8 L of water per kilogram of MOF (MOF-801) per day at room temperature with relative humidity below 20%. Recently, Hanikel et al. [18] summarized the progress of atmospheric water harvesting and the use of MOF-designed water-collection equipment. Therefore, it is feasible to use MOFs to capture water vapor from the atmosphere. We need to screen out some super-hydrophilic MOFs in a huge MOF database. The water vapor in the atmosphere is extremely small compared to N2 and O2, so we need super-hydrophilic and highly selective MOFs to efficiently capture water from the atmosphere. It is very difficult to select suitable MOFs from so huge a database by experimental verification. The emergence of high-throughput computational screening (HTCS) provides a possibility for solving the problem. Qiao et al. [19] used HTCS to select MOFs suitable for separating CO2/N2 and CO2/CH4 from 4764 computation-ready, experimental MOF (CoRE-MOF) databases. In fact, HTCS includes Monte Carlo simulations (MC) and machine learning (ML) [20], and previous studies on HTCS have usually relied on molecular simulation. In recent years, ML has been widely applied to various fields, including image recognition [21], natural language processing [22], data classification and mining [23,24], and material performance prediction [25]. In our previous work, Shi et al. [26] combined MS and ML to screen MOFs with good performance that can be used for adsorption heat pumps. In addition, according to two ML methods with good prediction effects, we determined that the heat of adsorption is the key descriptor that determines the performance of the heat pump. In the work of Dureckova et al. [27], a gradient boosting regression tree (GBRT) model was applied to predict the CO2 working capacity and CO2/H2 adsorption selectivity of carbon capture. It was found that the R2 values of predictive CO2 working capacity and CO2/H2 selectivity were 0.944 and 0.872, respectively. Hypothetical MOFs (hMOFs) can be automatically generated by different metals, linkers. and topologies in computer software. Wilmer et al. [28] generated 137,953 hMOFs from a library of 102 building blocks and screened 300 hMOFs with a higher capacity for methane storage than known CoRE-MOFs. Wu et al. [29] formed a new data set with 130,397 hMOFs and 37 feature descriptors including Henry’s coefficient, atomic number density, and functional group number density. They found that the hMOFs with optimal methane-storage capacities exhibit ϕ of 0.65–0.88, VSA of ~2250 m2·cm–3, etc. The combination of machine learning and molecular simulation of HTCS has greatly increased the speed of discovering new materials [27,30], because ML suitable for specific systems will reduce the number of simulated materials, especially for the updated database of material. Recently, Pardakhti et al. [31] used the trained random forest (RF) of the ML model to predict the methane adsorption of ~130,000 hMOFs. The results showed that the speed of ML was several orders of magnitude faster than traditional MS. The combination of MS and ML has developed into the current main method of screening materials. In Shi et al.’s review [7], several ML methods were considered to possess better prediction performance, such as back propagation neural network and random forest. Therefore, in this work, we selected these methods, as well as gradient boosting regression tree and neighbor component analysis, on the basis that they have good predictive performance for water harvesting on MOFs. In the present work, we apply MC and three ML models to study the performance of water harvesting on MOFs. Based on the established structure-performance relationship, all three types of machine learning achieve a relatively good predictive effect. Then we obtained the main descriptors that played an important role in the performance of MOFs for the capture of water, and finally obtained super hydrophilic MOFs. This may provide guidance for experimental workers to synthesize available MOFs.

2. Models and Methods

2.1. Molecular Models

The crystal structures of the version 2017 of 6013 CoRE-MOFs were collected and established by Chung et al. [32,33] removing the free and coordinated solvent molecules. A large crystallographic dataset of 137,953 hMOFs was designed by Wilmer et al. [28] using 102 building blocks and six different topologies. Five structural descriptors including the largest cavity diameter (LCD), pore-limiting diameter (PLD), volumetric surface area (VSA), void fraction (ϕ), density (ρ), and an energy descriptor of heat of adsorption (Qst), were used to quantitatively describe the structure of the MOF. The reasons for the selection of these six descriptors are as follows: (1) they possessed the strong structure-performance relationships between gas and MOFs, confirmed by many previous works [28,32,34,35] of high-throughput calculation of MOFs, which means that these six MOF descriptors have a greater possibility of achieving the accuracy prediction in ML models than the thousands of other descriptors that could have been used; (2) these six descriptors could be applied in accuracy prediction of ML, which coincide with many ML works [36,37,38,39]; (3) these descriptors are relatively easy to measure in the experiment, and they can be used directly to guide the synthesis and application of MOF. The LCD and PLD in each CoRE-MOF were estimated using Zeo++ [40]. The VSA and ϕ were determined using the diameter of 0.364 nm and 0.258 nm of N2 and He as a probe under the RASPA package, respectively [41]. The Qst was calculated by the NVT-Monte Carlo (MC) with the Widom method in RASPA under infinite dilution conditions, where N, V, and T are the number of particles, the volume of system, and the temperature of the system, respectively [41]. The partial atomic charges of MOFs were rapidly estimated and evaluated using the new MEPO-Qeq [42] method trained to reproduce density function theory (DFT), the extended electrostatic potential fitted charges using the Repeating Electrostatic Potential Extracted Atomic (REPEAT) method [43]. The LJ potential parameters of all CoRE-MOFs were obtained from the universal force field (UFF) [44], as listed in Table S1. In our previous work, it was shown that combining the force fields and MEPO-Qeq method can accurately and quickly predict the adsorption and capture of gases in various MOFs [26,45]. The force field parameters of N2 and O2 molecules were described by the transferable potentials for phase equilibria (TraPPE) force field [46], as demonstrated in Table S2. The TIP4P-Ew [47] model was used to simulate H2O molecules with LJ sites on the O and H atoms, along with the partial charges on H atoms and a dummy atom. A three-site model was applied to mimic a CO2 molecule, which has a C-O bond length of 0.116 nm and a bond angle ∠OCO of 180° [48]. Similarly, an N2 molecule was modeled as a three-site model with the N-N bond length of 0.110 nm.

2.2. Monte Carlo Simulations

To capture water from the air, the Henry’s constants of H2O, N2, and O2 were calculated at 298 K using the Widom particle insertion method [49], and then the selectivity S0[H was calculated by the Henry’s constant of three gas molecules. In this study, the MOF with the larger Henry’s constant of H2O (KH2O) and higher S0[H is regarded as excellent candidate. Notice that Henry’s constant of water is calculated based on the interaction of a water molecule with the framework, which is mainly designed to simulate the extremely low water-molecule content in extreme environments such as deserts (it can be regarded as only one molecule of water in the air). It is noteworthy that, although grand canonical MC (GCMC) is an accurate estimation for the adsorption performance of MOFs, it is difficult to accurately calculate the adsorption loading of H2O. This is because although the structure of a water molecule is very simple, a molecular H-O-H hydrogen bond angle and dipole moment can change continuously during the adsorption process, which further complicates the adsorption [50]. Thus, the H2O adsorption isotherm in most adsorbing has a jump in a narrow range of vapor pressure. This jump is very difficult to calculate during the GCMC simulation. Currently, there was still not a suitable force field or H2O model, which could be used to screen the adsorption loading of H2O in most CoRE-MOFs by GCMC. After the GCMC simulation was repeatedly tested, only several MOFs could be accurately predicted with a relatively good level of agreement with the experimental isotherm [46,50,51,52] Therefore, for a large scale of screening of CoRE-MOFs, the Ki was used to calculate the adsorption selectivity of H2O in this work. For further explanation, see the supporting information (SI). The simulation unit cell extended to at least 2.4 nm along each dimension, and periodic boundary conditions were applied in the three dimensions. It was assumed that the framework atoms of MOFs were rigid and fixed during the simulations. To calculate the LJ interaction, the long-range corrected spherical cut-off radius was set to 1.2 nm. The Ewald summation [53] method was used to estimate the electrostatic interaction between the frameworks and gas molecules as well as between the gas molecules. The number of MC cycles was 100,000; the first 50,000 cycles were performed to the simulation of the equilibrium system, and the last 50,000 cycles were run for ensemble averages. After testing, it was shown that the effect of increasing the MC cycle on the adsorption results was negligible. All simulations were carried out under the RASPA package [41].

2.3. Machine Learning Method

To find out which of the machine-learning (ML) models is suitable for predicting the relationship between the six descriptors (LCD, ϕ, VSA, PLD, ρ, Qst) and the selectivity S0[H of MOFs, further information was sought by ML models. The three kinds of ML employed for the prediction of S0[H were random forest (RF), gradient boosting regression tree (GBRT), and neighbor component analysis (NCA), which were run in Statistics and Machine Toolbox Learning under Matlab2019a software. Because the magnitude of the selectivity data span was very large and cannot be predicted directly, it needed to be pre-processed first; that is, the value of S0[H was taken by the logarithm (log10(S0[H)) to narrow the enormous difference in the various data. In this work, after the five-fold cross-validation evaluated all possible values of each parameter, three ML algorithms programmatically selected the optimal parameter values for the final calculation and prediction, in which the six descriptors were regarded as the input variable and log10(S0[H) as output variable of ML. The key parameters were optimized by five-fold cross-validation and grid-search, as listed in Table S3. All data were divided into five folds. For each cycle, four-fifths of the data were selected randomly as a training set, and one-fifth of the data as a test set. The ML model was run five times for each group value of key parameters by the five-fold cross-validation. The average determinate coefficient (R) of test sets in five-fold cross-validation was adopted to indicate the performance of the model built by different parameter groups. where n, y, f, and refer to the number of MOFs, simulated value, ML predicted value, and average ML predicted value, respectively. In view of the maximum average R2, the optimal parameters could be automatically obtained by the strategy of parameter optimization. Except the optimized parameters, the other parameters were the default values, as listed in Table S4. Secondly, the entire data set of 6013 CoRE-MOFs was adopted to train the model with optimal parameter values. Finally, the data of 10,000 hMOFs were tested. Among them, NCA [54,55] is a supervised learning algorithm that learns the feature weights using a diagonal adaptation. RF is made of multiple decision trees to achieve comparatively higher robustness, and its output is the average of the prediction results of multiple trees [31]. Similarly, GBRT is also an aggregation method by decision tree, which creates the optimal split criterion by continuously minimizing the least squares-regression error for the reduction of computing residual last time. More details of the three MLs are listed in the SI.

3. Results and Discussion

3.1. Univariate Analysis

To explore the effect of the six MOF descriptors on water harvesting performance, we used univariate analysis to understand the relationship between each descriptor (LCD, ϕ, VSA, ρ, PLD, and Qst) and the selectivity S0[H. In Figure 1, the scale of S0[H is very large, because the adsorption behavior of vapor water is very special; it is different from most gases. It is a typical multilayer adsorption. The adsorption of vapor water in MOFs can be divided into two stages. Firstly, based on the interaction between vapor water and MOFs, the water molecules are gradually adsorbed in the pore wall of MOFs. Second, with the increasing of water molecules entering into the framework, strong hydrogen bonds are formed between water and water molecules, leading to remarkable multilayer adsorption. Thus, it is extremely important in the adsorption process of vapor water that the first layer of water is successfully adsorbed in MOFs. Therefore, the difference in selectivity of H2O between hydrophilic and hydrophobic MOFs is extremely large, leading to the data with very high value. In addition, the content of water vapor in the atmosphere is very small, especially in desert areas. The V-shaped adsorption isotherm and very high selectivity of water could be helpful to achieve the capture of H2O in these extreme environments. Figure 1a shows the relationship of the S0[H and LCD. The selectivity is close to 0 in the range of LCD less than 0.27 nm, which may be because the molecules of H2O with the dynamic diameter of 0.264 nm cannot enter the pores of the MOF. As the LCD continues to increase, the S0[H gradually decreases and eventually stabilizes at less than 1 (approximately 0.01). The S0[H is less than 1, indicating that the MOF does not have the ability to selectively adsorb H2O vapor, but preferentially adsorbs N2 and O2 in the atmosphere. This process reflects the change from shape selective to inverse-shape selective adsorption. The relationship of the S0[H and VSA is shown in Figure 1b. In the region where VSA is close to 0, it shows a higher selectivity. When VSA continues to increase, the selectivity reaches its highest point, then gradually decreases. This is because when the VSA is small, the pores of the MOF can accommodate H2O molecules, and when the VSA is too large, the accessible surface of all molecules in the MOF increases, so the contact probability of the N2 and O2 with their optimal adsorption sites increases. Therefore, the selectivity will decrease; that is, the selective separation of H2O vapor cannot be achieved. The super hydrophilic MOFs with high selectivity and high Henry’s constants have a VSA of less than 1000 m2·cm−3, except that the VSA of HUZSUR01 is 1422.66 m2·cm−3.
Figure 1

The relationship of selectivity S0[H versus (a) LCD, (b) VSA, (c) ρ, and (d) PLD. The color code represents void fraction ϕ.

Figure 1c shows the relationship of the S0[H and density ρ. Density and void fraction are correlated descriptors. It is worth noting that as the selectivity increases, the density and void fraction change in opposite directions. This is not difficult to understand. The larger the porosity, the larger the pore volume of the MOF and the lower its density. When the ρ is less than 1260 kg·m−3, the S0[H increases as the density increases, and then the selectivity decreases with the increase of the ρ. In Figure 1, the void fraction ϕ is mapped in the subplots as color codes. The MOFs with high selectivity have a medium-range of ϕ (0.20–0.62), except for MOF HEWFUL (ϕ = 0.16). This is because too large and too small pores are not suitable for selective separation. The pore is too small to prevent the molecule of H2O from entering, thus hindering the adsorption. Conversely, if the pore is too large, the interaction between the adsorbed molecules and the MOF will be weakened, which is not conducive to selective separation. The S0[H versus PLD is shown in Figure 1d. This trend is similar to the relationship of the S0[H and LCD. The highest S0[H are observed at PLD with 0.4 nm, which is approximately equaled to the kinetic diameter of N2 and O2 (0.364 nm and 0.346 nm, respectively). Although the PLD of some MOFs is smaller than the dynamic diameter of H2O, it is still possible for water molecules to enter these MOF channels. On the one hand, MOF pores are not regular, H2O molecules may enter from other larger pores; on the other hand, the kinetic diameter of H2O is estimated by empirical estimation, which is usually larger than the actual size [56], so the molecule of adsorbate may be adsorbed in the MOF. Figure 2a shows that the selectivity increases with the heat of adsorption, which shows a monotonic upward trend. The trend is almost linear, indicating that the isosteric heat of adsorption and the selectivity are strongly correlated variables. When the range of Qst is 270–480 kJ·mol−1, the MOF has its highest S0[H. Since we simulated the adsorption of a single H2O molecule in MOF at infinite dilution, so the heat of adsorption can characterize the strength of the adsorption. Therefore, Qst may be a key descriptor for determining S0[H. This phenomenon also appeared in the CO2 [57] adsorption and thiol capture [45] from the air in our previous works. Figure 2b plots the relationship of the S0[H and the Henry’s constants of water KH2O [58]. On a logarithmic scale, the scatter plot of the S0[H and the KH2O shows an upward trend. The Henry’s constant is a parameter that measures the affinity between the optimal adsorption site of the adsorbent and the adsorbate. The larger Henry’s constant indicates that the interaction between adsorbent and adsorbate molecule is stronger, making adsorption-based separation achievable. Thus, it is necessary that the MOF with large KH2O is required to harvest H2O vapor from the air in arid areas (RH ≈ 20%). From Figure 1 and Figure 2, the Qst seems to be the most important descriptor, and its relationship with selectivity is the most obvious. After linear, binomial, and trinomial fitting for S0[H~Qst, the R2 of binomial fitting could achieve 0.97 and remain stable by using the trinomial fitting, as is shown in Figure S2. The deviation of two points with highest S in the linear fitting makes a relatively lower R2 than both the binomial and trinomial fitting. In fact, the R2 for linear fitting is only 0.93, and it has no accuracy prediction for data in the range of log S > 47. In view of the fitting, we can simply estimate and understand the structure-property relationships of MOFs for the atmospheric water harvesting. Therefore, Qst is very worthy of attention during the screening process.
Figure 2

The relationship of selectivity S0[H versus (a) Qst, (b) KH2O. The color code represents void fraction ϕ.

3.2. Machine Learning

At present, ML has been widely used to predict the performance of materials. Through univariate analysis, only the influence of a single descriptor can be obtained, and ML can not only predict the relationship of structure-performance, but also obtain the common impact of multiple descriptors on performance. In our study, the optimal parameters were obtained by five-fold cross-validation and grid-search. The average R2 of test sets in five-fold cross-validation was adopted to indicate the performance of the model built by different parameter groups. The final model trained by all 6013 pieces of data and optimal parameters. The results are showed in Figure 3a–c. The order of three ML is NCA > GBRT > RF. Compared to GBRT, NCA performs better in the range of high log10(S0[H), while compared to RF, NCA performs better in the range of both low and high log10(S0[H). The reason for this may be the different learning styles of the model, such as feature learning.
Figure 3

Model (a) NCA, (b) GBRT, and (c) RF trained by 6013 CoRE-MOFs. The color represents a base-e logarithm of the number of MOFs.

To further understand the relative importance of the six descriptors for the S0[H, we calculated the weight of each descriptor by three MLs. The weight of the descriptors was calculated while the model was being constructed. The value of relative importance was computed by the normalization of the weight of the six descriptors, as shown in Figure 4 and Table S5. Due to the different characteristics of models, ML shows the relative importance of the descriptors in different ways. However, they all have a point of comparison, which is that the proportion of Qst is more than 50%, especially for GBRT almost only built by a variable (Qst). The order of the six descriptors is Qst > ϕ > ρ > LCD ≈ VSA > PLD. Qst seems to govern the MOF performance in this work, because the concentration of vapor water in air is close to the condition of infinite dilution. The result shows that Qst holds an absolute advantage importance relative to others, as in Section 3.1, which provides a guide for designing the best MOFs of adsorption of water vapor in the experiment.
Figure 4

Relative importance of the six descriptors on the selectivity of MOFs by three ML prediction.

Furthermore, the predictive ML model should be used to accelerate the new HTCS for the other MOF database. Of course, both the simple binomial/trinomial fitting and ML model could achieve this H2O-MOF system, because of the strong relativity of Qst, but ML model would possess higher universality for the other gas-MOF system. Thus, we have added the prediction of a new MOF database (137,953 hMOFs) [28] by ML model, which was trained by 6013 CoRE-MOF datasets. First, Qst was calculated for all 137,953 hMOFs, and then we selected 10,000 hMOFs with the highest Qst for the new prediction, because Qst has the highest importance. As shown in Figure 5a–c, after the predicted results were compared with simulated results by molecular simulation, R2 of the prediction in NCA could reach 0.86. The reasons for the differences of performance between training and predicting are that there exist some differences between the CoRE-MOF and hMOF databases. For examples, there are more than 350 topologies in the CoRE-MOFs database, while there are only six topologies in the hMOFs database, which leads to a diversity gap in those databases; CoRE-MOFs contain much more open metal sites or non-skeleton ions than hMOFs [28]. In this work, the establishment and evaluation of models are finished by 6013 CoRE-MOFs. Ten thousand hMOFs are the extra data, which are different from CoRE-MOFs in some aspects and do not participate in the establishment and evaluation of models. The difference between NCA and GBRT/RF could be that GBRT overemphasizes the importance of Qst (relative importance ≈ 97% in Figure 4); that is, the GBRT model is almost only built by a variable (Qst) and RF may fail to grasp the importance of features other than Qst. Therefore, GBRT and RF may be suitable for the prediction of CoRE-MOFs but not hMOFs, which also means NCA is more universal. Nevertheless, the prediction of NCA for 10,000 hMOFs still shows the sufficient predictive ability of the model, but it is usually not as effective as the original dataset [59]. Moreover, it can be found that, when a hMOF possesses high selectivity (log10S > 5.3), the model performs very well. Thus, the ML model obtained by the CoRE-MOF database can pre-screen out low-performance MOFs to greatly reduce the running time of molecular simulation. Based on the ML algorithm, 80 hMOFs with high performance (log10S > 5.3) could be precisely screened out, and then only the selected 80 hMOFs would have to have their simulated adsorption behavior calculated, as opposed to 137,953 hMOFs, saving a considerable amount of time and computing resource. Finally, the optimal hMOFs were listed in Table S6.
Figure 5

Model (a) NCA, (b) GBRT, and (c) RF prediction for 10,000 hMOFs. The color represents a base-e logarithm of the number of MOFs.

4. Best CoRE-MOFs

According to the principle that both the Henry’s constants of H2O and the selectivity of excellent MOFs are large, we selected 10 optimal CoRE-MOFs for harvesting water from the air based on the order of the selectivity of MOFs from high to low, as listed in Table 1. Among them, the best MOF is QUTHAP, whose KH2O and S0[H are 2.78 × 10124 and 4.14 × 10128, respectively. The range of LCD, ϕ, VSA, PLD, ρ, and Qst of 10 MOFs is 0.035–0.988 nm, 0.16–0.62, 10.46–1422.66 m2·cm−3, 0.264–0.867 nm, 842.16–2912.23 kg·m−3, and 261.01–479.91 kJ·mol−1, respectively. The range of Qst, VSA, and ϕ shows significant agreement with the analysis of Section 3, while the others show less because the relationships between them and selectivity are less obvious. Obviously, the Henry’s constants and selectivity show a proportional trend, which is consistent with our univariate analysis. Moreover, we give more excellent hydrophilic MOFs for the further test. The top 200 CoRE-MOFs and top 200 hMOFs were listed in the Excel file of SI.
Table 1

Top 10 CoRE-MOFs with optimal performance of water harvesting.

No.CSD Code aLCD (nm) ϕ VSA(m2·cm3)PLD (nm)ρ (kg·m−3)Qst (kJ·mol−1)KH2O(mol·kg−1·Pa−1) S 0[H2O/(N2+O2)]
1 QUTHAP0.5690.44654.970.4411257.79479.91 ± 8.312.78 × 101244.14 × 10128
2 CAJWIV0.6200.49856.370.3801078.91307.79 ± 10.194.30 × 10776.85 × 1082
3 PIBLUJ0.5950.39473.390.3521304.13261.19 ± 5.985.81 × 10412.35 × 1046
4 LIRVAK0.3550.2217.880.3011535.50318.82 ± 7.427.16 × 10443.11 × 1045
5 HUZSUR010.9880.621422.660.867842.16255.45 ± 8.233.18 × 10392.48 × 1043
6 HEWFUL0.5580.16293.030.4941665.78271.03 ± 3.111.63 × 10361.31 × 1042
7 YUJWAD0.3880.2616.620.2641429.41265.55 ± 9.256.35 × 10369.73 × 1041
8 YUJWAD010.3980.2842.730.2701409.42261.01 ± 9.771.73 × 10361.30 × 1041
9 - b0.3840.2610.460.2711414.41254.38 ± 9.031.85 × 10344.26 × 1039
10 ECUFEP0.5320.22491.720.4472912.23247.36 ± 7.132.53 × 10321.17 × 1039

CSD code is the number of MOFs in the Cambridge Structural Database. This MOF came from Tominaka et al.’s work [60].

5. Conclusions

In summary, we simulated the adsorption behaviors of H2O, N2, and O2 on 6013 CoRE-MOFs and 137,953 hMOFs by HTCS and ML. Then, after the relationships between selectivity and six MOF descriptors (LCD, ϕ, VSA, ρ, PLD and Qst) were analyzed, respectively, Qst of H2O was shown to possess a strong correlation with the MOF ability for the capture of H2O. Furthermore, three ML algorithms were employed to predict the adsorption performance for each CoRE-MOF, indicating that NCA with a five-fold cross-validation accuracy of R2 = 0.97 is the best algorithm for the prediction of selectivity and that the rank of their predictive ability is NCA > GBRT> RF. Continuously, the relative importance of the six descriptors by MLs could demonstrate that the Qst took the absolute predominance for designing MOFs with optimal selectivity of H2O/air. In addition, from the three models applied to predict the selectivity of hMOFs, it was found that the predicted R2 of NCA can reach 0.86; NCA is more universal for gas-MOFs systems than other models. Finally, the ten MOFs with the best performance were screened out by the statistical methods. They were potential candidates for the capture of H2O from air, especially for QUTHAP. The bottom-up microscopic insights obtained from this study offer experimentalists the guidelines for the development of MOFs with high performance for atmospheric water harvesting.
  20 in total

1.  Development of an improved four-site water model for biomolecular simulations: TIP4P-Ew.

Authors:  Hans W Horn; William C Swope; Jed W Pitera; Jeffry D Madura; Thomas J Dick; Greg L Hura; Teresa Head-Gordon
Journal:  J Chem Phys       Date:  2004-05-22       Impact factor: 3.488

2.  Large-scale screening of hypothetical metal-organic frameworks.

Authors:  Christopher E Wilmer; Michael Leaf; Chang Yeon Lee; Omar K Farha; Brad G Hauser; Joseph T Hupp; Randall Q Snurr
Journal:  Nat Chem       Date:  2011-11-06       Impact factor: 24.427

3.  Water harvesting from air with metal-organic frameworks powered by natural sunlight.

Authors:  Hyunho Kim; Sungwoo Yang; Sameer R Rao; Shankar Narayanan; Eugene A Kapustin; Hiroyasu Furukawa; Ari S Umans; Omar M Yaghi; Evelyn N Wang
Journal:  Science       Date:  2017-04-13       Impact factor: 47.728

Review 4.  Metal-Organic Framework Materials for the Separation and Purification of Light Hydrocarbons.

Authors:  Wen-Gang Cui; Tong-Liang Hu; Xian-He Bu
Journal:  Adv Mater       Date:  2019-05-20       Impact factor: 30.849

5.  Robots that can adapt like animals.

Authors:  Antoine Cully; Jeff Clune; Danesh Tarapore; Jean-Baptiste Mouret
Journal:  Nature       Date:  2015-05-28       Impact factor: 49.962

Review 6.  Dependency distance: A new perspective on syntactic patterns in natural languages.

Authors:  Haitao Liu; Chunshan Xu; Junying Liang
Journal:  Phys Life Rev       Date:  2017-03-27       Impact factor: 11.025

Review 7.  MOF water harvesters.

Authors:  Nikita Hanikel; Mathieu S Prévot; Omar M Yaghi
Journal:  Nat Nanotechnol       Date:  2020-05-04       Impact factor: 39.213

8.  Water adsorption in porous metal-organic frameworks and related materials.

Authors:  Hiroyasu Furukawa; Felipe Gándara; Yue-Biao Zhang; Juncong Jiang; Wendy L Queen; Matthew R Hudson; Omar M Yaghi
Journal:  J Am Chem Soc       Date:  2014-03-11       Impact factor: 15.419

9.  Tuning Water Sorption in Highly Stable Zr(IV)-Metal-Organic Frameworks through Local Functionalization of Metal Clusters.

Authors:  Yong-Zheng Zhang; Tao He; Xiang-Jing Kong; Xiu-Liang Lv; Xue-Qian Wu; Jian-Rong Li
Journal:  ACS Appl Mater Interfaces       Date:  2018-08-08       Impact factor: 9.229

10.  High methane storage capacity in aluminum metal-organic frameworks.

Authors:  Felipe Gándara; Hiroyasu Furukawa; Seungkyu Lee; Omar M Yaghi
Journal:  J Am Chem Soc       Date:  2014-03-24       Impact factor: 15.419

View more
  1 in total

1.  Combining Machine Learning and Molecular Simulations to Unlock Gas Separation Potentials of MOF Membranes and MOF/Polymer MMMs.

Authors:  Hilal Daglar; Seda Keskin
Journal:  ACS Appl Mater Interfaces       Date:  2022-07-11       Impact factor: 10.383

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.