| Literature DB >> 24217226 |
Yuting Guo1, Jianzhong Wang, Na Gao, Miao Qi, Ming Zhang, Jun Kong, Yinghua Lv.
Abstract
The relationship between synthetic factors and the resulting structures is critical for rational synthesis of zeolites and related microporous materials. In this paper, we develop a new feature selection method for synthetic factor analysis of (6,12)-ring-containing microporous aluminophosphates (AlPOs). The proposed method is based on a maximum weight and minimum redundancy criterion. With the proposed method, we can select the feature subset in which the features are most relevant to the synthetic structure while the redundancy among these selected features is minimal. Based on the database of AlPO synthesis, we use (6,12)-ring-containing AlPOs as the target class and incorporate 21 synthetic factors including gel composition, solvent and organic template to predict the formation of (6,12)-ring-containing microporous aluminophosphates (AlPOs). From these 21 features, 12 selected features are deemed as the optimized features to distinguish (6,12)-ring-containing AlPOs from other AlPOs without such rings. The prediction model achieves a classification accuracy rate of 91.12% using the optimal feature subset. Comprehensive experiments demonstrate the effectiveness of the proposed algorithm, and deep analysis is given for the synthetic factors selected by the proposed method.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24217226 PMCID: PMC3856056 DOI: 10.3390/ijms141122132
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Confusion matrix.
| Hypothesis | Actual positive | Actual negative |
|---|---|---|
| Hypothesise positive | True positive ( | False positive ( |
| Hypothesise negative | False negative ( | True negative ( |
Figure 1Comparison of the original and proposed feature selection methods. (a) Using Nearest Neighbor as classifier; (b) Using Naive Bayes as classifier.
Highest classification accuracy rates reached by the original and the proposed feature selection methods.
| Nearest Neighbor | Naive Bayes | |||
|---|---|---|---|---|
|
|
| |||
| Method | Highest Acc_Rate | Dimension | Highest Acc_Rate | Dimension |
| F | 0.9080 | 20 | 0.8736 | 3 |
| FI | 0.9112 | 12 | 0.8767 | 5 |
| R | 0.9088 | 17 | 0.8608 | 21 |
| RI | 0.9096 | 17 | 0.8608 | 3 |
| G | 0.9080 | 21 | 0.8624 | 19 |
| GI | 0.9096 | 13 | 0.8648 | 13 |
Highest F-measure reached by the original and the improved feature selection methods.
| Method | Highest | Highest |
|---|---|---|
| F | 0.8144 | 0.7817 |
| FI | 0.8586 | 0.8071 |
| R | 0.8585 | 0.7851 |
| RI | 0.8599 | 0.7851 |
| G | 0.8518 | 0.7640 |
| GI | 0.8579 | 0.8003 |
Figure 2The features selected by Fisher and Fisher combined with PCC. (a) Different category synthesis factors are represented as different color; (b) features selected by Fisher score; (c) features selected by Fisher score combined with PCC in our algorithm.
Figure 3Performance comparison of the proposed algorithm and some popular feature selection methods. (a) Using nearest neighbor as classifier; (b) Using Naive Bayes as classifier.
Highest classification accuracy rates reached by different feature selection methods.
| Nearest Neighbor | Naive Bayes | |||
|---|---|---|---|---|
|
|
| |||
| Method | Highest | Dimension | Highest | Dimension |
| Constraint Score | 0.908 | 21 | 0.8639 | 19 |
| Ttest | 0.9096 | 19 | 0.8728 | 2 |
| FCBF | 0.8072 | / | 0.8584 | / |
| MRMR | 0.908 | 21 | 0.868 | 1 |
| Our algorithm (FI) | 0.9112 | 12 | 0.8767 | 5 |
Optimal F-measure values reached by different feature selection methods.
| Method | ||
|---|---|---|
| Constraint Score | 0.8388 | 0.7588 |
| Ttest | 0.8046 | 0.7825 |
| FCBF | 0.5416 | 0.7730 |
| MRMR | 0.7723 | 0.7721 |
| Our algorithm (FI) | 0.8586 | 0.8071 |
Description of the input synthetic factors.
| Category | ID | Description |
|---|---|---|
| Gel composition | F1 | The molar amount of Al2O3 in the gel composition |
| F2 | The molar amount of P2O5 in the gel composition | |
| F3 | The molar amount of solvent in the gel composition | |
| F4 | The molar amount of template in the gel composition | |
|
| ||
| Solvent | F5 | The density |
| F6 | The melting point | |
| F7 | The boiling point | |
| F8 | The dielectric constant | |
| F9 | The dipole moment | |
| F10 | The polarity | |
|
| ||
| Organic template | F11 | The longest distance of organic template |
| F12 | The second longest distance of organic template | |
| F13 | The shortest distance of organic template | |
| F14 | The Van der Waals volume | |
| F15 | The dipole moment | |
| F16 | The ratio of | |
| F17 | The ratio of | |
| F18 | The ratio of | |
| F19 | The Sanderson electronegativity | |
| F20 | The number of free rotated single bond | |
| F21 | The maximal number of protonated H atoms | |
The feature selection process of the proposed method.
| Input: The original data sample |