Literature DB >> 34992644

A Feature Selection Algorithm Integrating Maximum Classification Information and Minimum Interaction Feature Dependency Information.

Li Zhang1.   

Abstract

Feature selection is the key step in the analysis of high-dimensional small sample data. The core of feature selection is to analyse and quantify the correlation between features and class labels and the redundancy between features. However, most of the existing feature selection algorithms only consider the classification contribution of individual features and ignore the influence of interfeature redundancy and correlation. Therefore, this paper proposes a feature selection algorithm for nonlinear dynamic conditional relevance (NDCRFS) through the study and analysis of the existing feature selection algorithm ideas and method. Firstly, redundancy and relevance between features and between features and class labels are discriminated by mutual information, conditional mutual information, and interactive mutual information. Secondly, the selected features and candidate features are dynamically weighted utilizing information gain factors. Finally, to evaluate the performance of this feature selection algorithm, NDCRFS was validated against 6 other feature selection algorithms on three classifiers, using 12 different data sets, for variability and classification metrics between the different algorithms. The experimental results show that the NDCRFS method can improve the quality of the feature subsets and obtain better classification results.
Copyright © 2021 Li Zhang.

Entities:  

Mesh:

Year:  2021        PMID: 34992644      PMCID: PMC8727115          DOI: 10.1155/2021/3569632

Source DB:  PubMed          Journal:  Comput Intell Neurosci


1. Introduction

In the era of big data, the number of dimensions of small sample data has increased dramatically, leading to dimensional disasters. In the preprocessing stage, irrelevant and redundant features need to be processed using data dimension reduction techniques. Because there are a lot of irrelevant and redundant features in high-dimensional data, these features not only lead to higher computational complexity but also reduce the accuracy and efficiency of classification methods. Feature selection [1-5] differs from other data dimensionality reduction techniques (e.g., feature extraction) [6] in that feature selection focuses on analysing the relevance and redundancy in high-dimensional data, removing as many irrelevant and redundant features as possible and retaining the relevant original physical features. This approach not only improves the data quality and classification performance but also reduces the training time of the model and makes it more interpretable [7-9]. Feature selection methods can be classified into three types: filter methods [10, 11], wrapper methods [12], and embedded methods [13]. Due to their high computational efficiency and generality, filter methods are also easily applied to ultra-high-dimensional data sets. In this paper, the filter feature selection method is used. The filter feature selection methods can be classified into rough set [14], statistics-based [15], and information-based [16] according to different metrics. Among these criteria, information-theoretic-based feature selection algorithms are currently the most popular research direction for filter feature selection algorithms. Usually, feature selection algorithms in information theory are further divided into mutual information metrics [17, 18], conditional mutual information metrics [1, 19], interactive mutual information metrics [20-22], and so on. These methods then only determine whether the features are redundant and relevant under a single condition, so the optimal feature subset cannot be obtained. At the same time, the main differences between feature extraction in deep learning and feature selection algorithms based on information-theoretic filtering are described in two ways: (1) from a business perspective, feature selection algorithms can analyse features, whereas feature extraction can only perform pattern mapping and not correlation analysis and research; (2) from an efficiency perspective, feature extraction requires higher computational resources and longer training time, whereas feature selection only needs to be performed in a low-performance server. In a high-dimensional small sample environment, the dynamic search for redundant and correlated features between features becomes a current problem to be solved in response to the diversity and high dimensionality of the data. This paper proposes a feature selection algorithm for nonlinear dynamic conditional relevance (NDCRFS). The innovations and contributions of this paper are as follows: Firstly, the correlation between independent features and class labels is calculated by mutual information. Secondly, the correlation between the candidate features and the selected features under the class label is calculated using the conditional information. Finally, the correlation and redundancy between features are judged by the interaction information. This method solves the problem of how to measure the relevance and redundancy between selected features and candidate features. The interaction information is normalized by an information gain factor to solve the dynamic balance of interaction information values. Experimental comparison of 12 benchmark data sets in k-nearest neighbour (KNN), decision tree (C4.5), and support vector machine (SVM) classifiers showed that the NDCRFS algorithm outperformed other feature selection algorithms (Mutual Information Maximization (MIM) [23], Interaction Gain-Recursive Feature Elimination (IG-RFE) [24], Interaction Weight Feature Selection (IWFS) [21], Conditional Mutual Information Maximization (CMIM) [25], Dynamic Weighting-based Feature Selection (DWFS) [26], and Conditional Infomax Feature Extraction (CIFE) [23]). The experimental results demonstrate that the proposed NDCRFS algorithm is an effective criterion for classifying feature subsets and can select the feature subsets with good classification performance. The rest of the paper is organised as follows. In Section 2, related work is presented. Section 3 discusses mutual information and conditional mutual information. In Section 4, the development of filtered feature selection algorithms is introduced and summarised and also a discussion is given on how to define independent feature relevance and redundancy, new categorical information relevance, and interaction feature dependency relevance and redundancy. In Section 5, the process and details of the implementation of the NDCRFS algorithm are described in detail. In Section 6, the effectiveness of the NDCRFS algorithm is validated by conducting a comprehensive evaluation of 12 data sets in ASU and UCI, while giving a related discussion. In Section 7, the paper is summarised and the shortcomings and future developments of the NDCRFS algorithm are pointed out.

2. Mutual Information and Conditional Mutual Information

Let X, Y, and Z be three discrete variables [27], where X={x1, x2,…, x}, Y={y1, y2,…, y}, Z={z1, z2,…, z}. Therefore, the mutual information between X and Y is defined as follows: In the above equation, p(x, y) refers to the joint distribution, and p(x) and p(y) refer to the marginal distribution. Also, the conditional mutual information of X , Y, and Z is defined as follows:

3. Related Work

A large number of feature selection algorithms have been proposed for filters, which mainly use forward search to find the optimal subset of features by evaluating the relevance between features and class labels and the redundancy between features using their respective evaluation criteria. Let F be the original set of features and let S be the best feature subset S ⊂ F, J(·) represents the assessment criteria, f indicates candidate features, and fselect indicates a selected feature, f ∈ F, f ∉ S, fselect ∈ S. Lewis et al. proposed the MIM algorithm, which focuses on selecting the k most relevant features from F using the relevance of the features to the class labels. In the MIM algorithm, it is evaluated by the following criteria: Therefore, Lin et al. studied the limitations of the MIM algorithm and proposed CIFE algorithm, in which it is evaluated with the following criteria: In JCIFE, in addition to measuring redundancy I(f; f) between features, it is proposed to measure redundancy within class labels I(f; f|C) . Yang et al. [28] proposed the Joint Mutual Information (JMI) algorithm, which is evaluated with the following criteria:where JJIM(f) has only one additional weighting factor 1/|S| over JCIFE and |S| represents the optimal number of feature subsets. Fleuret et al. proposed CMIM algorithm according to the maximum-minimum criterion, which is evaluated as follows: The difference between JCMIM(f) and JCIFE(f) is that JCMIM(f) uses a nonlinear cumulative summation standard, while JCIFE(f) uses a linear cumulative summation standard. Sun et al. considered nonlinear criteria with low computational cost and therefore proposed DWFS, in which the DWFS algorithm is evaluated as follows:where, in the WDWFS(f) standard, I(f; C|fselect) > I(f; C) means relevant and I(f; C|fselect) < I(f; C) means redundant. Hu et al. [29] proposed the Dynamic Relevance and Joint Mutual Information Maximization (DRJMIM) algorithm based on the DWFS algorithm and the JMIM algorithm, which mainly addresses the definition of feature relevance, that is, how to distinguish between the relevance of candidate features and the relevance of selected features. The evaluation criteria of this algorithm are as follows:In the above equation, C_Ratio(f, fselect)=2 × (I(f, C|fselect) − I(f, C)/H(f)+H(C)). Xiao et al. [30] believed that the use of redundancy between features can further improve the accuracy of the classification algorithm. Based on this, the Dynamic Weights Using Redundancy (DWUR) algorithm was proposed. Evaluation criteria of the algorithm are as follows: In the above equation, WDWUR(f) has one more (1 − β × I(f; fselect)) item than WDWFS(f). In summary, the analysis of equations (3) to (9) shows that the existing feature selection algorithms all have some of the following problems: (1) Redundant features and irrelevant features are not completely eliminated. (2) Interdependent features are often removed as redundant features because they are highly correlated with each other. These algorithms ignore judgements about the relevance and redundancy of interdependent features. (3) The dependency relevance and redundancy of interaction features can be judged by conditional mutual information and mutual information differences. Therefore, the study of better feature selection criteria is an urgent problem to be solved.

4. Evaluation Basis for Feature Selection

Bennasar et al. [31] argued that a feature f is considered useful if it is related to the class label C; otherwise, feature f is considered useless. This assumption only considers features to be completely independent of each other. In reality, feature f and label C correlations vary with the addition of different features, and it can be concluded that there are interdependencies between features and that feature f and class label C correlations and redundancies change dynamically with each other. In this section, the relevance and redundancy of independent and dependent features will be analysed and discussed. Let f ∈ F − S, f ∈ F − S, f ≠ f.

4.1. Independent Feature Relevance and Redundancy Analysis

Mutual information I(f; C) is often used to assess the correlation between feature f and the class label C. The stronger the correlation between feature f and the class label C is, the closer the I(f; C) value is to 1; conversely, the weaker the correlation is, the closer the value is to 0. If I(f; C) > I(f; C), then the correlation between feature f and the class label C is stronger than the correlation between feature f and the class label C. If I(f; C) < I(f; C), then the correlation between feature f and the class label C is weaker than the correlation between feature f and the class label C. The mutual information I(f; f) is often used to assess the correlation between feature f and feature f. If the correlation between f and f is high, then the redundancy between features is strong; conversely, the redundancy is weak. When I(f; f)=0, the features f and f are independent of each other. When I(f; f)=1, it means that feature f and feature f are redundant, and then it means that feature f or f is deleted.

4.2. Relevance Analysis of New Classification Information

If I(f; C|fselect) > 0, it means that the candidate feature f can provide more classification information. If I(f; C|fselect)=0, it means that the candidate feature f cannot provide any useful classification information and the features f and fselect are independent of each other. If I(f; C|fselect) > I(f; C|fselect), it means that feature f provides more classification information than feature f.

4.3. Relevance and Redundancy of Interaction Feature Dependencies

According to the literature [6, 18, 29], if I(f; fselect|C) > I(fselect; C) relevance of the selected feature fselect to the class label C is becoming stronger after the candidate feature f is added, it indicates that the candidate feature f can provide more classification information. If I(f; fselect|C) < I(fselect|C), the correlation between the selected feature fselect and the class label C is weakening after the candidate feature f is added, indicating that the candidate feature f and the selected feature fselect are redundant with each other.

5. NDCRFS Algorithm Description and Pseudocode Implementation

The feature selection algorithm seeks to search for sets of features that are closely related to class labels. To more accurately measure the relevance of features to class labels, the NDCRFS algorithm measures the relevance and redundancy of features in three ways: I(f; C) to measure the relevance of feature f to class label C I(f; fselect|C) to measure the relevance of feature f to the selected feature fselect under class label C I(f; fselect|C) − I(fselect; C) measuring the interaction correlation and redundancy between f and fselect under the class label C Therefore, for the evaluation criteria for the NDCRFS algorithm, the specific formula is as follows: In the above formula, CU(fselect, f)=2/H(fselect|C)+H(f|C), CU(fselect, f) is used as an information gain factor to normalize I(f; fselectC) − I(fselect; C).f indicates candidate features and fselect indicates a selected feature, f ∈ F, f ∉ S, fselect ∈ S. From equation (10), in the NDCRFS algorithm, it firstly selects the minimum redundant features from JNDCRFS(f) based on the correlation analysis between the selected features fselect and the candidate features f; secondly, it selects the most relevant features to the optimal feature subset S by iteration, and its pseudocode is as follows. From Algorithm 1, line 1 initializes set S and counters k. In lines 2 to 7, the mutual information of each feature in the set F is calculated. In lines 8 to 10, at the same time, the selected optimal feature f is removed from set F, and feature f is added to set S. At this time, the candidate feature f becomes the selected feature fselect. In lines 11 to 18, the values of I(f; C|fselect), I(f; fselect|C), and I(fselect; C) are calculated. The NDCRFS algorithm consists of 2 “for” loops and 1 “while” loop. Therefore, the time complexity of the NDCRFS algorithm is O(Tmn) (T represents the number of selected features, n represents the number of all features, and m represents the number of all samples, where T ≪ n). The complexity of the NDCRFS algorithm is higher than that of the MIM algorithm, IWFS algorithm, CMIM algorithm, DWFS algorithm, and CIFE algorithm, but the NDCRFS algorithm is lower than the IG-RFE algorithm, mainly because the NDCRFS algorithm also needs to calculate CU(fselect, f), I(f; fselect|C) − I(fselect; C), I(f; C|fselect).

6. Experiments and Results

6.1. Introduction to the Data Set

In order to verify the effectiveness of the NDCRFS algorithm, a total of 12 data sets were used in the experiments. The experimental data sets were selected from the internationally renowned UCI [3] and ASU [14] general data sets, which are described in detail in Table 1. From Table 1, we know that the sample range is from 60 to 7494, the feature range is from 16 to 19 993, and the classification label range is from 2 to 20. The experimental data sets involve biomedical (Lymphography, Dermatology, Lung Cardiotocography, Lymphoma, Nci9, SMK-CAN-187, and Carcinom), face image data (COIL20 and Pixraw10P), and text data (PCMAC and Pendigits).
Table 1

Experimental data set description.

No.Data setSamplesFeaturesCategoriesData sources
1Lymphography148188UCI
2Dermatology358346UCI
3Cardiotocography2126413UCI
4Pendigits74941610UCI
5Lung20333125ASU
6Carcinom174918211ASU
7Nci96097129ASU
8PCMAC194332892ASU
9Pixraw10P10010,00010ASU
10SMK-CAN-18718719,9932ASU
11Lymphoma9640269ASU
12COIL201440102420ASU

6.2. Experimental Environment Setup

NDCRFS was compared with six feature selection algorithms, MIM, IG-RFE, IWFS, CMIM, DWFS, and CIFE, to verify its effectiveness. The experiments were conducted using KNN, SVM, and C4.5, respectively, on the same feature subsets. The number of feature subsets was set as (K); for example, K = 10 for Lymphography and Pendigits and K=30 for the rest of the settings. The experimental environment for this paper was an Intel-i7 processor with 8 GB RAM, and the simulation software was Python 2.7. A 5-fold cross-validation method was used in the experiments to obtain the average classification accuracy of the current classifier for that feature selection algorithm's average classification accuracy. In the experiment, the incomplete samples are deleted, and, at the same time, according to Kuarga [32], the class attribute dependence maximization method is used to discretize continuous data.

6.3. Discussion and Analysis of Experimental Results

6.3.1. Comparison of Algorithm Variability

This paper proposes a method to measure the difference between two selected feature subsets using the Jaccard method. Among them, S1 ⊂ F, S2 ⊂ F, S1 ≠ S2.S1 represents the feature subset selected by the NDCRFS algorithm, and S2 represents the feature subset selected by other feature selection algorithms. The specific formula (11) is as follows: As can be seen in Table 2, the mean values of the difference between NDCRFS and MIM, NDCRFS and IG-RFE, NDCRFS and IWFS, NDCRFS and CMIM, NDCRFS and DWFS, and NDCRFS and CIFE are 0.355, 0.389, 0.261, 0.222, 0.286, and 0.166, respectively, indicating that the difference between features is not considered. When sorting the relationship, the NDCRFS algorithm is significantly different from the other feature selection algorithms.
Table 2

The difference between NDCRFS and the comparison algorithms.

No.MIMIG-RFEIWFSCMIMDWFSCIFE
10.6670.8180.3330.3330.4290.176
20.9350.9350.7650.7650.8180.765
30.5380.5790.50.4630.50.5
40.8180.8180.3330.3330.250.25
50.0170.0170.0170.00.1320.0
60.00.0170.0170.00.0340.091
70.5790.6220.0530.2240.0170.034
80.4290.50.2240.3950.250.091
90.0340.0170.0170.0170.0910.017
100.0910.0170.8180.00.7650.0
110.1320.1320.0340.1320.0710.071
120.0170.20.0170.00.0710.0
Average0.3550.3890.2610.2220.2860.166

6.4. Comparison of Classification Accuracy

Tables 3to 5 show the average classification accuracy on the 12 data sets using KNN, C4.5, and SVM. Bold represents the highest accuracy value in the feature selection algorithm for that data set. Tables 3–5 show that the NDCRFS algorithm had the highest average classification accuracy of 88.734%, 81.574%, and 79.213%, respectively. “Wins/Ties/Losses” describes the number of wins/ties/losses between NDCRFS and MIM, IG-RFE, IWFS, CMIM, DWFS, and CIFE.
Table 3

Average classification accuracy (%) of KNN classifier.

Data setNDCRFSMIMIG-RFEIWFSCMIMDWFSCIFE
Lymphography 38.3 34.7835.5935.5934.8835.2834.78
Dermatology 97.769 92.16492.16488.51290.7996.6887.139
Cardiotocography 98.589 98.40198.40198.40198.40198.58998.401
Pendigits 97.919 97.14597.14597.23897.50598.15997.625
Lung 88.636 88.06483.71276.39181.67887.68174.922
Carcinom 85.48 68.03732.25560.03565.8467.02631.952
Nci9 76.69 75.4474.01269.02476.11948.42957.25
PCMAC 87.648 85.53886.15582.34884.76585.74378.952
Pixraw10P 93.0 88.091.088.092.088.092.0
SMK-CAN-187 70.014 68.39369.00470.065.74768.42158.876
Lymphoma 95.667 84.72284.7569.80690.08372.05682.833
COIL20 84.662 80.73379.74371.66777.11472.02460.652
Average accuracy rate 88.734 84.2476.99475.58483.6476.50771.28
Wins/Ties/Losses12/0/012/0/012/0/012/0/012/0/012/0/0

The “Average” column gives the average accuracy value of the feature selection algorithm over all datasets. Bold represents the highest average classification prediction under this dataset.

Table 5

Average classification accuracy (%) of SVM classifier.

Data setNDCRFSMIMIG-RFEIWFSCMIMDWFSCIFE
Lymphography 45.147 42.49943.32941.4542.82543.32942.825
Dermatology 98.317 93.77793.82493.28394.07997.76193.53
Cardiotocography 98.448 98.40198.40198.40198.40198.40198.401
Pendigits 63.331 63.33163.33155.3559.74156.97957.219
Lung84.78877.8978.39177.891 86.203 85.31177.402
Carcinom 87.964 50.99825.02850.44751.54555.77320.915
Nci976.512 78.119 76.6962.59574.42957.92958.821
PCMAC 85.589 85.58885.48682.19485.33385.38280.394
Pixraw10P 92.0 91.091.091.091.091.091.0
SMK-CAN-18770.98270.56962.532 71.593 65.3271.05357.255
Lymphoma85.581.27879.61167.05681.97272.194 86.194
COIL20 68.352 63.88662.06752.82455.93348.63840.905
Average accuracy rate 79.213 73.36371.64170.22673.89871.97965.333
Wins/Ties/Losses10/1/112/0/012/0/011/0/110/0/211/0/1

The “Average” column gives the average accuracy value of the feature selection algorithm over all datasets. Bold represents the highest average classification prediction under this dataset.

Table 4

Average classification accuracy (%) of C4.5 classifier.

Data setNDCRFSMIMIG-RFEIWFSCMIMDWFSCIFE
Lymphography 43.935 41.89341.47341.34742.32243.00242.322
Dermatology 95.021 94.43494.14994.18795.02193.33794.727
Cardiotocography 98.401 98.40198.40198.40198.40198.40198.401
Pendigits 94.569 94.34394.19693.78293.76894.22293.675
Lung 87.774 79.91885.11375.96483.84284.15777.236
Carcinom 70.604 54.58625.7948.29256.82253.99924.3
Nci969.92961.01265.09560.667 71.083 57.92960.226
PCMAC 87.906 86.46486.51582.50285.89786.66980.805
Pixraw10P 99.0 97.096.092.095.092.095.0
SMK-CAN-18764.12562.00661.49463.65662.077 65.747 57.852
Lymphoma 87.75 79.7580.069.52882.80669.41786.917
COIL20 79.876 67.61472.76263.18662.89570.62958.295
Average accuracy rate 81.574 76.45275.08273.62677.49575.79272.48
Wins/Ties/Losses11/1/011/1/011/1/010/1/110/1/111/1/0
From Table 3, it is clear that the NDCRFS algorithm outperforms the MIM, IG-RFE, IWFS, CMIM, DWFS, and CIFE algorithms in most data sets by 12, 12, 12, 12, 12, and 12, respectively. In Figure 1(a), the classification accuracy of the NDCRFS algorithm is the highest compared to the six classification algorithms (97.769%, the required number of features is 23), which is 5.605%, 5.605%, 9.257%, 6.979%, 1.089%, and 10.63% higher, respectively. In Figure 1(b), the classification accuracy of the NDCRFS algorithm is the highest compared to the six classification algorithms (98.589%, the number of required features is 5), which is 0.188%, 0.188%, 0.188%, 0.188%, 0.0%, and 0.188% higher, respectively. In Figure 1(c), the classification accuracy of the NDCRFS algorithm is the highest compared to the six classification algorithms (76.69%, the required number of features is 28), which is 1.25%, 2.678%, 7.666%, 0.571%, 28.261%, and 19.44% higher, respectively. In Figure 1(d), the classification accuracy of the NDCRFS algorithm is the highest compared to the six classification algorithms (70.014%, the number of required features is 15), which is 1.621%, 1.01%, 0.014%, 4.267%, 1.593%, and 11.138% higher, respectively.
Figure 1

Comparison of accuracy in KNN classifier.

From Table 4, the NDCRFS algorithm is superior to the MIM, IG-RFE, IWFS, CMIM, DWFS, and CIFE algorithms in the majority of data sets, with 11, 11, 11, 10, 10, and 11, respectively. In Figure 2(a), the classification accuracy of the NDCRFS algorithm is the highest compared to the six classification algorithms (43.935%, the required number of features is 7), which is 2.042%, 2.462%, 2.588%, 1.613%, 0.933%, and 1.613% higher, respectively. In Figure 2(b), the classification accuracy of the NDCRFS algorithm is the highest compared to the six classification algorithms (94.569%, the number of required features is 10), which is 0.226%, 0.373%, 0.787%, 0.801%, 0.347%, and 0.894% higher, respectively. In Figure 2(c), the classification accuracy of the NDCRFS algorithm is the highest compared to the six classification algorithms (87.774%, the required number of features is 30), which is 7.856%, 2.661%, 11.81%, 3.932%, 3.617%, and 10.538% higher, respectively. In Figure 2(d), the classification accuracy of the NDCRFS algorithm is the highest compared to the six classification algorithms (87.75%, the required number of features is 4), which is 8.0%, 7.75%, 18.222%, 4.944%, 18.333%, and 0.833% higher, respectively.
Figure 2

Comparison of accuracy in C4.5 classifier.

From Table 5, the NDCRFS algorithm is superior to the MIM, IG-RFE, IWFS, CMIM, DWFS, and CIFE algorithms in the majority of data sets, with 10, 12, 12, 11, 10, and 11, respectively. In Figure 3(a), the classification accuracy of the NDCRFS algorithm is the highest compared to the six classification algorithms (87.964%, the number of required features is 28), which is 36.966%, 62.936%, 37.517%, 36.419%, 32.191%, and 67.049% higher, respectively. In Figure 3(b), the classification accuracy of the NDCRFS algorithm is the highest compared to the six classification algorithms (85.589% with 20 required features), which is 0.001%, 0.102%, 3.394%, 0.255%, 0.206%, and 5.194% higher, respectively. In Figure 3(c), the classification accuracy of the NDCRFS algorithm is the highest compared to the six classification algorithms (92%, the number of required features is 5), which is 1%, 1%, 1%, 1%, 1%, and 1% higher, respectively. In Figure 3(d), the classification accuracy of the NDCRFS algorithm is the highest compared to the six classification algorithms (68.352%, the number of features required is 24), which is 4.466%, 6.285%, 15.528%, 12.419%, 19.714%, and 27.447% higher, respectively.
Figure 3

Comparison of accuracy in SVM classifier.

6.5. Runtime Analysis of the Algorithm

Calculating the running time of feature selection algorithms is also one of the criteria to measure the importance of feature selection algorithms. Now, the running times of the NDCRFS algorithm, the MIM algorithm, the IG-RFE algorithm, the IWFS algorithm, the CMIM algorithm, the DWFS algorithm, and the CIFE algorithm are compared. In Table 6, these feature selection algorithms are the final runtimes derived from the feature ranking of all features of the 12 data sets. The NDCRFS algorithm's runtimes are well within acceptable limits.
Table 6

The runtimes of different feature selection algorithms.

Date setRuntime (s)
NDCRFSMIMIG-RFEIWFSCMIMDWFSCIFE
Lymphography0.1410.0890.1710.0780.0620.0780.09
Dermatology1.3730.7121.5760.8110.6710.8430.824
Cardiotocography9.9525.97612.2156.3035.5236.385.599
Pendigits5.7254.1776.5683.5883.1983.8073.878
Lung216.292155.033322.127134.73127.425166.766131.861
Carcinom629.731577.148744.337351.026315.636407.857502.515
Nci9149.744130.371167.166100.87681.922104.424133.998
PCMAC1206.531130.491689.445878.968615.348836.9691133.675
Pixraw10P335.022242.977415.235216.42188.65171.897259.263
SMK-CAN-1871649.124731.8131905.724812.859727.913995.003749.035
Lymphoma102.75545.368113.0996.243.59194.084248.495
COIL20414.124307.934570.717290.075273.888264.382248.495
Average393.376277.674495.698240.995198.652254.374284.811
The results of the 5-fold cross-validation experiments on the ASU and UCI data sets show that the proposed NDCRFS algorithm is able to select a subset of features with better classification performance, which can further improve the discrimination ability of the data set under data dimensionality compression.

7. Conclusion

Feature selection is an important tool for the data preprocessing phase in high-level small sample data. The main objective of feature selection is to select the optimal subset of features and should have a high classification accuracy. Therefore, in this paper, a nonlinear dynamic conditional correlation feature selection algorithm is proposed. The algorithm first uses mutual information, conditional mutual information, and interactive mutual information to determine and identify the relevance and redundancy of independent features and dependent features. Secondly, the “max-min” principle is used to eliminate redundant and irrelevant features from the original feature set iteratively. Finally, the effectiveness of this algorithm is verified through experiments, which demonstrate that the NDCRFS algorithm significantly outperforms feature selection algorithms MIM, IG-RFE, IWFS, CMIM, DWFS, and CIFE in most of the data sets. However, the NDCRFS algorithm also has an unsatisfactory selection of feature subsets on some data sets. In the future, it will be necessary to optimize the NDCRFS, while verifying the proposed method in research fields.
  4 in total

1.  Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis.

Authors:  Xin Sun; Yanheng Liu; Da Wei; Mantao Xu; Huiling Chen; Jiawei Han
Journal:  J Biomed Inform       Date:  2012-11-02       Impact factor: 6.317

2.  A filter feature selection method based on the Maximal Information Coefficient and Gram-Schmidt Orthogonalization for biomedical data mining.

Authors:  Hongqiang Lyu; Mingxi Wan; Jiuqiang Han; Ruiling Liu; Cheng Wang
Journal:  Comput Biol Med       Date:  2017-08-24       Impact factor: 4.589

3.  A new feature selection method based on symmetrical uncertainty and interaction gain.

Authors:  Xiaohui Lin; Chao Li; Weijie Ren; Xiao Luo; Yanpeng Qi
Journal:  Comput Biol Chem       Date:  2019-11-06       Impact factor: 2.877

4.  A Feature Selection Approach Based on Interclass and Intraclass Relative Contributions of Terms.

Authors:  Hongfang Zhou; Jie Guo; Yinghui Wang; Minghua Zhao
Journal:  Comput Intell Neurosci       Date:  2016-08-08
  4 in total
  1 in total

1.  Feature Selection Based on Adaptive Particle Swarm Optimization with Leadership Learning.

Authors:  Zhiwei Ye; Yi Xu; Qiyi He; Mingwei Wang; Wanfang Bai; Hongwei Xiao
Journal:  Comput Intell Neurosci       Date:  2022-08-28
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.