Literature DB >> 36160366

Unsupervised feature selection based on incremental forward iterative Laplacian score.

Jiefang Jiang1,2, Xianyong Zhang1,2,3, Jilin Yang2,4.   

Abstract

Feature selection facilitates intelligent information processing, and the unsupervised learning of feature selection has become important. In terms of unsupervised feature selection, the Laplacian score (LS) provides a powerful measurement and optimization method, and good performance has been achieved using the recent forward iterative Laplacian score (FILS) algorithm. However, there is still room for advancement. The aim of this paper is to improve the FILS algorithm, and thus, feature significance (SIG) is mainly introduced to develop a high-quality selection method, i.e., the incremental forward iterative Laplacian score (IFILS) algorithm. Based on the modified LS, the metric difference in the incremental feature process motivates SIG. Therefore, SIG offers a dynamic characterization by considering initial and terminal states, and it promotes the current FILS measurement on only the terminal state. Then, both the modified LS and integrated SIG acquire granulation nonmonotonicity and uncertainty, especially on incremental feature chains, and the corresponding verification is achieved by completing examples and experiments. Furthermore, a SIG-based incremental criterion of minimum selection is designed to choose optimization features, and thus, the IFILS algorithm is naturally formulated to implement unsupervised feature selection. Finally, an in-depth comparison of the IFILS algorithm with the FILS algorithm is achieved using data experiments on multiple datasets, including a nominal dataset of COVID-19 surveillance. As validated by the experimental results, the IFILS algorithm outperforms the FILS algorithm and achieves better classification performance.
© The Author(s), under exclusive licence to Springer Nature B.V. 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Entities:  

Keywords:  Feature selection; Feature significance; Forward iterative Laplacian score; Granulation nonmonotonicity and uncertainty; Incremental forward iterative Laplacian score; Unsupervised learning

Year:  2022        PMID: 36160366      PMCID: PMC9484723          DOI: 10.1007/s10462-022-10274-6

Source DB:  PubMed          Journal:  Artif Intell Rev        ISSN: 0269-2821            Impact factor:   9.588


Introduction

Machine learning facilitates the research and application of artificial intelligence (Hai et al 2021; Lim et al 2022; Tham et al 2022; Van et al 2022), and feature selection (FS) is a basic topic in machine learning (Cai et al 2018; Tubishat et al 2021). The aim of FS is validity extraction and dimensionality reduction, and it has been deeply researched in data mining (El-Hasnony et al 2020; Nguyen et al 2020; Sun et al 2021), pattern recognition (Gunal and Edizkan 2008), etc. Moreover, FS has practical application (Kou et al 2020; Remeseiro and Bolon-Canedo 2019). FS is usually classified into three cases based on the decision category information, i.e., supervised FS, semisupervised FS, and unsupervised FS. Supervised and semisupervised cases require objects with completely-labeled and partially-labeled categories, respectively. Currently, unlabeled data exist universally across diverse fields; therefore, unsupervised FS works well for anomaly detection (Amarbayasgalan et al 2020; Yuan et al 2022), clustering learning (Chen et al 2020; Zhou et al 2020), medical analysis (García-Díaz et al 2020; Rostami et al 2020), etc. Unsupervised FS can be divided into three main methods: “wrapper, filter, hybrid” (Alelyani et al 2018; Solorio-Fernández et al 2020). In terms of the FS strategy, wrapper methods utilize clustering algorithms (Breaban and Luchian 2011) but filter methods do not (Dadaneh et al 2016; Luo et al 2017), while hybrid approaches reach a compromise by combining the above two methods (Solorio-Fernández et al 2016). Filter unsupervised FS depends on the intrinsic properties of data to exhibit good characteristics and applicability, and basic filter learning is motivated by the Laplacian score (LS). He et al (2005) proposed LS to evaluate feature importance, and thus, FS was implemented by the LS algorithm. Then, Zhao et al (2008) used LS for face recognition comparison in semisupervised FS, and Huang et al (2018) presented manifold-based constraint LS for multilabel learning. Moreover, Zhu et al (2012) proposed an LS-based FS algorithm called the iterative LS (ILS) algorithm, and using this approach, the nearest neighborhood graph is iteratively updated by discarding the least relevant feature; thus, better experimental results were achieved using the ILS algorithm compared with the LS algorithm. To enhance both the LS and ILS algorithms, Pang and Zhang (2020a) recently proposed an improved FS algorithm, i.e., the forward iterative LS algorithm. More specifically, LS is modified by both the parameter concretization (on the K nearest neighbors (KNN)) and feature subset assessment, and then, the FILS algorithm introduces the forward strategy and selective criterion by optimally extracting the most important feature in each iteration. Meanwhile, Pang and Zhang (2020b) introduced the modified LS to linearly combine the neighborhood discrimination index, and the corresponding semisupervised FS was further implemented. Now, the FILS algorithm (Pang and Zhang 2020a) is refocused on, and it utilizes the modified LS assessment and forward iteration strategy to achieve better performance than the LS and ILS algorithms. This algorithm still has room for improvement. In the FILS algorithm, the feature correlation and selection criterion require uncertainty measurements on the subset transfer , and this core process actually adheres to dynamic learning with a feature increment; however, only the measurement of the terminal state subset is of concern when using the FILS algorithm. However, the initial state A with dynamic change is also worth considering. Thus, in this paper, feature significance (SIG) is introduced through the process of state deference, i.e., , and the relevant uncertainty characterization can more systematically evaluate and effectively promote forward iterations with dynamics. The in-depth utilization of SIG has already been applied in optimal selection and heuristic reduction of rough set reasoning (Wang et al 2017; Xu et al 2021; Yuan et al 2021; Zhang et al 2021; Zhang and Yao 2022). In this paper, the modified LS and forward iteration are still utilized. However, the SIG technology is added to implement (filter) unsupervised FS. The relevant algorithm is referred to as the incremental forward iterative LS (IFILS) algorithm, and it truly improves the FILS algorithm, which already improves the LS and ILS algorithms. Thus, better learning performance on classification recognition is achieved using the IFILS algorithm, as validated by data experiments. Moreover, both the basic modified LS and developmental SIG acquire in-depth measurement properties, mainly granulation nonmonotonicity and uncertainty. In this paper, unsupervised feature selection is implemented, and its purpose is to improve the current FILS algorithm (Pang and Zhang 2020a). Regarding its novelty, SIG and its granulation nonmonotonicity and uncertainty are deeply mined, and thus an improved algorithm, i.e., the IFILS algorithm, is designed to achieve better classification performance. The remainder of this paper is organized as follows. In Sect. 2, unsupervised FS based on FILS is reviewed. In Sect. 3, unsupervised FS based on IFILS algorithm is established. In Sect. 4, an illustrative example is provided for relevant measures and algorithms. In Sect. 5, comparative experiments of the FILS and IFILS algorithms are performed. Finally, in Sect. 6, this paper is concluded.

Unsupervised feature selection based on forward iterative Laplacian score (FILS)

LS effectively evaluates feature description abilities. Therefore, it facilitates FS especially in unsupervised learning. In LS-based FS, if samples with similar distances in the original feature space maintain their neighbor relationships in a single feature dimension, then the basic feature is considered to maintain local structures of data; that is, FS utilizes the nearest neighbor graph to explore the local structures. Of course, different LS forms and treatments may cause different algorithms and effects of unsupervised FS. Next, the relevant FILS algorithm is reviewed based on basic LS and improved LS (Pang and Zhang 2020a). Let be a decision information system. Here, universe carries u samples, the condition set contains n features (such as ), denotes the value on i-th sample and r-th feature (; ), and D implies the decision set. By separation, information system is mainly used for unsupervised FS, and D supports only eventual identification for learning effects. Let .

Definition 1

(LS (He et al 2005)) Information system has the nearest neighbor graph. If is in KNN of , then nodes i and j are connected by an edge. By sample circulation in universe X, weight matrix is obtained as follows:Here, is an adjustable parameter, is the Euclidean distance for , and denotes the KNN set of . Now, introduce matrix expressions:Then, LS of feature is defined aswhere and . Basic LS can be used to guide unsupervised FS, as realized by the LS algorithm (He et al 2005). In this method, the nearest neighborhood graph is mainly iteratively updated, and the importance of a feature is evaluated based on its locality preservation ability. Furthermore, the ILS algorithm is used to remove poor performing features (Zhu et al 2012); therefore, it generally outperforms the LS algorithm. In contrast, selecting higher-ranking features becomes a direct and effective strategy. For example, better effects are achieved using the FILS algorithm with the LS improvement (Pang and Zhang 2020a).

Definition 2

(FILS (Pang and Zhang 2020a)) The original weight matrix in Eq. (1) is defined asHere, is given, where represents the KNN of , and has the similar case. By Eqs. (2) (4), -based is updated to generate matrices:where is the sample matrix of feature subset . Thus, the modified LS of feature subset A is defined aswhere denotes the sum of the diagonal matrix elements. The selection criterion (regarding picking up the most important feature) is based onwhere In Definition 2, the weight matrix is first defined by the modified LS through the assignment of parameter t to the distance production on KNN, i.e., . This strategy is used to introduce relative measurements and statistical information on KNN. Therefore, it largely eliminates similar elements to accurately reflect the intrinsic relationships of the elements. Moreover, feature subsets rather than single features are of concern in variant LS. Therefore, it more completely contains local structures and actual interactions. Briefly, the new LS fully considers mutual influences between elements and features to improve the previous LS, so it is beneficial in unsupervised FS. As a result, the corresponding feature indicator and optimization extraction are presented in Eq. (7) of Definition 2, and they are mainly established by virtue of minimality and inversion. They naturally lead to the FILS algorithm and its validity. Note that abbreviation FILS is used to represent both the relevant mechanism and the feature extraction algorithm (Pang and Zhang 2020a).

Unsupervised feature selection based on incremental forward iterative Laplacian score (IFILS)

Based on observation, the FILS algorithm still has room for improvement with respect to unsupervised feature selection. Although each dynamic iteration necessarily contains two states, the beginning and ending states, this algorithm mainly addresses uncertainty measurements for the ending state in an iterative process. Therefore, not describing the beginning state causes both the basic defect of measurement incompleteness and the further possibility of FS improvement. In this section, the incremental technology and complete measurements are introduced to address the underlying measurement issue in the FILS algorithm, and thus, an improved FS algorithm (i.e., the IFILS algorithm) is proposed.

Feature significance in the incremental process and its granulation nonmonotonicity/uncertainty

Herein, a process measure for feature importance is first mined based on modified LS, and its granulation nonmonotonicity/uncertainty is further revealed. Feature selection with forward iterations concerns the core process from A to . Therefore, the additional feature should be optimally selected to efficiently increase A. The procedure contains two states: the initial set A and the final extension . In the FILS algorithm, the evaluation measurement in Eq. (7) only considers the final state . Based on the broad treatment of the global , it undoubtedly outperforms the single evaluation of the local . However, this approach represents only the structural absoluteness of the process termination. Therefore, measurement weakness easily occurs. The initial state A inevitably exhibits subsequent iterative renewals; thus, the relevant relativity measurement should be added for integrated reinforcement, based on the concept of double quantification and its advantages (Zhang and Gou 2022). Therefore, a better structural mechanism that addresses both initial and final states is worth introducing to realize systematic extraction and comprehensive assessment. Herein, the process difference is considered, and thus, the process is integrally estimated by or its related variants. The contrast difference or difference operation has been extensively utilized in multiple disciplines, such as physics and mathematics, and its resultant feature significance has already profited the optimization selection and heuristic search in rough set analysis (Wang et al 2017; Zhang et al 2021; Zhang and Yao 2022).

Definition 3

(SIG) In information system , the significance (SIG) of the additional feature on feature subset is defined asand it is a new measure from the basic modified LS and related difference operation. utilizes the difference of modified LS to express a type of feature importance. Therefore, a better assessment of the increment process with dynamics is obtained. The difference characteristic of SIG implies a general increment, and it has become a common and effective form. In later studies, SIG is utilized to optimize measurements and select features. As a basis and an extension, granulation nonmonotonicity/uncertainty, which is a fundamental property regarding knowledge cognition (Xu et al 2021; Zhang and Jiang 2022), is particularly discussed for the SIG metric. In fact, knowledge-based uncertainty is an inherent quality of intelligent information systems, and its relevant granulation monotonicity and nonmonotonicity have currently gained extensive attention in feature selection and attribute reduction (Stańczyk and Zielosko 2020; Zhang and Yao 2022; Gao et al 2019; Wang et al 2015, 2019). An inclusion relationship of feature subsets, such as , is theoretically required to discuss the granulation uncertainty and monotonicity/nonmonotonicity of SIG. In practice, researchers may focus more on a usual granulation chain with feature additions, i.e.,which concerns incremental processes:This totally ordered sequence from feature granulation can be effectively used to probe hierarchical knowledge structures and corresponding measurement manifestations.

Theorem 1

In information system containing , does not necessarily hold, and the inequality may partially or completely emerge. In other words, the modified LS has granulation nonmonotonicity and uncertainty.

Corollary 1

For (Eq. (10)), if or , then neither nor necessarily holds. That is, all modified LS values on the feature-incremental chain, i.e.,have nonmonotonic and uncertain size changes.

Theorem 2

For (Eq. (10)), let so that . Thus, Eqs. (9), (10) and (11) lead to the following formulas:where so that . Based on Eq. (13), neither nor necessarily holds. This basic size result implies sequential size changes with granulation nonmonotonicity and uncertainty for values: Based on Eq. (14), neither nor necessarily holds. This basic size result implies sequential size changes with granulation nonmonotonicity and uncertainty for general values (): as well as for special values (where ) or n values. where is additionally and reasonably stipulated.

Corollary 2

In information system , the measure has nonmonotonic and uncertain size relationships in relation to the variable changes of subset A and feature . In other words, let and . A pair of metric values may have no necessary sizes for four special pairs: Theorems 1, 2 and Corollaries 1, 2 present the basic nonmonotonicity and further uncertainty for two measures, mainly by the granulation changes of feature subsets. For these metric properties, Theorem 1 and Corollary 1 support modified LS, while Theorem 2 and Corollary 2 support SIG. As a comparison, Theorem 1 and Corollary 2 mainly focus on the direct change of , while Theorem 1 and Corollary 1 reveal the incremental chain . The theoretical correctness, especially in terms of the relevant changes of Theorem 1 and Corollary 1 on the feature chain, will be verified through examples and experiments, as shown in Table 3, Fig. 2, Table 6, and Fig. 3. Since SIG is derived from modified LS, the nonmonotonicity and uncertainty of modified LS induce the similar properties in SIG. In other words, the nonmonotonicity and uncertainty of SIG mainly originate from modified LS.In summary, Theorem 1 describes the original and basic nonmonotonicity/uncertainty of modified LS, and Corollary 1 is derived from it. Corollary 1 generates Theorem 2, which describes the SIG nonmonotonicity/uncertainty on the incremental feature chain, and the latter is the main conclusion that will be observed and verified. Finally, Corollary 2 can be explained by Theorems 1 or 2. All these properties of granulation nonmonotonicity and uncertainty are in-depth. Therefore, they both enrich theoretical studies and underlie practical applications of modified LS and evolutive SIG for uncertainty characterization.
Table 3

Measurement values of the modified LS and SIG on incremental chain of Table 2 (the Immunotherapy dataset)

MeasureFeature-incremental chain \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_1\subset A_2\subset \cdots \subset A_7$$\end{document}A1A2A7
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_1$$\end{document}A1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_2$$\end{document}A2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_3$$\end{document}A3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_4$$\end{document}A4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_5$$\end{document}A5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_6$$\end{document}A6\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_7$$\end{document}A7
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J_{FILS}(A_r)$$\end{document}JFILS(Ar)0.0010.00540.10130.04880.14940.17450.2213
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$SIG_{A_r}(f_{r+1})$$\end{document}SIGAr(fr+1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.0044$$\end{document}-0.0044\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.0959$$\end{document}-0.0959\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+0.0525$$\end{document}+0.0525\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.1006$$\end{document}-0.1006\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.0251$$\end{document}-0.0251\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.0468$$\end{document}-0.0468
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$SIG_{A_r}(f_{7})$$\end{document}SIGAr(f7)− 0.0199− 0.0503− 0.0705− 0.0700− 0.0488− 0.04680
Fig. 2

Three-way non-monotonicity/uncertainty changes of the modified LS and SIG on incremental chain of Table 2 (the Immunotherapy dataset)

Table 6

-based three-way value table of the modified LS and SIG on incremental chains of nine UCI datasets

No.DatasetLocal or complete feature-incremental chains
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_1$$\end{document}A1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_2$$\end{document}A2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_3$$\end{document}A3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_4$$\end{document}A4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_5$$\end{document}A5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_6$$\end{document}A6\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_7$$\end{document}A7
(1)Blood[0.0291, − 0.0029, + 0.017][0.032, + 0.005, + 0.0107][0.027, + 0.0042,+ 0.0042][0.0228, –, 0]
(2)Ccbr[0.2519, + 0.0543, + 0.1636][0.1976, − 0.0178,+ 0.0611][0.2153, + 0.0008,+ 0.0141][0.2145, − 0.0395, − 0.0191][0.254, − 0.0362, − 0.0143][0.2902, − 0.034, − 0.0046][0.3242, + 0.0258, − 0.0079]
(3)Ecoli[0.0016, − 0.0162, − 0.0177][0.0178, − 0.0003, − 0.0254][0.018, − 0.0234, − 0.0296][0.0414, − 0.0252, − 0.0249][0.0666, − 0.0143, − 0.0241][0.0809, − 0.0114, − 0.0114][0.0923, –, 0]
(4)Glass[0.0285, − 0.0601, − 0.034][0.0885, + 0.0406, − 0.0152][0.0479, − 0.0331, − 0.0322][0.081, − 0.027, − 0.0211][0.108, − 0.0249, − 0.0194][0.133, − 0.0041, − 0.0143][0.1371, − 0.0309, − 0.014]
(5)Hayes-roth[0.0018, − 0.0048, − 0.0256][0.0066, − 0.0628, − 0.0676][0.0694, − 0.0785, − 0.082][0.1479, − 0.0654, − 0.0654][0.2133, –, 0]
(6)Iris[0.0031, − 0.0401, − 0.0135][0.0431, + 0.0074, + 0.0015][0.0358, + 0.0012, + 0.0012][0.0345, –, 0]
(7)Lung cancer[0.656, + 0.3857, + 0.549][0.2703, − 0.0571, + 0.1239][0.3274, − 0.0408, + 0.0638][0.3682, + 0.0462, + 0.0562][0.3221, − 0.0609, − 0.0496][0.3829, − 0.0613, − 0.0551][0.4443, − 0.075, − 0.0498]
(8)Wine[0.0057, − 0.0276, − 0.019][0.0333, − 0.0468, − 0.0288 ][0.0801, − 0.042, − 0.0239][0.1221, − 0.0478, − 0.0164][0.1699, − 0.0211, − 0.0042][0.1911, − 0.0004, − 0.0046][0.1915, − 0.0197, − 0.0013]
(9)COVID-19[0.6099, − 0.1356, − 0.175][0.7455, − 0.0765, − 0.1013][0.822, − 0.0859, − 0.0859][0.9079, + 0.0254, + 0.0798][0.8825, − 0.0503, + 0.0225][0.9328, + 0.032, + 0.032][0.9008, –, 0]
Fig. 3

-based three-way value figures of the modified LS and SIG on incremental chains of nine UCI datasets

In terms of the incremental chain, the chaos in Eq. (12) implies the qualitative difference of positive-negative numbers or the quantitative difference of values in Eqs. (13) and (15). Therefore, the SIG values in Eqs. (13) and (15) also become chaotic. A similar induction or explanation applies for the cases in Eqs. (14) and (16) (17). For a direct binary comparison, there are several basic cases of SIG for single and dual changes of the features and subsets, i.e., the four combined types in Eq. (18). Based on Eq. (18), represents the knowledge granulation. However, still implies the usual parallel relationship or selection. For a single change, the incomparability is naturally related to the nonmonotonicity/uncertainty in Theorem 1; furthermore, dual changes become more complex and also exhibit the incomparability, and their related derivations can be easily realized.

Increment-significance-based feature selection criterion and algorithm

In the above, feature significance (SIG) is established via modified LS, and its robust mechanism on the uncertainty measurements firmly underlies further applications, such as feature selection. Next, SIG is utilized to develop the corresponding optimization criterion and implementation algorithm for feature selection.

Definition 4

(IFILS) Eqs. (7) and (9) are combined to offer a new standard of feature selection, i.e.,Here, the metric integration of feature is obtained by In Definition 4, is introduced in Eq. (7) to replace . Therefore, an improved selection strategy is constructed through Eq. (19). The new feature index and extraction still use modified LS but carry the difference , and they adhere to the structural relativity and dynamic integrity of the entire incremental process. Hence, Definition 4 (or its Eq. (19)) guides the natural construction of the relevant IFILS algorithm, and corresponding Algorithm 1 has a procedural flowchart that is shown in Fig. 1.
Fig. 1

Procedural flowchart of the IFILS algorithm

Procedural flowchart of the IFILS algorithm For feature selection, Eqs. (7) and (19) have a large similarity to influence the FILS and IFILS algorithms, respectively. In Eqs. (7) and (19), informationizes local structures of the original data to act as a contrast criterion; when adding feature , measures and imply the process-driven absolute and relative degrees, respectively, for maintaining the local structures of the data. In the FILS algorithm, if approaches , i.e., , then feature subset can represent the whole feature set F well; in other words, a lower amount of implies more importance of feature . Therefore is minimized to generate the FILS algorithm. Similarly, the IFILS algorithm is designed by minimizing ; when facing multiple features, the feature with the smallest label is preferentially utilized. Clearly, the IFILS algorithm and its performance benefit from both the effective uncertainty measurement of SIG and the corresponding convergence treatment of incremental forward iteration. Algorithm 1 mainly simulates but slightly changes the FILS algorithm. The FILS algorithm necessarily enters the loop calculation regardless of A’s condition satisfiability. In contrast, Algorithm 1 adopts the condition judgement to enter the loop, as shown in Step 5. This type of improvement reduces the running costs of a complete circulation. In terms of computational complexity, weight matrix in Step 3 has a complexity of , in Step 4 has a complexity of , feature iterations on in Steps 5–15 have a complexity of . Therefore, the total complexity is . The IFILS algorithm has a partial complexity increase in contrast to the FILS algorithm. This increase mainly comes from the calculation of . In addition, the IFILS algorithm actually concerns the underlying calculation of modified LS only twice, and thus, it becomes feasible.

Illustrative example of relevant measures and algorithms

Here, a table example is provided to illustrate relevant measures and algorithms, mainly the granulation nonmonotonicity/uncertainty of the modified LS, SIG and processing contrast of the IFILS and FILS algorithms. For this purpose, a defined part of the University of California at Irvine (UCI) dataset, “Immunotherapy”, is extracted. This numeric dataset and 11 other experimental datasets all come from the UCI repository (https://archive.ics.uci.edu/ml/index.php), and their basic information is described in Table 1. On this website, the “view all datasets” link or the “search” function can be directly utilized to acquire the relevant datasets. Moreover, some datasets are saved or opened in other locations. As a result, the data can be received in various manners through searches and downloads. For the Immunotherapy dataset, it can offerwhich has seven selective features.
Table 1

Basic descriptions of experimental ten UCI datasets

No.Dataset nameSample numberFeature numberCategory numberData type
(0)Immunotherapy9072Numeric
(1)Blood74842Numeric
(2)Ccbr (Cervical cancer behavior risk)72192Numeric
(3)Ecoli33678Numeric
(4)Glass21496Numeric
(5)Hayes-roth13253Numeric
(6)Iris15043Numeric
(7)Lung cancer32563Numeric
(8)Wine178133Numeric
(9)COVID-19 (COVID-19 surveillance)1473Nominal
(10)Brain (Multi-view Brain Networks Data Set)707010Numeric
(11)Parkinsons195222Numeric
Basic descriptions of experimental ten UCI datasets Max-min normalization is implemented for data preprocessing:In 10-fold cross-validation, ten random trials are run for performance and statistics assessments, and each trial contains both a 9-subset collection for training and one for testing. In a given case, only one primary treatment is focused on to better illustrate the algorithmic procedures and comparisons. After normalization, the 1st random subset for testing is extracted, as shown in Table 2, where subuniverse contains nine samples. Next, Table 2 is utilized to measure the observation and demonstrate the algorithm.
Table 2

Decision subsystem on the Immunotherapy dataset’s 1st sample subset

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X^*$$\end{document}X\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_1$$\end{document}f1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_2$$\end{document}f2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_3$$\end{document}f3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_4$$\end{document}f4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_5$$\end{document}f5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_6$$\end{document}f6\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_7$$\end{document}f7d
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_9$$\end{document}x91.000.100.450.060.000.240.091
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{10}$$\end{document}x101.000.411.000.281.000.030.040
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{14}$$\end{document}x141.000.000.410.610.000.050.071
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{19}$$\end{document}x191.000.000.501.000.000.060.071
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{50}$$\end{document}x500.000.561.000.720.000.090.060
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{58}$$\end{document}x581.000.930.820.000.500.030.341
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{64}$$\end{document}x640.000.070.980.220.500.010.041
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{78}$$\end{document}x781.000.760.610.390.000.040.011
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{90}$$\end{document}x901.000.200.520.280.000.010.001
Decision subsystem on the Immunotherapy dataset’s 1st sample subset

Measure observation

For the nonmonotonicity/uncertainty metric, the relevant results on incremental feature chains, i.e., the modified LS case in Eq. (12) of Corollary 1 and the SIG cases in Eqs. (15) and (17) of Theorem 2, are mainly considered. According to Table 2, the incremental feature chain is defined as . Therefore, the relevant metric observations mainly concern three groups of measurement values, i.e.,The first group of the modified LS and the two types of SIG are listed in Table 3 and depicted in Fig. 2. Measurement values of the modified LS and SIG on incremental chain of Table 2 (the Immunotherapy dataset) Three-way non-monotonicity/uncertainty changes of the modified LS and SIG on incremental chain of Table 2 (the Immunotherapy dataset) Table 3 and Fig. 2 can be used to illustrate relevant measure definitions, calculations, and properties, and thus, the granulation nonmonotonicity/uncertainty in Eqs. (12) (15) (17) are mainly verified. The value sequence () shows a general increase. However, there is a decrease in the case when . This phenomenon is in accordance with the granulation nonmonotonicity/uncertainty of the modified LS in Corollary 1. The sequence of () contains only the positive value , which adheres to serial number 3, a middle number. Moreover, there is some fluctuation in the five negative values, such as . These two facts validate the granulation nonmonotonicity/uncertainty of SIG in item 1) of Theorem 2. The sequence of () contains only negative numbers. However, quantitative differences and fluctuation are observed for these values. Thus, fully reflects the granulation nonmonotonicity/uncertainty of SIG in item 2) of Theorem 2.

Algorithm demonstration

For the Immunotherapy dataset, the training dataset is the complement of the 1st group of subsets, i.e., the opposite of the data in Table 2. Parameter values are set due to algorithmic requirements. The FILS and IFILS algorithms have different selection procedures and results, as shown in Tables 4 and 5 (where ), respectively. The FILS algorithm utilizes optimal ordering to obtain a feature subset , while the IFILS algorithm adopts sequence to yield another subset . There are some differences beyond circulation optimization between these two algorithms, e.g., the IFILS algorithm selects rather than in the 2nd round.
Table 4

The FILS algorithm’s procedure on the Immunotherapy dataset’s 1st random sample subset ()

No.A\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\overline{A}}$$\end{document}A¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J_{FILS}(f_k)~(f_k\in {\overline{A}})$$\end{document}JFILS(fk)(fkA¯)Selective featureMeet loop condition \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(|A|<r)\wedge ({\overline{A}}\ne \emptyset )$$\end{document}(|A|<r)(A¯)?Accuracy
(1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\emptyset$$\end{document}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_1,f_2,f_3,f_4,f_5,f_6,f_7\}$$\end{document}{f1,f2,f3,f4,f5,f6,f7}[0.9955, 0.9754, 0.9759, 0.9141, 0.9955, 0.4510, 0.8910]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_6$$\end{document}f6Yes77.78%
(2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_6\}$$\end{document}{f6}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_1,f_2,f_3,f_4,f_5,f_7\}$$\end{document}{f1,f2,f3,f4,f5,f7}[0.8769, 0.5850, 0.5617, 0.4852, 0.7436, 0.4437]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_7$$\end{document}f7Yes
(3)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_6,f_7\}$$\end{document}{f6,f7}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_1,f_2,f_3,f_4,f_5\}$$\end{document}{f1,f2,f3,f4,f5}[0.7268, 0.3770, 0.3992, 0.2624, 0.6088]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_4$$\end{document}f4Yes
(4)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_4,f_6,f_7\}$$\end{document}{f4,f6,f7}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_1,f_2,f_3,f_5\}$$\end{document}{f1,f2,f3,f5}[0.5531, 0.1142, 0.1026, 0.3755]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_3$$\end{document}f3Yes
(5)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_3,f_4,f_6,f_7\}$$\end{document}{f3,f4,f6,f7}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_1,f_2,f_5\}$$\end{document}{f1,f2,f5}[0.3286, 0.1330, 0.2067]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_2$$\end{document}f2No
Table 5

The IFILS algorithm’s procedure on the Immunotherapy dataset’s 1st random sample subset ()

No.A\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\overline{A}}$$\end{document}A¯Meet loop condition \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(|A|<r)\wedge ({\overline{A}}\ne \emptyset )$$\end{document}(|A|<r)(A¯)?\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J_{IFILS}(f_k)~(f_k\in {\overline{A}})$$\end{document}JIFILS(fk)(fkA¯)Selective featureAccuracy
(1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\emptyset$$\end{document}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_1,f_2,f_3,f_4,f_5,f_6,f_7\}$$\end{document}{f1,f2,f3,f4,f5,f6,f7}Yes[44.1750, 44.1549, 44.1554, 44.0937, 44.1750, 43.6305, 44.0705]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${f_6}$$\end{document}f6100%
(2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_6\}$$\end{document}{f6}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_1,f_2,f_3,f_4,f_5,f_7\}$$\end{document}{f1,f2,f3,f4,f5,f7}Yes[0.5741, 0.8660, 0.8892, 0.9657, 0.7073, 1.0072]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_1$$\end{document}f1
(3)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_1,f_6\}$$\end{document}{f1,f6}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_2,f_3,f_4,f_5,f_7\}$$\end{document}{f2,f3,f4,f5,f7}Yes[1.0814, 1.0928, 1.0789, 1.0355, 1.1500]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_5$$\end{document}f5
(4)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_1,f_5,f_6\}$$\end{document}{f1,f5,f6}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_2,f_3,f_4,f_7\}$$\end{document}{f2,f3,f4,f7}Yes[1.1643, 1.1698, 1.1687, 1.2348]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_2$$\end{document}f2
(5)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_1,f_2,f_5,f_6\}$$\end{document}{f1,f2,f5,f6}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{f_3,f_4,f_7\}$$\end{document}{f3,f4,f7}No
The FILS algorithm’s procedure on the Immunotherapy dataset’s 1st random sample subset () The IFILS algorithm’s procedure on the Immunotherapy dataset’s 1st random sample subset () The subsets selected above, which are based on the unsupervised training, next work on the testing set (i.e., Table 2), and the accuracy of label prediction can be formulated by decision classification. Conclusively, the prediction resorts to the KNN classifier to pursue balanced connections with previous KNN processing approaches, such as Eq. (4), and the classifier that is used requires the same distance and K. In Table 2, in the FILS algorithm generates predicted labels [1, 1, 1, 1, 1, 1, 1, 1, 1] on , and a classification accuracy of is achieved when considering the real labels [1, 0, 1, 1, 0, 1, 1, 1, 1]; in contrast, a recognition rate of is achieved using in the IFILS algorithm, which predicts the labels [1, 0, 1, 1, 0, 1, 1, 1, 1]. Herein, the classification accuracy (or recognition rate) is defined as the percentage of correct predictions, i.e.,The contrast accuracies reflect the validity and superiority of the IFILS algorithm, and the root comes from the SIG advancement on the uncertainty measurement. In summary, Tables 4 and 5, which come from Table 2 as well as the surplus, thoroughly demonstrate the process and comparison of the FILS and IFILS algorithms. As a result, the measurement mechanism and recognition advantage of the IFILS algorithm are firmly validated.

Comparative experimental verification of unsupervised feature selection on the FILS and IFILS algorithms

Finally, data experiments are performed to further demonstrate the effectiveness and superiority of the IFILS algorithm, mainly in contrast to the FILS algorithm. Before this core treatment, the modified LS and SIG are also calculated so that their granulation nonmonotonicity/uncertainty is provided, mainly in terms of incremental feature chains. Three-way values are defined as follows:These values are shown in Table 6 and are depicted in Fig. 3. For basic settings, only the 1st sample subset is chosen from the 10-fold cross-validation. The chain number follows for full verification, and the end feature is adopted for the last metric observation of the fixed features. Thus, related results from Table 6 and Fig. 3 effectively validate all theoretical properties, especially the nonmonotonicity/uncertainty in Corollary 1 and Theorem 2. -based three-way value figures of the modified LS and SIG on incremental chains of nine UCI datasets -based three-way value table of the modified LS and SIG on incremental chains of nine UCI datasets In the above example, relevant experimental settings and mechanisms are detailed, and they include the dataset description in Table 1, data standardization in Eq. (21), 10-fold cross-validation, KNN classifier prediction, and recognition accuracy in Eq. (23). By deepening and enlarging the above example, more datasets, cross-validation, and parametric optimization are adopted to determine the statistical performance, where a fixed value is mainly used (Pang and Zhang 2020a). Next, four subsections are respectively formed from the Immunotherapy dataset, eight-numeric datasets, the COVID-19 dataset, and two additional datasets (of medical diagnosis). As described in Table 1, all 12 datasets come from the UCI repository (https://archive.ics.uci.edu/ml/index.php). UCI datasets with real-world data have become authoritative and convenient for machine learning, and thus they are also generally utilized for studies on feature selection. Relevant experiments are related to the theoretical validation and practical application, and thus, they adopt more datasets around medical diagnosis, including 8 cases: Immunotherapy, Blood, Ccbr, Ecoli, Lung cancer, COVID-19, Brain, and Parkinsons. Regarding the operational environment, all experiments are performed in MATLAB R2021a and run with an Intel Core i7 CPU at 2.80 GHz, 8 GB RAM and 64-bit operating system.

The Immunotherapy dataset

The above example is obtained by performing experiments on the Immunotherapy dataset. The 1st group is fixed and . These parameters are further extended through 10-fold cross-validation and parameter changes. For the case where , Fig. 4 (a) shows the accuracies obtained on a grid, and the IFILS algorithm occupies more dominant points with higher accuracies. Fig. 4 (b) shows the 10-fold average accuracy on feature chain with representative subset , and the highest accuracy on each feature number r is achieved using the IFILS algorithm. Hence, the optimal accuracy () achieved by the IFILS algorithm is better than the optimal accuracy () achieved by the FILS algorithm. In terms of the Immunotherapy dataset, the IFILS algorithm outperforms the FILS algorithm, and the optimal accuracies of the two methods are mainly realized by middle feature numbers.
Fig. 4

The FILS and IFILS algorithms’ accuracies on the Immunotherapy dataset ()

The FILS and IFILS algorithms’ accuracies on the Immunotherapy dataset () The IFILS algorithm’s 10-fold-average accuracies and standard deviations on the Immunotherapy dataset (with K, r) K is the surplus parameter for change analysis, and it is next extended to . The K analysis for only the IFILS algorithm is sufficient and reasonable because this new algorithm is improved and more representative. Table 7 shows the 10-fold average accuracies and standard deviations on the (K, r)-grid, and it also shows the edge statistics of K and r optimization. For clarity, Fig. 5(a) shows the average accuracies, and the highest accuracy of is obtained when . Figure 5(b) shows the cut accuracies (with standard deviations) when . When the K value increases, the mean accuracies first change from the lowest accuracy () to the highest (), and then they decrease to a stabilized value (). Figure 5(c) and (d) show the optimization statistics on K and r, respectively, and the optimal accuracies adhere to the middle values when and . In summary, is appropriate for the IFILS algorithm as well as the FILS algorithm (Pang and Zhang 2020a).
Table 7

The IFILS algorithm’s 10-fold-average accuracies and standard deviations on the Immunotherapy dataset (with K, r)

rK
123456
162.22 ± 13.0452.22 ± 13.9172.22 ± 12.0062.22 ± 15.8976.67 ± 14.3073.33 ± 11.94
263.33 ± 12.8851.11 ± 11.9467.78 ± 9.7355.56 ± 9.07\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textit{78.89}\pm \textit{13.30}$$\end{document}78.89±13.3073.33 ± 11.94
363.33 ± 17.4155.56 ± 17.3766.67 ± 15.7165.56 ± 15.2372.22 ± 12.0065.56 ± 12.23
457.78 ± 21.4764.44 ± 14.6371.11 ± 15.8968.89 ± 8.76\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow \textit{86.67}\pm \textit{8.76}$$\end{document}86.67±8.7671.11 ± 10.73
575.56 ± 11.9768.89 ± 19.1274.45 ± 12.2272.23 ± 10.2477.78 ± 13.1575.56 ± 12.96
6\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow$$\end{document}78.89 ± 12.62\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow$$\end{document}75.56 ± 15.56\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow$$\end{document}77.78 ± 8.61\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow$$\end{document}75.56 ± 6.67\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textit{78.89}\pm \textit{10.48}$$\end{document}78.89±10.4877.78 ± 13.15
774.45 ± 12.2272.23 ± 10.2475.56 ± 6.6774.45 ± 5.0974.45 ± 8.68\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow$$\end{document}77.78 ± 11.11
r optimization\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow$$\end{document}78.89 ± 12.62\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow$$\end{document}75.56 ± 15.56\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow$$\end{document}77.78 ± 8.61\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow$$\end{document}75.56 ± 6.67\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow$$\end{document}86.67 ± 8.76\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow$$\end{document}77.78 ± 11.11
Fig. 5

The FILS algorithm’ accuracies on the Immunotherapy dataset (K, r)

The FILS algorithm’ accuracies on the Immunotherapy dataset (K, r) The FILS and IFILS algorithms’ 10-fold-average accuracies and r-optimization statistics on eight UCI datasets based on KNN The FILS and IFILS algorithms’ 10-fold-average accuracies and r-optimization statistics on eight UCI datasets based on CART The FILS and IFILS algorithms’ 10-fold-average accuracies and r-optimization statistics on eight UCI datasets based on SVM The FILS and IFILS algorithms’ r-optimal () three-way indexes (i.e., number, accuracy, time) on eight UCI datasets based on KNN, CART, SVM

Eight numeric datasets

By referring to the above experiments on the Immunotherapy dataset, eight extra numeric datasets are implemented using the FILS and IFILS algorithms, and they are compared. Except for a 10-fold cross-validation, parameter r concerns an equidistant feature sequence. In terms of classifiers, KNN is still adopted, and is suitable and fixed; moreover, classifiers CART and SVM are directly added for algorithmic comparison. In comparative experiments, classification accuracies are focused on for the following three dimensions: 10-fold cross-validation, selection number r, and classifier, and the main results are provided after necessary integrated processing. First, as shown in Fig. 6, the aim is to provide details on 10-fold cross-validation, and r is optimized in the range to achieve the highest accuracy. In other words, Fig. 6 shows the r-optimal accuracies on 10-fold cross-validation by using three classifiers. By observation, the IFILS algorithm occupies more 10-fold points with higher accuracies in terms of KNN, CART, SVM. Therefore, it has superiority over the FILS algorithm from the perspective of 10-fold cross-validation.
Fig. 6

The FILS and IFILS algorithms’ r-optimal () accuracies on eight UCI datasets (with 10-fold times and three classifiers)

The FILS and IFILS algorithms’ r-optimal () accuracies on eight UCI datasets (with 10-fold times and three classifiers) In general, 10-fold cross-validation is mainly used for integrated statistics. Therefore, the 10-fold dimension is naturally reduced. Based on 10-fold statistics, Tables 8, 9, 10 basically summarize the average accuracies and standard deviations of KNN, CART and SVM. Furthermore, optimal or suboptimal accuracies from r statistics are reflected in the last two columns, and the bolded text highlights two-algorithmic comparisons. Moreover, the 10-fold statistical averages of Tables 8, 9, 10 are equivalently depicted in Fig. 7 to vividly clarify the relevant advancement of the IFILS algorithm. Based on comparisons in the table and figure, the IFILS algorithm generally has greater proportions of dominant features with higher accuracies for each dataset and the whole dataset. When r increases, the accuracy first rapidly increases and then gradually increases or decreases. Therefore, the optimal accuracies are usually realized by the final or middle feature numbers. Regarding the entire feature set F, the highest accuracies on the Blood, Ecoli and Wine datasets are achieved using both algorithms. However, the IFILS algorithm can make an additional optimization realization on the Ccbr and Iris datasets, from the perspective of KNN. The surplus observations of the CART and SVM classifiers can lead to similar analyses and results. In terms of appropriate feature subsets, the IFILS algorithm can achieve not only higher optimization accuracies but also less or an equal number of features for the three classifiers on almost all datasets, and the latter results imply practical optimization with dimensionality reduction. The sole exception for internal optimization comes from Wine dataset when using the KNN and SVM classifiers. For the Wine dataset on KNN (Table 8), optimal accuracies of on and on are achieved using the FILS and IFILS algorithms, respectively; although , these two values are very close to never become significantly different; moreover, the highest accuracy of 96.08% on the entire feature set F (where ) is actually achieved using both algorithms.
Table 8

The FILS and IFILS algorithms’ 10-fold-average accuracies and r-optimization statistics on eight UCI datasets based on KNN

No.DatasetAlgorithmrOptimization realization on feature set (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r=|F|$$\end{document}r=|F|)Optimization on feature proper subset (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r<|F|$$\end{document}r<|F|)
(1)Blood1234
FILS70.05 ± 3.9473.53 ± 4.7575.94 ± 5.7276.87 ± 5.7176.87 ± 5.71 (4)75.94 ± 5.72 (3)
IFILS70.05 ± 3.9475.00 ± 6.7376.75 ± 5.9676.87 ± 5.7176.87 ± 5.71 (4)76.75 ± 5.96 (3)
(2)Ccbr14710131619
FILS29.29 ± 24.5776.43 ± 16.1077.68 ± 21.0279.29 ± 20.0081.96 ± 16.4283.39 ± 17.3887.68 ± 13.7987.68 ± 13.79 (19)83.39 ± 17.38 (16)
IFILS29.29 ± 24.5765.36 ±  16.5682.14 ± 14.6891.96 ± 11.3389.11 ± 14.3087.68 ±  13.7987.68 ± 13.7991.96 ± 11.33 (10)
(3)Ecoli1234567
FILS43.15 ± 3.6249.09 ± 6.6650.87 ± 8.8562.75 ± 10.2180.92 ± 10.2484.78 ± 6.9586.28 ± 5.1786.28 ± 5.17 (7)84.78 ± 6.95 (6)
IFILS49.39 ± 7.5450.56 ± 7.0663.10 ± 6.2377.40 ± 5.1783.32 ± 4.7284.80 ± 5.2086.28 ± 5.1786.28  ± 5.17 (7)84.80  ± 5.20 (6)
(4)Glass13579
FILS35.54 ± 9.8057.92 ± 10.0462.14 ± 10.8365.35 ± 10.1764.48 ± 8.7165.35 ± 10.17 (7)
IFILS34.11 ± 7.9452.81 ± 11.1066.77 ± 8.5067.23 ± 9.3464.48 ± 8.7167.23  ± 9.34 (7)
(5)Hayes-roth12345
FILS42.53 ± 9.9934.18 ± 11.1644.67 ± 10.9045.44 ± 10.5136.43 ± 10.2845.44 ± 10.51 (4)
IFILS42.53 ± 9.9954.34 ± 14.3956.87 ± 10.3671.15 ± 13.4236.43 ± 10.2871.15  ± 13.42 (4)
(6)Iris1234
FILS52.67 ± 12.7588.67 ± 8.9292.00 ± 8.2096.00 ± 5.6296.00  ± 5.62 (4)92.00 ± 8.20 (3)
IFILS52.67 ± 12.7596.00 ± 5.6296.67 ± 4.7196.00 ± 5.6296.67  ± 4.71 (3)
(7)Lung cancer1101928374656
FILS27.50 ± 27.7923.33 ± 31.6227.50 ± 35.1534.17 ± 28.4553.33 ± 24.9147.50 ± 28.3442.50 ± 27.3453.33 ± 24.91 (37)
IFILS27.50 ± 27.7955.83 ± 22.9250.00 ± 22.2254.17 ± 32.2253.33 ± 24.9141.67 ± 23.5742.50 ± 27.3455.83  ± 22.92 (10)
(8)Wine135791113
FILS49.48 ± 12.1068.01 ± 11.0284.35 ± 9.0295.49 ± 2.3894.35 ± 3.8294.90 ± 4.2296.08 ± 3.7796.08  ± 3.77 (13)95.49  ± 2.38 (7)
IFILS49.48 ± 12.1083.76 ± 12.6691.01 ± 9.5292.68 ± 7.4694.90 ± 4.2295.46 ± 3.6796.08 ± 3.7796.08  ± 3.77 (13)95.46 ± 3.67 (11)
Table 9

The FILS and IFILS algorithms’ 10-fold-average accuracies and r-optimization statistics on eight UCI datasets based on CART

No.DatasetAlgorithmrOptimization realization on feature set (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r=|F|$$\end{document}r=|F|)Optimization on feature proper subset (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r<|F|$$\end{document}r<|F|)
(1)Blood1234
FILS75.95 ± 5.7673.53 ± 4.2274.87 ± 5.3474.87 ± 5.3475.95 ± 5.76 (1)
IFILS75.95 ± 5.7675.68 ± 5.5675.68 ± 4.6674.87 ± 5.3475.95 ± 5.76 (1)
(2)Ccbr14710131619
FILS74.82 ± 17.6670.71 ± 10.2568.04 ± 13.3777.68 ± 12.0684.82 ± 10.1483.39 ± 17.3780.54 ± 16.6784.82 ± 10.14 (13)
IFILS74.82 ± 17.6674.64 ± 17.8379.11 ± 12.0883.39 ± 8.6786.25 ± 13.1086.25 ± 11.2380.54 ± 16.6786.25 ± 13.10 (13)
(3)Ecoli1234567
FILS43.44 ± 3.5645.81 ± 4.4447.93 ± 5.8758.33 ± 7.8674.06 ± 10.2376.13 ± 7.7880.60 ± 8.6780.60 ± 8.67(7)76.13 ± 7.78 (6)
IFILS55.34 ± 6.9246.11 ± 4.4159.82 ± 6.9973.51 ± 3.8078.83 ± 8.2080.02 ± 9.5080.60 ± 8.6780.60 ± 8.67(7)80.02 ± 9.50 (6)
(4)Glass13579
FILS45.26 ± 16.1063.98 ± 11.9268.33 ± 10.3268.64 ± 12.2367.36 ± 9.8068.64 ± 12.23 (7)
IFILS45.26 ± 16.1058.27 ± 10.9868.64 ± 10.7968.77 ± 10.9167.36 ± 9.8068.77 ± 10.91 (7)
(5)Hayes-roth12345
FILS46.98 ± 12.9045.44 ± 11.8546.15 ± 15.2859.07 ± 9.0477.15 ± 14.5877.15 ± 14.58(5)59.07 ± 9.04 (4)
IFILS46.98 ± 12.9053.74 ± 11.8565.17 ± 9.6077.15 ± 14.5877.15 ± 14.5877.15 ± 14.58(5)77.15 ± 14.58 (4)
(6)Iris1234
FILS51.33 ± 8.3484.00 ± 10.0494.00 ± 2.1195.33 ± 4.5095.33 ± 4.50 (4)94.00 ± 2.11 (3)
IFILS51.33 ± 8.3494.00 ± 5.8496.00 ± 4.6695.33 ± 4.5096.00 ± 4.66 (3)
(7)Lung cancer1101928374656
FILS45.00 ± 18.9245.00 ± 31.4830.83 ± 25.4735.00 ± 37.0250.83 ± 39.3747.50 ± 32.8849.17 ± 32.5050.83 ± 39.37 (37)
IFILS45.00 ± 18.9245.83 ± 23.3364.17 ± 24.2356.67 ± 21.0856.67 ± 21.0862.50 ± 29.7249.17 ± 2.5064.17 ± 24.23 (19)
(8)Wine135791113
FILS45.59 ± 9.1261.83 ± 10.3076.99 ± 11.7989.90 ± 5.0990.46 ± 10.4988.72 ± 7.9488.76 ± 6.9390.46 ± 10.49 (9)
IFILS45.59 ± 9.1283.66 ± 9.3587.03 ± 9.9591.57 ± 5.4290.42 ± 5.9389.31 ± 6.6788.76 ± 6.9391.57 ± 5.42(7)
Table 10

The FILS and IFILS algorithms’ 10-fold-average accuracies and r-optimization statistics on eight UCI datasets based on SVM

No.DatasetAlgorithmrOptimization realization on feature set (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r=|F|$$\end{document}r=|F|)Optimization on feature proper subset (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r<|F|$$\end{document}r<|F|)
(1)Blood1234
FILS76.21 ± 6.0576.21 ± 6.0576.48 ± 6.8576.62 ± 6.8576.62 ± 6.85(4)76.48 ± 6.85(3)
IFILS76.21 ± 6.0576.62 ± 6.7476.75 ± 6.8876.62 ± 6.8576.75 ± 6.88 (3)
(2)Ccbr14710131619
FILS73.57 ± 19.3077.68 ± 13.8176.43 ± 18.7179.28 ± 11.7687.32 ± 14.2088.93 ± 11.1988.93 ± 13.0688.93 ± 13.06 (19)88.93 ± 11.19(16)
IFILS73.57 ± 19.3073.57 ± 19.3077.50 ± 25.5290.54 ± 14.6390.54 ± 14.6389.11 ± 14.3088.93 ± 13.0690.54 ± 14.63 (10)
(3)Ecoli1234567
FILS43.15 ± 3.6247.60 ± 3.9353.89 ± 8.6665.15 ± 4.4679.44 ± 9.1583.01 ± 7.1985.69 ± 6.4685.69 ± 6.46 (7)83.01 ± 7.19 (6)
IFILS50.85 ± 4.2148.78 ± 3.5364.29 ± 4.4378.86 ± 7.3584.81 ± 4.9685.71 ± 5.2185.69 ± 6.4685.71 ± 5.21 (6)
(4)Glass13579
FILS41.08 ± 13.0842.60 ± 10.3756.45 ± 13.6461.19 ± 9.1661.64 ± 8.8061.64 ± 8.80(9)61.19 ± 9.16 (7)
IFILS38.70 ± 11.9546.86 ± 13.5759.37 ± 8.6665.39 ± 8.7961.64 ± 8.8065.39 ± 8.79 (7)
(5)Hayes-roth12345
FILS34.29 ± 13.4733.52 ± 10.0646.04 ± 11.7058.41 ± 15.5863.08 ± 13.5763.08 ± 13.57(5)58.41 ± 15.58 (4)
IFILS34.29 ± 13.4747.86 ± 15.0056.87 ± 8.2473.46 ± 6.5563.08 ± 13.5773.46 ± 6.55 (4)
(6)Iris1234
FILS54.67 ± 15.0183.33 ± 13.7984.67 ± 9.4690.00 ± 11.4490.00 ± 11.44 (4)84.67 ± 9.46 (3)
IFILS54.67 ± 15.0192.67 ± 5.8492.00 ± 6.8890.00 ± 11.4492.67 ± 5.84 (2)
(7)Lung cancer1101928374656
FILS45.00 ±  18.9238.33 ±  21.9448.33 ±  37.0244.17 ±  27.7946.67 ±  24.9150.83 ±  28.4566.67  ±  23.9066.67  ± 23.90(56)50.83 ± 28.45 (46)
IFILS45.00 ±  18.9260.00  ±  31.8756.67 ±  21.0850.83 ±  17.7750.83 ±  28.4551.67 ±  25.0966.67  ±  23.9066.67  ± 23.90(56)60.00 ± 31.87 (10)
(8)Wine135791113
FILS44.38 ± 9.4564.12 ± 9.9584.38 ± 9.3494.35 ± 5.3295.46 ± 5.8397.71 ± 4.0598.30 ± 2.7498.30 ± 2.74 (13)97.71 ± 4.05 (11)
IFILS44.38 ± 9.4586.50 ± 9.5489.80 ± 11.1994.93 ± 4.8896.57 ± 4.8297.16 ± 4.8598.30 ± 2.7498.30 ± 2.74(13)97.16 ± 4.85 (11)
Fig. 7

The FILS and IFILS algorithms’ 10-fold-average classification accuracies on eight UCI datasets (with number r and three classifiers)

The FILS and IFILS algorithms’ 10-fold-average classification accuracies on eight UCI datasets (with number r and three classifiers) Tables 8, 9, 10 mainly come from the 10-fold statistics, and their accuracy values on r-optimal selection (where ) are extracted and summarized in Table 11. Moreover, Table 11 shows a record of the running time of the two algorithms. Therefore, Table 11 provides a comprehensive platform for macroscopically comparing the FILS and IFILS algorithms based on KNN, CART, SVM, and the bottom average results on 10-dataset statistics play an important role. Through observation and analysis, the dominant position of the IFILS algorithm in contrast to the FILS algorithm can be concluded by three assessment indicators, i.e., the reduction length, classification accuracy, and consumption time. In terms of the main indices, the IFILS algorithm exhibits a general advantage in terms of both the length and accuracy. Moreover, its execution time is only slightly greater than that of the FILS algorithm, and the time difference is not large, which implies that these algorithms are on the same level. In other words, the IFILS algorithm has great superiority in pursuing feature simplification and recognition accuracy, and its time is acceptable when compared to the FILS algorithm.
Table 11

The FILS and IFILS algorithms’ r-optimal () three-way indexes (i.e., number, accuracy, time) on eight UCI datasets based on KNN, CART, SVM

No.DatasetAlgorithmKNNCARTSVM
NumberAccuracy (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}%)Time (s)NumberAccuracy (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}%)Time (s)NumberAccuracy(\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}%)Time (s)
1BloodFILS375.94 ± 5.722234.34 ± 60.75175.95 ± 5.76856.29 ± 46.24376.48 ± 6.852234.67 ± 60.76
IFILS376.75 ± 5.962270.09 ± 48.89175.95 ± 5.76878.86 ± 57.67376.75 ± 6.882270.47 ± 48.91
2CcbrFILS1683.39 ± 17.3867.22 ± 2.091384.82 ± 10.1475.49 ± 1.541688.93 ± 11.1967.31 ± 2.09
IFILS1091.96 ± 11.3387.64 ± 2.891386.25 ± 13.10103.24 ± 3.191090.54 ± 14.6387.71 ± 2.88
3EcoliFILS684.78 ± 6.95455.74 ± 25.63676.13 ± 7.78455.82 ± 25.63683.01 ± 7.19455.92 ± 25.63
IFILS684.80 ± 5.20655.07 ± 28.10680.02 ± 9.50655.15 ± 28.09685.71 ± 5.21655.23 ± 28.09
4GlassFILS765.35 ± 10.17234.02 ± 5.81768.33 ± 10.32234.11 ± 5.82761.19 ± 9.16234.16 ± 5.82
IFILS767.23 ± 9.34318.13 ± 2.28768.77 ± 10.91318.23 ± 2.29765.39 ± 8.79318.27 ± 2.29
5HayesrothFILS445.44 ± 10.5117.62  ± 0.42459.07 ± 9.0417.70 ± 0.43458.41 ± 15.5817.71 ± 0.43
IFILS471.15 ± 13.4224.59 ± 0.65477.15 ± 14.5824.69 ± 0.65473.46 ± 6.5524.69 ± 0.64
6IrisFILS392.00 ± 8.2014.90 ± 0.02394.00 ± 2.1114.91 ± 0.03384.67 ± 9.4614.91 ± 0.004
IFILS396.67 ± 4.7121.61 ± 0.01396.00 ± 4.6621.62 ± 0.01292.67 ± 5.8416.55 ± 0.02
7LungcancerFILS3753.33 ± 24.91101.06 ± 5.593750.83 ± 39.37101.08 ± 5.594650.83 ± 28.45109.36 ± 4.590
IFILS1055.83 ± 22.9245.10 ± 1.701964.17 ± 24.2383.65 ± 2.681060.00 ± 31.8745.19 ± 1.702
8WineFILS795.49 ± 2.38234.52 ± 3.33990.46 ± 10.49289.98 ± 9.751197.70 ± 14.05307.67 ± 10.513
IFILS1195.46 ± 3.67451.72 ± 21.38791.57 ± 5.42348.67 ± 5.641197.16 ± 4.85451.83 ± 21.376
(1)-(8)AverageFILS10.3874.46 ± 20.96419.93 ± 704.261074.95 ± 20.82255.67 ± 270.041275.15 ± 20.25430.21 ± 701.85
IFILS6.7579.88 ± 17.56484.24 ± 713.867.579.98 ± 15.79304.26 ± 299.026.6380.21 ± 18.14483.74 ± 714.38
In summary, eight datasets based on the three classifiers provide information on the full algorithmic verification from the correspondence and optimization perspectives, and the IFILS algorithm generally outperforms the FILS algorithm and achieves better classification performance. The FILS and IFILS algorithms’ 3-fold-average accuracies and standard deviations on COVID-19 dataset (with K, r)

COVID-19 surveillance: a nominal dataset

The FILS and IFILS algorithms can also be applied to nominal datasets, and the COVID-19 dataset shown in Table 1 is provided. This important medical case can further reveal the superiority of the new algorithm. There exist only three different operations, i.e., nominal-numeric transformation, cross-validation correction, and parameter K determination. Two qualitative symptoms, i.e., “”, of seven features naturally correspond to qualitative numbers , and they can be linearly translated to 2, 0 to eliminate the negativity. Since the COVID-19 dataset contains a small number of samples (only 14 samples), 3-fold cross-validation is alternatively adopted. used in the above experiments may become too large, so K is mainly searched in for full analysis in both algorithms. Table 12 shows the accuracy on the (K, r) grid based on the 3-fold statistics, and the 3-fold case and mean are partly reflected in Fig. 8(a) and (b), respectively, when . In Table 12, the main body of the average accuracies are imported from Fig. 8(c), and its cut parts when and constitute two broken lines in Fig. 8(c) and (d); furthermore, the right and lower margins represent the maximal accuracies on K and r optimization, respectively; these form the other two lines in Fig. 8(c) and (d). Using contrast analysis, more proportions on correspondence and higher accuracies on optimization are obtained using the IFILS algorithm. Therefore, the better performance on this nominal medicine dataset is achieved using this algorithm.
Table 12

The FILS and IFILS algorithms’ 3-fold-average accuracies and standard deviations on COVID-19 dataset (with K, r)

rAlgorithmK
1234
1FILS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textit{66.67}\pm \textit{ 24.94}$$\end{document}66.67±24.9458.33 ± 14.34\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow$$\end{document}58.33 ± 14.34\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow$$\end{document}58.33  ± 14.34
IFILS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textit{66.67}\pm \textit{24.94}$$\end{document}66.67±24.9458.33 ± 14.3458.33 ± 14.3458.33 ± 14.34
2FILS56.67 ± 4.71\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow \textit{65.00}\pm \textit{7.07}$$\end{document}65.00±7.0756.67 ± 4.71\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow$$\end{document}58.33 ± 14.34
IFILS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textit{66.67}\pm \textit{24.94}$$\end{document}66.67±24.9458.33 ± 14.3458.33 ± 14.3458.33 ± 14.34
3FILS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow \textit{73.33}\pm \textit{18.86}$$\end{document}73.33±18.8658.33 ± 14.34\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow$$\end{document}58.33 ±  14.34\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow$$\end{document}58.33 ± 14.34
IFILS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow \textit{73.33}\pm \textit{18.86}$$\end{document}73.33±18.86\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow$$\end{document}65.00 ± 7.07\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow$$\end{document}65.00 ± 7.07\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow$$\end{document}65.00 ± 7.07
4FILS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textit{58.33}\pm \textit{14.34}$$\end{document}58.33±14.34\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textit{58.33}\pm \textit{14.34}$$\end{document}58.33±14.3450.00 ± 8.16\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow \textit{58.33}\pm \textit{14.34}$$\end{document}58.33±14.34
IFILS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textit{60.00}\pm \textit{32.66}$$\end{document}60.00±32.6658.33 ± 14.3456.67 ± 4.7158.33 ± 14.34
5FILS41.67 ± 14.3450.00 ± 8.1650.00 ± 8.1650.00 ± 8.16
IFILS58.33 ± 14.3450.00 ± 8.1663.33 ±  12.47\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow \textit{65.00}\pm \textit{17.80}$$\end{document}65.00±17.80
6FILS41.67 ± 14.3450.00 ± 8.1650.00 ± 24.49\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow \textit{58.33}\pm \textit{ 14.34}$$\end{document}58.33±14.34
IFILS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textit{58.33}\pm \textit{14.34}$$\end{document}58.33±14.3450.00 ± 8.1635.00 ± 17.8050.00 ± 8.16
7FILS50.00 ± 8.1650.00 ± 8.1635.00 ± 17.80\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow \textit{58.33}\pm \textit{14.34}$$\end{document}58.33±14.34
IFILS50.00 ± 8.1650.00 ± 8.1635.00 ±  17.80\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textit{58.33}\pm \textit{14.34}$$\end{document}58.33±14.34
r optimizationFILS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow {\varvec{73.33}}\pm {\varvec{18.86}}$$\end{document}73.33±18.86\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow$$\end{document}65.00 ± bf 7.07\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow$$\end{document}58.33 ± 14.34\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow$$\end{document}58.33 ± 14.34
IFILS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow {\varvec{73.33}}\pm {\varvec{18.86}}$$\end{document}73.33±18.86\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow$$\end{document}65.00 ± 7.07\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow$$\end{document}65.00 ± 7.07\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow$$\end{document}65.00 ± 7.07
Fig. 8

The FILS and IFILS algorithms’ accuracies on the COVID-19 dataset

The FILS and IFILS algorithms’ accuracies on the COVID-19 dataset

Two additional medical diagnosis datasets

The FILS and IFILS algorithms’ 10-fold-average accuracies and r-optimization statistics on the Brain and Parkinsons datasets based on KNN, CART and SVM Finally, two additional datasets related to medical diagnosis, the Brain and Parkinsons datasets shown in Table 1, are adopted to comparatively analyze the FILS and IFILS algorithms to further verify the advantage of the IFILS algorithm over the FILS algorithm. Note that both the Brain and Parkinsons datasets are real-world related to medical diagnosis. The Brain dataset is a multilayer brain network dataset with resting-state electroencephalography data, and these practical data come from the Department of Otolaryngology of Sun Yat-sen Memorial Hospital, Sun Yat-sen University. The Parkinsons dataset is a detection dataset related to Parkinson’s disease (PD). In this dataset, a range of biomedical voice measurements were recorded from 31 individuals. Of these individuals, 23 were diagnosed with PD. This dataset can be used to discriminate health patients from patients with PD. Relevant accuracy results are provided in Table 13 based on changing the feature number r. By using 10-fold statistics, the results in Table 13 are first summarized as average accuracies and standard deviations in terms of the KNN, CART and SVM classifiers. Then, the last columns in Table 13 show a record of the r-optimal statistics, where the bold label highlights the comparative maximum. The algorithm comparison and superiority revelations can be observed in Table 13. When considering the same feature number r on the same classifier, higher recognition rates are often achieved using the IFILS algorithm rather than the FILS algorithm. When , the same recognition rate is achieved by these two algorithms with the same classifier. These results are generally the highest in relation to feature subsets. However, one exception exists on Parkinsons dataset using the CART classifier. Concretely, a maximum of value of is achieved when using the IFILS algorithm, while a value of when is achieved using the FILS algorithm, so the IFILS algorithm outperforms the FILS algorithm. Regarding r-optimization with a range of , the IFILS algorithm has a better effect than the FILS algorithm when observing the two contrast indices for the recognition rate and feature number. For the Brain dataset, when using the three-way classifiers on , the following results are acquired for the IFILS and FILS algorithms, i.e., of the recognize accuracy and of the feature number. As further validated by the two medical diagnosis datasets, the IFILS algorithm not only removes the redundant data but also improves the recognition rate. Therefore, it has true improvements over the FILS algorithm.
Table 13

The FILS and IFILS algorithms’ 10-fold-average accuracies and r-optimization statistics on the Brain and Parkinsons datasets based on KNN, CART and SVM

No.DatasetClassifierAlgorithm r Optimization realization on feature set (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r=|F|$$\end{document}r=|F|)Optimization on feature proper subset (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r<|F|$$\end{document}r<|F|)
1122334455670
(10)BrainKNNFILS24.29 ± 15.1344.29 ± 15.7257.14 ± 21.350.00 ± 24.5158.57 ± 20.761.43 ±  26.1385.71 ± 11.6685.71 ± 11.66(70)61.43  ± 26.13(56)
IFILS24.29 ± 15.1367.14 ±  26.9872.86 ± 20.777.14 ± 23.5277.14 ± 20.4372.86 ± 19.5885.71 ± 11.6685.71 ± 11.66(70)77.14 ± 23.52(34)
CARTFILS11.43 ±  11.2754.29 ± 24.0954.29 ± 26.7752.86 ± 25.2454.29 ± 22.1360.00 ± 24.0962.86 ± 18.0762.86 ± 18.07(70)60.00 ± 24.09(56)
IFILS11.43 ± 11.2744.29 ±  14.2161.43 ± 19.1154.29 ± 18.8158.57 ± 19.5854.29 ± 16.2262.86 ± 18.0762.86 ± 18.07(70)61.43 ± 19.11(23)
SVMFILS5.71 ±  12.0545.71 ± 13.1368.57 ± 19.9867.14 ± 21.3572.86 ± 20.7082.86 ± 16.2288.57 ± 13.1388.57 ± 13.13(70)82.86 ± 16.22(56)
IFILS5.71 ± 12.0578.57 ±  27.1585.71 ± 16.5082.86 ± 14.7585.71 ± 13.4784.29 ± 12.5188.57 ± 13.1388.57 ± 13.13(70)85.71 ± 16.50(23)

Conclusions

This paper devotes to an unsupervised feature selection method based on modified LS, and thus, the IFILS algorithm, whose flowchart is shown in Fig. 1, is proposed as an improvement over the FILS algorithm. The algorithmic validity and superiority come from the introduction and perfection of the systematic measure SIG based on the modified LS (Definition 3), and they are completely supported through examples and data experiments. In terms of the root of uncertainty in measurements, both the main SIG and underlying modified LS have granulation uncertainty/nonmonotonicity, and their characterizations (especially SIG’s informatization) effectively function in unsupervised feature selection. As a result, the IFILS algorithm outperforms the FILS algorithm, and it achieves better classification performance. This conclusion can be generally validated by the experimental results, especially on the main ten datasets (Sects. 5.2 and  5.4), and the eight datasets and their results in Sect. 5.2 are further summarized. In terms of the 8 datasets, Tables 8, 9, and 10 provide information on the chain search and optimization determination. From this detailed information, the comparative advantages of the IFILS algorithm are revealed by using the KNN, CART and SVM algorithms; furthermore, Table 11 shows the optimal situations of feature selection, and they comprehensively reflect the superiority of the IFILS algorithm in terms of selection number, prediction accuracy, and running time, using the three classifiers. In terms of the 8 dataset figures, Fig. 6 shows the 10-fold cross-validation to determine the r-optimal accuracies (where ) based on the KNN, CART and SVM classifiers, while Fig. 7 shows the feature number r to reflect the 10-fold average accuracies based on the three classifiers; these two figures also reveal the advantages of the IFILS algorithm from different yet clear perspectives. The IFILS algorithm has become a new learning method for feature selection, and there are two possible limitations and corresponding development opportunities. Since the underlying modified LS is dependent on the KNN calculations of all elements, the IFILS algorithm is concerned with relevant matrix processing, which is very time-consuming. Meanwhile, the IFILS algorithm adheres to a specific approach to uncertainty measurements, and more robust measures and algorithms need to be constructed by comprehensively considering additional perspectives. In the future, the time efficiency of the IFILS algorithm should be improved by utilizing some optimization strategies, while its recognition effect can be further reinforced by combining other practical measures such as the dependency degree and information entropy. Furthermore, the IFILS algorithm can be further considered and utilized from the unsupervised FS to semisupervised learning perspective, and its data mining capability is worth studying from numeric and nominal datasets to hybrid datasets. Moreover, in-depth experiments based on broader datasets are required for the IFILS algorithm and subsequent improvements, thus better facilitating real-world intelligent applications of machine learning.
  9 in total

1.  Integration of multi-objective PSO based feature selection and node centrality for medical datasets.

Authors:  Mehrdad Rostami; Saman Forouzandeh; Kamal Berahmand; Mina Soltani
Journal:  Genomics       Date:  2020-07-24       Impact factor: 5.736

2.  Feature Selection Based on Neighborhood Discrimination Index.

Authors:  Changzhong Wang; Qinghua Hu; Xizhao Wang; Degang Chen; Yuhua Qian; Zhe Dong
Journal:  IEEE Trans Neural Netw Learn Syst       Date:  2017-06-23       Impact factor: 10.451

3.  Adaptive Unsupervised Feature Selection With Structure Regularization.

Authors:  Minnan Luo; Feiping Nie; Xiaojun Chang; Yi Yang; Alexander G Hauptmann; Qinghua Zheng
Journal:  IEEE Trans Neural Netw Learn Syst       Date:  2017-01-27       Impact factor: 10.451

4.  Valorization of groundnut shell via pyrolysis: Product distribution, thermodynamic analysis, kinetic estimation, and artificial neural network modeling.

Authors:  Abdul Hai; G Bharath; Muhammad Daud; K Rambabu; Imtiaz Ali; Shadi W Hasan; PauLoke Show; Fawzi Banat
Journal:  Chemosphere       Date:  2021-06-15       Impact factor: 7.086

5.  Unsupervised feature selection algorithm for multiclass cancer classification of gene expression RNA-Seq data.

Authors:  Pilar García-Díaz; Isabel Sánchez-Berriel; Juan A Martínez-Rojas; Ana M Diez-Pascual
Journal:  Genomics       Date:  2019-11-20       Impact factor: 5.736

Review 6.  A review of feature selection methods in medical applications.

Authors:  Beatriz Remeseiro; Veronica Bolon-Canedo
Journal:  Comput Biol Med       Date:  2019-07-31       Impact factor: 4.589

Review 7.  Smart microalgae farming with internet-of-things for sustainable agriculture.

Authors:  Hooi Ren Lim; Kuan Shiong Khoo; Wen Yi Chia; Kit Wayne Chew; Shih-Hsin Ho; Pau Loke Show
Journal:  Biotechnol Adv       Date:  2022-02-22       Impact factor: 14.227

8.  Sustainable smart photobioreactor for continuous cultivation of microalgae embedded with Internet of Things.

Authors:  Pei En Tham; Yan Jer Ng; Navintran Vadivelu; Hooi Ren Lim; Kuan Shiong Khoo; Kit Wayne Chew; Pau Loke Show
Journal:  Bioresour Technol       Date:  2021-12-11       Impact factor: 9.642

9.  Feature Selection Combining Information Theory View and Algebraic View in the Neighborhood Decision System.

Authors:  Jiucheng Xu; Kanglin Qu; Meng Yuan; Jie Yang
Journal:  Entropy (Basel)       Date:  2021-06-02       Impact factor: 2.524

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.