Literature DB >> 30931011

Ensemble of a subset of kNN classifiers.

Asma Gul^1,2, Aris Perperoglou¹, Zardad Khan^1,3, Osama Mahmoud¹, Miftahuddin Miftahuddin¹, Werner Adler⁴, Berthold Lausen¹.

Abstract

Combining multiple classifiers, known as ensemble methods, can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. We propose an ensemble of subset of kNN classifiers, ESkNN, for classification task in two steps. Firstly, we choose classifiers based upon their individual performance using the out-of-sample accuracy. The selected classifiers are then combined sequentially starting from the best model and assessed for collective performance on a validation data set. We use bench mark data sets with their original and some added non-informative features for the evaluation of our method. The results are compared with usual kNN, bagged kNN, random kNN, multiple feature subset method, random forest and support vector machines. Our experimental comparisons on benchmark classification problems and simulated data sets reveal that the proposed ensemble gives better classification performance than the usual kNN and its ensembles, and performs comparable to random forest and support vector machines.

Entities: Disease Gene

Keywords: Non-informative features; Bagging; Ensemble methods; Nearest neighbour classifier

Year: 2016 PMID： 30931011 PMCID： PMC6404785 DOI： 10.1007/s11634-015-0227-5

Source DB: PubMed Journal: Adv Data Anal Classif ISSN： 1862-5355

Introduction

In supervised classification tasks, the aim is to construct a predictor that assigns a class label to new observations. To do so the training data is utilized, where a class label is associated with each pattern. The class label of an observation is described by a feature vector. However, in many real life classification problems, one often encounters with imprecise data including non-informative features which dramatically increases the classification error of the algorithms (Nettleton et al. 2010). To overcome this problem feature selection methods are usually recommended before classification to mitigate the effect of such non-informative features (Liu et al. 2014; Mahmoud et al. 2014). These methods investigate the most discriminative features subset from the original features that increases classification performance of a classifier. However, different feature selection methods will result in different feature subsets for the same data set thus varying feature relevancy. This encourages combining the results of several best feature subsets. Combining multiple classifiers, known as ensemble techniques, have emerged as promising methods to improve the classification performance of weak learners and have gained a lot of interest in the last two decades (Barandela et al. 2013; Bauer and Kohavi 1999; Maclin and Opitz 2011; Melville et al. 2004). These techniques lead to substantial reduction in classification error in many real life applications and, in general, are more resilient to non-informative features in the data than using an individual model (Khoshgoftaar et al. 2011; Melville et al. 2004). One of the simplest ensemble technique is bootstrap aggregation (bagging), that combines the outputs of classifiers constructed on randomly-generated bootstrap training sets (Breiman 1996a). In bagging, B bootstrap samples are randomly drawn from the learning set, and a base learner is developed on each of these samples. A new observation is then classified by majority voting of these individual classifiers. Bagging has been used with numerous variations in the literature (Bauer and Kohavi 1999; Hothorn and Lausen 2003a, b). It is demonstrated that bagging can be used to improve the prediction accuracy of weak classifiers, such as decision trees (Breiman 1996a; Hothorn et al. 2004; Hothorn and Lausen 2005). One of the simplest and oldest methods for classification is the k nearest neighbours (kNN) classifier. It classifies an unknown observation to the class of majority among its k nearest neighbours observations, as measured by a distance metric, in the training data (Cover and Hart 1967; Guvenir and Akkus 1997). Despite its simplicity, kNN gives competitive results and in some cases even outperforms other complex learning algorithms. However, kNN is affected by non-informative features in the data, often the case with high dimensional data. Attempts have been made to improve the performance of nearest neighbours classifier by ensemble techniques. Some related work on ensemble of kNN classifiers can be found in Grabowski (2002), Domeniconi and Yan (2004), Zhou and Yu (2005), Hall and Samworth (2005) and Samworth (2012). An ensemble of nearest neighbour classifiers where each member classifier of the ensemble has access to a random feature subset only and the outcomes of these multiple nearest neighbour classifiers are combined for final decision is proposed in Bay (1998). A similar approach based on random feature subsets, random kNN based on the idea of random forest, is proposed for classification of high dimensional data sets (Li et al. 2011). Li et al. (2011) rank the features according to their importance and get a final set of features for the final model. In this manuscript we suggest an ensemble of subset of kNN classifiers (ESkNN) particularly to deal with the issue of non-informative features in a data set. We applied ESkNN to a benchmark and simulated classification problems and compare the results with those of simple kNN, bagged kNN (BkNN), random kNN (RkNN), ensemble based on multiple feature subset method (MFS), random forest (RF) and support vector machines (SVM). Experiments are carried out on the data sets with their original features set and with some added non-informative features.

Ensemble of subset of kNN classifiers

Let be a training set consisting of n independent observations, where is a d-dimensional feature vector and y is the vector of class labels; where , J being the total number of classes, here we consider the two class problem, thus . Based on this available data set , a classifier predicts the class label for a new/test observation with feature vector . Divide the training data in two parts, and , the first one for construction of the classifiers and the other part for validation. For simplicity we denote the set used for construction of the models by . Let us denote the d input features in by . For a given subset size, say l, where , a random subset of features , is drawn from . Based on the randomly selected features a bootstrap sample is drawn from . The new bootstrap learning set , consists of l dimensional feature vector. This process is repeated until we get m training sets, , each of dimensions. The base kNN classifier is constructed on these bootstrap training sets and a set of m classifiers is generated. While, drawing a random sample of the same size n from the training set, approximately of the observations are left out from that sample. These observations are called out-of-bag (OOB) observations, and can be utilized for estimation of the classification error (Breiman 1996b). In our framework we use the OOB sample for the assessment of the classifier. The m classifiers are then ranked according to their individual classification accuracy on the OOB sample and the first h of the m classifiers are selected from them. The selected classifiers are then assessed for their collective contribution as an ensemble on the validation set . This is done by starting from the best one among h classifiers and then adding one by one the rest of the classifiers to the ensemble. The formation of the ensemble of subset of kNN classifiers can be summarized as:These selected classifiers are further assessed as follows:The ensemble is formed in a two stage procedure by assessing the models using two different performance measures misclassification rate and Brier score. Draw a random sample of size , without replacement, of features from the feature vector of , denote the feature vector by . Based on the selected random feature subset , draw a random sample of size n, , from . Construct the kNN classifier on . Calculate the accuracy of the classifier on the OOB sample using the same feature set as used for its construction. Iterate step (1) to (4) m times and rank the m classifiers according to their accuracies. Select first h classifiers with highest accuracies. The ensemble is started with combining the second best classifier to the first best classifier, and classification performance is evaluated on the validation set . The ensemble is then grown by adding the third best classifier and the performance is measured, this process is carried out for all the h classifiers, let be the Brier score of the ensemble of selected best kNN models without the rth model and be the Brier score of the ensemble of the best models after including the rth model, then rth model is selected if In the first stage the classification models are evaluated using the misclassification rate (MR) as the performance measure. A classification model is desired to have minimum misclassification rate than others used for a classification task, and thus the classification models with a low misclassification rate are selected. In the second stage of the algorithm the selected models are further evaluated using the Brier score as a performance measure. The Brier score measures the difference between the observed state of the outcomes of the test instances and the estimated probabilities that are in turn used to classify new observations using some threshold. Besides the traditional misclassification rate and other metrics, Brier score can also be used to evaluate the predictive performance of a classifier. While using output of the classifier as a basis for decision making, a more detailed evaluation is required; where not only the prediction accuracy of the classifier should be considered but also the quality of the estimate needs ample consideration. That can be done through a score such as the Brier score that, in principle, measures the predictive ability/quality of a classifier in classifying new data (Hernández-Orallo et al. 2012; Steyerberg et al. 2010; Kruppa et al. 2014). Let the class labels of the test instances from the two classes, “positive” and “negative”, are represented by 0, or 1, i.e . The Brier score for the probabilities of the predicted class 1, , is:An estimator for the above score is:where, is the total number of test points and the state of the outcome is, . A low Brier score indicates better performance of the predictor. Thus the models minimizing the Brier score of the ensemble are selected. One technical reason for assessing the individually selected models, in the first stage, for their collective contribution using the Brier score is that this score is more capable of determining the contribution of a model, to be included in the ensemble, than the misclassification rate. To illustrate this, let the estimated probability of a test observation belonging to class 1, provided that class 1 is the true class, by a classifier c1 is given as:Suppose that the cut-off for assigning this observation to class 1 iswhich implies that the given observation belongs to class 1 and classification error will be 0 (correct classification). The Brier score in this case is 0.1936. Now consider that the second classifier gives the estimated probability for that observation as 0.68. The combined probability estimate of the two classifiers for the same observation, denoted by , is given as:Consequently, the Brier score decreases to 0.1444. The classification error in both the cases is 0 as that of a single classifier for the given cut-off. A third classifier has an estimated probability of 0.88, the resultant combined probability is:Here the Brier score decreases to 0.0841 while the classification error remains the same (0) as the previous ensemble of two classifiers for the given cut-off. This follows that if classification errors are considered for classifier addition into the ensemble, classifier c2 and c3 would not be part of the ensemble, as the error remains the same, whereas the Brier score reduces with the addition of classifiers c1 and c2 thus leading to an ensemble of size 3. The general pseudo code of ESkNN is given in Algorithm 1.

Simulation study

In addition to bench mark data sets we assessed ESkNN by simulation setups. We state two simulation models to assess the performance of ESkNN. The models proposed in our simulation study involve several variations to get an understanding of the behaviour of classifiers under different situations. The details of the two models are given below.

Simulation model 1

In this model, the performance of the classifiers is investigated in different setups. Firstly, the predictors of the two classes are generated with correlated and uncorrelated structures respectively. The variables for class 1 are correlated and generated with a varying variance covariance structure, while the features determining class 2 are independent. A total of 500 independent binary class data sets are generated, each with 20 features. The variables for class 1 are generated from , while those of class 2 generated from . The values considered for w in class 1 are 3, 5, 10, 15 and 20. The predictive performance of the algorithms are investigated by adding 50, 100, 200 and 500 non-informative features, generated from normal distribution, to the data. The variance covariance matrix , which is a matrix, is:where are the covariances given byand , on the diagonal of , is the variance, = 1 when w is 1. Changing the value of w results in different degree of correlation between variables. The data is generated in such a manner that the variables within Class 1 are correlated among each other and are exhibiting negligible/no correlation with the features from Class 2.

Simulation model 2

The second simulation model, model 2, is a four-dimensional model, derived from the model proposed in Mease et al. (2007). A set of 500 independent binary class data sets are generated each consisting of 1000 observations and 4 features. The feature vector is a four dimensional random vector uniformly distributed on [0, 100] and the response variable y with two outcomes 0 or 1. The class is determined by the distance r, the distance of feature vector from the central point. The class probabilities given features are:The response values are generated from the above distribution using a Bernoulli random number generator. We extend the dimensions of this model by adding 50, 100, 200 and 500 non-informative feature generated from uniform distribution. The data complexity increases with the increase in the number of added non-informative features.

Simulation results and discussion

The average misclassification rate, from model 1 and model 2, are presented in Tables 1, 2 and 3.

Table 1

Misclassification rate of the methods on the data sets with added non-informative features from model 1

Features	kNN	BkNN	RkNN	MFS	RF	SVM	ESkNN
20	0.050	0.047	0.046	0.048	0.052	0.043	0.044
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20+50$$\end{document}20+50	0.063	0.058	0.062	0.061	0.055	0.055	0.047
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20+100$$\end{document}20+100	0.076	0.067	0.071	0.066	0.066	0.057	0.046
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20+200$$\end{document}20+200	0.114	0.104	0.089	0.084	0.063	0.065	0.046
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20+500$$\end{document}20+500	0.146	0.127	0.142	0.112	0.062	0.084	0.046

The first column shows the number of non-informative features added to the data set. Results of the best performing method are highlighted in italics. The value of

Table 2

Misclassification rate of the classifiers on the data sets from model 1 for different values of w, on 70 features ( noninformative), listed in column 1

w	kNN	BkNN	RkNN	MFS	RF	SVM	ESkNN
3	0.198	0.196	0.185	0.168	0.084	0.103	0.147
5	0.221	0.213	0.182	0.169	0.058	0.115	0.162
10	0.225	0.198	0.114	0.104	0.026	0.100	0.114
15	0.200	0.180	0.057	0.061	0.012	0.086	0.076
20	0.185	0.164	0.035	0.041	0.008	0.077	0.039

Results of best performing methods for the corresponding value of w is shown in italics

Table 3

Misclassification rate of the methods on the data sets with added non-informative features from model 2

Features	kNN	BkNN	RkNN	MFS	RF	SVM	ESkNN
4	0.125	0.122	0.169	0.122	0.159	0.101	0.119
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4+50$$\end{document}4+50	0.170	0.170	0.175	0.169	0.193	0.164	0.163
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4+100$$\end{document}4+100	0.194	0.187	0.185	0.205	0.203	0.205	0.164
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4+200$$\end{document}4+200	0.242	0.232	0.201	0.216	0.199	0.443	0.175
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4+500$$\end{document}4+500	0.276	0.269	0.231	0.249	0.211	0.524	0.191

The first column shows the number of non-informative features added to the data set. Results of the best performing method is shown in italic font

Misclassification rate of the methods on the data sets with added non-informative features from model 1 The first column shows the number of non-informative features added to the data set. Results of the best performing method are highlighted in italics. The value of The results from model 1, in Table 1 indicate that the classification accuracy of ESkNN is higher than all the other methods on most of the cases except for the data with original 20 features where SVM outperforms all the methods. The table reveals that unsurprisingly, kNN shows high error rate compared to other methods and the performance of kNN based methods declines with the increasing number of non-informative features in the data where as ESkNN still perform better. In case of the data set with original features SVM performs better, by giving minimum misclassification rate, as compared to all the other methods. Misclassification rate of the classifiers on the data sets from model 1 for different values of w, on 70 features ( noninformative), listed in column 1 Results of best performing methods for the corresponding value of w is shown in italics Misclassification rate of the methods on the data sets with added non-informative features from model 2 The first column shows the number of non-informative features added to the data set. Results of the best performing method is shown in italic font From Table 2, there is an increase of misclassification rate of all the classifiers, except random forest. It can be observed that the prediction performance of the kNN based classification methods and SVM decrease with high variance and covariance of the data, i.e., for increasing values of w. However random forest gives better classification accuracy in this case. Although the performance of kNN based methods declines, ESkNN consistently perform better than the other methods except from random forest in such situations. Misclassification rate, of simulated data from model 2 with added non-informative features. a 50 added non-informative features; b 100 added non-informative features; c 200 added non-informative features; d 500 added non-informative features The results of model 2 from Table 3 reveal that ESkNN consistently outperform the other methods in the presence of non-informative features in the data, however, in the case of data with original features only, SVM is giving the best result and in case of 100 features ESkNN gives better results than other methods and comparable to SVM. Bagged kNN provide same results as usual kNN on the data with 4 features and slight accuracy gain is achieved than the usual kNN on the data with added no-informative features (Fig. 1).

Fig. 1

Misclassification rate, of simulated data from model 2 with added non-informative features. a 50 added non-informative features; b 100 added non-informative features; c 200 added non-informative features; d 500 added non-informative features

Experiments on bench mark data sets

The performance of the proposed method in terms of misclassification rate, is evaluated on a total of 31 benchmark data sets. The data sets chosen include a wide range of domain that is microarray gene expression data sets, data sets from life science, finance and physical science. “Diabetes” and “Sonar”, data sets are from R-packages “mlbench” (Leisch and Dimitriadou 2010); ‘dystrophy” and “Glaucoma” are from “ipred” (Peters and Hothorn 2012). All the other data sets are from UCI (Bache and Lichman 2013). Summary of the data sets is given in Table 4.

Table 4

Summary of the data sets

Data sets	Sample size	Features	Feature type (continuous/discrete/categorical)
Haberman	306	3	(0/3/0)
Dystrophy	164	5	(2/3/0)
Mammographic	830	5	(0/5/0)
Transfusion	748	5	(2/3/0)
Phoneme	1000	5	(5/0/0)
Bupa	345	6	(1/5/0)
Appendicitis	106	7	(7/0/0)
Diabetes	768	8	(8/0/0)
Biopsy	683	9	(0/9/0)
SAheart	462	9	(5/3/1)
Indian liver	579	10	(5/4/1)
Solar-Flare	322	12	(0/10/2)
Credit approval	690	15	(2/13/0)
House vote	232	17	(0/0/17)
Bands	365	19	(13/6/0)
Hepatitis	80	19	(2/17/0)
Two norms	1000	20	(20/0/0)
German credit	1000	20	(0/7/13)
Body	507	24	(24/0/0)
WPBC	194	33	(31/2/0)
Sonar	208	60	(60/0/0)
Glaucoma	196	61	(61/0/0)
Musk	476	166	(0/166/0)

Number of observations, features and feature type. The first 8 are microarray data sets, the rest are from life, finance, physical, and social science

Summary of the data sets Number of observations, features and feature type. The first 8 are microarray data sets, the rest are from life, finance, physical, and social science Misclassification rate of kNN, RkNN, BkNN, MFS, RF, SVM and ESkNN The results of best performing methods on the corresponding data set are highlighted in italics

Experimental setup

The performance of the ESkNN is evaluated on a total of 23 data sets. The ESkNN is evaluated in two scenarios on benchmark data sets; in case of benchmark data sets with their original features and then adding non-informative features to the data sets. The performance of ESkNN in terms of misclassification rate is compared with usual kNN, bagged kNN, random kNN, MFS, random forest and SVM. Each data set is divided into test and training sets, 90 % of the total data is used for the training and 10 % for testing. The same test and training set is used for all the methods and the results are averaged over a total of 1000 such splits. All the experiments are carried out using R (R Core Team 2013). The value of k for , is selected by tenfold cross validation using the R-Package “e1071” for the kNN based methods (Meyer et al. 2012). Random forest is tuned by using R-function “tune.randomForest” available within the same package. For SVM we used “kernlab” R-Package (Karatzoglou et al. 2004). For tuning sigma for SVM, we used the automatic selection available with the “kernlab” R package. The other parameters are fixed at default values. Total of 1001, kNN models are generated on bootstrap samples and then 40 % of the total are reselected for the second stage. The number of models generated is taken an odd number to break ties in voting on the classifiers for classification of a test point. The feature subset size is set to one-third of the input features, however, in low dimensions, in case of original features in the data, i.e., the feature subset size is taken as 2.

Results and discussion

The results on the data sets with their original features and with added 500 randomly generated non-informative features are reported in Tables 5 and 6 respectively. The results from Table 5, show that ESkNN outperform or giving comparable results to other methods considered here. It is interesting to note that in case of the data sets with their original features ESkNN consistently outperform the kNN based methods on most of the data sets and gives comparable results to random forest. ESkNN gives overall better results on 8 data sets, on 9 data sets random forest is better than all the methods, on 5 data sets SVM is giving minimum classification error and on one data sets RkNN outperforms the rest of the methods.

Table 5

Misclassification rate of kNN, RkNN, BkNN, MFS, RF, SVM and ESkNN

Data sets	kNN	BkNN	RkNN	MFS	RF	SVM	ESkNN
Haberman	0.243	0.24	0.255	0.241	0.271	0.325	0.237
Dystrophy	0.117	0.118	0.121	0.110	0.115	0.099	0.105
Mammographic	0.190	0.193	0.178	0.183	0.167	0.191	0.174
Transfusion	0.233	0.235	0.23	0.225	0.217	0.317	0.218
Phoneme	0.167	0.184	0.171	0.174	0.145	0.204	0.132
Bupa	0.320	0.327	0.219	0.327	0.271	0.319	0.319
Appendicitis	0.142	0.139	0.144	0.149	0.145	0.224	0.128
Diabetes	0.264	0.259	0.263	0.262	0.233	0.27	0.256
Biopsy	0.032	0.0311	0.028	0.039	0.027	0.058	0.020
SAheart	0.336	0.334	0.343	0.337	0.289	0.307	0.317
Indian liver	0.314	0.320	0.290	0.312	0.293	0.373	0.286
Solar-flare	0.027	0.026	0.025	0.026	0.025	0.042	0.022
Credit Approval	0.319	0.317	0.336	0.194	0.123	0.142	0.166
House Vote	0.082	0.082	0.089	0.072	0.036	0.033	0.042
Bands	0.389	0.393	0.342	0.383	0.265	0.367	0.350
Hepatitis	0.423	0.372	0.288	0.362	0.276	0.146	0.321
Two Norms	0.040	0.039	0.029	0.036	0.04	0.026	0.033
German Credit	0.307	0.306	0.296	0.308	0.23	0.291	0.286
Body	0.023	0.024	0.036	0.025	0.037	0.016	0.020
WPBC	0.241	0.240	0.235	0.244	0.196	0.285	0.235
Sonar	0.179	0.179	0.157	0.189	0.161	0.169	0.147
Glaucoma	0.193	0.193	0.192	0.196	0.105	0.122	0.176
Musk	0.142	0.142	0.113	0.114	0.110	0.133	0.103

The results of best performing methods on the corresponding data set are highlighted in italics

Table 6

Misclassification rate of kNN, RkNN, BkNN, MFS, RF, SVM and ESkNN with added non-informative features to the data sets

Data sets	kNN	BkNN	RkNN	MFS	RF	SVM	ESkNN
Haberman	0.278	0.274	0.279	0.269	0.263	0.429	0.260
Dystrophy	0.249	0.248	0.291	0.237	0.118	0.252	0.204
Mammographic	0.217	0.223	0.180	0.225	0.158	0.527	0.189
Transfusion	0.238	0.237	0.237	0.239	0.236	0.517	0.230
Phoneme	0.279	0.279	0.252	0.351	0.269	0.538	0.243
Bupa	0.362	0.352	0.389	0.376	0.342	0.560	0.330
Appendicitis	0.207	0.209	0.277	0.209	0.150	0.215	0.197
Diabetes	0.358	0.354	0.349	0.348	0.248	0.530	0.328
Biopsy	0.065	0.067	0.086	0.102	0.027	0.067	0.052
SAheart	0.414	0.395	0.349	0.347	0.345	0.509	0.345
Indian liver	0.316	0.315	0.286	0.286	0.286	0.519	0.275
Solar-flare	0.027	0.022	0.021	0.025	0.022	0.022	0.022
Credit approval	0.354	0.354	0.320	0.345	0.322	0.546	0.317
House vote	0.128	0.125	0.126	0.112	0.032	0.109	0.095
Bands	0.405	0.396	0.358	0.354	0.359	0.549	0.343
Hepatitis	0.362	0.371	0.380	0.410	0.387	0.160	0.333
Two norms	0.047	0.045	0.038	0.052	0.038	0.052	0.034
German credit	0.308	0.305	0.301	0.371	0.285	0.517	0.300
Body	0.098	0.098	0.099	0.098	0.049	0.092	0.088
WPBC	0.262	0.251	0.235	0.235	0.235	0.252	0.225
Sonar	0.164	0.164	0.161	0.225	0.242	0.314	0.156
Glaucoma	0.256	0.249	0.242	0.272	0.154	0.236	0.242
Musk	0.184	0.182	0.169	0.168	0.165	0.290	0.161

The results of best performing methods on the corresponding data set are highlighted in italics

In case of non-informative features in the data, Table 6, on 11 data sets ESkNN gives minimum classification error than the other methods, on 9 data set RF is giving best classification performance and on one data set SVM is giving better results and on two data sets their is no clear winner between random forest and ESkNN, however, ESkNN gives better performance than kNN based methods and SVM. Here again, it is observed that ESkNN results in smaller classification error than kNN based methods on most of the data sets. Misclassification rate of kNN, RkNN, BkNN, MFS, RF, SVM and ESkNN with added non-informative features to the data sets The results of best performing methods on the corresponding data set are highlighted in italics

Conclusion and outlook

Considering the idea of ensemble techniques, we have proposed an ensemble of subset of kNN classifiers (ESkNN) for classification tasks particularly to deal with the issue of non-informative features in the data sets. Our approach consists of forming an ensemble of best kNN models thus implicitly digging out the informative features subsets and discarding the non-informative ones. ESkNN is assessed for its classification performance on simulated and benchmark data sets. Our results on simulated and benchmark data sets show that the ESkNN gives comparable results to RF and outperform kNN and kNN based ensembles. The results from the simulations, Table 2, reveal that in case of high variance in the classes RF performs better than the others. Random projection ensemble classification (Cannings and Samworth 2015) may allow further improvements. Moreover, it would be of interest to investigate if recent proposals as predictive hubs (Lausser et al. 2014) and representative prototypes (Müssel et al. 2015) can be exploited to develop ESkNN further. ESkNN is implemented and available as R-Package “ESkNN” on CRAN (Gul et al. 2015).

7 in total

1. Visible Particle Identification Using Raman Spectroscopy and Machine Learning.

Authors: Han Sheng; Yinping Zhao; Xiangan Long; Liwen Chen; Bei Li; Yiyan Fei; Lan Mi; Jiong Ma
Journal: AAPS PharmSciTech Date: 2022-07-06 Impact factor: 3.246

2. Morphological analysis of subcortical structures for assessment of cognitive dysfunction in Parkinson's disease using multi-atlas based segmentation.

Authors: S Sivaranjini; C M Sujatha
Journal: Cogn Neurodyn Date: 2021-03-14 Impact factor: 3.473

3. The Optimized Multi-Scale Permutation Entropy and Its Application in Compound Fault Diagnosis of Rotating Machinery.

Authors: Xianzhi Wang; Shubin Si; Yu Wei; Yongbo Li
Journal: Entropy (Basel) Date: 2019-02-12 Impact factor: 2.524

4. Classification of Fatigue Phases in Healthy and Diabetic Adults Using Wearable Sensor.

Authors: Lilia Aljihmani; Oussama Kerdjidj; Yibo Zhu; Ranjana K Mehta; Madhav Erraguntla; Farzan Sasangohar; Khalid Qaraqe
Journal: Sensors (Basel) Date: 2020-12-03 Impact factor: 3.576

5. Sleep Apnea Classification Algorithm Development Using a Machine-Learning Framework and Bag-of-Features Derived from Electrocardiogram Spectrograms.

Authors: Cheng-Yu Lin; Yi-Wen Wang; Febryan Setiawan; Nguyen Thi Hoang Trang; Che-Wei Lin
Journal: J Clin Med Date: 2021-12-30 Impact factor: 4.241

6. Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments.

Authors: Muhammad Hamraz; Naz Gul; Mushtaq Raza; Dost Muhammad Khan; Umair Khalil; Seema Zubair; Zardad Khan
Journal: PeerJ Comput Sci Date: 2021-06-01

Review 7. Fault Handling in Industry 4.0: Definition, Process and Applications.

Authors: Heiko Webert; Tamara Döß; Lukas Kaupp; Stephan Simons
Journal: Sensors (Basel) Date: 2022-03-12 Impact factor: 3.576

7 in total