Literature DB >> 25484912

An efficient diagnosis system for Parkinson's disease using kernel-based extreme learning machine with subtractive clustering features weighting approach.

Chao Ma1, Jihong Ouyang1, Hui-Ling Chen2, Xue-Hua Zhao1.   

Abstract

A novel hybrid method named SCFW-KELM, which integrates effective subtractive clustering features weighting and a fast classifier kernel-based extreme learning machine (KELM), has been introduced for the diagnosis of PD. In the proposed method, SCFW is used as a data preprocessing tool, which aims at decreasing the variance in features of the PD dataset, in order to further improve the diagnostic accuracy of the KELM classifier. The impact of the type of kernel functions on the performance of KELM has been investigated in detail. The efficiency and effectiveness of the proposed method have been rigorously evaluated against the PD dataset in terms of classification accuracy, sensitivity, specificity, area under the receiver operating characteristic (ROC) curve (AUC), f-measure, and kappa statistics value. Experimental results have demonstrated that the proposed SCFW-KELM significantly outperforms SVM-based, KNN-based, and ELM-based approaches and other methods in the literature and achieved highest classification results reported so far via 10-fold cross validation scheme, with the classification accuracy of 99.49%, the sensitivity of 100%, the specificity of 99.39%, AUC of 99.69%, the f-measure value of 0.9964, and kappa value of 0.9867. Promisingly, the proposed method might serve as a new candidate of powerful methods for the diagnosis of PD with excellent performance.

Entities:  

Mesh:

Year:  2014        PMID: 25484912      PMCID: PMC4251425          DOI: 10.1155/2014/985789

Source DB:  PubMed          Journal:  Comput Math Methods Med        ISSN: 1748-670X            Impact factor:   2.238


1. Introduction

Parkinson's disease (PD) is one degenerative disease of the nervous system, which is characterized by a large group of neurological conditions called motor system disorders because of the loss of dopamine-producing brain cells. The main symptoms of PD are given as follows: (1) tremor or trembling in hands, arms, legs, jaw, or head, (2) rigidity or stiffness of the limbs and trunk, (3) bradykinesia or slowness of movement, (4) postural instability or impaired balance (http://www.ninds.nih.gov/research/parkinsonsweb/index.htm, last accessed: April 2012). At present, PD has an impact on about 1% of the worldwide population over the age of 50; however, this proportion is on the increase as people live longer [1]. Till now, PD has no medical treatment and some dedication is only available for relieving the symptoms of disease [2]. It is so important that we gain more of insight into the problem and improve our methods to deal with PD. Here we focus on the study based on dysphonia, which is known as a group of vocal impairment symptoms; it is reported to be one of the most significant symptoms of PD [3]. The researches have shown that about 90% of people with PD have such vocal evidence. The dysphonic indicators of PD make speech measurements as an important part of diagnosis [4]. Dysphonic measures have been proposed as a reliable tool to detect and monitor PD [5, 6]. Previous studies on the PD problem based on machine learning methods have been undertaken by various researchers. Little et al. [6] used support vector machine (SVM) classifier with Gaussian radical basis kernel function to predict PD, by means of feature selection method to reduce the feature space, and best accuracy rate of 91.4% was obtained by the proposed model. Shahbaba and Neal [7] presented a nonlinear model based on Dirichlet mixtures for the PD classification, compared with multinomial logit models, decision trees, and SVM; the classification accuracy of 87.7% was achieved by the proposed model. Das [8] used a comparative study of neural networks (NN), DMneural, regression and decision trees for the diagnosis of PD; the experiment results had shown that the NN method achieved the overall classification performance of 92.9%. Sakar and Kursun [9] used mutual information measure to combine with SVM for the diagnosis of PD and achieved the classification result of 92.75%. Psorakis et al. [10] introduced sample selection strategies and model improvements for multiclass multikernel relevance vector machines and achieved the classification accuracy of 89.47% in the PD dataset. Guo et al. [11] combined genetic programming and the expectation maximization (EM) to diagnose PD in the ordinary feature data and achieved the classification accuracy of 93.1%. Luukka [12] proposed a new method which used fuzzy entropy measures to combine with the similarity classifier to predict PD, and the mean classification of 85.03% was achieved. Li et al. [13] introduced a fuzzy-based nonlinear transformation approach together with SVM in the PD dataset; best classification accuracy of 93.47% was obtained. Ozcift and Gulten [14] combined the correlation based feature selection method with the rotation forest ensemble classifier of 30 machine learning algorithms to distinguish PD; the proposed model got best classification accuracy of 87.13%. Åström and Koker [15] achieved highest classification accuracy of 91.2% by using a parallel neural network model for PD diagnosis. Spadoto et al. [16] adopted evolutionary based method together with the optimum-path forest (OPF) classifier for PD diagnosis, and best classification accuracy of 84.01% was obtained. Polat [17] applied the fuzzy C-means (FCM) clustering feature weighting (FCMFW) together with the k-nearest neighbor classifier for detecting PD; the classification accuracy of 97.93% was obtained. Chen et al. [18] proposed a model which used the principle component analysis based feature extraction together with the fuzzy k-nearest neighbor method to predict PD and achieved best classification accuracy of 96.07% by the proposed model. Daliri [19] presented a chi-square distance kernel-based SVM to discriminate the subjects with PD from the healthy control subjects using gait signals, and the classification result of 91.2% was obtained. Zuo et al. [20] used a new diagnosis model based on particle swarm optimization (PSO) to strengthen the fuzzy k-nearest neighbor classifier for the diagnosis of PD, and the mean classification accuracy of 97.47% was achieved. From these works, it can be seen that most of the common classifiers from machine learning community have been used for PD diagnosis. For the nonlinear classification problems, the data preprocessing methods such as feature weighting, normalization, and feature transformation could increase the performance of alone classifier algorithm. So it is obvious that the choice of an efficient feature preprocessing method and an excellent classifier is of significant importance for the PD diagnosis problem. Aiming at improving the efficiency and effectiveness of the classification performance for the diagnosis of PD, in this paper, an efficient features weighting method called subtractive clustering features weighting (SCFW) and a fast classification algorithm named kernel-based extreme learning machine (KELM) are examined. The SCFW method is used to map the features according to data distributions in dataset and transform linearly nonseparable dataset to linearly separable dataset. In this way, the similar data within each feature are prone to getting together so that the distinction between classes is increased to classify the PD datasets correctly. It is reported that SCFW method can help improve the discrimination abilities of classifiers in many applications, such as traffic accident analysis [21] and medical datasets transformation [22]. KELM is the improved version of ELM algorithm based on kernel function [23]. The advantage of KELM is that only two parameters (the penalty parameter C and the kernel parameter γ) need to be adjusted, unlike ELM which needs to specify the suitable values of weights and biases for improving the generalization performance [24]. Furthermore, KELM not only trains as fast as that of ELM, but also can achieve good generalization performance. The objective of the proposed method is to explore the performance of PD diagnosis using a two-stage hybrid modeling procedure via integrating SCFW with KELM. Firstly the proposed method adopts SCFW to construct the discriminative feature space through weighting features, and then the achieved weighted features serve as the input of the trained KELM classifier. To evaluate the performance of proposed hybrid method, classification accuracy (ACC), sensitivity, specificity, AUC, f-measure, and kappa statistic value have been used. Experimental results have shown that the proposed method achieves very promising results based on proper kernel function by 10-fold cross validation (CV). The main contributions of this paper are summarized as follows. It is the first time that we have proposed to integrate SCFW approach with KELM classifier to detect PD in an efficient and effective way. In the proposed system, SCFW method is employed as data preprocessing tool to strengthen the discrimination between classes for further improving the distinguishing performance of KELM classifier. Compared with the existing methods in previous studies, the proposed diagnostic system has achieved excellent classification results. The rest of the paper is organized as follows. Section 2 offers brief background knowledge on SCFW and KELM. The detailed implementations of the diagnosis system are presented in Section 3. In the next section, the detailed experiment design is described, and Section 5 gives the experiment results and discussions of the proposed method. Finally, conclusions and recommendations for future work are summarized in Section 6.

2. The Theoretical Background of the Related Methods

2.1. Subtractive Clustering Features Weighting (SCFW)

Subtractive clustering is the improved version of mountain clustering algorithm. The problem of mountain clustering is that its calculation grows exponentially with the dimension of the problem. Subtractive clustering has solved this problem using data points as the candidates for cluster centers, instead of grid points as in mountain clustering, so the calculation cost is proportional to the problem size instead of the problem dimension [25]. The subtractive clustering algorithm can be briefly summarized as follows:

Step 1 .

Consider a collection of n data points {x 1, x 2,…, x } in M-dimensional space. Since each data point is a candidate for cluster center, the density measure at data point x is defined as where r is a positive constant defining a neighborhood radius; it is used to determine the number of cluster centers. So, a data point will have a high density value if it has many neighboring data points. The data points outside the neighborhood radius contribute slightly to the density measure. Here, r is set to 0.5.

Step 2 .

After the density measure of each data point has been calculated, the data point with the highest density measure is selected as the first cluster center. Let X be the point selected and D the density measure. Next, the density measure for each data point x is revised as follows: where r is a positive constant and r = η · r , η is a constant greater than 1 to avoid cluster centers being in too close proximity. In this paper, r is set to 0.8.

Step 3 .

After the density calculation for each data point is revised, the next cluster center X is selected and all the density calculations for data point are revised again. The process is repeated until a sufficient number of cluster centers are generated. For SCFW method, firstly the cluster centers of each feature are calculated by using subtractive clustering. After calculating the centers of features, the ratios of means of features to their cluster centers are calculated and these ratios are multiplied with the data of each feature [21]. The pseudocode of SCFW method is given in Algorithm 1, and the flowchart of weighting process is shown in Figure 1.
Algorithm 1

Pseudocode for weighting features based on subtractive clustering method.

Figure 1

The flowchart of SCFW algorithm.

2.2. Kernel-Based Extreme Learning Machine (KELM)

ELM is an algorithm originally developed for training single hidden layer feed-forward neural networks (SLFNs) [26]. The essence of ELM is that parameters of hidden neurons in neural network are randomly created instead of being tuned and then fixed the nonlinearities of the network without iteration. Figure 2 shows the structure of ELM.
Figure 2

The structure of ELM.

For given N samples (x, y) having L hidden neurons and activation function h(x), the output function of ELM is defined as follows: where β = [β 1, β 2,…, β ] is the output weight connecting hidden nodes to output nodes. H = {h }  (i = 1,…, N and j = 1,…, L) is the hidden layer output matrix of neural network. h(x) actually maps the data from the d-dimensional input space to the L-dimensional hidden layer feature space H, and thus, h(x) is indeed a feature mapping. The determination of the output weights is calculated by the least square method: where H + is the Moore-Penrose generalized inverse [26] of the hidden layer output matrix H. To improve the generalization capabilities of ELM in comparison with the least square solution-based ELM, Huang et al. [23] proposed kernel-based method for the design of ELM. They suggested adding a positive value 1/C (where C is a user-defined parameter) for calculating the output weights such that Therefore, the output function is expressed as follows: When the hidden feature mapping function h(x) is unknown, a kernel matrix for ELM is used according to the following equation: where K(x , x ) is a kernel function. Many kernel functions, such as linear, polynomial, and radial basis function, can be used in kernel-based ELM. Now the output function of KELM classifier can be expressed as

3. The Proposed SCFW-KELM Diagnosis System

This work proposes a novel hybrid method for PD diagnosis. The proposed model is comprised of two stages as shown in Figure 3. In the first stage, SCFW algorithm is firstly applied to preprocess data in the PD dataset. The purpose of this method is to map the features according to their distributions in dataset and to transform from linearly nonseparable space to linearly separable one. With this method, similar data in the same feature are gathered, which will substantially help improve the discrimination ability of classifiers. In the next stage, KELM is evaluated on the weighted feature space with different types of activation functions to perform the classification. Finally, the best parameters and the suitable activation function are obtained based on the performance analysis. The detailed pseudocode of the hybrid method is given in Algorithm 2.
Figure 3

The overall procedure of the proposed hybrid diagnosis system.

Algorithm 2

Pseudocode for the proposed model.

4. Experimental Design

4.1. Data Description

In this section, we have performed the experiments in the PD dataset taken from University of California Irvine (UCI) machine learning repository (http://archive.ics.uci.edu/ml/datasets/Parkinson, last accessed: January 2013). It was created by Max Little of the University of Oxford, in collaboration with the National Centre for Voice and Speech, Denver, Colorado, who recorded the speech signals. The purpose of PD dataset is to discriminate healthy people from those with PD, given the results of various medical tests carried out on a patient. The PD dataset consists of voice measurements from 31 people of which 23 were diagnosed with PD. There are 195 instances comprising 48 healthy and 147 PD cases in the dataset. The time since diagnoses ranged from 0 to 28 years, and the ages of the subjects ranged from 46 to 85 years (mean 65.8). Each subject provides an average of six phonations of the vowel (yielding 195 samples in total), each 36 seconds in length [6]. Note that there are no missing values in the dataset and the whole features are real value. The whole 22 features along with description are listed in Table 1.
Table 1

The details of the whole 22 features of the PD dataset.

LabelFeatureDescription
F1MDVP: Fo (Hz)Average vocal fundamental frequency
F2MDVP: Fhi (Hz)Maximum vocal fundamental frequency
F3MDVP: Flo (Hz)Minimum vocal fundamental frequency
F4MDVP: Jitter (%)Several measures of variation in fundamental frequency
F5MDVP: Jitter (Abs)
F6MDVP: RAP
F7MDVP: PPQ
F8Jitter: PPQ
F9MDVP: ShimmerSeveral measures of variation in amplitude
F10MDVP: Shimmer (dB)
F11Shimmer: APQ3
F12Shimmer: APQ5
F13MDVP: APQ
F14Shimmer: DDA
F15NHRTwo measures of ratio of noise to tonal components in the voice
F16HNR
F17RPDETwo nonlinear dynamical complexity measures
F18D2
F19DFASignal fractal scaling exponent
F20Spread1Three nonlinear measures of fundamental frequency variation
F21Spread2
F22PPE

4.2. Experimental Setup

The proposed SCFW-KELM classification model was carried out on the platform of MATLAB 7.0. The SCFW algorithm was implemented from scratch. For KELM and ELM, the implementation available from http://www3.ntu.edu.sg/home/egbhuang/ was used. For SVM, LIBSVM implementation was used, which was originally developed by Chang and Lin [27]. The empirical experiment was conducted on Intel Dual-Core TM (2.0 GHz CPU) with 2 GB of RAM. In order to guarantee the valid results, k-fold CV was used to evaluate the classification results [28]. Each time, nine of ten subsets were put together to form a training set and the other subset was used as the test set. Then the average result across all 10 trials was calculated. Thanks to this method, all the test sets were independent and the reliability of the results could be improved. Because of the arbitrariness of partition of the dataset, the predicted results of model at each iteration were not necessarily the same. To evaluate accurately the performance of the PD dataset, the experiment was repeated 10 times and then the results were averaged.

4.3. Measure for Performance Evaluation

In order to evaluate the prediction performance of SCFW-KELM model, we used six performance metrics, ACC, sensitivity, specificity, AUC, f-measure, and kappa statistic value, to test the performance of the proposed model. About the mentioned performance evaluation formulations are defined as follows according to the confusion matrix which is shown in Table 2:
Table 2

The confusion matrix.

Predicted patients with PDPredicted healthy persons
Actual patients with PDTrue positive (TP)False negative (FN)
Actual healthy personsFalse positive (FP)True negative (TN)
In the confusion matrix, TP is the number of true positives, which represents that some cases with PD class are correctly classified as PD. FN is the number of false negatives, which represents that some cases with the PD class are classified as healthy. TN is the number of true negatives, which represents that some cases with the healthy class are correctly classified as healthy and FP is the number of false positives, which represents that some cases with the healthy class are classified as PD. ACC is a widely used metric to determine class discrimination ability of classifiers. The receiver operating characteristic (ROC) curve is usually plotted using true positives rate versus false positives rate, as the discrimination threshold of classification algorithm is varied. The area under ROC curve (AUC) is widely used in classification studies with relevant acceptance and it is a good summary of the performance of the classifier [29]. Also f-measure is a measure of a test's accuracy, which is usually used as performance evaluation metric to assess the performance of binary classifier, based on the harmonic mean for the classifier's precision and recall. Kappa error (KE) or Cohen's kappa statistics (KS) is adopted to compare the performances of different classifiers. KS is a good measure to inspect classifications that may be due to chance. As KS value calculated for classifiers closer to 1, the performance of classifier is assumed to be more realistic rather than being by chance. Thus, KS value is a recommended metric to consider for evaluation in the performance analysis of classifiers and it is calculated with [30] where P(A) means total agreement probability and P(E) means agreement probability due to chance.

5. Experimental Results and Discussions

Experiment 1 (classification in the PD dataset).

In this experiment, we firstly evaluated KELM in the original feature space without SCFW. It is known that different types of kernel activation functions have great influence on the performance of KELM. Therefore, we presented the results from our investigation on the influence of different types of kernel functions and assigned initial values for them. We tried to perform four types of kernel functions, including radial basis function (RBF_kernel), wavelet kernel function (Wav_kernel), linear kernel function (Lin_kernel), and polynomial kernel function (Poly_kernel). Table 3 summarized the detailed results of classification performance in the PD dataset in terms of ACC, sensitivity, specificity, AUC, f-measure, and KS value, and these results were achieved via 10-fold CV scheme and represented in the form of average accuracy (Mean), standard deviation (SD), maximal accuracy (Max), and minimal accuracy (Min). From this table, it can be seen that the classification performance of KELM with various kernel functions is apparently differential. The best kernel function of KELM classifier in discriminating the PD dataset was RBF kernel function. We can see that KELM with RBF kernel outperforms that with the other three kernel functions with a mean accuracy of 95.89%, 96.35%, 95.72%, and 96.04% in terms of ACC, sensitivity, specificity, and AUC and has also got f-measure value of 0.9724 and KS value of 0.8925. KELM with wavelet kernel has obtained the average results of 94.36%, 91.24%, 95.25%, and 93.19% in terms of ACC, sensitivity, specificity, and AUC and got f-measure value of 0.9622 and KS value of 0.8425, lower than those of KELM with RBF kernel. The worse results of classification performance obtained by KELM with polynomial kernel and KELM with linear kernel were successively given. Noting training KELM with kernel functions instead of sigmoid additive function of ELM, the number of hidden neurons has no influence on the performance of KELM model, so it does not need to be considered.
Table 3

Results of KELM with different types of kernel functions in the original PD dataset without SCFW.

Kernel typePerformance metricsMeanSDMaxMin
RBF_kernelACC (%) 95.89 4.6610089.74
Sensitivity (%) 96.35 5.1910088.89
Specificity (%) 95.72 5.9310088.00
AUC (%) 96.04 4.0610090.43
f-measure 0.9724
Kappa 0.8925

Wav_kernelACC (%)94.364.5910087.18
Sensitivity (%)91.246.0210083.33
Specificity (%)95.155.2310086.21
AUC (%)93.194.5610088.10
f-measure0.9622
Kappa0.8425

Lin_kernelACC (%)89.237.9997.4479.49
Sensitivity (%)66.0722.3390.9141.67
Specificity (%)97.322.8010093.33
AUC (%)81.7012.2295.4568.89
f-measure0.9316
kappa0.6333

Poly_kernelACC (%)90.774.2997.4487.18
Sensitivity (%)87.7311.5410075.00
Specificity (%)91.835.7396.7782.76
AUC (%)89.785.7898.3982.66
f-measure0.9375
kappa0.7547
To investigate whether SCFW method can improve the performance of KELM, we further conducted the model in the PD dataset in the weighted feature space by SCFW. The proposed system consisted of two stages. Firstly, SCFW approach was used to weight the features of PD dataset. By using SCFW method, the weighted feature space was constructed. Table 4 listed the cluster centers of the features in the PD dataset using SCFW method. Figure 4 depicted the box graph representation of the original and weighted PD dataset with the whole 22 features. Figure 5 showed the distribution of two classes of the original and weighted 195 samples formed by the best three principle components obtained with the principle component analysis (PCA) algorithm [31]. From Figures 4 and 5, it can be seen that the discriminative ability of the original PD dataset has been improved substantially by SCFW approach. After data preprocessing stage, the classification algorithms have been used and discriminated the weighted PD dataset.
Table 4

The cluster centers of the features of PD dataset using SCFW method.

Number of featureCenters of the features using SCFW (normal case)Centers of the features using SCFW (PD case)
F1 154.229181.938
F2 197.105223.637
F3 116.325145.207
F4 0.0060.006
F500
F6 0.0030.003
F7 0.0030.003
F8 0.010.01
F90.030.03
F100.2820.276
F110.0160.015
F12 0.0180.018
F13 0.0240.013
F14 0.0470.045
F15 0.0250.028
F16 21.88624.678
F17 0.4990.443
F18 0.7180.696
F19 −5.684−6.759
F20 0.2270.161
F21 2.3822.155
F22 0.2070.123
Figure 4

The box graph representation of the original and weighted PD dataset.

Figure 5

Three-dimensional distribution of two classes in the original and weighted feature space by the best three principle components obtained with PCA method.

The detailed results obtained by SCFW-KELM with four types of different kernel functions were presented in Table 5. As seen from Table 5, all these best results were much higher than the ones obtained in the original feature space without SCFW. The classification performance in the PD dataset has significantly improved by using SCFW method. Compared with KELM with RBF kernel function in the original feature space, KELM with RBF kernel based on SCFW method increased the performance by 3.6%, 3.65%, 3.67%, and 3.65% in terms of ACC, sensitivity, specificity, and AUC and has obtained highest f-measure value of 0.9966 and highest KS value of 0.9863. The KELM models with the other three kernel functions also have got great improvements in terms of six performance metrics.
Table 5

Results of SCFW-KELM with different types of kernel functions in the PD dataset.

Kernel typePerformance metricsMeanSDMaxMin
RBF_kernelACC (%) 99.49 1.1510097.44
Sensitivity (%) 100 0100100
Specificity (%) 99.39 1.3610096.97
AUC (%) 99.69 0.6810098.48
f-measure 0.9966
Kappa 0.9863

Wav_kernelACC (%)96.922.1510094.87
Sensitivity (%)98.463.4410092.31
Specificity (%)96.542.3910093.33
AUC (%)97.502.1810094.23
f-measure0.9793
Kappa0.9194

Lin_kernelACC (%)96.922.1510094.87
Sensitivity (%)90.438.8510081.82
Specificity (%)99.291.6010096.43
AUC (%)94.863.9910090.91
f-measure0.9798
Kappa0.9147

Poly_kernelACC (%)97.432.5610094.87
Sensitivity (%)96.677.4510083.33
Specificity (%)97.373.6110093.10
AUC (%)97.023.4210091.67
f-measure0.9828
Kappa0.9323
Table 6 also presented the comparison results of the confusion matrices obtained by SCFW-KELM and KELM. As seen from Table 6, SFCW-KELM correctly classified 194 normal cases out of 195 total normal cases and misclassified only one patient with PD as a healthy person, while KELM without SCFW method only correctly classified 187 normal cases out of 195 total normal cases and misclassified 6 patients with PD as healthy persons and 2 healthy persons as patients with PD.
Table 6

Confusion matrix of KELM with RBF kernel function in the original and weighted PD dataset.

MethodExpected outputPrediction output
KELMPatients with PD 1416
Healthy persons246

SCFW-KELMPatients with PD 1461
Healthy persons048
For SVM classifier, we have performed SVM classifier with RBF kernel. It is known that the performance of SVM is sensitive to the combination of the penalty parameter C and the kernel parameter γ. Thus, the best combination of (C, γ) needs to select in the classification tasks. Instead of manually setting the parameters (C, γ) of SVM, the grid-search technique [32] was adopted using 10-fold CV to find out the best parameter values. The range of the related parameters C and γ was varied between C = [2−15, 2−14,…, 211] and γ = [2−15, 2−14,…, 25]. The combinations of (C, γ) were tried and the one with the best classification accuracy was chosen as the parameter values of RBF kernel for training model. For original ELM, we know that the classification performance of ELM with sigmoid additive function is sensitive to the number of hidden neurons L, so value of L needs to be specified by users. Figure 6 presented the detailed results of ELM in the original and weighted PD dataset with different hidden neurons ranging from 1 to 50. Specifically, the average results of 10 runs of 10-fold CV for every specified neuron were recorded. As shown in Figure 6, the classification rates of ELM were improved with hidden neuron increasing at first and then gradually fluctuated. In the original dataset, it achieved highest mean classification accuracy with 40 hidden neurons, while in the weighted dataset with SCFW method, highest mean classification accuracy was gained with only 26 hidden neurons.
Figure 6

The effects of hidden neurons in original ELM in the classification of the original and weighted PD dataset.

For KNN classifier, the influence of neighborhood size k of KNN classifier in the classification performance of the PD dataset has been investigated. In this study, value of k increased from 1 to 10. The results obtained from KNN classifier with different values of k in the PD dataset are shown in Figure 7. From the figure, we can see that the best results have been obtained by 1-NN classifier, and the performance was decreased with the value of k increasing, while the better results were achieved in the weighted PD dataset with SCFW method for 2-NN.
Figure 7

The effects of k in KNN in the classification of the original and weighted PD dataset.

For KELM classifier, there were two parameters, the penalty parameter C and the kernel parameter γ, that need to be specified. In this study, we have conducted the experiments on KELM depending on the best combination of (C, γ) by grid-search strategy. The parameters C and γ were both varied in the range of [2−15, 2−14,…, 215] with the step size of 1. Figure 8 showed the classification accuracy surface in one run of 10-fold CV procedure, where x-axis and y-axis were log⁡2⁡C and log⁡2⁡γ, respectively. Each mesh node in the (x, y) plane of the classification accuracy represented a parameter combination and z-axis denoted the achieved test accuracy value with each parameter combination.
Figure 8

Test accuracy surface with parameters in KELM in the original and weighted PD dataset.

Table 7 summarized the comprehensive results achieved from four classifiers and those based on SCFW method in terms of ACC, sensitivity, specificity, AUC, f-measure, and KS value over 10 runs of 10-fold CV. Besides, the sum of computational time of training and that of testing in seconds was recorded. In this table, we can see that, with the aid of SCFW method, all these best results were much higher than the ones obtained in the original feature space. The SCFW-KELM model has achieved highest results of 99.49%, 100%, 99.39%, and 99.69% in terms of ACC, sensitivity, specificity, and AUC and got highest f-measure of 0.9966 and KS value of 0.9863, which outperforms the other three algorithms. Compared with KELM without SCFW, SCFW-KELM has improved the average performance by 3.6%, 3.65%, 3.67%, and 3.65% in terms of ACC, sensitivity, specificity, and AUC. Note that the running time of SCFW-KELM was extremely short, which costs only 0.0126 seconds.
Table 7

The results obtained from four algorithms in the original and weighted PD dataset.

MethodsPerformance metricsOriginal feature space without SCFW methodWeighted feature space with SCFW method
KELM-RBFACC (%)95.89 ± 4.6699.49 ± 1.15
Sensitivity (%)96.35 ± 5.19100 ± 0
Specificity (%)95.72 ± 5.9399.39 ± 1.36
AUC (%)96.04 ± 4.0699.69 ± 0.68
f-measure0.97240.9966
Kappa 0.89250.9863
Time (s)0.004350.0126

SVMACC (%)95.38 ± 1.1597.95 ± 2.15
Sensitivity (%)85.09 ± 10.4596.67 ± 7.45
Specificity (%)98.67 ± 2.9898.71 ± 1.77
AUC (%)91.88 ± 4.1497.69 ± 3.46
f-measure0.96990.9863
Kappa 0.87110.9447
Time (s)1.244861.29817

KNNACC (%)95.38 ± 5.2597.43 ± 3.14
Sensitivity (%)92.73 ± 11.8597.78 ± 4.97
Specificity (%)96.50 ± 4.3897.38 ± 4.10
AUC (%)94.61 ± 6.9597.58 ± 2.60
f-measure0.96920.9828
Kappa 0.87650.9431
Time (s)1.28471.3226

ELMACC (%)89.23 ± 6.8896.92 ± 4.21
Sensitivity (%)73.94 ± 13.1895.78 ± 5.79
Specificity (%)93.35 ± 6.2797.19 ± 4.51
AUC (%)83.64 ± 9.0696.48 ± 4.36
f-measure83.64 ± 9.060.9863
Kappa 0.70780.9447
Time (s)1.14371.2207
In comparison with SVM, SCFW-SVM has achieved the results of 97.95%, 96.67%, 98.71%, and 97.6% in terms of ACC, sensitivity, specificity, and AUC and improved the performance by 2.57%, 11.58%, 0.04%, and 5.72%, respectively. KNN also has significantly improved by SCFW method. For ELM classifier, it has achieved best results by ELM with 36 hidden neurons on the original feature space, while the best performance was achieved by SCFW-ELM with small hidden neurons (only 26). It meant that the combination of SCFW and ELM not only significantly improved the performance but also compacted the network structure of ELM. Moreover, the sensitive results of SVM and ELM were significantly improved by 11.58% and 21.84%, respectively. Whatever in the original or weighted feature space, KELM with RBF kernel was much superior to the other three models by a large percentage in terms of ACC, sensitivity, specificity, AUC, f-measure, and KS value. Although SVM achieved the specificity of 98.67%, the sensitivity, AUC, f-measure, and KS value were lower than those of KELM with RBF kernel. We can also see that the performance of KELM with RBF kernel was much higher than those of ELM with sigmoid function. The reason may lie in the fact that the relation between class labels and features in the PD dataset is linearly nonseparable; kernel-based strategy works better for this case by transforming from linearly nonseparable to linearly separable dataset. However, the performances obtained by SCFW-SVM approach were close to those of SCFW-KNN. It meant that, after data preprocessing, SVM can achieve the same ability to discriminate the PD dataset as that of KNN. Additionally, it is interesting to find that the standard deviation of SCFW-KELM was much lower than that of KELM, and it had the smallest SD in all of the models, which meant SCFW-KELM became more robust and reliable by means of SCFW method. In addition, the reason why SCFW method outperforms FCM is that SCFW may be more suitable for nonlinear separable datasets. It considers the density measure of data points to reduce the influence of outliers; however, FCM tends to select outliers as initial centers. For comparison purpose, the classification accuracies achieved by previous methods which researched the PD diagnosis problem were presented in Table 8. As shown in the table, our developed method can obtain better classification results than all available methods proposed in previous studies.
Table 8

Classification accuracies achieved with our method and other methods.

StudyMethodAccuracy (%)
Little et al. [6]Preselection filter + exhaustive search + SVM91.40 (bootstrap with 50 replicates)
Shahbaba and Neal [7]Dirichlet process mixtures87.70 (5-fold CV)
Das [8]ANN92.90 (hold out)
Sakar and Kursun [9]Mutual information + SVM92.75 (bootstrap with 50 replicates)
Psorakis et al. [10]Improved mRVMs89.47 (10-fold CV)
Guo et al. [11]GP-EM93.10 (10-fold CV)
Luukka [12]Fuzzy entropy measures + similarity85.03 (hold out)
Ozcift and Gulten [14]CFS-RF87.10 (10-fold CV)
Li et al. [13]Fuzzy-based nonlinear transformation + SVM93.47 (hold out)
Åström and Koker [15]Parallel NN91.20 (hold out)
Spadoto et al. [16]PSO + OPF Harmony search + OPF Gravitational search + OPF73.53 (hold out) 84.01 (hold out)  84.01 (hold out)
Daliri [19]SVM with chi-square distance kernel91.20 (50-50% training-testing)
Polat [17]FCMFW + KNN97.93 (50-50% training-testing)
Chen et al. [18]PCA-FKNN96.07 (average 10-fold CV)
Zuo et al. [20]PSO-FKNN97.47 (10-fold CV)
This studySCFW-KELM 99.49 (10-fold CV)

Experiment 2 (classification in two other benchmark datasets).

Besides the PD dataset, two benchmark datasets, that is, Cleveland Heart and Wisconsin Diagnostic Breast Cancer (WDBC) datasets, from the UCI machine learning repository, have been used to further evaluate the efficiency and effectiveness of the proposed method. We used the same flow as in the PD dataset for the experiments of two datasets. The weighted features space of datasets was constructed using SCFW and then the weighted features were evaluated with the four mentioned algorithms. It will only give the classification results of four algorithms for the sake of convenience. Table 9 showed the obtained results in the classification of the original and weighted Cleveland Heart dataset by SCFW-KELM model. Table 10 presented the achieved results in the classification of the original and weighted WDBC dataset using SCFW-KELM model. As seen from these results, the proposed method also has achieved excellent results. It indicated the generality of the proposed method.
Table 9

Results of SCFW-KELM with different types of kernel functions in Cleveland heart dataset.

Kernel typePerformance metricsMeanSDMaxMin
RBF_kernelACC (%) 99.34 0.9110098.33
Sensitivity (%) 100 0100100
Specificity (%) 98.75 1.7210096.67
AUC (%) 99.37 0.8610098.33
f-measure 0.9964
Kappa 0.9867

Wav_kernelACC (%)99.010.9010098.36
Sensitivity (%)1000100100
Specificity (%)97.842.0210095.83
AUC (%)98.921.0110097.92
f-measure0.9891
Kappa0.98

Lin_kernelACC (%)93.0793.0793.0793.07
Sensitivity (%)98.7798.7798.7798.77
Specificity (%)87.0587.0587.0587.05
AUC (%)92.9192.9192.9192.91
f-measure0.9195
Kappa0.8591

Poly_kernelACC (%)98.352.3310095.08
Sensitivity (%)1000100100
Specificity (%)96.605.0110088.89
AUC (%)98.302.5010094.44
f-measure0.9817
Kappa0.9667
Table 10

Results of SCFW-KELM with different types of kernel functions in WDBC dataset.

Kernel typePerformance metricsMeanSDMaxMin
RBF_kernelACC (%) 99.65 0.7910098.23
Sensitivity (%) 99.05 2.1310095.24
Specificity (%) 100 0100100
AUC (%) 99.52 1.0610097.62
f-measure 0.9972
Kappa 0.9925

Wav_kernelACC (%)99.650.4810099.12
Sensitivity (%)99.101.2410097.62
Specificity (%)1000100100
AUC (%)99.540.6610098.65
f-measure0.9958
Kappa0.9925

Lin_kernelACC (%)98.071.6910095.61
Sensitivity (%)94.705.2710086.11
Specificity (%)1000100100
AUC (%)97.352.6310093.06
f-measure0.9848
Kappa0.9582

Poly_kernelACC (%)99.400.8899.1297.37
Sensitivity (%)95.332.0797.7393.48
Specificity (%)1000100100
AUC (%)97.671.0498.8696.74
f-measure0.9944
Kappa0.962

6. Conclusions and Future Work

In this work, we have developed a new hybrid diagnosis method for addressing the PD problem. The main novelty of this paper lies in the proposed approach; the combination of SCFW method and KELM with different types of kernel functions allows the detection of PD in an efficient and fast manner. Experiments results have demonstrated that the proposed system performed significantly well in discriminating the patients with PD and healthy ones. Meanwhile, the comparative results are conducted among KELM, SVM, KNN, and ELM. The experiment results have shown that the SCFW-KELM method performs advantageously over the other three methods in terms of ACC, sensitivity, specificity, AUC, f-measure, and kappa statistic value. In addition, the proposed system outperforms the existing methods proposed in the literature. Based on the empirical analysis, it indicates that the proposed method can be used as a promising alternative tool in medical decision making for PD diagnosis. The future investigation will pay much attention to evaluating the proposed method in other medical diagnosis problems.
  13 in total

1.  Extreme learning machine for regression and multiclass classification.

Authors:  Guang-Bin Huang; Hongming Zhou; Xiaojian Ding; Rui Zhang
Journal:  IEEE Trans Syst Man Cybern B Cybern       Date:  2011-10-06

2.  Multiclass relevance vector machines: sparsity and accuracy.

Authors:  Ioannis Psorakis; Theodoros Damoulas; Mark A Girolami
Journal:  IEEE Trans Neural Netw       Date:  2010-08-30

3.  Phonatory impairment in Parkinson's disease: evidence from nonlinear dynamic analysis and perturbation analysis.

Authors:  Douglas A Rahn; Maggie Chou; Jack J Jiang; Yu Zhang
Journal:  J Voice       Date:  2005-12-27       Impact factor: 2.009

4.  Application of attribute weighting method based on clustering centers to discrimination of linearly non-separable medical datasets.

Authors:  Kemal Polat
Journal:  J Med Syst       Date:  2011-05-25       Impact factor: 4.460

5.  Speech impairment in a large sample of patients with Parkinson's disease.

Authors:  Aileen K. Ho; Robert Iansek; Caterina Marigliani; John L. Bradshaw; Sandra Gates
Journal:  Behav Neurol       Date:  1998       Impact factor: 3.342

6.  Telediagnosis of Parkinson's disease using measurements of dysphonia.

Authors:  C Okan Sakar; Olcay Kursun
Journal:  J Med Syst       Date:  2009-03-14       Impact factor: 4.460

7.  Parametric quantitative acoustic analysis of conversation produced by speakers with dysarthria and healthy speakers.

Authors:  Kristin M Rosen; Raymond D Kent; Amy L Delaney; Joseph R Duffy
Journal:  J Speech Lang Hear Res       Date:  2006-04       Impact factor: 2.297

8.  Improving Parkinson's disease identification through evolutionary-based feature selection.

Authors:  André A Spadoto; Rodrigo C Guido; Felipe L Carnevali; André F Pagnin; Alexandre X Falcão; João P Papa
Journal:  Conf Proc IEEE Eng Med Biol Soc       Date:  2011

9.  Suitability of dysphonia measurements for telemonitoring of Parkinson's disease.

Authors:  Max A Little; Patrick E McSharry; Eric J Hunter; Jennifer Spielman; Lorraine O Ramig
Journal:  IEEE Trans Biomed Eng       Date:  2009-04       Impact factor: 4.538

10.  Are men at greater risk for Parkinson's disease than women?

Authors:  G F Wooten; L J Currie; V E Bovbjerg; J K Lee; J Patrie
Journal:  J Neurol Neurosurg Psychiatry       Date:  2004-04       Impact factor: 10.154

View more
  8 in total

1.  Using a deep recurrent neural network with EEG signal to detect Parkinson's disease.

Authors:  Shixiao Xu; Zhihua Wang; Jutao Sun; Zhiqiang Zhang; Zhaoyun Wu; Tiezhao Yang; Gang Xue; Chuance Cheng
Journal:  Ann Transl Med       Date:  2020-07

2.  A decision support system to improve medical diagnosis using a combination of k-medoids clustering based attribute weighting and SVM.

Authors:  Musa Peker
Journal:  J Med Syst       Date:  2016-03-21       Impact factor: 4.460

3.  Lipidomics Prediction of Parkinson's Disease Severity: A Machine-Learning Analysis.

Authors:  Hila Avisar; Cristina Guardia-Laguarta; Estela Area-Gomez; Matthew Surface; Amanda K Chan; Roy N Alcalay; Boaz Lerner
Journal:  J Parkinsons Dis       Date:  2021       Impact factor: 5.568

4.  A Hybrid Classification System for Heart Disease Diagnosis Based on the RFRS Method.

Authors:  Xiao Liu; Xiaoli Wang; Qiang Su; Mo Zhang; Yanhong Zhu; Qiugen Wang; Qian Wang
Journal:  Comput Math Methods Med       Date:  2017-01-03       Impact factor: 2.238

5.  A new fruit fly optimization algorithm enhanced support vector machine for diagnosis of breast cancer based on high-level features.

Authors:  Hui Huang; Xi'an Feng; Suying Zhou; Jionghui Jiang; Huiling Chen; Yuping Li; Chengye Li
Journal:  BMC Bioinformatics       Date:  2019-06-10       Impact factor: 3.169

6.  Can a Smartphone Diagnose Parkinson Disease? A Deep Neural Network Method and Telediagnosis System Implementation.

Authors:  Y N Zhang
Journal:  Parkinsons Dis       Date:  2017-09-18

7.  Detecting Parkinson's disease from sustained phonation and speech signals.

Authors:  Evaldas Vaiciukynas; Antanas Verikas; Adas Gelzinis; Marija Bacauskiene
Journal:  PLoS One       Date:  2017-10-05       Impact factor: 3.240

8.  Identification of arthropathy and myopathy of the temporomandibular syndrome by biomechanical facial features.

Authors:  Bruno Coelho Calil; Danilo Vieira da Cunha; Marcus Fraga Vieira; Adriano de Oliveira Andrade; Daniel Antônio Furtado; Douglas Peres Bellomo Junior; Adriano Alves Pereira
Journal:  Biomed Eng Online       Date:  2020-04-15       Impact factor: 2.819

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.