Literature DB >> 24790571

New fuzzy support vector machine for the class imbalance problem in medical datasets classification.

Xiaoqing Gu1, Tongguang Ni1, Hongyuan Wang1.   

Abstract

In medical datasets classification, support vector machine (SVM) is considered to be one of the most successful methods. However, most of the real-world medical datasets usually contain some outliers/noise and data often have class imbalance problems. In this paper, a fuzzy support machine (FSVM) for the class imbalance problem (called FSVM-CIP) is presented, which can be seen as a modified class of FSVM by extending manifold regularization and assigning two misclassification costs for two classes. The proposed FSVM-CIP can be used to handle the class imbalance problem in the presence of outliers/noise, and enhance the locality maximum margin. Five real-world medical datasets, breast, heart, hepatitis, BUPA liver, and pima diabetes, from the UCI medical database are employed to illustrate the method presented in this paper. Experimental results on these datasets show the outperformed or comparable effectiveness of FSVM-CIP.

Entities:  

Mesh:

Year:  2014        PMID: 24790571      PMCID: PMC3982259          DOI: 10.1155/2014/536434

Source DB:  PubMed          Journal:  ScientificWorldJournal        ISSN: 1537-744X


1. Introduction

Computer techniques such as machine learning and pattern recognition have been widely adopted by modern medicine. One reason is that an enormous amount of data has to be gathered and analyzed which is very hard or even impossible without making use of computer techniques. The other reason is that computer techniques have led toward digital analysis of pathological diagnosis, automatic classification differentiating, and detecting diseases. In some cases, an early symptom of some diseases is lighter and gives no obvious pointer to a possible diagnosis; moreover, many symptoms look very similar to each other, though they are caused by different diseases. So it may be difficult even for experienced doctors to make correct diagnosis. Therefore, an automatic classification system can help doctor diagnose accurately, assess disorders remotely and evaluate the treatment process [1]. In recent years, researchers have proposed a lot of approaches for medicine classification, such as neural network, Bayesian network, and support vector machine (SVM). Among them SVM is considered to be one of the most successful ones [2]. For example, to improve time and accuracy in differentiating diffuse interstitial lung disease for computer-aided quantification, a hierarchical SVM is introduced which shows promise for various real-time and online image-based classification applications in clinical fields [3]. SVM as a classifier is used for liver disorders and its correct classification rate is highly successful compared to the other results attained [4]. A two-stage approach is proposed for medical datasets classification, in which the artificial bee colony algorithm is used for feature selection and SVM is used for classification [5]. The support vector machine (SVM) proposed by Vapnik [6, 7] is a novel approach for solving pattern recognition problems. SVM maps the sample points into a high-dimensional feature space to seek for an optimal separating hyperplane through maximizing the margin between two classes. In addition, SVM is a quadratic programming (QP) problem that assures that its solution is obtained once it is the global unique solution, and the sparsity of solution assures better generalization. However, most of the real-world medical datasets usually contain some outliers and noisy examples. The classical SVM is very sensitive to outliers/noise. To solve this problem, fuzzy support vector machine (FSVM) [8] is proposed, in which each sample is given a fuzzy membership that denotes the attitude of the corresponding point toward one class. The membership represents how important the sample is to the decision surface. Nevertheless, many medical datasets are composed of “normal” samples with only a small percentage of “abnormal” ones, which leads to the so-called class imbalance problems. FSM does not take into consideration the class distribution and can be sensitive to the class imbalance problem. As a result, the hyperplane of FSVM can be skewed towards the minority class, and this skewness can degrade the performance of FSVM with respect to the minority class. To tackle this problem, Veropoulos et al. [9] have proposed a method called different error costs (DEC), where the SVM objective function has been modified to assign two different misclassification cost values. It is noticed that One-Class Classification [10, 11] is sometimes used in novelty detection, and it only uses the normal training data. However, in many real medical datasets, abnormal examples exist, although they are very few. Furthermore, in classification tasks, the scatter matrix can play an important role when incorporated with local intrinsic geometry structures of samples [12]. Some methods have been recently proposed to incorporate the structure of the data distribution into SVM. A linear manifold learning method named locality preserving projection (LPP) is proposed in [13, 14], which aims at preserving the local manifold structure of the samples space. Although LPP considers enhancing the local data compactness with each manifold, it does not separate manifolds with different class labels. In this paper, we propose a new FSVM method for the class imbalance problem (FSVM-CIP) which can be used to address both the problem of class imbalance and outliers/noise. FSVM-CIP not only considers the fuzziness of each training sample but also extends manifold regularization and maximizes the localized relative margin. It takes the positive samples and negative samples into consideration with different misclassification costs according to their unbalanced distributions. We systematically evaluated the FSVM-CIP on five real-world medical datasets and compared its performance with four different SVM methods for classification. The results showed that the proposed method can improve the classification accuracy and handle the classification problems with outliers/noise and imbalanced datasets more effectively. The rest of this paper is organized as follows. Section 2 briefly reviews the related works. Section 3 presents the details of FSVM-CIP in the linear case. Section 4 presents FSVM-CIP in the nonlinear case in detail. The experimental results on five medical datasets are reported in Section 5, and some concluding remarks are given in Section 6.

2. Related Works

2.1. Fuzzy Support Vector Machines (FSVMs)

In traditional SVM, all the data points are considered with equal importance and assigned the same penal parameter in its objective function. However, in many real-world classification applications, some sample points, such as the outliers or noises, may not be exactly assigned to one of these two classes, and each sample point does not have the same meaning to the decision surface. To solve this problem, the theory of fuzzy support vector machine was originally proposed in [8]. Fuzzy membership to each sample point is introduced such that different sample points can make different contributions to the construction of decision surface. Suppose the training samples are where x ∈ R is the n-dimension sample point, y ∈ {−1, +1} represents its class label, and s (i = 1,…, N) is a fuzzy membership which satisfies σ ≤ s ≤ 1 with a sufficiently small constant σ > 0. The quadratic optimization problem for classification is considered as follows: where w is a normal vector of the separating hyperplane, b is a bias term, and C is a parameter which has to be determined beforehand to control the tradeoff between the classification margin and the cost of misclassification error. Since s is the attitude of the corresponding point x towards one class and the slack variables ξ are a measure of error, then the term s ξ can be considered a measure of error with different weights. It is noted that the bigger the s is, the more importantly the corresponding point is treated; the smaller the s is, the less importantly the corresponding point is treated; thus, different input points can make different contributions to the learning of decision surface. Therefore, FSVM can find a more robust hyperplane by maximizing the margin by letting some misclassification of less important points. In order to solve the FSM optimal problem, (2) is transformed into the following dual problem by introducing Lagrangian multipliers α : Compared with the standard SVM, the above statement only has a little difference, which is the upper bound of the values of α . By solving this dual problem in (3) for optimal α , w and b can be recovered in the same way as in the standard SVM.

2.2. Locality Preserving Projections (LPP)

Locality preserving projection (LPP) [13, 14] is a linear dimensionality reduction algorithm by feature extraction or projection. It builds an adjacency graph incorporating neighborhood information of the data set using the Laplacian graph and then computes a transformation matrix which maps the data points into a subspace. This linear transformation optimally preserves local neighborhood information in a certain sense. The representation map generated by this method can be viewed as a linear discrete approximation to a continuous map that naturally arises from the geometry of the manifold. For a set X = {x } (i ∈ [1, N]), let N (x ) denote k nearest neighbors of node i, and let G denote the adjacency graph of dataset X. Here, the ith node corresponds to the data point x and nodes i and j are connected by an edge if node i is among the k nearest neighbors of node j or if node j is among the k nearest neighbors of node i; that is, x ∈ N (x ) or x ∈ N (x ). The adjacency graph G can be weighed as follows: where exp⁡(−||x −x ||2/t) is called the heart kernel function and t is a constant. ||x − x || is the Euclidean distance in R between point i and point j. LPP tries to find the transformation vector w ∈ R by minimizing the following objective function: where D is a diagonal matrix whose entries are column sum of W and D = ∑ W normalizes each weight. L = D − W is the Laplacian matrix. The transformation vector w in the objective function in (5) is given by the minimum eigenvalue solution to the generalized eigenvalue problem. LPP preserves the intrinsic geometry and local structure of the data by minimizing the objective function.

3. FSVM for the Class Imbalance Problem in the Linear Case

In this section, we first define the local within-class preserving scatter matrix in the linear case. Secondly, the optimization problem formulation of FSVM-CIP in the linear case is given. Moreover, the fuzzy membership functions for linear FSVM-CIP are defined. Finally, the algorithm of linear FSVM-CIP is summarized.

3.1. The Local within-Class Preserving Scatter Matrix in the Linear Case

Following the idea of [15], we build the nearest within-class neighbor graph to model intrinsic geometry and local structure of the data. The graph preserves local neighborhood information in a certain sense and it can be viewed as a linear discrete approximation to a continuous map that naturally arises from the geometry of the manifold. Considering the fact that we have a binary classification problem, one class denoted as C 1 contains sample points x with y = 1 and the other class denoted as C 2 contains sample points x with y = −1. Set |C 1 | = m 1 and |C 2 | = N − m 1, and the total number of sample points is N.

Definition 1

For each data x , suppose its k nearest within-class neighbors set N (x ) and an edge is put between x and its neighbors. The corresponding weight matrix W is where D = ∑ W normalizes each weight.

Definition 2

The local within-class preserving scatter matrix where I ( is an N × N diagonal matrix. In this case, the obtained nearest within-class neighbor graph attempts to preserve the local structure of the data set and (I ( − W ()(I ( − W () preserves locality of nearby points with same class label in the embedding space during the unfolding process of nonlinear structures [15]. In fact, a heavy penalty is applied to the objective function through the weight W if the neighboring data x and x are mapped far apart. Hence, the minimization criterion is an attempt to ensure points y and y close to each other as well as x and x being close. It is worthwhile to note that the local within-class scatter matrix S is symmetric and positive semidefinite. S looks similar to the within-class scatter matrix S [16, 17] and the Laplacian matrix L in LPP. However, S reflects the intrinsic geometry and local structure of the data, and S only considers the mean value of samples in different classes. S carries the class label information and discriminating information but L only considers the information of nearest neighbors for each data point in the input space, without considering the class labels.

3.2. FSVM-CIP in the Linear Case

To tackle the imbalance classification problem with noise and outliers, we integrate FSVM, the ideas of imbalance classification problem, and the local within-class preserving scatter. On one hand, as shown in Figure 1, the linear classifier presented by the hyperplane is (w x + b = 0) and defines a field for majority-class examples (w x + b > 1 − ξ) and another field for minority-class examples (w x + b > −(1 + ρ − ξ)) which is used to weaken the skewness towards the minority class and enhance the locality maximum margin. On the other hand, by assigning a higher misclassification cost for the minority class examples than the majority class examples, the effect of class imbalance could be reduced. In addition, to minimize the amount of misclassifications, the local within-class scatter matrix S is used to preserve intrinsic geometry and local structure of the data.
Figure 1

The hyperplanes of linear FSVM-CIP.

Due to this, we define the primal problem of FSVM-CIP as follows: where m 1, m 2 denote the number of positive (normal class or majority class) and negative (abnormal class or minority class) training points, and m 2 = N − m 1. ρ is a nonnegative number, and ρ + 1 is the margin between the hyperplane and the minority class examples. η is a nonnegative regulation constant which is the tradeoff between the local within-class scatter and the margin. Variables v 1, v 2 are positive penalty parameters, which tune penalty cost of the training error for positive and negative training data, respectively. ξ , ξ ≥ 0 are the slack variables, and μ , μ are fuzzy memberships for two-class examples. Obviously, w S w provides prior geometrical information into the penalty terms based on manifold regularization. Minimizing w S w means that close data originally in the same class in the input space are likely to be close in the output place. Therefore, w S w aims to preserve the local information of the manifold structure. It is noted that, in FSVM-CIP, we assign different fuzzy membership values for training examples to reflect their different classes of importance. We also showed that it is similar to assign different misclassification costs μ /v 1 m 1(μ /v 2 m 2) for different training examples. In order to reduce the effect of class imbalance, we can assign higher membership values μ or lower parameter v 2 for the minority class examples, while we assign lower membership values μ or higher v 1 for the majority class. That is, our proposed method would not tend to skew the separating hyperplane towards the minority class examples as the minority class examples are now assigned with a higher misclassification cost. By means of setting μ /v 1 m 1(μ /v 2 m 2) and extending manifold regularization, the learned optimal separating hyperplane enhances the relative maximum margin and FSVM-CIP will be less sensitive to imbalanced class problems. Then, we transform this problem into its corresponding dual problem as follows. The primal Lagrangian is with Lagrangian multipliers α ≥ 0, γ ≥ 0, and s ≥ 0. The derivatives of L(w, b, ρ, ξ, α, γ, s) with respect to the primal variables using the Karush-Kuhn-Tucker (KKT) conditions should vanish. Consider where I is an N-dimensional vector of ones, and I = [1,…,1]. We have w = (I + η S )−1∑ α y x . Substituting (10)–(14) into (9), we obtain the dual form of the optimization problem: where H is a matrix with entry H = y y x (I + η S )−1 x , and vectors α = [α 1,…,α ]. Equation (15) is a typical convex quadratic programming problem which is easy to be numerically solved. Suppose α* = [α 1*,…, α *] can be used to solve the above optimization problem, and then the optimal weight vector is Denote a training sample x (1 ≤ i ≤ N) called a support vector (SV) if the corresponding Lagrange multiplier α > 0. Denote the SV sets as SV1 = {x | 0 < α ≤ μ /v 1 m 1, 1 ≤ i ≤ m 1} and SV2 = {x | 0 < α ≤ μ /v 2 m 2, 1 + m 1 ≤ j ≤ N} while s + and s − denote the number of SVs in SV1 and SV2, respectively. According to KKT condition, (15) becomes equations for the input data in SV1 and SV2, respectively, with slack variables ξ and ξ being 0. Thus, the optimal thresholds b* and ρ* can be calculated. However, from the numerical perspective, it is better to take the mean value of b* and ρ* resulting from all such data. Therefore, the optimal thresholds b* and ρ* are computed by the following formula: As a result, the corresponding decision function of the linear FSVM-CIP will be Note that, to deal with the small sample size problem, (I + η S ) is regularized by adding a scale multiple η of the identity matrix S with I before any inversion takes place. Hence, (I + η S ) is always nonsingular, and the inverse of (I + η S ) exists. Following the terminology in [18], a training sample x (1 ≤ i ≤ N) is called a margin error (ME) if the corresponding slack variable ξ > 0. We give the following theorem for parameter selection later.

Theorem 3

Let m + and m − denote the number of MEs in the positive and negative classes; s + and s − denote the number of SVs in the positive and negative classes, respectively. Then one has where and denote the mean fuzzy membership of MEs in the positive and negative classes; and denote the mean fuzzy membership of SVs in the positive and negative classes, respectively. A proof of the above theorem can be found in Appendix.

3.3. Fuzzy Membership Functions in the Linear Case

In FSVM, the fuzzy membership is used to reduce the effects of outliers or noises and different fuzzy membership functions have different influences on the fuzzy algorithm. Basically, the rule to assign proper membership values to data points can depend on the relative importance of date points to their own classes. In this paper, we consider two fuzzy membership functions given in [19]. Given the sequence of training points, denote the mean of positive class and negative class as and .

Definition 4

The μ is called the linear fuzzy membership and μ can be defined as where δ is a small positive value, which is used to avoid μ becoming zero. ||·|| is the Euclidean distance.

Definition 5

The μ exp⁡ is called the exponential fuzzy membership and μ exp⁡ can be defined as where parameter λ ∈ [0,1] determines the steepness of the decay.

3.4. Solution

Based on the above, we can state the approach of proposed FSVM-CIP in the linear case as Algorithm 1.
Algorithm 1

FSVM-CIP in the linear case.

4. FSVM for the Class Imbalance Problem in the Nonlinear Case

In this section, we extend the local within-class preserving scatter matrix and FSVM-CIP into feature space. Moreover, the fuzzy membership functions in feature space are defined. Finally, the algorithm of kernel FSVM-CIP is summarized.

4.1. Kernel Extension

In order to handle nonlinear classification, the kernelization trick [20] is used to map the n-dimensional date points into an arbitrary reproducing kernel Hilbert space (RKHS) [21] via a mapping function ϕ : R ↦ H; that is, x ↦ ϕ(x ). Then a linear hyperplane f(v) = α ϕ(v) + b in feature space H would correspond to a nonlinear hyperplane in the original space R where α, ϕ(v) ∈ H, v ∈ R , and b ∈ R. Let ϕ(X) denote the date matrices in feature space H, ϕ(X) = [ϕ(x 1), ϕ(x 2),…, ϕ(x )]; then the kernel function K is a matrix with entry K = K(x , x ) = ϕ(x ) ϕ(x ). Here the kernel local within-class scatter matrix S in feature space is where I (1), I (2) are N 1-order, N 2-order identity matrixes, respectively. Based on the above notations, K (1), K (2) are N × m 1,  N × (N − m 1) matrixes, respectively; thus K = [K (1), K (2)]. The weight matrixes W and W are the nonlinear version of W (1)  and W (2), respectively. W and W could be built by W , and the nonlinear version of W is where D = ∑ W is a normalizer. Thus, the kernel FSVM-CIP can be easily achieved by solving the following quadratic problem: Like its linear counterpart, the solution to this optimization problem can be easily found using Lagrange multipliers. By using the representer theorem, w can be given by w = ∑ β ϕ(x ). We obtain the dual form of the optimization problem: where M = Y K Q −1 K Y and Q = K + η K (1)(I (1) − W (1))(I (1) − W (1))K (1) + η K (2)(I (2) − W (2))(I (2) − W (2))K (2). Vectors α = [α 1,…, α ], and Y = diag⁡(y 1, y 2,…, y ) is a diagonal matrix. Equation (27) is a typical convex quadratic programming problem which is easy to be numerically solved. Suppose α* = [α 1*,…, α *] can be used to solve the above optimization problem; then the optimal weight vector β* = Q −1 K Y α*. Therefore, the optimal thresholds b* and ρ* are computed by the following formula: Finally, a more robust decision function of kernel FSVM-CIP will be

Theorem 6

The matrix M in (27) is symmetric and positive semidefinite. A proof of the above theorem can be found in Appendix. Next, we consider fuzzy membership functions in feature space.

Definition 7

The μ is called the linear fuzzy membership in feature space and μ can be defined as where δ is a small positive value. ||·|| is the Euclidean distance.

Definition 8

The μ exp⁡ is called the exponential fuzzy membership in feature space and μ exp⁡ can be defined as where parameter λ ∈ [0,1] determines the steepness of the decay. Consider Thus, the distance can be given by Likewise, the can be given in a similar manner.

4.2. Solution

Based on the above, we can state the approach of kernel FSVM-CIP as Algorithm 2.
Algorithm 2

Kernel FSVM-CIP.

5. Experiments and Discussions

To evaluate the performance of our proposed FSVM-CIP, in this section, FSVM-CIP is evaluated compared with other related representative methods, such as standard FSVM [8], SVDD [11], FSVM for class imbalance learning (FSVM-CIL) [22], and FSVM with minimum within-class scatter (WCS-FSVM) [23]. We implement FSVM-CIP using the linear fuzzy membership and the exponential fuzzy membership, respectively, which are represented as FSVM-CIP and FSVM-CIPexp⁡. All the experiments are performed in Matlab (R2010a) on personal computer, whose configuration is as follows: CPU 2.99 GHz, 4.0 G RAM, and Microsoft Windows XP.

5.1. Data Preparation

In this section, we use five real-world medical datasets from the UCI repository of machine learning database [24], to demonstrate the classification performance of the method proposed in this paper. These five medical datasets are breast, heart, hepatitis, BUPA liver, and pima diabetes. It is highly likely that these real-world datasets contain some outliers and noisy examples in different amounts [22]. In each of them, the positive class consists of the data corresponding to the healthy, normal, or benign cases, while the negative class contains the data for diseased, abnormal, or malignant cases. Further details of these datasets are provided in Table 1. This contains the total number of positive data #pos, the total number of negative data #neg, the number of positive training examples m1, the number of negative training examples m2, the positive-to-negative imbalance ratio Ratio, and the data dimensionality d.
Table 1

Characteristics of the selected datasets.

Datasets#pos#neg m1 m2Ratio d
Breast4582412401202 : 19
Heart12015080204 : 113
Hepatitis123321001010 : 119
BUPA liver2001451501015 : 16
Pima diabetes2685001801018 : 18

5.2. Performance Measure and Experimental Settings

We used the geometric mean of sensitivity (sensitivity = proportion of the positives correctly recognized), specificity (specificity = proportion of the negatives correctly recognized), and accuracy (accuracy = proportion of correctly classified instances) for the classifier performance evaluation in experiments, as commonly used in medical datasets classification research [7]. Like the existing SVM and FSVM algorithms, the solution is sensitive to the setting of the parameters. In order to evaluate the performance, a strategy is that a set of the parameters is given first and then the best cross-validation mean rate among the set is used to estimate the generalized accuracy. We adopt this strategy in this paper. For FSVM-CIP, the parameter ν is searched in {1, 5, 10, 15, …, 80}, while v 1 and v 2 are selected from {0.001, 0.005, 0.01, 0.05}. η is selected from log⁡2 η ∈ {−5, − 4.5, − 4, …, 5.5, 6}. The heat kernel parameter t is searched in {0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0} and the neighborhood parameter k is searched in {3, 5, 7, 9, 11, 13, 15}. In addition, when the linear fuzzy function is used, we set δ = 10−6. When the exponential fuzzy function is used, the optimal value of λ is chosen from the range λ = {0.1, 0.2, 0.3, …, 1}. The regularization parameter C for FSVM, SVDD, FSVM-CIL, and WCS-FSVM is selected from the set {0.001, 0.01, 0.1, 1, 10, 100}. In WCS-FSVM, β is selected from log⁡2 β ∈ {−5, − 4.5, − 4, …, 5.5, 6}. For FSVM-CIL, the fuzzy membership is based on the distance from the actual hyperplane and uses the exponential fuzzy membership λ. λ is chosen from the range λ = {0.1, 0.2, 0.3, …, 1}. For the kernel-based methods, we use a Gaussian RBF kernel, that is, exp⁡(−(u − v)(u − v)/σ), where σ is the spread of Gaussian kernel, and σ is searched in {τ 2/16, τ 2/8, τ 2/4, τ 2/2, τ 2, 2τ 2, 4τ 2, 8τ 2, 16τ 2}, where τ 2 is the mean norm of the training data. For parameter selection, we conduct fivefold cross-validation in a stratified manner so that each validation set has the same positive to negative ratio as in the training set. Finally, the experiment is repeated 10 times independently of each dataset.

5.3. Experimental Results

FSVM-CIP method test results developed for the breast, heart, hepatitis, BUPA liver, and pima diabetes datasets are given both in the linear case and nonlinear case. Tables 2, 3, 4, 5, and 6 display the comparison results with the other methods on these five databases, respectively.
Table 2

Comparison of the classification results (%) on breast dataset.

Method SensitivitySpecificityAccuracy
Linear FSVM95.87 ± 0.01795.04 ± 0.04395.58 ± 0.035
 SVDD97.71 ± 0.06590.90 ± 0.01395.28 ± 0.052
 FSVM-CIL95.87 ± 0.02495.87 ± 0.01595.81 ± 0.028
 WCS-FSVM96.33 ± 0.06795.04 ± 0.05695.87 ± 0.047
 FSVM-CIPlin 96.98 ± 0.03996.49 ± 0.02296.76 ± 0.040
 FSVM-CIPexp 96.68 ± 0.01196.69 ± 0.04296.76 ± 0.037

Gaussian kernel FSVM96.33 ± 0.02395.87 ± 0.05196.17 ± 0.050
 SVDD97.30 ± 0.06591.25 ± 0.01395.44 ± 0.052
 FSVM-CIL96.79 ± 0.05995.87 ± 0.04296.46 ± 0.055
 WCS-FSVM96.97 ± 0.03096.69 ± 0.09396.76 ± 0.067
 FSVM-CIPlin 97.25 ± 0.05596.29 ± 0.03297.05 ± 0.042
 FSVM-CIPexp 97.25 ± 0.05597.52 ± 0.04597.34 ± 0.033
Table 3

Comparison of the classification results (%) on heart dataset.

Method SensitivitySpecificityAccuracy
Linear FSVM87.50 ± 0.08080.77 ± 0.06982.35 ± 0.069
 SVDD87.03 ± 0.02177.69 ± 0.00580.00 ± 0.051
 FSVM-CIL85.00 ± 0.04682.04 ± 0.11082.35 ± 0.072
 WCS-FSVM87.30 ± 0.07181.54 ± 0.08982.94 ± 0.088
 FSVM-CIPlin 85.00 ± 0.06382.31 ± 0.08382.84 ± 0.054
 FSVM-CIPexp 87.50 ± 0.02582.31 ± 0.08383.53 ± 0.055

Gaussian kernel FSVM86.70 ± 0.09982.61 ± 0.08783.35 ± 0.042
 SVDD90.35 ± 0.02280.77 ± 0.03482.80 ± 0.070
 FSVM-CIL87.05 ± 0.03481.54 ± 0.06782.94 ± 0.044
 WCS-FSVM91.00 ± 0.07681.73 ± 0.08384.12 ± 0.085
 FSVM-CIPlin 90.00 ± 0.04582.31 ± 0.08684.12 ± 0.052
 FSVM-CIPexp 86.05 ± 0.02383.08 ± 0.07884.71 ± 0.066
Table 4

Comparison of the classification results (%) on hepatitis dataset.

Method SensitivitySpecificityAccuracy
Linear FSVM82.60 ± 0.05322.73 ± 0.08753.33 ± 0.073
 SVDD73.91 ± 0.07145.45 ± 0.01160.00 ± 0.046
 FSVM-CIL77.66 ± 0.02645.46 ± 0.08261.02 ± 0.070
 WCS-FSVM79.56 ± 0.10727.27 ± 0.06253.33 ± 0.059
 FSVM-CIPlin 78.26 ± 0.04645.46 ± 0.03262.22 ± 0.023
 FSVM-CIPexp 78.26 ± 0.06850.00 ± 0.08664.44 ± 0.071

Gaussian kernel FSVM73.91 ± 0.03831.82 ± 0.01253.33 ± 0.025
 SVDD82.60 ± 0.05342.86 ± 0.02563.64 ± 0.030
 FSVM-CIL77.26 ± 0.04150.00 ± 0.08663.84 ± 0.064
 WCS-FSVM78.26 ± 0.01536.36 ± 0.07457.78 ± 0.056
 FSVM-CIPlin 73.51 ± 0.06454.55 ± 0.03764.44 ± 0.058
 FSVM-CIPexp 73.91 ± 0.05059.10 ± 0.01166.67 ± 0.036
Table 5

Comparison of the classification results (%) on BUPA liver dataset.

Method SensitivitySpecificityAccuracy
Linear FSVM88.10 ± 0.00866.42 ± 0.07372.19 ± 0.057
 SVDD87.27 ± 0.02168.05 ± 0.06372.72 ± 0.042
 FSVM-CIL88.00 ± 0.00467.44 ± 0.04273.19 ± 0.015
 WCS-FSVM84.00 ± 0.36067.15 ± 0.06871.66 ± 0.051
 FSVM-CIPlin 88.00 ± 0.00467.88 ± 0.06373.26 ± 0.031
 FSVM-CIPexp 86.00 ± 0.04869.34 ± 0.07273.80 ± 0.054

Gaussian kernel FSVM96.00 ± 0.05766.67 ± 0.02674.60 ± 0.038
 SVDD95.43 ± 0.03371.24 ± 0.05077.23 ± 0.017
 FSVM-CIL95.00 ± 0.04572.59 ± 0.05278.37 ± 0.050
 WCS-FSVM90.08 ± 0.07067.44 ± 0.08373.73 ± 0.062
 FSVM-CIPlin 94.00 ± 0.04974.10 ± 0.04579.46 ± 0.048
 FSVM-CIPexp 94.00 ± 0.04973.33 ± 0.08479.92 ± 0.074
Table 6

Comparison of the classification results (%) on pima diabetes dataset.

Method SensitivitySpecificityAccuracy
Linear FSVM91.91 ± 0.02249.98 ± 0.05355.36 ± 0.051
 SVDD88.65 ± 0.08153.43 ± 0.06258.45 ± 0.029
 FSVM-CIL86.36 ± 0.06455.10 ± 0.05959.86 ± 0.060
 WCS-FSVM87.50 ± 0.04352.65 ± 0.02457.96 ± 0.030
 FSVM-CIPlin 85.23 ± 0.02157.76 ± 0.06461.94 ± 0.043
 FSVM-CIPexp 84.09 ± 0.00957.96 ± 0.06261.94 ± 0.053

Gaussian kernel FSVM93.18 ± 0.03151.02 ± 0.07357.44 ± 0.053
 SVDD91.76 ± 0.02556.86 ± 0.05262.57 ± 0.028
 FSVM-CIL90.91 ± 0.04758.78 ± 0.08463.67 ± 0.077
 WCS-FSVM92.05 ± 0.01054.69 ± 0.06660.38 ± 0.053
 FSVM-CIPlin 88.84 ± 0.04061.38 ± 0.06365.57 ± 0.063
 FSVM-CIPexp 88.64 ± 0.02961.43 ± 0.07465.57 ± 0.070
The main observations from the performance comparisons include the following. (1) We can see that, in many real-world applications, a linear classifier seems powerless. In terms of accuracy, kernel method can improve the classification performance for all five medical datasets. (2) We can clearly observe that the FSVM-CIP outperforms other methods on almost datasets both in the linear case and nonlinear case, which gives higher accuracy. This fortifies the fact that the locality maximum margin and the local structure information presented by local within-class preserving scatter could improve classification performance; furthermore, the method of different misclassification costs based on the number of two classes is a sensitive learning solution to overcome the imbalance problem in SVMs. (3) It is noted that, for all the datasets considered, the classification accuracy given by the FSVM-CIPexp⁡ setting is higher than the FSVM-CIP setting. Therefore, we can state that FSVM-CIPexp⁡ setting with the appropriate selection of λ value would be an effective choice applied to any medical dataset. In other words, when dealing with medical datasets classification, the performance of the exponential fuzzy membership is better than linear fuzzy membership in FSVM-CIP. (4) For breast and heart datasets, the class imbalance is not obviously shaped; WCS-FSVM yielded standard FSVM, SVDD, and FSVM-CIL. We can say that the performance can indeed be improved when the structure of the data is taken into consideration. For the other three datasets, the class imbalance strikingly improved, the results given by standard FSVM and WCS-FSVM for datasets are biased towards the majority class represented as lower specificity and lower accuracy. These results justify the fact that these two methods are sensitive to the class imbalance problem. Meanwhile, SVDD and FSVM-CIL yielded standard FSVM and WCS-FSVM. BY assigning different misclassification costs for the minority class and majority class, the effect of class imbalance could be reduced.

5.4. Parameter Selection for Kernel FSVM-CIPexp⁡

The parameter η > 0 is an essential parameter in our proposed method which controls the tradeoff between the local within-class scatter and the margin. Figure 2 shows the impact of parameter η on the classification accuracy of FSVM-CIPexp⁡ in kernel case with each value of η selected from log⁡2 η ∈ {−5, − 4.5, − 4, …, 5.5, 6}. It can be seen that the best accuracy is obtained for all the datasets and therefore η is searched in a reasonable range.
Figure 2

The effect of the parameter η on kernel FSVM-CIPexp⁡.

Compared with standard FSVM, the additional neighbor parameter k is employed in FSVM-CIP. To evaluate the influence of this parameter on the performance, the classification accuracy of kernel FSVM-CIPexp⁡ for five medical databases is recorded for each value of k in {3, 5, 7, 9, 11, 13, 15}. Figure 3 shows the results. It can be seen that the classification accuracy is not high when k value is small and, by increasing k, the classification accuracy increases; however, if k continues to increase, the classification accuracy begins to drop severely down. It is because, when k is too small, the number of nearest neighbors is sparse; when k is too large, the number of nearest neighbors is excessive, so to preserve so much local relation may be inappropriate.
Figure 3

The effect of the parameter k on kernel FSVM-CIPexp⁡.

6. Conclusion

Computer tools have improved the medical practice implementation to a greater extent. Although computer tools cannot replace the doctors, they can make their work easier and more effective. In this paper, a new fuzzy support machine called FSVM-CIP, used for medical datasets classification, is proposed. The proposed method is based on local within-class preserving scatter and assigned two misclassification costs in the SVM objective function, which is for learning from imbalance datasets in the presence of outliers/noise and enhancing the locality maximum margin. Experiments were performed on several UCI medical datasets with a comparison of the proposed method with several other related methods such as standard FSVM, SVDD, FSVM-CIL, and WCS-FSVM. Obtained results show that the performance of the proposed method is highly successful compared to other results attained and seems very promising. Finally, we can recommend that FSVM-CIPexp⁡ which uses the exponential fuzzy membership would be an effective choice for medical datasets classification applications. In future work, we intend to perform investigations to large-scale classification problems.
  11 in total

1.  Face recognition using laplacianfaces.

Authors:  P Niyogi
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2005-03       Impact factor: 6.226

2.  Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction technique.

Authors:  Effrosyni Kokiopoulou; Yousef Saad
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2007-12       Impact factor: 6.226

3.  An introduction to kernel-based learning algorithms.

Authors:  K R Müller; S Mika; G Rätsch; K Tsuda; B Schölkopf
Journal:  IEEE Trans Neural Netw       Date:  2001

4.  Fuzzy support vector machines.

Authors:  Chun-Fu Lin; Sheng-De Wang
Journal:  IEEE Trans Neural Netw       Date:  2002

5.  Novel multiclass classifiers based on the minimization of the within-class variance.

Authors:  Irene Kotsia; Stefanos Zafeiriou; Ioannis Pitas
Journal:  IEEE Trans Neural Netw       Date:  2008-12-09

6.  A small sphere and large margin approach for novelty detection using training data with outliers.

Authors:  Mingrui Wu; Jieping Ye
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2009-11       Impact factor: 6.226

7.  Minimum class variance support vector machines.

Authors:  Stefanos Zafeiriou; Anastasios Tefas; Ioannis Pitas
Journal:  IEEE Trans Image Process       Date:  2007-10       Impact factor: 10.856

8.  A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets.

Authors:  Der-Chiang Li; Chiao-Wen Liu; Susan C Hu
Journal:  Artif Intell Med       Date:  2011-04-13       Impact factor: 5.326

9.  Fast and efficient lung disease classification using hierarchical one-against-all support vector machine and cost-sensitive feature selection.

Authors:  Youngjoo Lee; Yongjun Chang; Namkug Kim; Jonghyuck Lim; Joon Beom Seo; Young Kyung Lee
Journal:  Comput Biol Med       Date:  2012-11-14       Impact factor: 4.589

10.  Feature selection method based on artificial bee colony algorithm and support vector machines for medical datasets classification.

Authors:  Mustafa Serter Uzer; Nihat Yilmaz; Onur Inan
Journal:  ScientificWorldJournal       Date:  2013-07-28
View more
  3 in total

1.  Faster title and abstract screening? Evaluating Abstrackr, a semi-automated online screening program for systematic reviewers.

Authors:  John Rathbone; Tammy Hoffmann; Paul Glasziou
Journal:  Syst Rev       Date:  2015-06-15

2.  Solving the class imbalance problem using ensemble algorithm: application of screening for aortic dissection.

Authors:  Lijue Liu; Xiaoyu Wu; Shihao Li; Yi Li; Shiyang Tan; Yongping Bai
Journal:  BMC Med Inform Decis Mak       Date:  2022-03-28       Impact factor: 2.796

3.  Classification and Detection of Mesothelioma Cancer Using Feature Selection-Enabled Machine Learning Technique.

Authors:  M Shobana; V R Balasraswathi; R Radhika; Ahmed Kareem Oleiwi; Sushovan Chaudhury; Ajay S Ladkat; Mohd Naved; Abdul Wahab Rahmani
Journal:  Biomed Res Int       Date:  2022-07-27       Impact factor: 3.246

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.