Literature DB >> 24288574

Determination of fetal state from cardiotocogram using LS-SVM with particle swarm optimization and binary decision tree.

Abstract

We use least squares support vector machine (LS-SVM) utilizing a binary decision tree for classification of cardiotocogram to determine the fetal state. The parameters of LS-SVM are optimized by particle swarm optimization. The robustness of the method is examined by running 10-fold cross-validation. The performance of the method is evaluated in terms of overall classification accuracy. Additionally, receiver operation characteristic analysis and cobweb representation are presented in order to analyze and visualize the performance of the method. Experimental results demonstrate that the proposed method achieves a remarkable classification accuracy rate of 91.62%.

Entities: Chemical Disease Species

Mesh：

Year: 2013 PMID： 24288574 PMCID： PMC3830816 DOI： 10.1155/2013/487179

Source DB: PubMed Journal: Comput Math Methods Med ISSN： 1748-670X Impact factor: 2.238

1. Introduction

There is a growing tendency to use clinical decision support systems in medical diagnosis. These systems help to optimize medical decisions, improve medical treatments, and reduce financial costs [1, 2]. A large number of the medical diagnosis procedures can be converted into intelligent data classification tasks. These classification tasks can be categorized as two-class task and multiclass task. The first type separates the data between only two classes while the second type involves the classification of the data with more than two classes [3]. Cardiotocography was introduced into obstetrics practice in the early 1970s, and since then it has been used as a worldwide method for antepartum (before delivery) and intrapartum (during delivery) fetal monitoring. Cardiotocogram (CTG) is a recording of two distinct signals, fetal heart rate (FHR), and uterine activity (UA) [4]. It is used for determining the fetal state during both pregnancy and delivery. The aim of the CTG monitoring is to determine babies who may be short of oxygen (hypoxic); thus further assessments of fetal condition may be performed or the baby might be delivered by caesarean section or natural birth [5]. The visual evaluation of the CTG not only requires time but also depends on the knowledge and clinical experience of obstetricians. A clinical decision support system eliminates the inconsistency of visual evaluation. There have been proposed several classification tools for developing such system [4, 6–10]. One of these tools is support vector machine (SVM) and it is used in [4, 8, 10]. In [4, 8], SVM is used for FHR signal classification with two classes, normal or at risk. The risk of metabolic acidosis for newborn based on FHR signal is predicted in [4] while the classification of antepartum FHR signal is made in [8]. In [10], a medical decision support system based on SVM and genetic algorithm (GA) is presented for the evaluation of fetal well-being from the CTG recordings as normal or pathologic. In [6], an approach based on hidden Markov models (HMM) is presented for automatic classification of FHR signal belonging to hypoxic and normal newborns. In [7], an ANBLIR (Artificial Neural Network Based on Logical Interpretation of fuzzy if-then Rules) system is used to evaluate the risk of low-fetal birth weight as normal or abnormal using CTG signals recorded during the pregnancy. In [9], an adaptive neurofuzzy inference system (ANFIS) is proposed for the prediction of fetal state from the CTG recordings as normal or pathologic. Support vector machines (SVM) is developed for two-class task, but classification problems generally require multi-class task. There are several methods proposed in the literature based on binary decision tree (BDT) to extend the binary SVMs to multi-class problems, for example, [11, 12]. LS-SVM is a modified version of SVM in a least square sense [13]. The higher computational load of SVM is overcome by LS-SVM because LS-SVM solves the problem using a set of linear equations while SVM solves as a quadratic programming problem. The choice of appropriate kernel function and the model parameters (including kernel parameters) is crucial for SVM-based methods, and this influences directly the classification performance. The most common kernel functions used in the literature are polynomial, Gaussian radial basis, exponential radial basis, and sigmoid. Performance evaluation of classifiers is a fundamental step for determining the best classifier or the best set of parameters for a classifier [14]. In general, the overall classification accuracy is a natural way to measure the performance of the classifiers. The classifier predicts the class for each data point in the data set; if the prediction is correct it is counted as a success and if it is wrong it is counted as an error. The overall classification accuracy is computed as the ratio of the number of successes over the number of the whole data points to be classified. For many classification problems, especially in the medical diagnosis, the overall classification accuracy is not adequate alone because in general not all errors have the same consequences. Wrong diagnoses can cause different cost and dangers depending on which kind of mistakes have been done [15]. Therefore, for such situations, in addition to overall classification accuracy receiver operation characteristic (ROC) analysis is usually performed [16]. In this paper, we use LS-SVM utilizing a BDT for classification of the CTG data to determine the fetal state as normal, suspect, or pathologic. Gaussian radial basis function is chosen as the kernel of LS-SVM, and the model parameters, which are the penalty factor and the width of Gaussian kernel, are optimized by using particle swarm optimization (PSO). The robustness of the proposed method LS-SVM-PSO-BDT is examined with 10-fold cross-validation (10-fold CV) on the CTG data set taken from UCI machine learning repository. The performance of the method is evaluated in terms of overall classification accuracy. Additionally, ROC analysis and cobweb representation are presented in order to analyze and visualize the performance of the method.

2. Support Vector Machine (SVM)

SVM is a powerful supervised learning algorithm based on statistical learning theory that has been widely used for solving a wide range of data classification problems since it was first introduced by Boser et al. [17]. SVM builds a hyperplane separating the data points into two different classes with a maximum margin. A given training set of N data points (x , y ), x ∈ R , and y ∈ ±1, where x is a data point and y is the corresponding class label; SVM requires the minimization of the following primal optimization problem: where w is the normal vector to hyperplane, b is the bias or offset scalar, ξ are the slack parameters which are used to allow soft margins, C is the penalty parameter which controls the trade-off between minimizing the error and maximizing the margin, and φ(x ) is a nonlinear mapping from the input space to the higher dimensional feature space [4, 8, 13, 17, 18]. The corresponding dual problem of (1) is given by where α are Lagrange multipliers, the term K(x , x ) is a kernel function representing the inner product of two vectors in the feature space, that is, φ (x )φ(x ). Kernel function must satisfy the well-known Mercer's condition. The data points for which α > 0 are called support vectors, which construct the following decision function [4, 8, 13, 17, 18]: where b = −(1/2)∑ y α (K(x +, x ) + K(x −, x )), x + and x − are two arbitrary supporting vectors from different classes y ∈ ±1 [17].

3. Least Squares SVM (LS-SVM)

LS-SVM is originally proposed by Suykens and Vandewalle as a modification to SVM regression formulation [13]. The idea behind the modification is to transform the problem from a quadratic programming problem to solving a set of linear equations. The optimization problem has been modified as follows: where γ and e are similar to the penalty parameter C and the slack variable ξ of SVM, respectively. In (4), it can be easily seen that the following two modifications are made; the first one is that the inequality constraints are replaced by the equality constraints, and the second one is that the squared loss function is taken for e . These modifications significantly simplify the problem [19]. To solve the optimization problem in (4), Lagrangian function is defined as given below: where α are Lagrange multipliers, which can be positive or negative due to the equality constraints. According to optimality conditions, we can get Defining Z = [φ (x 1)y 1; …; φ (x )y ], Y = [y 1; …; y ], I = [1; …; 1], e = [e 1; …; e ], α = [α ; …; α ] and after elimination of w and e, a linear Karush-Kuhn-Tucker system is obtained as in (7) [13]: where Ω = ZZ and the Mercer's condition can be applied to the matrix Ω: LS-SVM classifier takes the form as in (9) which is similar to SVM case as in (3) and found by solving the linear set of equations in (7):

4. Particle Swarm Optimization (PSO)

PSO is a swarm intelligence based optimization method proposed by Kennedy and Eberhart inspired by social behavior of bird flocking and fish schooling [20]. In PSO, the procedure begins with an initialization step in which a population (swarm) of possible solutions (particles) is chosen in the search space and then searches for optimum solution by updating particles over generations. The particles are updated by iteratively by using the following equations: where λ = [λ ,…, λ ] and V = [V ,…, V ] are the current position and the velocity of the ith particle in M dimensional space and G = [G 1,…, G ] and P = [P ,…, P ] are the best position of the swarm and the best position of the ith particle, respectively. The value of inertia weight ω is a trade-off between global search and local search. A bigger value of inertia weight allows the particles to search new areas in the search space (global search) while a smaller value let the particles move in the current search area for fine tuning (local search). The cognitive and the social learning factors c 1 and c 2 are positive constants, and r 1 and r 2 are random numbers in the range [0,1] [20, 21].

5. Binary Decision Tree (BDT)

BDT architecture for classification of data sets with R classes requires R − 1 classifiers. The architecture for classification of a data set with R classes is shown in Figure 1. There is a classifier at each node in the tree to make a binary decision.

Figure 1

BDT architecture for classification of data set with R classes.

6. Cross-Validation (CV)

CV is a most commonly used statistical method for evaluating and comparing the learning algorithms by separating the data set into two sets as training and testing. In CV, the training and testing sets must cross-over in successive rounds, and thus each data point has a chance of being validated against [22]. General form of CV is k-fold CV in which the data set is divided into k groups of (almost) equal size, and k iterations are made. In each iteration step, one of the k groups is used for testing and the remaining k − 1 groups are used for training.

7. ROC Analysis

ROC analysis has been used a standard tool for the design, optimization, and evaluation of two-class classifiers [23]. In ROC analysis with two classes, the notation, which is given in Table 1, is used for the confusion matrix [24].

Table 1

Confusion matrix.

Predicted	Actual
Predicted	Positive	Negative
Positive	TP (true positive)	FP (false positive)
Negative	FN (false negative)	TN (true negative)

ROC analysis investigates and employs the relationship between sensitivity and specificity of two-class classifiers while decision threshold varies [25]. Sensitivity is the true positive rate while specificity is the true negative rate, and they are defined as TP/(TP+FN) and TN/(TN+FP), respectively [24]. ROC curve represents the performance of a classifier in a two-dimensional graph, and conventionally the true positive rate is plotted against the false positive rate [25]. Detailed information about ROC analysis can be found in [23-28]. The extension of ROC analysis for more than two classes has been studied extensively in the literature [15, 23, 27, 29, 30]. For R classes, the confusion matrix is R × R matrix such that its diagonal entries contain the R correct classifications while its off-diagonal entries contain R 2 − R possible errors. Therefore, generating ROC curves for visualizing the performance of a classifier becomes difficult as the number of classes increase, for example, a six-dimensional space is required for three classes. Recently, cobweb representation is used to visualize the performance of the classifiers in the form of multiclass version of ROC analysis [30].

8. Cobweb Representation

The cobweb representation is generated by using the misclassification ratios of the confusion ratio matrix, which is column-normalized version of the confusion matrix. Let us consider a chance classification with R classes. The confusion ratio matrix has R 2 − R misclassification rates which are equal to 1/R. The misclassification rates of 1/R show that when confronted with a data point from one of the classes the classifier classifies it as having the same chances of being from any of R classes. A polygon with R 2 − R equal sides can be formed to map the misclassification rates of the confusion ratio matrix. This polygon (chance polygon) is used to compare the performance of any classifier with the chance classifier in terms of misclassification rates. Any polygon within the chance performance polygon shows a better performance than chance performance. For a chance classification with three classes, the misclassification rates are (0.33, 0.33, 0.33, 0.33, 0.33, 0.33), and the chance polygon becomes a hexagon given as in Figure 2 [30, 31].

Figure 2

Misclassification cobweb for a chance classification with three classes.

9. CTG Data Set

The CTG data set used in this study is taken from UCI Machine Learning Repository [http://archive.ics.uci.edu/ml/datasets/Cardiotocography], (last accessed: June, 2013) and the details can be found in [32]. This data set has 2126 data points from three classes representing the fetal state as normal, suspect, or pathologic. All data points have 21 features, and these features are listed in Table 2.

Table 2

Features used for determining the fetal state.

Features
LB	FHR baseline (beats per minute)
AC	Number of accelerations per second
FM	Number of fetal movements per second
UC	Number of uterine contractions per second
DL	Number of light decelerations per second
DS	Number of severe decelerations per second
DP	Number of prolonged decelerations per second
ASTV	Percentage of time with abnormal short term variability
MSTV	Mean value of short term variability
ALTV	Percentage of time with abnormal long term variability
MLTV	Mean value of long term variability
Width	Width of FHR histogram
Min	Minimum (low frequency) of FHR histogram
Max	Maximum (high frequency) of FHR histogram
N _max	Number of histogram peaks
N _zeros	Number of histogram zeros
Mode	Histogram mode
Mean	Histogram mean
Median	Histogram median
Variance	Histogram variance
Tendency	Histogram tendency

10. Proposed LS-SVM-PSO-BDT Method

The proposed LS-SVM-PSO-BDT method for fetal state determination is described in this section. Its architecture is given in Figure 3.

Figure 3

The proposed method's architecture.

There are two nodes in BDT due to that the CTG data has three classes. A Gaussian radial basis function, which is illustrated in (11), is chosen as the kernel function of LS-SVMs: where σ 2 is the width of the kernel. LS-SVM parameters, the penalty factor γ, and the kernel width σ 2 are optimized by using PSO. Training procedure of the method is summarized as the following sequential steps.

Step 1

Training data points are put into the root node and divided into two groups as PS (pathologic and suspect) and Nr (normal).

Step 2

LS-SVM_ 1 is trained on the data points in the root node to classify the data points as PS or Nr. Meanwhile LS-SVM_ 1 parameters are optimized by using PSO.

Step 3

LS-SVM_ 2 is trained on the data points in the subnode PS to classify the data points as P (pathologic) or S (suspect). Meanwhile, LS-SVM_ 2 parameters are optimized by using PSO. In the first step, the reason why we combine pathologic and suspect data points in one group instead of combining normal and suspect data points is to minimize the risk of making decisions that cause abnormalities in babies.

11. Experimental Results and Discussions

The proposed method LS-SVM-PSO-BDT is used for the classification of the CTG data set which is taken from the UCI Machine Learning Repository. In order to validate the robustness of the method a 10-fold CV procedure is performed. The entire data set is randomly divided into ten subsets of approximately equal size while keeping the proportion of data points from different classes in each subset roughly the same as that in the whole data set. In each fold, one subset is left out for testing, and the union of the remaining nine sets is used for training. Thus, after ten folds, each subset is used once for testing purpose. The final result is average result of these ten folds. In the experiment, the parameters for LS-SVM-PSO-BDT are set as follows. Twenty-five particles are used in PSOs. The initial values of 25 particles for the penalty factor γ and the kernel width σ 2 are chosen on the intervals γ, σ 2 ∈ [2−4, 212]. The inertia weight, cognitive, and social learning factors of PSOs are chosen as ω = 0.75, c 1 = 2, and c 2 = 2. The codes for the proposed method have been developed in MATLAB [33], without using any toolbox. The classification accuracies for ten folds are reported in Table 3.

Table 3

Classification accuracy for each fold.

Fold-1	Fold-2	Fold-3	Fold-4	Fold-5	Fold-6	Fold-7	Fold-8	Fold-9	Fold-10
89.67%	94.84%	91.08%	94.84%	92.49%	91.55%	88.27%	90.14%	92.96%	90.14%

The overall classification accuracy of LS-SVM-PSO-BDT, which is average accuracy of ten folds, is obtained as 91.62%. There have been similar works focusing on the classification of the CTG data in the literature [4, 6–10]. It is not possible to make a direct comparison of the methods in these works with the proposed method because they are all used for two-class task and additionally the properties of the CTG data sets used in [4, 6–8] are different. But, based on the overall classification accuracy, a comparison of the proposed method with the methods used in above mentioned works is provided in Table 4.

Table 4

Comparison of LS-SVM-PSO-BDT with the existing methods in similar works.

Method	Maximum classification accuracy	Number of classes	Number of data points
LS-SVM-PSO-BDT	91.62%	3	2162

SVM Krupa et al., 2011 [8]	81.50%	2	129

SVM Georgoulas et al., 2006 [4]	81.25%	2	80

Hidden Markov models Georgoulas et al., 2004 [6]	83.00%	2	36

ANBLIR system Czabanski et al., 2010 [7]	97.50%	2	685

ANFISOcak and Ertunc, 2012 [9]	97.15%	2	1831

SVM and GAOcak, 2013 [10]	99.30% (specificity) 100% (sensitivity)	2	1831

Although the number of classes and the number of data points in the CTG data set used in our work are larger than those in above mentioned works, LS-SVM-PSO-BDT achieves a remarkable classification accuracy rate of 91.62%. In addition to overall classification accuracy ROC methodology is used to analyze the performance of the method in more detail. Therefore, a confusion matrix is created to analyze the classification results, which is given in Table 5. This table shows the number of correctly and incorrectly classified data points from the CTG data.

Table 5

Confusion matrix of LS-SVM-PSO-BDT.

Predicted	Actual
Predicted	Normal	Suspect	Pathologic
Normal	1604	70	12
Suspect	38	208	29
Pathologic	13	17	135

Total	1655	295	176

In order to visualize the performance of the proposed method a cobweb representation is presented. Cobweb representation is generated by using the misclassification ratios from the confusion ratio matrix, which is column-normalized version of the confusion matrix. The confusion ratio matrix of the proposed method is given in Table 6.

Table 6

Confusion ratio matrix of LS-SVM-PSO-BDT.

Predicted	Actual
Predicted	Normal	Suspect	Pathologic
Normal	0.969	0.237	0.068
Suspect	0.023	0.705	0.165
Pathologic	0.008	0.058	0.767

Diagonal entries of the confusion ratio matrix show the correct classification ratios while its off-diagonal entries show the misclassification ratios. From Table 6, 96.90% of normal data points, 70.50% of suspect data points, and 76.70% of pathologic data points are correctly classified as normal, suspect, and pathologic, respectively. Cobweb representation of the proposed method is given in Figure 4. It can be seen from Figure 4 that the misclassification ratios of LS-SVM-PSO-BDT are smaller than those of the chance classifier.

Figure 4

Misclassification cobweb for LS-SVM-PSO-BDT.

12. Conclusions

In this work, we use LS-SVM utilizing a BDT for classification of the CTG data to determine the fetal state as normal, suspect, or pathologic. Gaussian radial basis function is chosen as the kernel of LS-SVM, and the model parameters, which are the penalty factor and the width of Gaussian kernel, are optimized by using PSO. The robustness of LS-SVM-PSO-BDT is examined by running 10-fold CV. The performance of the proposed method is evaluated in terms of overall classification accuracy. According to empirical results, the proposed LS-SVM-PSO-BDT method achieves a remarkable overall classification accuracy rate of 91.62%. Additionally, ROC methodology is used to analyze the performance of the method in more detail. The correct classification and misclassification ratios of the method with the respect to each individual class are presented. 96.90% of normal data points, 70.50% of suspect data points, and 76.70% of pathologic data points are correctly classified as normal, suspect, and pathologic, respectively. In order to visualize the performance of the method, a cobweb representation is presented. This representation indicates that misclassification ratios of the proposed method are smaller than those of the chance classifier. Empirical results show that the proposed method can help the obstetricians to make more accurate decision in determining the fetal state.

12 in total

Review 1. Better decisions through science.

Authors: J A Swets; R M Dawes; J Monahan
Journal: Sci Am Date: 2000-10 Impact factor: 2.142

2. Basic principles of ROC analysis.

Authors: C E Metz
Journal: Semin Nucl Med Date: 1978-10 Impact factor: 4.446

3. Binary tree of SVM: a new fast multiclass training and classification algorithm.

Authors: Ben Fei; Jinbai Liu
Journal: IEEE Trans Neural Netw Date: 2006-05

4. Predicting the risk of metabolic acidosis for newborns based on fetal heart rate signal classification using support vector machines.

Authors: George Georgoulas; Chrysostomos D Stylios; Peter P Groumpos
Journal: IEEE Trans Biomed Eng Date: 2006-05 Impact factor: 4.538

5. Efficient multiclass ROC approximation by decomposition via confusion matrix perturbation analysis.

Authors: Thomas C W Landgrebe; Robert P W Duin
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2008-05 Impact factor: 6.226

6. Predicting the risk of low-fetal birth weight from cardiotocographic signals using ANBLIR system with deterministic annealing and epsilon-insensitive learning.

Authors: Robert Czabanski; Michal Jezewski; Janusz Wrobel; Janusz Jezewski; Krzysztof Horoba
Journal: IEEE Trans Inf Technol Biomed Date: 2010-02-02

7. A medical decision support system based on support vector machines and the genetic algorithm for the evaluation of fetal well-being.

Authors: Hasan Ocak
Journal: J Med Syst Date: 2013-01-16 Impact factor: 4.460

8. Evolving rule-based systems in two medical domains using genetic programming.

Authors: Athanasios Tsakonas; Georgios Dounias; Jan Jantzen; Hubertus Axer; Beth Bjerregaard; Diedrich Graf von Keyserlingk
Journal: Artif Intell Med Date: 2004-11 Impact factor: 5.326

9. Antepartum fetal heart rate feature extraction and classification using empirical mode decomposition and support vector machine.

Authors: Niranjana Krupa; Mohd Ali; Edmond Zahedi; Shuhaila Ahmed; Fauziah M Hassan
Journal: Biomed Eng Online Date: 2011-01-19 Impact factor: 2.819

10. An expert system based on Fisher score and LS-SVM for cardiac arrhythmia diagnosis.

Authors: Ersen Yılmaz
Journal: Comput Math Methods Med Date: 2013-06-19 Impact factor: 2.238

4 in total

1. Classification of Cardiotocography Based on the Apriori Algorithm and Multi-Model Ensemble Classifier.

Authors: Meng Chen; Zhixiang Yin
Journal: Front Cell Dev Biol Date: 2022-05-11

2. A novel clinical decision support system using improved adaptive genetic algorithm for the assessment of fetal well-being.

Authors: Sindhu Ravindran; Asral Bahari Jambek; Hariharan Muthusamy; Siew-Chin Neoh
Journal: Comput Math Methods Med Date: 2015-02-22 Impact factor: 2.238

3. An Enhanced Ant Colony Optimization Mechanism for the Classification of Depressive Disorders.

Authors: Abed Saif Alghawli; Ahmed I Taloba
Journal: Comput Intell Neurosci Date: 2022-06-28

4. Accessing Artificial Intelligence for Fetus Health Status Using Hybrid Deep Learning Algorithm (AlexNet-SVM) on Cardiotocographic Data.

Authors: Nadia Muhammad Hussain; Ateeq Ur Rehman; Mohamed Tahar Ben Othman; Junaid Zafar; Haroon Zafar; Habib Hamam
Journal: Sensors (Basel) Date: 2022-07-07 Impact factor: 3.847

4 in total