| Literature DB >> 31998772 |
Davies Segera1, Mwangi Mbuthia1, Abraham Nyete1.
Abstract
Determining an optimal decision model is an important but difficult combinatorial task in imbalanced microarray-based cancer classification. Though the multiclass support vector machine (MCSVM) has already made an important contribution in this field, its performance solely depends on three aspects: the penalty factor C, the type of kernel, and its parameters. To improve the performance of this classifier in microarray-based cancer analysis, this paper proposes PSO-PCA-LGP-MCSVM model that is based on particle swarm optimization (PSO), principal component analysis (PCA), and multiclass support vector machine (MCSVM). The MCSVM is based on a hybrid kernel, i.e., linear-Gaussian-polynomial (LGP) that combines the advantages of three standard kernels (linear, Gaussian, and polynomial) in a novel manner, where the linear kernel is linearly combined with the Gaussian kernel embedding the polynomial kernel. Further, this paper proves and makes sure that the LGP kernel confirms the features of a valid kernel. In order to reveal the effectiveness of our model, several experiments were conducted and the obtained results compared between our model and other three single kernel-based models, namely, PSO-PCA-L-MCSVM (utilizing a linear kernel), PSO-PCA-G-MCSVM (utilizing a Gaussian kernel), and PSO-PCA-P-MCSVM (utilizing a polynomial kernel). In comparison, two dual and two multiclass imbalanced standard microarray datasets were used. Experimental results in terms of three extended assessment metrics (F-score, G-mean, and Accuracy) reveal the superior global feature extraction, prediction, and learning abilities of this model against three single kernel-based models.Entities:
Year: 2019 PMID: 31998772 PMCID: PMC6973196 DOI: 10.1155/2019/4085725
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Parameters and their respective ranges.
| Parameter | Range |
|---|---|
|
| 0 < |
| log2 | −5 ≤ log2 |
|
| 0 ≤ |
|
| 2 ≤ |
| log2 | −15≤ log2 |
Initial PSO parameters setting.
| Parameter | Range |
|---|---|
| Maximum number of iterations | 50 |
| Inertial weight, | 1 |
| Number of particles/swarm size | (1) PSO + L-MCSVM = 10 |
| (2) PSO + G-MCSVM = 20 | |
| (3) PSO + | |
| (4) PSO + LGP-MCSVM = 80 | |
| Cognition learning factor, | 2.0 |
| Social learning factor, | 2.0 |
Figure 1Scheme of the proposed PSO-PCA-LGP-MCSVM algorithm.
The cancer microarray datasets utilized in this paper.
| Category | Dataset | Sample size | Number of genes | Number of classes |
|---|---|---|---|---|
| Two-class | AML-ALL | 72 | 7129 | 2 |
| Colon | 62 | 2000 | 2 | |
| Multiclass | St. Jude | 215 | 12558 | 7 |
| Lung | 203 | 3312 | 5 |
Percentage proportion for the calibration, validation, and test sets.
| Dataset | % proportion for calibration set | % proportion for validation set | % proportion for test set |
|---|---|---|---|
| AML-ALL | 61.1 | 15.3 | 23.6 |
| Colon | 58.1 | 14.5 | 27.4 |
| St. Jude | 57.7 | 14.4 | 27.9 |
| Lung | 57.1 | 14.3 | 28.6 |
Confusion matrix for a two-class problem.
| Positive prediction | Negative prediction | |
|---|---|---|
| Positive class | True positive (TP) | False negative (FN) |
| Negative class | False positive (FP) | True negative (TN) |
Accuracy of all considered models on the four microarray datasets.
| Models | Colon | Lung | AML-ALL | St. Jude |
|---|---|---|---|---|
| PSO + L-MCSVM |
| 0.9596 |
| 0.9422 |
| PSO + P-MCSVM | 0.8235 |
|
|
|
| PSO + G-MCSVM | 0.8235 | 0.9608 | 0.9412 | 0.9572 |
| PSO + LGP-MCSVM |
|
|
|
|
Values in bold represent the best result and values in italic denote the worst in each column, respectively.
F-score of all considered models on the four microarray datasets.
| Models | Colon | Lung | AML-ALL | St. Jude |
|---|---|---|---|---|
| PSO + L-MCSVM |
| 0.9246 | 0.9328 | 0.7870 |
| PSO + P-MCSVM | 0.8211 |
|
|
|
| PSO + G-MCSVM | 0.8211 | 0.9306 |
| 0.8477 |
| PSO + LGP-MCSVM |
|
|
|
|
Values in bold represent the best result and values in italic denote the worst in each column, respectively.
G-mean of all considered models on the four microarray datasets.
| Models | Colon | Lung | AML-ALL | St. Jude |
|---|---|---|---|---|
| PSO + L-MCSVM |
| 0.9791 |
| 0.9557 |
| PSO + P-MCSVM | 0.8235 |
|
|
|
| PSO + G-MCSVM | 0.8235 | 0.9792 |
| 0.9661 |
| PSO + LGP-MCSVM |
|
|
|
|
Values in bold represent the best result and values in italic denote the worst in each column, respectively.