| Literature DB >> 34722764 |
Brenda Jerop1, Davies Rene Segera1.
Abstract
Disease diagnosis faces challenges such as misdiagnosis, lack of diagnosis, and slow diagnosis. There are several machine learning techniques that have been applied to address these challenges, where a set of symptoms is applied to a classification model that predicts the presence or absence of a disease. To improve on the performance of these techniques, this paper presents a technique which involves feature selection using principal component analysis (PCA), a hybrid kernel-based support vector machine (HKSVM) classification model and hyperparameter optimization using genetic algorithm (GA). The HKSVM in this paper introduces a new way of combining three kernels: Radial basis function (RBF), linear, and polynomial. Combining local (RBF) and global (linear and polynomial) kernels has the effect of improved model performance. This is because the local kernels are better able to distinguish points closer to each other while the global kernels are more suited to distinguish points that are far away from each other. The PCA-GA-HKSVM is used on 7 different medical datasets, with two datasets being multiclass datasets and 5 datasets being binary. Performance evaluation metrics used were accuracy, precision, and recall. It was observed that the PCA-GA-HKSVM offered better performance than the single kernel support vector machines (SVMs).Entities:
Mesh:
Year: 2021 PMID: 34722764 PMCID: PMC8550829 DOI: 10.1155/2021/4784057
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Flowchart of PCA-GA-HKSVM process.
Results of hyperparameter optimization.
| Dataset | Crossvalidation accuracy |
|
|
|
|
|---|---|---|---|---|---|
| Respiratory diseases | 96.31% | 0.2208 | 0.3938 | 0.3855 | 0.2275 |
| Nephritis | 100% | 0.3950 | 0.4253 | 0.1797 | 0.1615 |
| Acute bladder inflammation | 98.97% | 0.4117 | 0.4579 | 0.1304 | 0.7639 |
| Breast cancer | 94.94% | 1 | 0 | 0 | 1 |
| Chronic kidney disease | 95.94% | 1 | 0 | 0 | 1 |
| Lymphography | 86.44% | 0.8635 | 0.1365 | 1.3878e-17 | 0.4877 |
| Heart disease | 80.97% | 0.949 | 0.0075 | 0.0075 | 1.00 |
Precision values for the different datasets.
| Dataset | Class 1 | Class 2 | Class 3 | Class 4 |
|---|---|---|---|---|
| Respiratory | 1.0000 | 0.8932 | 1.0000 | 0.9725 |
| Lymphography | 0.8 | 0.8571 | 0.8221 | 0.8081 |
| Nephritis | 1 | 1 | — | — |
| Acute bladder inflammation | 1 | 1 | — | — |
| Breast cancer | 0.9559 | 0.8913 | — | — |
| Chronic kidney disease | 1 | 0.9074 | — | — |
| Heart disease | 0.8427 | 0.7759 | — | — |
Recall values for the different datasets.
| Dataset | Class 1 | Class 2 | Class 3 | Class 4 |
|---|---|---|---|---|
| Respiratory | 0.9204 | 1.0000 | 0.9697 | 0.9636 |
| Nephritis | 1 | 1 | — | — |
| Acute bladder inflammation | 1 | 1 | — | — |
| Breast cancer | 0.9286 | 0.9318 | — | — |
| Chronic kidney disease | 0.8387 | 1 | — | — |
| Lymphography | 0.8571 | 0.8 | 0.8621 | 0.7936 |
| Heart disease | 0.7426 | 0.8654 | — | — |
Accuracy of PCA-GA-HKSVM vs. single kernel SVMs.
| Dataset | PCA-GA-HKSVM | RBF | Linear | Polynomial |
|---|---|---|---|---|
| Respiratory diseases | 96.3547% | 93.81% | 93.05% | 75.13% |
| Nephritis | 100% | 98% | 100% | 98% |
| Bladder inflammation | 100% | 98% | 100% | 98% |
| Breast cancer | 92.98% | 65.54% | 87.61% | 89.11% |
| Chronic kidney disease | 93.7500% | 92.21% | 90.63% | 92.74% |
| Lymphography | 82.8576% | 69.41% | 79.76% | 79.76% |
| Heart disease | 80.49% | 80.49% | 77.64% | 69.414% |
p value from Friedman's test.
| Dataset |
|
|---|---|
| Respiratory diseases | 0.0952 |
| Nephritis | 0.2610 |
| Bladder inflammation | 0.1251 |
| Breast cancer | 0.0537 |
| Chronic kidney disease | 0.0825 |
| Lymphography | 0.0943 |
| Heart disease | 0.0621 |
Running time in seconds.
| Dataset | PCA-GA-HKSVM | RBF | Linear | Polynomial |
|---|---|---|---|---|
| Respiratory diseases | 61.456 | 2.182 | 31.27 | 2.378 |
| Nephritis | 13.616 | 0.545 | 0.675 | 0.598 |
| Bladder inflammation | 3.480 | 0.533 | 0.532 | 0.512 |
| Breast cancer | 62.737 | 0.800 | 2.233 | 6.557 |
| Chronic kidney disease | 58.228 | 0.708 | 2.671 | 3.903 |
| Lymphography | 5.563 | 0.358 | 0.999 | 0.472 |
| Heart disease | 47.759 | 0.798 | 3.050 | 0.648 |