| Literature DB >> 35685134 |
Taher M Ghazal1,2, Hussam Al Hamadi3, Muhammad Umar Nasir4, Mohammed Gollapalli5, Muhammad Zubair6, Muhammad Adnan Khan7, Chan Yeob Yeun3.
Abstract
Fatal diseases like cancer, dementia, and diabetes are very dangerous. This leads to fear of death if these are not diagnosed at early stages. Computer science uses biomedical studies to diagnose cancer, dementia, and diabetes. With the advancement of machine learning, there are various techniques which are accessible to predict and prognosis these diseases based on different datasets. These datasets varied (image datasets and CSV datasets) around the world. So, there is a need for some machine learning classifiers to predict cancer, dementia, and diabetes in a human. In this paper, we used a multifactorial genetic inheritance disorder dataset to predict cancer, dementia, and diabetes. Several studies used different machine learning classifiers to predict cancer, dementia, and diabetes separately with the help of different types of datasets. So, in this paper, multiclass classification proposed methodology used support vector machine (SVM) and K-nearest neighbor (KNN) machine learning techniques to predict three diseases and compared these techniques based on accuracy. Simulation results have shown that the proposed model of SVM and KNN for prediction of dementia, cancer, and diabetes from multifactorial genetic inheritance disorder achieved 92.8% and 92.5%, 92.8% and 91.2% accuracy during training and testing, respectively. So, it is observed that proposed SVM-based dementia, cancer, and diabetes from multifactorial genetic inheritance disorder prediction (MGIDP) give attractive results as compared with the proposed model of KNN. The application of the proposed model helps to prognosis and prediction of cancer, dementia, and diabetes before time and plays a vital role to minimize the death ratio around the world.Entities:
Mesh:
Year: 2022 PMID: 35685134 PMCID: PMC9173933 DOI: 10.1155/2022/1051388
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Proposed machine learning-based cancer, dementia, and diabetes prediction model from multifactorial genetic disorder.
Description of dataset attributes.
| No. | Attributes | Values |
|---|---|---|
| 1 | Patient age | 0–16 |
| 2 | Genes in mothers' side | 1: yes; 2: no |
| 3 | Inherited from father | 1: yes; 2: no |
| 4 | Maternal gene | 1: yes; 2: no |
| 5 | Paternal gene | 1: yes; 2: no |
| 6 | Gender | 1: male; 2: female; 3: ambiguous |
| 7 | Birth asphyxia | 1: yes; 2: no |
Figure 2No. of patients from targeted classes.
Training performance of the proposed three-class SVM and KNN-based model.
| Instances (1447) | SVM | KNN | ||||
|---|---|---|---|---|---|---|
| Dementia | Cancer | Diabetes | Dementia | Cancer | Diabetes | |
| Dementia | 5 | 0 | 98 | 2 | 0 | 101 |
| Cancer | 0 | 57 | 10 | 0 | 56 | 11 |
| Diabetes | 6 | 2 | 1268 | 2 | 1 | 1273 |
Testing performance of the proposed three-class SVM and KNN-based model.
| Instances (1447) | SVM | KNN | ||||
|---|---|---|---|---|---|---|
| Dementia | Cancer | Diabetes | Dementia | Cancer | Diabetes | |
| Dementia | 3 | 0 | 46 | 6 | 0 | 43 |
| Cancer | 0 | 27 | 3 | 0 | 20 | 10 |
| Diabetes | 0 | 0 | 541 | 11 | 0 | 530 |
Figure 3Performance of the proposed SVM-based model w.r.t MSSE vs. iterations.
Training simulation results of dementia class by the proposed model.
| Instances (1446) | CA (%) | CMR (%) | Sensitivity (%) | Specificity (%) | F1-score (%) | NPV (%) | FPR (%) | FNR (%) | LPR (%) | LNR (%) | PPV (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | 94.77 | 5.23 | 45.45 | 95.16 | 11.9 | 99.5 | 4.84 | 54.55 | 9.39 | 0.57 | 6.84 |
| KNN | 94.84 | 5.16 | 50 | 92.99 | 3.73 | 92.99 | 7.01 | 50 | 7.13 | 0.53 | 1.94 |
Training simulation results of cancer class by the proposed model.
| Instances (1446) | CA (%) | CMR (%) | Sensitivity (%) | Specificity (%) | F1-score (%) | NPV (%) | FPR (%) | FNR (%) | LPR (%) | LNR (%) | PPV (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | 99.17 | 0.83 | 96.61 | 99.27 | 90.47 | 99.27 | 0.73 | 3.39 | 132.34 | 0.034 | 85.07 |
| KNN | 99.17 | 0.83 | 98.24 | 99.2 | 90.32 | 99.20 | 0.8 | 1.76 | 122.8 | 0.017 | 83.58 |
Training simulation results of diabetes class by the proposed model.
| Instances (1446) | CA (%) | CMR (%) | Sensitivity (%) | Specificity (%) | F1-score (%) | NPV (%) | FPR (%) | FNR (%) | LPR (%) | LNR (%) | PPV (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | 91.97 | 8.03 | 92.15 | 88.57 | 95.62 | 36.47 | 11.43 | 7.85 | 8.06 | 0.088 | 99.37 |
| KNN | 92.04 | 7.96 | 91.9 | 95.08 | 95.67 | 34.11 | 4.92 | 8.1 | 18.67 | 0.088 | 99.76 |
Testing simulation results of dementia class by the proposed model.
| Instances (620) | CA (%) | CMR (%) | Sensitivity (%) | Specificity (%) | F1-score (%) | NPV (%) | FPR (%) | FNR (%) | LPR (%) | LNR (%) | PPV (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | 92.58 | 7.42 | 100 | 92.54 | 12.24 | 100 | 7.46 | 0 | 13.4 | 0 | 6.12 |
| KNN | 91.29 | 8.71 | 35.29 | 92.86 | 18.18 | 98.07 | 7.14 | 64.71 | 4.94 | 0.69 | 12.24 |
Testing simulation results of cancer class by the proposed model.
| Instances (620) | CA (%) | CMR (%) | Sensitivity (%) | Specificity (%) | F1-score (%) | NPV (%) | FPR (%) | FNR (%) | LPR (%) | LNR (%) | PPV (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | 99.51 | 0.49 | 100 | 99.4 | 94.7 | 100 | 0.6 | 0 | 166.6 | 0 | 90 |
| KNN | 98.38 | 1.62 | 100 | 98.3 | 80 | 100 | 1.7 | 0 | 58.82 | 0 | 66.6 |
Testing simulation results of diabetes class by the proposed model.
| Instances (1446) | CA (%) | CMR (%) | Sensitivity (%) | Specificity (%) | F1-score (%) | NPV (%) | FPR (%) | FNR (%) | LPR (%) | LNR (%) | PPV (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | 92.09 | 7.91 | 91.69 | 100 | 95.66 | 37.97 | 0 | 8.31 | 0 | 0.083 | 100 |
| KNN | 89.67 | 10.33 | 90.90 | 70.27 | 94.30 | 32.91 | 29.73 | 9.1 | 3.05 | 0.129 | 97.96 |
Proposed model parameter results.
| Instances (2067) | SVM | KNN | ||
|---|---|---|---|---|
| Training (%) (1446 instances) | Testing (%) (620 instances) | Training (%) (1446 instances) | Testing (%) (620 instances) | |
| Accuracy | 92.8 | 92.5 | 92.8 | 91.2 |
| Miss-rate | 7.2 | 7.5 | 7.2 | 8.8 |
Comparative analysis with previous work.
| Work | Model | Dataset | Classification accuracy (%) |
|---|---|---|---|
| Kee Pang Soh et al. [ | Logistic regression, random forest | Cancer mutate data | 77.7 |
| Javier De Velasco Oriol et al. [ | Linear ML | SNP dataset | 75 |
| Bassam Farran et al. [ | Logistic regression, KNN | Kuwait health network data | 81.3 |
| Proposed model for prediction of dementia, cancer, and diabetes | Machine learning (SVM and KNN) | Genome disorder data | 92.5 |