| Literature DB >> 36210985 |
Zahid Ullah1, Farrukh Saleem1, Mona Jamjoom2, Bahjat Fakieh1, Faris Kateb3, Abdullah Marish Ali4, Babar Shah5.
Abstract
Diabetes is a chronic disease that can cause several forms of chronic damage to the human body, including heart problems, kidney failure, depression, eye damage, and nerve damage. There are several risk factors involved in causing this disease, with some of the most common being obesity, age, insulin resistance, and hypertension. Therefore, early detection of these risk factors is vital in helping patients reverse diabetes from the early stage to live healthy lives. Machine learning (ML) is a useful tool that can easily detect diabetes from several risk factors and, based on the findings, provide a decision-based model that can help in diagnosing the disease. This study aims to detect the risk factors of diabetes using ML methods and to provide a decision support system for medical practitioners that can help them in diagnosing diabetes. Moreover, besides various other preprocessing steps, this study has used the synthetic minority over-sampling technique integrated with the edited nearest neighbor (SMOTE-ENN) method for balancing the BRFSS dataset. The SMOTE-ENN is a more powerful method than the individual SMOTE method. Several ML methods were applied to the processed BRFSS dataset and built prediction models for detecting the risk factors that can help in diagnosing diabetes patients in the early stage. The prediction models were evaluated using various measures that show the high performance of the models. The experimental results show the reliability of the proposed models, demonstrating that k-nearest neighbor (KNN) outperformed other methods with an accuracy of 98.38%, sensitivity, specificity, and ROC/AUC score of 98%. Moreover, compared with the existing state-of-the-art methods, the results confirm the efficacy of the proposed models in terms of accuracy and other evaluation measures. The use of SMOTE-ENN is more beneficial for balancing the dataset to build more accurate prediction models. This was the main reason it was possible to achieve models more accurate than the existing ones.Entities:
Mesh:
Year: 2022 PMID: 36210985 PMCID: PMC9536939 DOI: 10.1155/2022/2557795
Source DB: PubMed Journal: Comput Intell Neurosci
Summary of related work.
| S. No. | Ref. | Dataset | Preprocessing method(s) | Outperformed method(s) | Model accuracy (%) |
|---|---|---|---|---|---|
| 1 | [ | Private | PCA, mRMR | RF | 80.84 |
| PIDD | 77.21 | ||||
| 2 | [ | PIDD | PCA | SVM, AB, bootstrap | 94.44 |
| 3 | [ | BRFSS-2014 | SMOTE | NN | 82.41 |
| 4 | [ | BRFSS | Different parameters used | RF | 86.80 |
| 5 | [ | — | — | LR | 77.9 |
| 6 | [ | PIDD | Feature selection | NN | 86.6 |
| 7 | [ | PIDD | Label encoding, normalization | SVM | 80.26 |
| Other | DT, RF | 96.81 | |||
| 8 | [ | PIDD | Features extraction | RF | 88.31 |
| 9 | [ | Private | — | LR | 96.02 |
| 10 | [ | Private | — | Bagging | 97.7 |
Figure 1Dataset description.
Figure 2Imbalanced dataset.
Figure 3Confusion matrix.
Comparison of the proposed method with existing studies used BRFSS dataset.
| Study | Dataset | Method | Accuracy (%) | Sensitivity | Specificity | AUC |
|---|---|---|---|---|---|---|
| [ | BRFSS-2014 | NN | 82.4 | 0.378 | 0.902 | 0.795 |
| [ | BRFSS-2017 | RF | 86.8 | — | — | — |
| Proposed method | BRFSS-2015 | KNN | 98.36 | 0.98 | 0.98 | 0.983 |
Comparison of the proposed method with existing studies that used other datasets.
| Study | Dataset | Method | Accuracy (%) | Precision | Sensitivity | Specificity |
|
|---|---|---|---|---|---|---|---|
| [ | Private | RF | 80.84 | — | 0.85 | 0.767 | — |
| PIDD | RF | 77.21 | 0.746 | 0.799 | |||
| [ | PIDD | SVM, AB | 94.44 | 0.971 | 0.910 | — | — |
|
| |||||||
| [ | PIDD | LR,SVM | 78.85, 77.71 | 0.788, 0.774 | 0.789, 0.777 | — | 0.788,0.775 |
| NN | 88.6 | — | — | — | |||
| [ | PIDD | RF | 88.31 | 0.88 | 0.86 | — | 0.87 |
| [ | Private | LR | 96.02 | 0.887 | 0.857 | — | 0.871 |
| Proposed method | BRFSS | KNN |
|
|
|
|
|
Figure 4Accuracy of the proposed methods.
Model evaluation measures.
| Classifier | Precision | Sensitivity | Specificity |
| AUC |
|---|---|---|---|---|---|
| kNN | 0.98 | 0.98 | 0.98 | 0.98 | 0.983 |
| RF | 0.96 | 0.95 | 0.95 | 0.95 | 0.955 |
| XGBoost | 0.95 | 0.95 | 0.96 | 0.95 | 0.951 |
| Bagging | 0.93 | 0.94 | 0.94 | 0.94 | 0.946 |
| AdaBoost | 0.94 | 0.94 | 0.95 | 0.94 | 0.944 |
Figure 5ROC curves of prediction models.