| Literature DB >> 35866773 |
Mohammed Zeyad Al Yousef1, Adel Fouad Yasky1, Riyad Al Shammari2,3, Mazen S Ferwana4.
Abstract
BACKGROUND: Saudi Arabia ranks 7th globally in terms of diabetes prevalence, and its prevalence is expected to reach 45.36% by 2030. The cost of diabetes is expected to increase to 27 billion Saudi riyals in cases where undiagnosed individuals are also documented. Prevention and early detection can effectively address these challenges.Entities:
Mesh:
Year: 2022 PMID: 35866773 PMCID: PMC9302319 DOI: 10.1097/MD.0000000000029588
Source DB: PubMed Journal: Medicine (Baltimore) ISSN: 0025-7974 Impact factor: 1.817
Figure 1.Flow chart showing the process of preparing the data.
Attributes used in our study.
| Study | Region | RBC | WBC | ALK_Phos | Sodium | HDL |
|---|---|---|---|---|---|---|
| Current Study (41 attributes) | Gender | Hgb | MCHC | Adj_Ca | CO2 | Triglyceride |
| Age | MPV | Mg | AGAP | Potassium | LDL | |
| SBP | HCT | Phosphorus | Creatinine | BUN | FBS | |
| DBP | MCH | Uric Acid | ALT | Chloride | A1c | |
| BMI | RDW | T Bili | AST | eGFR | RBS | |
| Platelet | MCV | Albumin | Ca | Cholesterol |
A1c = glycated hemoglobin, ADJ_Ca = adjusted calcium, AGAP = anion gap, ALK_Phos = alkaline phosphatase, ALT = alanine transaminase, AST = aspartate aminotransferase, BMI = body mass index, BUN = blood urea nitrogen, DBP = diastolic blood pressure, eGFR = estimated glomerular filtration rate, FBS = fasting blood sugar, HCT = hematocrit, HDL = high-density lipoprotein, Hgb = hemoglobin, LDL = low-density lipoprotein, MCH = mean corpuscular hemoglobin, MCHC = mean cell hemoglobin concentration, MCV = mean corpuscular volume, MPV = mean platelet volume, RBC = red blood cells, RBS = random blood sugar, RDW = red cell distribution width, SBP = systolic blood pressure, T Bili = total bilirubin, WBC = white blood cells.
Figure 2.Flow chart showing the process of labeling each MRN with the diagnosis. MRN = medical record number.
Distribution and statistical values of gender based on the region.
| Central region | Eastern region | Western region | ||||||
|---|---|---|---|---|---|---|---|---|
| Gender | N | N % | N | N% | N | N% | N | N% |
| Female | 10,051 | 46.90% | 6989 | 55.69% | 1147 | 30.98% | 1915 | 36.96% |
| Male | 7446 | 34.74% | 5045 | 40.2% | 1081 | 29.20% | 1320 | 25.48% |
| None available gender | 3934 | 18.36% | 515 | 4.1% | 1474 | 39.81% | 1945 | 37.54% |
| Total | 21,431 | 100% | 12,549 | 100% | 3702 | 100% | 5180 | 100% |
Distribution and statistical values of diagnosis based on gender and region.
| Diabetics | Prediabetics | None diabetics | Total | |||||
|---|---|---|---|---|---|---|---|---|
| Gender | N | N% | N | N% | N | N% | N | N% |
| Female | 6132 | 47.94% | 1881 | 18.71% | 2038 | 20.27% | 10,051 | 46.89% |
| Male | 4554 | 35.60% | 1554 | 20.87% | 1338 | 17.96% | 7446 | 34.74% |
| None available gender | 2105 | 16.46% | 638 | 16.21% | 591 | 15.02% | 3934 | 18.36% |
| Total | 12,791 | 59.93% | 4073 | 19.00% | 4567 | 21.31% | 21,431 | 100% |
The attributes and the number of missing values from the 21431 patients.
| Attribute | Missing | Missing % | Attribute | Missing | Missing % | Attribute | Missing | Missing % |
|---|---|---|---|---|---|---|---|---|
| Region | 0 | 0.00% | MCV | 3229 | 15.07% | AST | 5548 | 25.89% |
| Gender | 2934 | 13.69% | WBC | 3667 | 17.11% | Ca | 6645 | 31.01% |
| Age | 2933 | 13.69% | MCHC | 3260 | 15.21% | Sodium | 3011 | 14.05% |
| SBP | 161 | 0.75% | Mg | 6159 | 28.74% | CO2 | 1927 | 8.99% |
| DBS | 161 | 0.75% | Phosphors | 7297 | 34.05% | Potassium | 1860 | 8.68% |
| BMI | 861 | 4.02% | Uric Acid | 8107 | 37.83% | BUN | 1849 | 8.86% |
| Platelet | 3187 | 14.87% | T Bili | 5954 | 27.78% | Chloride | 1937 | 9.04% |
| RBC | 4139 | 19.31% | Albumin | 6218 | 29.01% | eGFR | 3463 | 16.16% |
| Hgb | 3164 | 14.76% | ALK_Phos | 5732 | 26.75% | Cholesterol | 3561 | 16.62% |
| MPV | 3171 | 14.80% | Adj_Ca | 8164 | 38.09% | HDL | 2894 | 13.50% |
| HCT | 3173 | 14.81% | AGAP | 1967 | 9.18% | Triglyceride | 3581 | 16.71% |
| MCH | 3216 | 15.01% | Creatinine | 2389 | 11.15% | LDL | 5803 | 27.08% |
| RDW | 4123 | 19.24% | ALT | 5356 | 24.99% |
ADJ_Ca = Adjusted Calcium, AGAP = anion gap, ALK_Phos = Alkaline phosphatase, ALT = alanine transaminase, AST = aspartate aminotransferase, BMI = body mass index, BUN = blood urea nitrogen, DBP = diastolic blood pressure, eGFR = estimated glomerular filtration rate, HCT = hematocrit, HDL = high density lipoprotein, Hgb = hemoglobin, LDL = low-density lipoprotein, MCH = mean corpuscular hemoglobin, MCHC = mean cell hemoglobin concentration, MCV = mean corpuscular volume, MPV = Mean Platelet Volume, RBC = red blood cells, RDW = red cell distribution width, SBP = systolic blood pressure, T Bili = total bilirubin, WBC = white blood cells.
Figure 3.The selected attributes according to their information gain measures.
Comparison of the performance of the different classification models without using the synthetic minority oversampling technique.
| KNN | ||||||||
|---|---|---|---|---|---|---|---|---|
| Measures | RF | SVM | LR | BC | BN | K = 1 | K = 10 | K = 50 |
| Precision | 56% | – | 57% | 56% | 59% | 50% | 52% | – |
| Recall | 60% | 61% | 62% | 59% | 63% | 55% | 62% | 60% |
| AUC | 0.67 | 0.53 | 0.67 | 0.70 | 0.71 | 0.55 | 0.62 | 0.62 |
| F-score | 54% | – | 53% | 56% | 60% | 52% | 51% | – |
| RMSE | 0.45 | 0.44 | 0.41 | 0.44 | 0.41 | 0.55 | 0.44 | 0.44 |
| Accuracy | 53% | 60% | 62% | 59% | 63% | 55% | 62% | 60% |
Comparison of the performance of the different classification models using the synthetic minority over-sampling technique.
| KNN | ||||||||
|---|---|---|---|---|---|---|---|---|
| Measures | RF | SVM | LR | BC | BN | K = 1 | K = 10 | K = 50 |
| Precision | 60% | 53% | 54% | 66% | 62% | 51% | 53% | 49% |
| Recall | 28% | 61% | 56% | 59% | 66% | 53% | 58% | 60% |
| AUC | 0.64 | 0.58 | 0.66 | 0.70 | 0.75 | 0.56 | 0.60 | 0.59 |
| F-score | 22% | 54% | 55% | 56% | 61% | 52% | 53% | 48% |
| RMSE | 0.50 | 0.44 | 0.43 | 0.44 | 0.42 | 0.56 | 0.44 | 0.43 |
| Accuracy | 28% | 61% | 56% | 59% | 66% | 53% | 58% | 60% |
A comparison summary of the models.
| Results | Performance measures | Validation and training | # of attributes | class | # of records | Data set | Author |
|---|---|---|---|---|---|---|---|
| RF: Recall (90%), precision (68%) | Sensitivity (recall) and PPV (precision) | training/testing, percentage not mentioned | 18 | Dm, Non-DM | 66,325 | NGHA 2013-2015 | Daghistani T[ |
| PNN: accuracy 81.49% | Accuracy | 76% training/25% testing | 9 | Dm, Non-DM | 768 | PIMA | Soltani Z[ |
| clustering + SVM: Accuracy 98.93, sensitivity 99.33, specificity 98.73% a and AUC of 0.97 | Accuracy, sensitivity, specificity, AUC | Cross validation | 9 | Dm, Non-DM | 768 | PIMA | Ilango B[ |
| clustering + C4.5: accuracy 92.38 %, sensitivity (90.38), and specify (93.29). | Accuracy, sensitivity, specificity | Cross validation | 9 | Dm, Non-DM | 768 | PIMA | Patil B[ |
| BN: Precision (62%), Recall (66%), AUC (0.75), F-Score (61%), RMSE (0.42), and Accuracy (66%) | Sensitivity (recall), PPV (precision), AUC, F score, RMSE, accuracy | 70% training/ 30% testing | 41 | Dm, Pre Dm, Non-DM | 18,181 | NGHA 2015–2018 | Current work |
BN = Bayesian Network, Dm = Diabetic, Non-DM = Non-Diabetic, PNN = probabilistic neural network, Pre Dm = Pre-diabetic, RF = Random Forest, SVM = support vector machine.