| Literature DB >> 34211680 |
Ebrahime Mohammed Senan1, Mosleh Hmoud Al-Adhaileh2, Fawaz Waselallah Alsaade3, Theyazn H H Aldhyani4, Ahmed Abdullah Alqarni5, Nizar Alsharif6, M Irfan Uddin7, Ahmed H Alahmadi8, Mukti E Jadhav9, Mohammed Y Alzahrani5.
Abstract
Chronic kidney disease (CKD) is among the top 20 causes of death worldwide and affects approximately 10% of the world adult population. CKD is a disorder that disrupts normal kidney function. Due to the increasing number of people with CKD, effective prediction measures for the early diagnosis of CKD are required. The novelty of this study lies in developing the diagnosis system to detect chronic kidney diseases. This study assists experts in exploring preventive measures for CKD through early diagnosis using machine learning techniques. This study focused on evaluating a dataset collected from 400 patients containing 24 features. The mean and mode statistical analysis methods were used to replace the missing numerical and the nominal values. To choose the most important features, Recursive Feature Elimination (RFE) was applied. Four classification algorithms applied in this study were support vector machine (SVM), k-nearest neighbors (KNN), decision tree, and random forest. All the classification algorithms achieved promising performance. The random forest algorithm outperformed all other applied algorithms, reaching an accuracy, precision, recall, and F1-score of 100% for all measures. CKD is a serious life-threatening disease, with high rates of morbidity and mortality. Therefore, artificial intelligence techniques are of great importance in the early detection of CKD. These techniques are supportive of experts and doctors in early diagnosis to avoid developing kidney failure.Entities:
Year: 2021 PMID: 34211680 PMCID: PMC8208843 DOI: 10.1155/2021/1004767
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
The stages of development of CKD.
| Stage | Description | Glomerular filtration rate (GFR) (mL/min/1.73 m2) | Treatment stage |
|---|---|---|---|
| 1 | Kidney function is normal | ≥90 | Observation, blood pressure control |
| 2 | Kidney damage is mild | 60–89 | Observation, blood pressure control and risk factors |
| 3 | Kidney damage is moderate | 30–59 | Observation, blood pressure control and risk factors |
| 4 | Kidney damage is severe | 15–29 | Planning for end-stage renal failure |
| 5 | Established kidney failure | ≤ 15 | Treatment choices |
Figure 1Factors affecting chronic kidney disease.
Figure 2The proposed system for the diagnosis of CKD.
Statistical analysis of the dataset of numerical features.
| Features | Mean | Standard deviation | Max | Min |
|---|---|---|---|---|
| Age | 51.483 | 17.21 | 90 | 2 |
| Blood glucose random | 148.037 | 76.583 | 490 | 22 |
| Serum creatinine | 3.072 | 4.512 | 76 | 0.4 |
| Blood pressure | 76.469 | 13.756 | 180 | 50 |
| Blood urea | 57.426 | 49.987 | 391 | 1.5 |
| Potassium | 4.627 | 2.92 | 47 | 2.5 |
| Packed cell volume | 38.884 | 8.762 | 54 | 9 |
| Sodium | 137.529 | 9.908 | 163 | 4.5 |
| Hemoglobin | 12.526 | 2.815 | 17.8 | 3.1 |
| White blood cell count | 8406.12 | 2823.35 | 26400 | 2200 |
| Red blood cell count | 4.707 | 0.89 | 8 | 2.1 |
Statistical analysis of the dataset of nominal features.
| Features | Label | Count |
|---|---|---|
| Albumin | 0 | 245 |
| 1 | 44 | |
| 2 | 43 | |
| 3 | 43 | |
| 4 | 24 | |
| 5 | 1 | |
|
| ||
| Specific gravity | 1.005 | 7 |
| 1.01 | 84 | |
| 1.015 | 75 | |
| 1.02 | 153 | |
| 1.025 | 81 | |
|
| ||
| Sugar | 0 | 339 |
| 1 | 13 | |
| 2 | 18 | |
| 3 | 14 | |
| 4 | 13 | |
| 5 | 3 | |
|
| ||
| Pus cell | Normal | 324 |
| Abnormal | 76 | |
|
| ||
| Red blood cells | Normal | 353 |
| Abnormal | 47 | |
|
| ||
| Bacteria | Present | 22 |
| Not present | 378 | |
|
| ||
| Pus cell clumps | Present | 42 |
| Not present | 358 | |
|
| ||
| Diabetes mellitus | Yes | 137 |
| No | 263 | |
|
| ||
| Hypertension | Yes | 147 |
| No | 253 | |
|
| ||
| Edema | Yes | 76 |
| No | 324 | |
|
| ||
| Coronary artery disease | Yes | 34 |
| No | 366 | |
|
| ||
| Anemia | Yes | 60 |
| No | 340 | |
|
| ||
| Appetite | Good | 318 |
| Poor | 82 | |
The importance of predictive variables in diagnosing CKD.
| Features | Priority ratio (%) |
|---|---|
| al | 17.99 |
| hemo | 14.34 |
| pcv | 12.91 |
| sc | 12.09 |
| rc | 7.51 |
| bu | 6.56 |
| sg | 6.08 |
| pcv | 5.60 |
| htn | 4.64 |
| bgr | 3.48 |
| dm | 3.20 |
| pe | 1.25 |
| wc | 1.01 |
| sod | 0.92 |
| rbc | 0.91 |
| bp | 0.39 |
| su | 0.35 |
| appet | 0.28 |
| ba | 0.18 |
| age | 0.18 |
| cad | 0.09 |
| pcc | 0.06 |
| pot | 0.00 |
| ane | 0.00 |
Figure 3Number of features vs. cross-validated score.
Environment setup of the proposed system.
| Resource | Details |
|---|---|
| CPU | Core i5 Gen6 |
| RAM | 8 GB |
| GPU | 4 GB |
| Software | Python |
Splitting dataset.
| Dataset | Numbers |
|---|---|
| Training | 300 patients |
| Testing and validation | 100 patients |
Results of diagnosing CKD using four machine learning algorithms.
| Classifiers | SVM | KNN | Decision tree | Random forest |
|---|---|---|---|---|
| Accuracy % | 96.67 | 98.33 | 99.17 | 100.00 |
| Precision % | 92.00 | 100.00 | 100.00 | 100.00 |
| Recall % | 94.74 | 97.37 | 98.68 | 100.00 |
| F1-score% | 97.30 | 98.67 | 99.34 | 100.00 |
Figure 4Correlation between different features.
Comparison of the performance of our proposed system with previous studies.
| Previous studies | Accuracy % | Precision % | Recall % | F1-score % |
|---|---|---|---|---|
| Hore et al. [ | 92.54 | 85.71 | 96 | 90.56 |
| Vasquez-Morales et al. [ | 92 | 93 | 90 | 91 |
| Rady and Anwar [ | 95.84 | 84.06 | 93.55 | 88.55 |
| Elhoseny et al. [ | 85 | 88 | 88 | |
| Ogunleye and Wang [ | 96.8 | 87 | 93 | |
| Khan et al. [ | 95.75 | 96.2 | 95.8 | 95.8 |
| Chittora et al. [ | 90.73 | 83.34 | 93 | 88.05 |
| Jongbo et al. [ | 89.2 | 97.72 | 97.8 | |
| Harimoorthy and Thangavelu [ | 66.3 | 65.9 | 65.9 | |
| Proposed model (random forest) | 100 | 100 | 100 | 100 |
| Proposed model (decision tree) | 99.34 | 98.68 | 100 | 99.17 |
| Proposed model (KNN) | 98.33 | 100 | 97.37 | 98.67 |
| Proposed model (SVM) | 97.3 | 94.74 | 92 | 96.67 |
Figure 5Comparison of system's performance on diagnostic accuracy in the two datasets.