| Literature DB >> 34372817 |
Hamida Ilyas1,2, Sajid Ali1,2,3, Mahvish Ponum4, Osman Hasan1, Muhammad Tahir Mahmood1,5, Mehwish Iftikhar1,6, Mubasher Hussain Malik1,2.
Abstract
BACKGROUND: Chronic Kidney Disease (CKD), i.e., gradual decrease in the renal function spanning over a duration of several months to years without any major symptoms, is a life-threatening disease. It progresses in six stages according to the severity level. It is categorized into various stages based on the Glomerular Filtration Rate (GFR), which in turn utilizes several attributes, like age, sex, race and Serum Creatinine. Among multiple available models for estimating GFR value, Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI), which is a linear model, has been found to be quite efficient because it allows detecting all CKD stages.Entities:
Keywords: CKD; Decision tree; GFR; J48; Machine learning; Random Forest
Mesh:
Year: 2021 PMID: 34372817 PMCID: PMC8351137 DOI: 10.1186/s12882-021-02474-z
Source DB: PubMed Journal: BMC Nephrol ISSN: 1471-2369 Impact factor: 2.388
CKD Stages According to GFR Measurement Values
| Stage | GFR | Description |
|---|---|---|
| 1 | 90–100 mL/min | Normal kidney function or structural abnormalities |
| 2 | 60–89 mL/min | Mildly reduced kidney function |
| 3A | 45–59 mL/min | Moderately reduced kidney function |
| 3B | 30–44 mL/min | Moderately reduced kidney function |
| 4 | 15–29 mL/min | Severely reduced kidney function |
| 5 | < 15 mL/min or dialysis | End stage kidney failure |
Fig. 1Block Diagram of Proposed Method Made in MS Visio 2013
Variable Description Used in Analysis
| Attribute Symbols and Description | Type | Class |
|---|---|---|
| age (Age) | Numerical | Predictor |
| bp (Blood Pressure) | Numerical | Predictor |
| sg (Specific Gravity) | Nominal | Predictor |
| al (Albumin) | Nominal | Predictor |
| su (Sugar) | Nominal | Predictor |
| rbc (Red Blood Cells) | Nominal | Predictor |
| pc (pus Cell) | Nominal | Predictor |
| pcc (Pus Cell Clumps) | Nominal | Predictor |
| rc (Race) | Nominal | Predictor |
| bgr (Blood Glucose Random) | Numerical | Predictor |
| bu (Blood Urea) | Numerical | Predictor |
| sc (Serum Creatinine) | Numerical | Predictor |
| sod (Sodium) | Numerical | Predictor |
| pot (Potassium) | Numerical | Predictor |
| hemo (Hemoglobin) | Numerical | Predictor |
| pcv (Packed Cell Volume) | Numerical | Predictor |
| sex (Sex) | Nominal | Predictor |
| rc (Red Blood Cell Count) | Numerical | Predictor |
| htn (Hypertension) | Nominal | Predictor |
| dm (Diabetes Mellitus) | Nominal | Predictor |
| cad (Coronary Artery Disease | Nominal | Predictor |
| appet (Appetite) | Nominal | Predictor |
| pe (Pedal Edama) | Nominal | Predictor |
| ane (Anemia) | Nominal | Predictor |
| class (Class) | Nominal | Target |
Fig. 2A Generalized Model of Random Forest
Fig. 315-Fold Cross Validation
Confusion Matrix for Multi-Class Classification
| True Class | ||||
|---|---|---|---|---|
| Predicted Class | A | B | C | |
| A | TPA | EBA | ECA | |
| B | EAB | TPB | ECB | |
| C | EAC | EBC | TPC | |
Confusion Matrix for J48
| a | b | c | d | e | f | ||
|---|---|---|---|---|---|---|---|
Confusion Matrix for Random Forest
| a | b | c | d | e | f | ||
|---|---|---|---|---|---|---|---|
| 3 | 2 | 0 | 0 | 5 | 5 | ||
| 3 | 9 | 0 | 0 | 0 | 3 | ||
| 0 | 0 | 23 | 5 | 2 | 11 | ||
| 0 | 0 | 4 | 42 | 0 | 12 | ||
| 0 | 0 | 12 | 0 | 11 | 7 | ||
| 1 | 1 | 2 | 6 | 1 | 75 | ||
Summary of algorithms classification outputs for classifying the Chronic Kidney Disease patients with stage 1
| J48 | Random Forest | |
|---|---|---|
| Total instances | 400 | 400 |
| True Positive (TP) | 9 | 3 |
| True Negative (TN) | 376 | 379 |
| False Positive (FP) | 8 | 14 |
| False Negative (FN) | 7 | 4 |
| Accuracy | 96% | 96% |
| Sensitivity | 56% | 43% |
| Specificity | 98% | 96% |
| Precision | 0.56 | 0.429 |
| Recall | 0.52 | 0.176 |
| F-Measure | 0.55 | 0.250 |
| ROC Area | 0.86 | 0.947 |
Summary of algorithms classification outputs for classifying the Chronic Kidney Disease patients with stage 2
| J48 | Random Forest | |
|---|---|---|
| Total Instances | 400 | 400 |
| True Positive (TP) | 21 | 11 |
| True Negative (TN) | 362 | 362 |
| False Positive (FP) | 9 | 19 |
| False Negative (FN) | 8 | 8 |
| Accuracy | 96% | 93% |
| Sensitivity | 72% | 58% |
| Specificity | 98% | 95% |
| Precision | 0.72 | 0.579 |
| Recall | 0.70 | 0.367 |
| F-Measure | 0.71 | 0.449 |
| ROC Area | 0.93 | 0.958 |
Summary of algorithms classification outputs for classifying the Chronic Kidney Disease patients with stage 3A
| J48 | Random Forest | |
|---|---|---|
| Total instances | 400 | 400 |
| True Positive (TP) | 12 | 9 |
| True Negative (TN) | 381 | 381 |
| False Positive (FP) | 4 | 7 |
| False Negative (FN) | 3 | 3 |
| Accuracy | 98% | 98% |
| Sensitivity | 80% | 75% |
| Specificity | 99% | 98% |
| Precision | 0.80 | 0.75 |
| Recall | 0.75 | 0.56 |
| F-Measure | 0.77 | 0.64 |
| ROC Area | 0.92 | 0.99 |
Summary of algorithms classification outputs for classifying the Chronic Kidney Disease patients with stage 3B.
| J48 | Random Forest | |
|---|---|---|
| Total instances | 400 | 400 |
| True Positive (TP) | 50 | 42 |
| True Negative (TN) | 327 | 331 |
| False Positive (FP) | 8 | 16 |
| False Negative (FN) | 15 | 11 |
| Accuracy | 94% | 93% |
| Sensitivity | 77% | 79% |
| Specificity | 98% | 95% |
| Precision | 0.78 | 0.792 |
| Recall | 0.86 | 0.724 |
| F-Measure | 0.81 | 0.757 |
| ROC Area | 0.96 | 0.973 |
Summary of algorithms classification outputs for classifying the Chronic Kidney Disease patients with stage 4
| J48 | Random Forest | |
|---|---|---|
| Total instances | 400 | 400 |
| True Positive (TP) | 72 | 75 |
| True Negative (TN) | 309 | 274 |
| False Positive (FP) | 16 | 13 |
| False Negative (FN) | 3 | 38 |
| Accuracy | 95% | 87% |
| Sensitivity | 96% | 66% |
| Specificity | 95% | 95% |
| Precision | 0.96 | 0.664 |
| Recall | 0.82 | 0.852 |
| F-Measure | 0.88 | 0.746 |
| ROC Area | 0.95 | 0.938 |
Summary of algorithms classification outputs for classifying the Chronic Kidney Disease patients with stage 5
| J48 | Random Forest | |
|---|---|---|
| Total instances | 400 | 400 |
| True Positive (TP) | 28 | 23 |
| True Negative (TN) | 343 | 341 |
| False Positive (FP) | 13 | 18 |
| False Negative (FN) | 16 | 18 |
| Accuracy | 93% | 91% |
| Sensitivity | 64% | 56% |
| Specificity | 96% | 95% |
| Precision | 0.64 | 0.561 |
| Recall | 0.68 | 0.561 |
| F-Measure | 0.66 | 0.561 |
| ROC Area | 0.91 | 0.914 |
Overall Accuracy and Execution Time of Algorithms
| J48 | Random Forest | |
|---|---|---|
| Overall accuracy | 85.5 | 78.25 |
| Total execution time (seconds) | 0.03 | 0.28 |
Fig. 4Comparison on the base of overall accuracy
Detailed Information of Various Studies
| Machine Learning Technique | Year | Author | Resources of Data Set | Disease | Tool | Accuracy | Execution Time in seconds |
|---|---|---|---|---|---|---|---|
| Radial Basis Function | 2016 | S. Ramya et al. | Medical reports of patients collected from different laboratories | Chronic Kidney Disease | R | 85.3%. | N/A |
| Logistic regression | 2019 | Jing Xiao | Medical record of patients in Shanghai Huadong Hospital | Chronic Kidney Disease | online tool | 82% | N/A |
| Probabilistic Neural Networks (PNN) | 2019 | El-Houssainy A. Radya, Ayman S. Anwar | University of California Irvine (UCI) Machine Learning Repository | Chronic Kidney Disease | DTREG Predictive Modeling System | 96.7% | 12 |
Fig. 5Comparison of studies on the base of overall accuracy