| Literature DB >> 35206985 |
Ramesh Chandra Poonia1, Mukesh Kumar Gupta2, Ibrahim Abunadi3, Amani Abdulrahman Albraikan4, Fahd N Al-Wesabi5, Manar Ahmed Hamza6, Tulasi B1.
Abstract
Kidney disease is a major public health concern that has only recently emerged. Toxins are removed from the body by the kidneys through urine. In the early stages of the condition, the patient has no problems, but recovery is difficult in the later stages. Doctors must be able to recognize this condition early in order to save the lives of their patients. To detect this illness early on, researchers have used a variety of methods. Prediction analysis based on machine learning has been shown to be more accurate than other methodologies. This research can help us to better understand global disparities in kidney disease, as well as what we can do to address them and coordinate our efforts to achieve global kidney health equity. This study provides an excellent feature-based prediction model for detecting kidney disease. Various machine learning algorithms, including k-nearest neighbors algorithm (KNN), artificial neural networks (ANN), support vector machines (SVM), naive bayes (NB), and others, as well as Re-cursive Feature Elimination (RFE) and Chi-Square test feature-selection techniques, were used to build and analyze various prediction models on a publicly available dataset of healthy and kidney disease patients. The studies found that a logistic regression-based prediction model with optimal features chosen using the Chi-Square technique had the highest accuracy of 98.75 percent. White Blood Cell Count (Wbcc), Blood Glucose Random (bgr), Blood Urea (Bu), Serum Creatinine (Sc), Packed Cell Volume (Pcv), Albumin (Al), Hemoglobin (Hemo), Age, Sugar (Su), Hypertension (Htn), Diabetes Mellitus (Dm), and Blood Pressure (Bp) are examples of these traits.Entities:
Keywords: image matching; machine learning algorithms; medical information systems; morphological operations; usability score artificial intelligence
Year: 2022 PMID: 35206985 PMCID: PMC8871759 DOI: 10.3390/healthcare10020371
Source DB: PubMed Journal: Healthcare (Basel) ISSN: 2227-9032
Summary of related work.
| Sr. No. | Author | Year | Machine Learning Algorithms and Accuracy (%) |
|---|---|---|---|
| 1. | A. J. Aljaaf et al. [ | 2018 | Naïve Bayes: 83.4%, J48: 86.23% |
| 2. | N. Borisagar, D. Barad, and P. Raval [ | 2017 | ANN: 99.5 |
| 3. | B. Boukenze, A. Haqiq, and H. Mousannif [ | 2018 | SVM: 63.5%, LR: 64.0, C4.5: 63%, KNN: 55.15% |
| 4. | H. Polat, H. D. Mehr and A. Cetin [ | 2019 | SVM: 97.5% |
| 5. | P. Panwong and N. Iam-On [ | 2016 | KNN: 86.32%, naïve Bayes: 60.46%, ANN: 83.24%, RF: 86.60%, J48: 79.52% |
| 6. | Makino et al. [ | 2019 | KNN, Naïve Bayes + LDA + random subspace + Tree-based decision: 94% |
| 7. | Ren et al. [ | 2019 | SVM + ReliefF: 92.7% |
| 8. | Ma F. et al. [ | 2019 | Fisher discriminatory analysis and SVM: 96.7% |
| 9. | Almansour and colleagues [ | 2020 | KNN and SVM: 99% |
| 10. | J. Qin and colleagues [ | 2019 | SVM, KNN, and naïve Bayes decision tree: 99.7% |
| 11. | Z. Segal and colleagues [ | 2019 | SVM, KNN, and decision tree: 99.1% |
| 12. | Khamparia et al. [ | 2020 | Logistic regression, KNN, SVM, random forest, naive Bayes, and ANN: 99.7% |
| 13. | Ebiaredoh-Mienye Sarah A. et al. [ | 2017 | SVM 98.5% |
| 14. | Zhiyong Pang et al. [ | 2020 | Softmax regression 98% |
| 15. | Tabassum, Mamatha et al. [ | 2017 | DT: 85%, RF: 85% |
| 16. | K. R. A. Padmanaban and G. Parthiban [ | 2016 | DT: 91%, naïve Bayes: 86% |
| 17. | Sahil Sharma, Vinod Sharma, and Atul Sharma [ | 2018 | ANN: 80.4%, RF: 78.6% |
| 18. | Pratibha Devishri [ | 2019 | ANN: 86.40%, SVM: 77.12% |
| 19. | Sujata Drall, G. Singh Drall, S. Singh, Bharat Naib [ | 2018 | Naïve Bayes: 94.8%, KNN: 93.75%, SVM: 96.55% |
LR: Logistic Regression; KNN: k-Nearest Neighbors; SVM: Support Vector Machines; CART: Classification and Regression Trees; ANN: Artificial Neural Networks; LDA: Linear Discriminant Analysis; DT: Decision Tree; RF: Random Forest.
Figure 1Detection of chronic kidney disease using recursive feature elimination and classification algorithms. CKD: Chronic Kidney Disease; SVM: Support Vector Machine; KNN: K-Nearest Neighbors.
Details of the various kidney disease-related attributes.
| Name | Feature | Description |
|---|---|---|
| Age | Age | Patient’s age |
| Blood pressure | Bp | Blood pressure of the patient |
| Sugar level | Su | Sugar level of the patient |
| Bacteria | Ba | Presence of bacteria in the blood |
| Ratio of the density of urine | Sg | Ratio of the density of urine |
| Albumin level in the blood | Al | Ratio of the albumin level in the blood |
| Pedal edema | Pe | Does the patient have pedal edema or not |
| Red blood cells | Rbc | Patients’ red blood cell counts |
| Patient class | Class | Does the patient have kidney disease or not |
| Pus cell clumps | Pcc | Presence of pus cell clumps in the blood |
| Anemia | Ane | Does the patient have anemia or not |
| Red blood cell count | Rc | Red blood cell count of the patient |
| Hypertension | Htn | Does the patient have hypertension on not |
| Serum creatinine | Sc | Serum creatinine level in the blood |
| Diabetes mellitus | Dm | Does the patient have diabetes or not |
| Blood urea | Bu | Blood urea level of the patient |
| Blood glucose | Bgr | Blood glucose random count |
| Sodium | Sod | Sodium level in the blood |
| White blood cell count | Wc | White blood cell count of the patient |
| Hemoglobin | Hemo | Hemoglobin level in the blood |
| Packed cell volume | Pcv | Packed cell volume in the blood |
| Pus cell | Pc | pus cell count of patient |
| Potassium | Pot | Potassium level in the blood |
| Appetite | Appet | Patient’s appetite |
| Coronary artery disease | Cad | Does the patient have coronary artery disease or not |
Figure 2Flow chart of the proposed model. LR: Logistic Regression; NB: Naïve Bayes; SVM: Support Vector Machine; KNN: Nearest Neighbors; ANN: Artificial Neural Network; RFE: Recursive Feature Elimination.
Results of the prediction models with all features.
| Machine Learning | Precision | Recall | F-Measure | Accuracy |
|---|---|---|---|---|
| Algorithms | (%) | (%) | (%) | (%) |
| Logistic regression | 98 | 97 | 98 | 97.5 |
| Naïve Bayes | 95 | 95 | 95 | 95 |
| Support Vector Machines | 98 | 97 | 98 | 97.5 |
| k-Nearest Neighbors | 76 | 66 | 66 | 66.25 |
| Artificial Neural Networks | 42 | 65 | 51 | 65 |
Figure 3Results of the prediction models with all features. SVM: Support Vector Machine; KNN: K-Nearest Neighbors; ANN: Artificial Neural Network.
Results of the LR model with RFE feature selection technique.
| Performance Measure | Basic Logistic | Logistic Regression with RFE |
|---|---|---|
| Precision (%) | 98 | 92 |
| Recall (%) | 97 | 94 |
| F-Measure (%) | 98 | 93 |
| Accuracy (%) | 97.5 | 91.25 |
RFE: Recursive Feature Selection.
Figure 4Comparison of LR Models with and without RFE feature selection. RFE: Recursive Feature Selection.
Results of the SVM model with the RFE feature selection technique.
| Performance Measure | Basic SVM | SVM with RFE Feature Selection |
|---|---|---|
| Precision (%) | 98 | 98 |
| Recall (%) | 97 | 96 |
| F-Measure (%) | 98 | 97 |
| Accuracy (%) | 97.5 | 96.25 |
SVM: Support Vector Machine; RFE: Recursive Feature Elimination.
Features and their scores by the Chi-Square test.
| Features | Score |
|---|---|
| Wbcc | 12,733.73 |
| Bgr | 2428.328 |
| Bu | 2336.005 |
| Sc | 354.4105 |
| Pcv | 324.7065 |
| Al | 228.1047 |
| Hemo | 125.0657 |
| Age | 113.4602 |
| Su | 100.95 |
| Htn | 86.29181 |
| Dm | 82.2 |
| Bp | 80.02432 |
| Pe | 45.10802 |
| Ane | 35.6116 |
| Sod | 28.7933 |
| Pcc | 24.07546 |
| Rbcc | 20.848 |
| Cad | 19.93604 |
| Pc | 14.16913 |
| Ba | 12.58705 |
| Appet | 12.58703 |
| Rbc | 9.416036 |
| Pot | 4.071145 |
| Sg | 0.005035 |
Wbcc: White Blood Cell Count; brg: Blood Glucose Random; Bu: Blood Urea; Sc: Serum Creatinine; Pcv: Packed Cell Volume; Al: Albumin; Hemo: Hemoglobin; Su: Sugar; Htn: Hypertension; Dm: Diabetes Mellitus; Bp: Blood Pressure; Pe: Pedal edema; Ane: Anemia; Sod: Sodium; Pcc: Pus cell clumps; Rbcc: Red blood cells count; Cad: Coronary artery disease; Pc: Pus cell; Ba: Bacteria; Appet: Appetite; Rbc: Red blood cells; Pot: Potassium; Sg: Ratio of the density of urine.
Results of the LR prediction model with Chi-Square feature selection.
| Performance Measure | Number of Features (K) | Best |
|---|---|---|
| K = 5 5 < K < 15 | ||
| Precision (%) | 96 100 | 100 |
| Recall (%) | 92 98 | 96 |
| F-Measure (%) | 94 99 | 98 |
| Accuracy (%) | 92.5 98.75 | 97.5 |
Figure 5Results of the LR prediction model with Chi-Square feature selection.
Comparative analysis of existing models on a dataset of 400 patients each with 24 attributes [2,27].
| Method | Accuracy | Recall | Precision | F-Measure |
|---|---|---|---|---|
| Logistic regression [ | 91.8 | 1 | 0.98 | 0.98 |
| KNN [ | 92.7 | 0.88 | 0.98 | 0.92 |
| Naïve Bayes [ | 95.21% | 0.92 | 1.00 | 0.94 |
| SVM [ | 92.32 | 0.87 | 0.96 | 0.93 |
| Decision tree [ | 93.45 | 0.95 | 1.00 | 0.96 |
| Proposed method [ | 97.54 | 0.99 | 1.00 | 1.0 |
KNN: k-nearest neighbors algorithm; SVM: support vector machines.
Prediction models with and without various feature-selection techniques.
| Prediction Model | Accuracy (%) |
|---|---|
| Basic LR model | 91.25 |
| LR model + RFE feature selection | 97.5 |
| LR model + Chi-Square feature selection (K = 5) | 92.5 |
| LR model + Chi-Square feature selection (5 < K < 14) | 98.75 |
| LR model + Chi-Square feature selection (K > 14) | 97.5 |
Figure 6Results of the models with and without feature selection. LR: Logistic Regression; REF: Recursive Feature Elimination.