| Literature DB >> 36052046 |
Sunday O Olatunji1, Aisha Alansari1, Heba Alkhorasani1, Meelaf Alsubaii1, Rasha Sakloua1, Reem Alzahrani1, Yasmeen Alsaleem1, Reem Alassaf1, Mehwash Farooqui1, Mohammed Imran Basheer Ahmed1, Jamal Alhiyafi2.
Abstract
Alzheimer's Disease (AD) is a silent disease that causes the brain cells to die progressively, influencing consciousness, behavior, planning ability, and language to name a few. AD increases exponentially with aging, where it doubles every 5-6 years, causing profound implications, such as swallowing difficulties and losing the ability to speak before death. According to the Ministry of Health in Saudi Arabia, AD patients will triple by 2060 to reach 14 million patients worldwide. The rapid rise of patients is caused by the silent progress of the disease, leading to late diagnosis as the symptoms will not be distinguished from normal aging affect. Moreover, with the current medical capabilities, it is impossible to confirm AD with 100% certainty via specific medical examinations. The literature review revealed that most recent publications used images to diagnose AD, which is insufficient for local hospitals with limited imaging capabilities. Other studies that used clinical and demographical data failed to achieve adequate results. Consequently, this study aims to preemptively predict AD in Saudi Arabia by employing machine learning (ML) techniques. The dataset was acquired from King Fahad Specialist Hospital (KFSH) in Dammam, Saudi Arabia, containing standard clinical tests for 152 patients. Four ML algorithms, namely, support vector machine (SVM), k-nearest neighbors (k-NN), Adaptive Boosting (AdaBoost), and eXtreme Gradient Boosting (XGBoost), were employed to preemptively diagnose the disease. The empirical results demonstrated the robustness of SVM in the pre-emptive diagnosis of AD with accuracy, precision, recall, and area under the receiver operating characteristics (AUROC) of 95.56%, 94.70%, 97.78%, and 0.97, respectively, with 13 features after applying the sequential forward feature selection technique. This model can assist the medical staff in controlling the progression of the disease at low costs.Entities:
Mesh:
Year: 2022 PMID: 36052046 PMCID: PMC9427223 DOI: 10.1155/2022/5476714
Source DB: PubMed Journal: Comput Intell Neurosci
Literature reviews summary.
| # | Author/s | Technique/s | Results | Limitations |
|---|---|---|---|---|
| [ | Janghel and Rathore | Support vector machine (SVM), linear discriminate, and K-means clustering | SVM, linear discriminate, and the K means clustering achieved an accuracy of 100% using FMRI images, while k-NN achieved the highest accuracy of 76.56% using PET images | The datasets utilized are imbalance |
| [ | Gao et al. | Novel 3DMgNet architecture | The proposed architecture achieved an accuracy of 92.133%, sensitivity of 88.42%, specificity of 95.00%, and AUC of 94.443 | The model's sensitivity is considered low |
| [ | Memon et al. | Logistic regression (LR), decision tree (DT), and support vector machine (SVM) | LR achieved an accuracy, specificity, and sensitivity of 98.12%, 95%, and 90%, respectively | The sensitivity is considered low |
| [ | Dinu and Manju | Random forest (RF) and tree bagger (TB) | RF achieved an accuracy of 98.42%, sensitivity of 0.85, and specificity of 0.95 | The sensitivity is considered low |
| [ | Salehi et al. | Convolutional neural network (CNN) | CNN achieved an accuracy of 99% | The dataset is imbalance |
| [ | Eke et al. | Support vector machine (SVM) | SVM achieved a sensitivity higher than 80%, specificity above 70%, and at least 0.80 for AUC | The accuracy achieved needs improvement |
| [ | Neelaveni and Devasana | Support vector machine (SVM) and decision tree (DT) | SVM achieved an accuracy of 85%. | The accuracy achieved needs improvement |
| [ | Leong and Abdullah | Deep neural network (DNN), random forest (RF), gradient boosting machines (GBM), support vector machine (SVM), and logistic regression (LR) | RF achieved an accuracy, sensitivity, specificity, and AUC of 94.39%, 88.24%, 100.00%, and 94.44%, respectively | The sensitivity is considered low. |
| [ | Wang et al. | Convolutional neural network (CNN) | CNN achieved an accuracy, sensitivity, and specificity of 97.65%, 97.96%, and 97.35%, respectively | The model is built using 8-layers which increases the required computational cost |
| [ | Liu et al. | Linear SVC, Logistic Regression CV, decision tree (DT), bagging, and multiple layer perceptron (MLP) | Logistic Regression CV achieved precision, recall, F1 score, and accuracy of 87.5%, 91.3%, 89.4%, and 86.1%, respectively | The accuracy needs improvement |
| [ | Almubark et al. | Random forest (RF), gradient boosting (GB), support vector machine (SVM), and adaptive boosting (AdaBoost) | SVM achieved an accuracy, specificity, and sensitivity of 91.08%, 94%, and 85.71%, respectively | The sensitivity is considered low |
| [ | Revathi et al. | Support vector machine (SVM), random forest (RF), and multinomial logistic regression (LR) | SVM, RF, and multinomial LR achieved accuracy rates of 86%, 71%, and 89%, respectively | The accuracy is considered low |
| [ | Goenka and Tiwari | 3D- convolutional neural network (CNN) | The model achieved a testing accuracy of 100% with a loss of 12.74%, validation accuracy of 98.08% with a loss of 14.59%, training accuracy of 100% with a loss of 9.5%, and precision, recall, and F1 score of 100% | The 3D convolution layer increases the required computational cost |
Figure 1The study framework.
Features description.
| Feature | Description |
|---|---|
| Sex | Male or female |
| Age | Age in years |
| Temperature | The body temperature in degrees Celsius (C) |
| White blood cells (WBC) | The WBC count in a body |
| Red blood cells (RBC) | The RBC count in a body |
| Pulse ox | The measurement of oxygen in the blood (oxygen saturation) |
| Platelet | The platelet count in a body |
| MPV | The measurement of platelet size |
| RDW | The measurement of red cells size variance |
| MCH | The hemoglobin average in a single RBC |
| MCHC | The average quantity of hemoglobin in a single RBC per unit volume, considering the cell volume |
| MCV | The average size of RBCs |
| Hematocrit | The ratio of the volume of RBCs in the overall amount of blood |
| Hemoglobin | The measurement of hemoglobin level in the blood |
| Pulse | The number of heart beats per minute is also called the heart rate |
| Respiratory rate | The measurement of breathing rate per minute |
| BP-systolic | The highest blood pressure during ventricular contraction |
| BP-diastolic | The lowest pressure that is measured immediately before the subsequent contraction |
The statistical analysis of numerical attributes.
| Features | Mean | STD | Min | 25% | 50% | 75% | Max | Missing values |
|---|---|---|---|---|---|---|---|---|
| Age | 55.79 | 20.58 | 11.00 | 37.00 | 59.00 | 74.00 | 92.00 | 0.00 |
| Pulse | 80.35 | 12.45 | 49.00 | 72.00 | 78.50 | 88.75 | 117.00 | 18.00 |
| BP - systolic | 121.15 | 18.66 | 51.00 | 110.00 | 120.00 | 135.00 | 172.00 | 19.00 |
| Temperature | 36.70 | 0.33 | 35.40 | 36.50 | 36.70 | 36.80 | 38.30 | 19.00 |
| Respiratory rate | 20.15 | 2.68 | 0.00 | 20.00 | 20.00 | 20.00 | 35.00 | 19.00 |
| BP-diastolic | 72.90 | 11.24 | 27.00 | 66.00 | 73.00 | 79.00 | 110.00 | 19.00 |
| WBC | 6.92 | 2.81 | 0.70 | 4.90 | 6.60 | 8.10 | 16.10 | 37.00 |
| RBC | 4.39 | 0.75 | 2.20 | 4.13 | 4.46 | 4.84 | 5.87 | 37.00 |
| Hemoglobin | 12.23 | 2.18 | 5.00 | 11.45 | 12.50 | 13.70 | 17.40 | 37.00 |
| Hematocrit | 36.83 | 6.49 | 14.20 | 34.00 | 37.60 | 40.55 | 50.80 | 37.00 |
| MCV | 84.26 | 7.70 | 59.10 | 80.05 | 85.00 | 89.50 | 99.70 | 37.00 |
| MCH | 27.89 | 2.81 | 17.50 | 26.55 | 28.10 | 30.00 | 33.80 | 37.00 |
| MCHC | 33.17 | 1.27 | 28.60 | 32.50 | 33.40 | 34.00 | 35.40 | 37.00 |
| RDW | 14.91 | 2.33 | 11.70 | 13.50 | 14.30 | 15.55 | 24.00 | 37.00 |
| MPV | 8.80 | 1.13 | 6.00 | 8.00 | 8.80 | 9.40 | 12.90 | 38.00 |
| Platelet | 238.65 | 82.46 | 5.00 | 186.25 | 243.00 | 289.75 | 517.00 | 38.00 |
| Pulse ox | 98.24 | 3.78 | 65.00 | 98.00 | 99.00 | 100.00 | 100.00 | 40.00 |
The optimal hyperparameters of each classifier with the original and oversampled data.
| Classifier | Hyperparameter | Without oversampling | With oversampling |
|---|---|---|---|
| SVM | Cost | 5 | 4 |
| Gamma | 1 | 1 | |
| Kernel | Linear | RBF | |
|
| |||
| K-NN | N_neighbors | 5 | 5 |
| Metric | Minkowski | Minkowski | |
|
| |||
| Adaboost | N_estimators | 100 | 300 |
| Learning rate | 0.1 | 0.1 | |
|
| |||
| XGBoost | N_estimators | 100 | 500 |
| Booster | Gbtree | Gbtree | |
| Learning_rate | 0.1 | 0.1 | |
Classifiers accuracy, precision, and recall using the optimal hyperparameters.
| Classifier | Dataset | Accuracy (%) | Precision (%) | Recall (%) |
|---|---|---|---|---|
| SVM | Original | 92.21 | 92.44 | 92.14 |
| Oversampled | 93.33 | 93.32 | 95.56 | |
|
| ||||
| k-NN | Original | 87.63 | 86.54 | 84.29 |
| Oversampled | 88.24 | 85.43 | 94.44 | |
|
| ||||
| AdaBoost | Original | 90.92 | 90.95 | 89.05 |
| Oversampled | 91.16 | 92.32 | 92.22 | |
|
| ||||
| XGBoost | Original | 91.63 | 92.07 | 90.95 |
| Oversampled | 91.60 | 91.42 | 93.33 | |
The best feature subset obtained for each classifier.
| Classifier | Number of features | Features selected | Accuracy (%) |
|---|---|---|---|
| SVM | 13 | {Sex, age, pulse, respiratory rate, BP–diastolic, white blood cells, red blood cells, hemoglobin, hematocrit, MCV, MCH, RDW, MPV} | 95.56 |
| K-NN | 6 | {Sex, age, respiratory rate, hematocrit, MCH, RDW} | 95.52 |
| Adaboost | 10 | {Sex, age, BP–systolic, temperature, BP–diastolic, hematocrit, MCH, RDW, platelet, pulse ox} | 95.00 |
| XGBoost | 6 | {Sex, age, respiratory rate, white blood cells, MCV, MCHC} | 94.38 |
The performance of the final selected models.
| Classifier | Accuracy (%) | Precision (%) | Recall (%) |
|---|---|---|---|
| SVM | 95.56 | 94.70 | 97.78 |
| k-NN | 95.53 | 95.81 | 96.67 |
| AdaBoost | 95.00 | 96.00 | 94.44 |
| XGBoost | 94.38 | 94.18 | 95.56 |
Figure 2(a) SVM confusion matrix, (b) k-NN confusion matrix, (c) AdaBoost confusion matrix, (d) XGBoost confusion matrix.
Figure 3(a) SVM ROC-AUC curve, (b) K-NN ROC-AUC curve, (c) AdaBoost ROC-AUC curve, (d) XGBoost ROC-AUC curve.