| Literature DB >> 35888188 |
Ashir Javeed1,2, Ana Luiza Dallora2, Johan Sanmartin Berglund2, Peter Anderberg2,3.
Abstract
Dementia is a neurological condition that primarily affects older adults and there is still no cure or therapy available to cure it. The symptoms of dementia can appear as early as 10 years before the beginning of actual diagnosed dementia. Hence, machine learning (ML) researchers have presented several methods for early detection of dementia based on symptoms. However, these techniques suffer from two major flaws. The first issue is the bias of ML models caused by imbalanced classes in the dataset. Past research did not address this issue well and did not take preventative precautions. Different ML models were developed to illustrate this bias. To alleviate the problem of bias, we deployed a synthetic minority oversampling technique (SMOTE) to balance the training process of the proposed ML model. The second issue is the poor classification accuracy of ML models, which leads to a limited clinical significance. To improve dementia prediction accuracy, we proposed an intelligent learning system that is a hybrid of an autoencoder and adaptive boost model. The autoencoder is used to extract relevant features from the feature space and the Adaboost model is deployed for the classification of dementia by using an extracted subset of features. The hyperparameters of the Adaboost model are fine-tuned using a grid search algorithm. Experimental findings reveal that the suggested learning system outperforms eleven similar systems which were proposed in the literature. Furthermore, it was also observed that the proposed learning system improves the strength of the conventional Adaboost model by 9.8% and reduces its time complexity. Lastly, the proposed learning system achieved classification accuracy of 90.23%, sensitivity of 98.00% and specificity of 96.65%.Entities:
Keywords: bachine learning; balanced accuracy; dementia prediction; oversampling
Year: 2022 PMID: 35888188 PMCID: PMC9318926 DOI: 10.3390/life12071097
Source DB: PubMed Journal: Life (Basel) ISSN: 2075-1729
Demographic overview of the samples in the dataset.
| Age_Group | Male | Female | Subj. | Diagnosis. |
|---|---|---|---|---|
| 60 | 82 | 82 | 164 | 02 |
| 66 | 75 | 95 | 170 | 06 |
| 72 | 50 | 74 | 124 | 10 |
| 78 | 41 | 50 | 91 | 17 |
| 81 | 35 | 46 | 81 | 19 |
| 84 | 26 | 42 | 68 | 22 |
| 87 | 04 | 19 | 23 | 14 |
| 90+ | 00 | 05 | 05 | 01 |
|
| 313 | 413 | 726 | 91 |
Overview of selected variable.
| Variable_Category | Variables_Names | Sum |
|---|---|---|
| Demographic | Age, Gender | 02 |
| Social | Education, Religious Belief, Religious Activities, Voluntary Association, Social Network, Support Network, Loneliness | 07 |
| Lifestyle | Light Exercise, Alcohol Consumption, Alcohol Quantity, Work Status, Physical-Workload, Present Smoker, Past Smoker, Number of Cigarettes a Day, Social Activities, Physically Demanding Activities, Leisure Activities | 11 |
| Medical History | Number of Medications, Family History of Importance, Myocardial Infarction, Arrhythmia, Heart Failure, Stroke, TIA/RIND, Diabetes Type 1, Diabetes Type 2, Thyroid Disease, Cancer, Epilepsy, Atrial Fibrillation, Cardiovascular Ischemia, Parkinson’s Disease, Depression, Other Psychiatric Diseases, Snoring, Sleep Apnea, Hip Fracture, Head Trauma, Developmental Disabilities, High Blood Pressure | 22 |
| Biochemical Test | Hemoglobin Analysis, C-Reactive Protein Analysis | 02 |
| Physical Examination | Body Mass Index (BMI), Pain in the last 4 weeks, Heart Rate Sitting, Heart Rate Lying, Blood Pressure on the Right Arm, Hand Strength in Right Arm in a 10s Interval, Hand Strength in Left Arm in a 10s Interval, Feeling of Safety from Rising from a Chair, Assessment of Rising from a Chair, Single-Leg Standing with Right Leg, Single Leg Standing with Left Leg, Dental Prosthesis, Number of Teeth | 13 |
| Psychological | Memory Loss, Memory Decline, Memory Decline 2, Abstract Thinking, Personality Change, Sense of Identity | 06 |
| Health Instruments | Sense of Coherence [ | 12 |
Figure 1Schematic overview of the proposed intelligent learning system.
Figure 2Class distribution before and After applying SMOTE.
Performance of conventional ML predictive models on imbalanced dataset, Where Acc: Accuracy on training data, Acc: Accuracy on test data, Sens: Sensitivity, Spec: Specificity, MCC: Matthews correlation coefficient.
| Model | Sens. (%) | Spec. (%) | F1_Score |
| ||
|---|---|---|---|---|---|---|
| NB | 82.57 | 74.10 | 22.22 | 91.10 | 74.00 | 0.1428 |
| LR | 85.32 | 71.15 | 23.53 | 90.55 | 71.00 | 0.1228 |
| RF | 89.55 | 76.50 | 15.36 | 89.40 | 77.00 | 0.2278 |
| DT | 71.45 | 66.50 | 25.93 | 91.62 | 67.00 | 0.1882 |
| kNN | 78.56 | 48.40 | 16.67 | 89.62 | 49.00 | 0.0335 |
| SVM | 86.69 | 65.60 | 31.25 | 91.09 | 66.00 | 0.1896 |
Performance of conventional ML predictive models on balanced dataset, Where Acc: Accuracy on training data, Acc: Accuracy on test data, Sens: Sensitivity, Spec: Specificity, MCC: Matthews correlation coefficient.
| Model | Sens. (%) | Spec. (%) | F1_Score |
| ||
|---|---|---|---|---|---|---|
| NB | 75.37 | 70.70 | 98.57 | 78.89 | 70.00 | 0.2287 |
| LR | 82.74 | 76.85 | 85.35 | 80.55 | 77.00 | 0.4038 |
| RF | 98.96 | 85.95 | 52.73 | 87.68 | 86.00 | 0.4264 |
| DT | 80.44 | 73.51 | 80.58 | 91.62 | 74.00 | 0.3526 |
| kNN | 78.56 | 67.49 | 75.16 | 55.62 | 67.00 | 0.2534 |
| SVM | 96.26 | 75.82 | 92.52 | 84.20 | 76.00 | 0.3596 |
Figure 3ROC of ML models for dementia prediction.
Figure 4Performance comparison of proposed model with conventional Adaboost model in term of area under the cruve.
Classification accuracy of the proposed autoencoder-SMOTE-Adaboost model with optimal hyperparameters of Adaboost on balance dataset, where N: number of estimators, l: learning rate of adaboost, F: Feature extracted, Acc: Accuracy on training data, Acc: Accuracy on test data, Sens: Sensitivity, Spec: Specificity.
| N | l | F | Sens. (%) | Spec. (%) | ||
|---|---|---|---|---|---|---|
| 400 | 0.05 | 06 | 90.44 | 88.29 | 89.85 | 82.66 |
| 100 | 0.01 | 02 | 88.48 | 89.54 | 82.63 | 91.58 |
| 100 | 0.05 | 02 | 88.48 | 89.54 | 85.63 | 78.98 |
| 100 | 0.01 | 10 | 87.12 | 90.00 | 92.14 | 83.56 |
| 300 | 0.1 | 12 | 89.54 | 90.16 | 86.32 | 91.74 |
| 400 | 0.1 | 15 | 92.41 | 87.58 | 91.05 | 86.48 |
| 300 | 0.1 | 03 | 89.32 | 89.54 | 86.00 | 90.55 |
| 100 | 0.05 | 05 | 88.48 | 90.00 | 87.82 | 95.74 |
| 400 | 0.05 | 06 | 92.10 | 90.23 | 97.86 | 98.12 |
| 200 | 0.05 | 01 | 88.76 | 89.54 | 85.00 | 81.41 |
|
|
|
|
|
|
|
|
| 50 | 0.1 | 04 | 89.48 | 90.00 | 78.36 | 88.00 |
| 50 | 0.05 | 07 | 90.13 | 86.36 | 89.05 | 95.48 |
| 200 | 0.1 | 06 | 94.08 | 86.36 | 98.00 | 90.00 |
Performance of autoencoder-based predictive models on balanced dataset, where Hyp.: hyperparameters value; F: feature extracted; Acc: accuracy on training data; Acc: accuracy on test data, Sens: sensitivity; Spec: specificity.
| Model | Hyp. | F | Sens. (%) | Spec. (%) | ||
|---|---|---|---|---|---|---|
| AEC * + NB | V = 0.82 | 14 | 87.25 | 87.22 | 95.56 | 82.37 |
| AEC + LR | C = 10 | 15 | 84.40 | 87.15 | 85.35 | 90.87 |
| AEC + RF | 10 | 100 | 86.00 | 52.73 | 83.45 | |
| AEC + DT | 18 | 86.23 | 88.18 | 80.58 | 89.68 | |
| AEC + kNN | k = 14 | 20 | 100 | 83.48 | 79.16 | 95.32 |
| AEC + SVM | C = 0.5 | 12 | 87.15 | 86.22 | 92.52 | 80.28 |
|
|
|
|
|
|
AEC *: Autoencoder.
Classification accuracies comparison with previously proposed methods for dementia prediction.
| Study (Year) | Method | Accuracy (%) | Balancing |
|---|---|---|---|
| P.C. Cho & W.H. Chen (2012) [ | PNNs | 83.00 | No |
| P.Gurevich et al. (2017) [ | SVM | 89.00 | Yes |
| D.Stamate et al. (2018) [ | Gradient Boosting | 88.00 | Yes |
| Visser et al. (2019) [ | XGBoost+ RF | 88.00 | No |
| Dallora et al. (2020) [ | DT | 74.50 | Yes |
| M.Karaglani et al. (2020) [ | RF | 84.60 | No |
| E. Ryzhikova et al. (2021) [ | ANN + SVM | 84.00 | No |
| F.A salem et al. (2021) [ | RF | 88.00 | Yes |
| F. G. Gutierrez et al. (2022) [ | GA | 84.00 | No |
| G. Mirzaei,& H. Adeli (2022) [ | MLP | 70.32 | No |
| A. Shahzad et al. (2022) [ | SVM | 71.67 | No |
|
|
|
|
|