| Literature DB >> 35444165 |
Bojan Bogdanovic1, Tome Eftimov2, Monika Simjanoska3,4.
Abstract
Alzheimer's disease is still a field of research with lots of open questions. The complexity of the disease prevents the early diagnosis before visible symptoms regarding the individual's cognitive capabilities occur. This research presents an in-depth analysis of a huge data set encompassing medical, cognitive and lifestyle's measurements from more than 12,000 individuals. Several hypothesis were established whose validity has been questioned considering the obtained results. The importance of appropriate experimental design is highly stressed in the research. Thus, a sequence of methods for handling missing data, redundancy, data imbalance, and correlation analysis have been applied for appropriate preprocessing of the data set, and consequently XGBoost model has been trained and evaluated with special attention to the hyperparameters tuning. The model was explained by using the Shapley values produced by the SHAP method. XGBoost produced a f1-score of 0.84 and as such is considered to be highly competitive among those published in the literature. This achievement, however, was not the main contribution of this paper. This research's goal was to perform global and local interpretability of the intelligent model and derive valuable conclusions over the established hypothesis. Those methods led to a single scheme which presents either positive, or, negative influence of the values of each of the features whose importance has been confirmed by means of Shapley values. This scheme might be considered as additional source of knowledge for the physicians and other experts whose concern is the exact diagnosis of early stage of Alzheimer's disease. The conclusions derived from the intelligent model's data-driven interpretability confronted all the established hypotheses. This research clearly showed the importance of explainable Machine learning approach that opens the black box and clearly unveils the relationships among the features and the diagnoses.Entities:
Mesh:
Year: 2022 PMID: 35444165 PMCID: PMC9021280 DOI: 10.1038/s41598-022-10202-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Methodology workflow.
Figure 2Data set summary.
Unique PTRACCAT values (only subjects with non missing data are considered).
| Race | Number of subjects |
|---|---|
| White | 1046 |
| Black | 36 |
| More than one | 16 |
| Asian | 15 |
| Unknown | 3 |
| Hawaiian/Other PI | 3 |
| Am Indian/Alaskan | 2 |
Figure 3Linear correlation heat map for the data set.
Figure 4Scatter plot between ADAS11 & MMSE.
Original versus reduced targets’ distribution.
| Diagnosis | DX_bl | Values | |
|---|---|---|---|
| Original | Reduced | ||
| LMCI | 3.0 | 4644 | 3526 |
| CN | 1.0 | 3821 | 2652 |
| EMCI | 2.0 | 2319 | 1854 |
| AD | 5.0 | 1568 | 1196 |
| SMC | 4.0 | 389 | 364 |
Figure 5Targets’ distributions after data balancing.
Figure 6Comparison of multiple imputation algorithms performance.
Figure 7Oversampling algorithms comparison.
Figure 8K-neighbours validation.
Figure 9Confusion matrix for the XGBoost model.
XGBoost classification performance for each class.
| Precision | Recall | Specificity | F1-score | |
|---|---|---|---|---|
| CN | 0.86 | 0.90 | 0.96 | 0.88 |
| EMCI | 0.81 | 0.85 | 0.95 | 0.83 |
| LMCI | 0.78 | 0.81 | 0.94 | 0.80 |
| SMC | 0.94 | 0.78 | 0.99 | 0.85 |
| AD | 0.85 | 0.87 | 0.96 | 0.86 |
Comparison of models’ performance.
| Precision | Recall | Specificity | F1-score | Accuracy | Training time | Prediction time | |
|---|---|---|---|---|---|---|---|
| XGBoost Classifier | 0.85 | 0.79 | 0.96 | 0.84 | 0.842 | 20.8 s | 103 ms |
| Random Forest Classifier | 0.78 | 0.79 | 0.95 | 0.79 | 0.787 | 5.16 s | 157 ms |
Figure 10Summary variables importance plot for XGBoost model.
Figure 11Venn diagram presenting features impact ranking for various XGBoost models based on 5-fold cross-validation.
Figure 12Variables importance plot for CN diagnosis.
Figure 13Variables importance plot for LMCI diagnosis.
Figure 14Variables importance plot for AD diagnosis.
Figure 15Dependence plots between FDG and CDRSB for different classes.
Figure 16Dependence plot between APOE4 and AGE for AD class.
Figure 17Dependence plot between RAVLT_immediate and MidTemp for LMCI class.
Figure 18Features influence on a subject with AD diagnosis to be predicted as AD.
Figure 19Features influence on a subject with AD diagnosis to be predicted as LMCI.
Figure 20Comparison between features impact on the predicted (AD) and true class (LMCI) of a subject.
Figure 21Comparison between features impact on the predicted (EMCI) and true class (SMC) of a subject.
Figure 22Comparison between features impact on the predicted (CN) and true class (SMC) of a subject.
Figure 23Comparison between features impact on the predicted (LMCI) and true class (AD) of a subject.
XGBoost global interpretability.
| Diagnosis | Demographics | Cognitive scores | MRI | PET | Genotype | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Gender | Age | Education | CDRSB | MMSE | RAVLT_immediate | WholeBrain | Hippocampus | Entorhinal | MidTemp | FDG | APOE4 | ||||||||||||||
| M | F | Young | Old | Low | High | Low | High | Low | High | Low | High | Low | High | Low | High | Low | High | Low | High | Low | High | 0 | 1 | 2 | |
| CN | + | − | − | X | X | + | + | − | − | + | − | + | + | − | − | + | + | − | − | X | + | X | + | − | − |
| EMCI | − | + | + | − | − | X | − | + | − | X | − | − | − | + | − | + | − | + | − | + | − | + | − | X | + |
| LMCI | + | − | + | X | X | X | − | − | − | X | + | − | X | X | + | − lePara> | + | X | + | − | + | − | − | X | X |
| SMC | − | + | − | − | − | + | + | − | − | X | − | + | − | + | − | + | − | − | − | − | − | + | X | X | − |
| AD | − | + | + | + | + | − | − | + | + | − | + | − | − | + | + | − | − | − | X | + | + | − | − | + | + |
Merged scheme showing the influence of each feature on each of the diagnosis.
| Diagnosis | Demographics | Cognitive scores | MRI | PET | Genotype | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Gender | Age | Education | CDRSB | MMSE | RAVLT_immediate | WholeBrain | Hippocampus | Entorhinal | MidTemp | FDG | APOE4 | ||||||||||||||
| M | F | Young | Old | Low | High | Low | High | Low | High | Low | High | Low | High | Low | High | Low | High | Low | High | Low | High | 0 | 1 | 2 | |
| CN | + | − | − | X | X | + | + | − | − | + | − | + | + | − | − | + | + | − | − | − | X | X | + | − | − |
| EMCI | − | + | + | − | − | X | − | + | − | + | − | − | − | + | − | + | − | + | − | + | − | + | − | + | X |
| LMCI | + | − | + | − | − | + | − | − | − | X | + | − | X | − | + | − | + | − | + | − | + | − | − | X | + |
| SMC | − | + | − | − | − | X | + | − | − | + | − | + | − | + | − | + | − | − | − | − | − | + | X | X | − |
| AD | − | + | + | + | + | − | − | + | + | − | + | − | − | + | + | − | − | − | X | + | + | − | − | + | + |
Comparison of our model with some of the contestants of TADPOLE Challenge[52].
| Feature selection | Features | Missing data imputation | Prediction model | BCA | Training time | Prediction time (per subject) | |
|---|---|---|---|---|---|---|---|
| Frog | Automatic | 490 | None | Gradient boosting | 0.849 | 1 h | – |
| Our XGB Model | Manual | 13 | Extra Trees Regressor | XGBoost | 0.842 | 20.8 sec | 0.03 ms |
| BenchmarkSVM | Manual | 6 | Mean of previous values | SVM | 0.764 | 20 sec | 0.001 sec |
| SMALLHEADS - NeuralNet | Automatic | 376 | Nearest neighbour | Deep NN | 0.605 | 40 min | 0.06 sec |
| Rocket | Manual | 6 | Median of diagnostic group | Linear mixed effects model | 0.519 | 5 min | 0.3 sec |
Analysis of the latest eminent literature.
| Reference | Dataset characteristics | Methodology | Results | Comment | ||||
|---|---|---|---|---|---|---|---|---|
| Size | Features | Origin | Description | Split | ||||
| [ | 1909 subjects (MCI or AD) | 44 | Coalition Against Major Diseases (CAMD) | ADAS-Cog and MMSE scores, laboratory and clinical tests, background information | Train: 75% Validate: 5% Test: 20% | Conditional Restricted Boltzmann Machine (CRBM) | Accuracy: 0.5 (differentiation between actual and synthetic patient data) R2: 0.820.01 (observed vs. predicted correlation ) | Synthetic trajectories starting for real patients and entirely synthetic patients are generated. Missing data imputation is performed. CRBM does not model correlation between cognitive scores and other variables very well. Some crucial parameters, such as levels of amyloid, are omitted from the dataset. Overall performances of the model are significant |
| [ | 36 subjects (HC: 13, AD: 23) / 32 (HC: 8, AD: 24) | 504 / 488 | VBSD / Dem@Care | Extracted spectrogram features from subjects’voices. Each recording is previously segmented. | Train: 35/31Test: 1(subjects) | Logistic Regression CV (best among others) | Accuracy: 0.833 / 0.844 Precision: 0.869 / 0.913 Recall: 0.869 / 0.875 F1-Score: 0.869 / 0.894 | It provides new and inventive approach for analyzing and predicting the disease. No data preprocessing is performed. Even after the segmentation, datasets are still small-sized. Hyperparameter tuning is not applied |
| [ | 343 sessions -150 subjects (ND: 72, D:78) | 15 | Open Access Series of Imaging Studies (OASIS) | MRI scans and other brain measurements, MMSE and CDR scores, demographic data | Random selection allocation for train, validate and test | Random Forest (best among others) | Accuracy: 0.868 Precision: 0.941 Recall: 0.8 AUC: 0.872 | Detailed data processing and examination. Complete workflow following consecutive stages from data preprocessing to model evaluation. Only first visit for each patient is taken into account (e.g. cases when a patient convert from non-demented to demented are omitted). Only simple imputing techniques are considered |
| [ | 373 sessions–150 subjects (ND:72, D:64, C:14) | 15 | Open Access Series of Imaging Studies (OASIS) | MRI scans and other brain measurements, MMSE and CDR scores, demographic data | 10-fold cross-validation | Hybrid modeling (combination of four models) | Accuracy: 0.980 Precision: 0.981 Recall: 0.980 ROC: 0.991 | Three different approaches are being analyzed: manual feature selection, automatic feature selection and hybrid modeling. Results obtained by hybrid modeling are fascinating, containing high and stabile values. Not a single stage of data preprocessing and engineering is performed |
| [ | 5000 images (Mild, Very Mild, Non, Moderate Demented) | 1700 region proposals per image | Alzheimer’s Disease Neuroimaging Initiative (ADNI) | MRI scan images | Separate datasets for train and test | SVM, R-CNN and Fast R-CNN | Training time (h): R-CNN: 84 Fast R-CNN: 8.75 | The main goal is to provide comparison between different object detection algorithms in terms of their training and predicting times. No prediction results and accuracy metrics are shown. No data preprocessing is shown |
| [ | 1721 subjects (521 NC, 864 MCI, 336 AD) | 47 | AD Neuroimaging Initiative (ADNI) | MRI and PET scans, CSF, gene expression and cognitive scores | Train: 70 %Validate: 15% Test: 15% | Recurrent Neural Network | Accuracy:AD - NC: 0.959 AD - MCI: 0.859 NC - MCI: 0.773 | Whole focus is put on the RNN algorithm its possibilities and its evaluation. Filling data between different timestamps is performed on three various approaches. No information about data preprocessing is given. No missing data imputation is performed (missing values are replaced with 0) |
| [ | 202 subjects (52 HC, 99 MCI, 51 AD) | 189 (MRI ROI: 93, PET ROI: 93, CSF: 3) | Alzheimer’s Disease Neuroimaging Initiative (ADNI) | MRI, FDG-PET and CSF biomarkers | 10-fold cross-validation | SVM (multiple kernel combination) | Accuracy: 0.932 Specificity: 0.933 Recall: 0.930 | This study represents unified way of combining data from different sources into one kernel. Only three types of data are being used. An improvement of one model’s effectiveness using precise feature selection is shown. Before usage, images are preprocessed |
| [ | Group I: CN:20, AD:20;Group II: CN:14,AD:14;Group III: CN: 57, AD: 33; Group IV: FTLD: 19 | – | Each group of subjects comes from different community or research center | MR scans | Leave-one-out technique | SVM | Group I / Group II / Group III / Group IV: Accuracy:0.950 / 0.929 / 0.811 / 0.892 Specificity: 0.950 / 0.857 / 0.930 / 0.947 Recall: 0.950 / 1.00 / 0.606 / 0.833 | Differentiation between AD and FTLD subjects is represented as they are often misidentified. Detailed image preprocessing is performed. Results are better than most of the scientific works that used MRI before. Only two diagnoses at a time are taken into classification |
| [ | 785 subjects ( 184 HC, 228 sMCI, 181 pMCI, 192 AD) | – | AD Neuroimaging Initiative (ADNI) | ROI, APOe4, cognitive scores and demographic data | 10-fold cross-validation | CNN | Accuracy: 0.925 Specificity: 0.850 Recall: 0.875 | Very detailed and mathematically supported approach of using neural networks for classification is presented. Data preprocessing and feature selection is performed. Special attention is put on avoiding over/underfitting problems. All data is baseline |