| Literature DB >> 36056985 |
Mario Merone1, Sebastian Luca D'Addario2,3,4, Pierandrea Mirino2,3,5, Francesca Bertino3, Cecilia Guariglia2,4, Rossella Ventura2,4, Adriano Capirchio5, Gianluca Baldassarre5,6, Massimo Silvetti3, Daniele Caligiore7,8.
Abstract
Alzheimer's disease (AD) diagnosis often requires invasive examinations (e.g., liquor analyses), expensive tools (e.g., brain imaging) and highly specialized personnel. The diagnosis commonly is established when the disorder has already caused severe brain damage, and the clinical signs begin to be apparent. Instead, accessible and low-cost approaches for early identification of subjects at high risk for developing AD years before they show overt symptoms are fundamental to provide a critical time window for more effective clinical management, treatment, and care planning. This article proposes an ensemble-based machine learning algorithm for predicting AD development within 9 years from first overt signs and using just five clinical features that are easily detectable with neuropsychological tests. The validation of the system involved both healthy individuals and mild cognitive impairment (MCI) patients drawn from the ADNI open dataset, at variance with previous studies that considered only MCI. The system shows higher levels of balanced accuracy, negative predictive value, and specificity than other similar solutions. These results represent a further important step to build a preventive fast-screening machine-learning-based tool to be used as a part of routine healthcare screenings.Entities:
Keywords: ADAS score; Cerebellar impairment; Clinical Dementia Rating Scale; Early diagnosis; Machine learning; Renal and genitourinary dysfunctions
Year: 2022 PMID: 36056985 PMCID: PMC9440971 DOI: 10.1186/s40708-022-00168-2
Source DB: PubMed Journal: Brain Inform ISSN: 2198-4026
Subjects composition
| Variable | |
|---|---|
| Male | 292 |
| Female | 233 |
| Age (years) | [55 to 90]; mean 73 |
| Married | 397 |
| Divorced | 43 |
| MCI | 241 (EMCI = 55; LMCI = 186); conv. to AD 104 |
| NT | 284 (SMC = 59); conv. to AD 20 |
| Right hand | 474 |
| Left hand | 51 |
| Education (years) | [6 to 20]; mean 16 |
| Evaluation time lap (years) | [5 to 13]; mean 9 |
EMCI: early MCI; LMCI: late MCI; SMC: subjective memory complaints; NT: normotypical
Descriptive statistics for the ordinal data of all subjects (525)
| Feature | Min | Max | Mean | Std. dev. |
|---|---|---|---|---|
| 1 | 40 | 15.79 | 8.64 | |
| AGE | 55 | 90 | 73.27 | 6.54 |
| ANARTERR | 0 | 47 | 10.78 | 8.39 |
| BNTTOTAL | 11 | 30 | 27.61 | 2.83 |
| CATANIMSC | 6 | 38 | 19.39 | 5.35 |
| CATANINTR | 0 | 6 | 0.06 | 0.46 |
| CATANPERS | 0 | 13 | 0.74 | 1.25 |
| CDCARE | 0 | 1 | 0.01 | 0.12 |
| CDCOMMUN | 0 | 1 | 0.07 | 0.18 |
| CDGLOBAL | 0 | 0.5 | 0.23 | 0.25 |
| CDHOME | 0 | 1 | 0.07 | 0.18 |
| CDJUDGE | 0 | 1 | 0.16 | 0.25 |
| 0 | 2 | 0.25 | 0.29 | |
| CDORIENT | 0 | 2 | 0.10 | 0.22 |
| CLOCKSCOR | 1 | 5 | 4.59 | 0.71 |
| COPYSCOR | 0 | 5 | 4.81 | 0.50 |
| GDTOTAL | 0 | 6 | 1.13 | 1.29 |
| MMSCORE | 24 | 30 | 28.44 | 1.58 |
| NPISCORE | 0 | 17 | 1.25 | 2.23 |
| PTEDUCAT | 6 | 20 | 16.28 | 2.74 |
| RAVLT_forgetting_bl | − 3 | 15 | 4.31 | 2.51 |
| RAVLT_immediate_bl | 13 | 70 | 40.72 | 11.24 |
| RAVLT_learning_bl | 0 | 11 | 5.15 | 2.58 |
| RAVLT_perc_forgetting_bl | − 37.5 | 100 | 47.18 | 31.46 |
| TRAASCOR | 13 | 150 | 36.01 | 15.35 |
| TRABSCOR | 32 | 300 | 90.92 | 47.92 |
In bold the feature selected by the optimized procedure used for the features selection
Descriptive statistics for the nominal data of all subjects (525)
| Feature | Mode | Min | Max |
|---|---|---|---|
| DXPARK | 0 | 0 | 1 |
| FHQDAD | 0 | 0 | 2 |
| FHQMOM | 0 | 0 | 2 |
| FHQSIB | 1 | 0 | 1 |
| MHPSYCH | 0 | 0 | 1 |
| MH2NEURL | 0 | 0 | 1 |
| 1 | 0 | 1 | |
| MH4CARD | 1 | 0 | 1 |
| MH5RESP | 0 | 0 | 1 |
| MH6HEPAT | 0 | 0 | 1 |
| MH7DERM | 0 | 0 | 1 |
| MH8MUSCL | 1 | 0 | 1 |
| MH9ENDO | 0 | 0 | 1 |
| MH10GAST | 0 | 0 | 1 |
| MH11HEMA | 0 | 0 | 1 |
| 0 | 0 | 1 | |
| MH13ALLE | 0 | 0 | 1 |
| MH14ALCH | 0 | 0 | 1 |
| MH16SMOK | 0 | 0 | 1 |
| MH17MALI | 0 | 0 | 1 |
| MH19OTHR | 0 | 0 | 1 |
| NXVISUAL | 1 | 1 | 2 |
| NXAUDITO | 1 | 1 | 2 |
| NXTREMOR | 1 | 1 | 2 |
| NXNERVE | 1 | 1 | 2 |
| NXMOTOR | 1 | 1 | 2 |
| NXFINGER | 1 | 1 | 2 |
| 1 | 1 | 2 | |
| NXSENSOR | 1 | 1 | 2 |
| NXTENDON | 1 | 1 | 2 |
| NXPLANTA | 1 | 1 | 2 |
| NXGAIT | 1 | 1 | 2 |
| PTGENDER | 1 | 1 | 2 |
| PTHAND | 1 | 1 | 2 |
| PTMARRY | 1 | 1 | 5 |
| PXGENAPP | 1 | 1 | 2 |
| PXHEADEY | 1 | 1 | 2 |
| PXNECK | 1 | 1 | 2 |
| PXCHEST | 1 | 1 | 2 |
| PXHEART | 1 | 1 | 2 |
| PXABDOM | 1 | 1 | 2 |
| PXPERIPH | 1 | 1 | 2 |
| PXMUSCUL | 1 | 1 | 2 |
In bold the feature selected by the optimized procedure used for the features selection
Fig. 1Nested tenfold cross-validation (CV) procedure for model development and evaluation. In the outer CV loop (on top left), the dataset was partitioned into the ‘Model Development Set’ and ‘Test Set’. In the inner CV loop (on top right), the ‘Model Development Set’ was further partitioned into the ‘Training Set’ and ‘Validation Set’. The inner loop was composed of tenfold cross-validation Grid Search with the aim of obtaining the best parameters for each of the three classifiers assembled. On the bottom of figure, the procedure for one single iteration of the outer CV loop is graphed in diagram form
Hyperparameters of the three models forming the MEE, and their range used by the grid search method
| Models | Hyperparameters | Range | |
|---|---|---|---|
| Ensemble proposed | Neural Net | Optimizer | {‘SGD’, ‘Adagrad’, ‘Adadelta’, ‘Adam’, ‘Adamax’, ‘Nadam’} |
| Batch size | {10, 20, 40, 60, 80, 100} | ||
| Epochs | {10, 50, 100} | ||
| Number of hidden units | {2:2:50} | ||
| Random Forest | Max depth | {5, 20, 50, 80, 110} | |
| Min samples for leaf | {3, 4, 5, 10]} | ||
| Min samples for split | {8, 10, 12, 24, 32} | ||
| Number of estimators | {30, 200, 300, 1000} | ||
| Support Vector Machine | C parameter | {0.1,1, 10, 100} | |
| {1, 0.1, 0.01, 0.001} | |||
| Kernel | {‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’} |
Performance of the ML algorithms
| Classifier models | Sensitivity (%) | Specificity (%) | Precision (%) | NPV (%) | BA (%) | |
|---|---|---|---|---|---|---|
| MEE proposed | 73.5 | 88.3 | 68.3 | 91.5 | 80.9 | 70.8 |
| AdaBoost | 54.2 | 93.3 | 73.8 | 87.0 | 73.7 | 62.5 |
| MLP | 76.3 | 78.4 | 54.4 | 91.6 | 77.4 | 63.5 |
| NB | 48.5 | 90.0 | 61.1 | 85.4 | 69.4 | 54.1 |
| DT | 59.1 | 87.0 | 66.0 | 87.5 | 73.1 | 62.4 |
| KNN | 52.8 | 93.5 | 73.4 | 87.4 | 74.6 | 61.4 |
| LR | 75.0 | 79.5 | 55.2 | 91.3 | 77.3 | 63.6 |
| RF | 68.6 | 86.0 | 63.1 | 89.9 | 77.3 | 65.7 |
| SVM | 65.6 | 89.0 | 66.9 | 89.4 | 77.3 | 66.2 |
Sensitivity: ratio between the AD converter subjects correctly labeled by the algorithm and all subjects that actually converted; Specificity: ratio between the non-AD converter subjects correctly labeled by the algorithm and all subjects that have not actually converted; Precision: ratio between the correctly AD converter subjects labeled by the algorithm and the AD converters; Negative predictive value (NPV): the proportion of predicted negatives which are real negatives. It reflects the probability that a predicted negative is a true negative; Balanced accuracy (BA): the average between sensitivity and specificity; F1-score: the harmonic average of the sensitivity and precision
Reports for each method the average score obtained during Grid Search and values of hyperparameters most frequently selected during k-fold nested-cross-validation
| Classifier models | Average score | Best hyperparameters | |
|---|---|---|---|
| Ensemble proposed | NN | 78.5 | Optimizer: Adam |
| Batch size: 60 | |||
| Epochs: 100 | |||
| Number of hidden units: 32 | |||
| RF | 81.7 | Max depth: 80 | |
| Min samples for leaf: 3 | |||
| Min samples for split: 12 | |||
| Number of estimators: 100 | |||
| SVM | 83.5 | ||
| Kernel: radial basic function | |||
| AdaBoost | 80.6 | Algorithm: SAMME | |
| Learning rate: 0.1 | |||
| Number of estimators: 250 | |||
| MLP | 72.3 | Activation: identity | |
| Batch size: 20 | |||
| Epochs: 80 | |||
| Optimizer: Adam | |||
| Number of hidden units: 16 | |||
| NB | – | – | |
| DT | 78.3 | Criterion of split: Gini | |
| Max depth: 2 | |||
| Min samples for leaf: 5 | |||
| Split method: best | |||
| KNN | 54.2 | Distance metric: manhattan | |
| Number of neighbors: 19 | |||
| LR | 78.6 | ||
| Penalty: L1 | |||
| Solver: Newton-cg |