| Literature DB >> 36092811 |
Huan Wang1, Li Sheng2, Shanhu Xu3, Yu Jin3, Xiaoqing Jin3, Song Qiao3, Qingqing Chen4, Wenmin Xing5, Zhenlei Zhao5, Jing Yan5, Genxiang Mao5, Xiaogang Xu5.
Abstract
Background: Early identification of Alzheimer's disease or mild cognitive impairment can help guide direct prevention and supportive treatments, improve outcomes, and reduce medical costs. Existing advanced diagnostic tools are mostly based on neuroimaging and suffer from certain problems in cost, reliability, repeatability, accessibility, ease of use, and clinical integration. To address these problems, we developed, evaluated, and implemented an early diagnostic tool using machine learning and non-imaging factors. Methods and results: A total of 654 participants aged 65 or older from the Nursing Home in Hangzhou, China were identified. Information collected from these patients includes dementia status and 70 demographic, cognitive, socioeconomic, and clinical features. Logistic regression, support vector machine (SVM), neural network, random forest, extreme gradient boosting (XGBoost), least absolute shrinkage and selection operator (LASSO), and best subset models were trained, tuned, and internally validated using a novel double cross validation algorithm and multiple evaluation metrics. The trained models were also compared and externally validated using a separate dataset with 1,100 participants from four communities in Zhejiang Province, China. The model with the best performance was then identified and implemented online with a friendly user interface. For the nursing dataset, the top three models are the neural network (AUROC = 0.9435), XGBoost (AUROC = 0.9398), and SVM with the polynomial kernel (AUROC = 0.9213). With the community dataset, the best three models are the random forest (AUROC = 0.9259), SVM with linear kernel (AUROC = 0.9282), and SVM with polynomial kernel (AUROC = 0.9213). The F1 scores and area under the precision-recall curve showed that the SVMs, neural network, and random forest were robust on the unbalanced community dataset. Overall the SVM with the polynomial kernel was found to be the best model. The LASSO and best subset models identified 17 features most relevant to dementia prediction, mostly from cognitive test results and socioeconomic characteristics.Entities:
Keywords: Alzheimer’s disease; dementia; early diagnostic tool; machine learning; non-imaging factors
Year: 2022 PMID: 36092811 PMCID: PMC9461143 DOI: 10.3389/fnagi.2022.945274
Source DB: PubMed Journal: Front Aging Neurosci ISSN: 1663-4365 Impact factor: 5.702
FIGURE 1Flow diagram of selecting participants from the nursing home and the communities.
FIGURE 2Visualization of the nursing dataset after dimension reduction with principal component analysis. Black dots represent patients diagnosed with Alzheimer’s disease (AD), red dots represent patients diagnosed with mild cognitive impairment (MCI), and green dots represent patients from neither class (non-AD/MCI).
FIGURE 3Flowchart of the double cross-validation.
Demographic characteristics of age, sex, education, and summary scores of cognitive tests across the normal, MCI, and AD groups.
| Nursing dataset | Community dataset | |||||||
| AD | MCI | Non-AD/MCI | Overall | AD | MCI | Non-AD/MCI | Overall | |
| ( | ( | ( | ( | ( | ( | ( | ( | |
|
| ||||||||
| Male | 52 (31.0%) | 63 (34.6%) | 114 (37.5%) | 229 (35.0%) | 30 (50.8%) | 48 (40.7%) | 448 (48.5%) | 526 (47.8%) |
| Female | 116 (69.0%) | 119 (65.4%) | 190 (62.5%) | 425 (65.0%) | 29 (49.2%) | 70 (59.3%) | 475 (51.5%) | 574 (52.2%) |
| 85 (± 6.4) | 85 (± 5.6) | 82 (± 6.4) | 84 (± 6.3) | 80 (± 5.4) | 79 (± 5.7) | 75 (± 6.2) | 76 (± 6.3) | |
| 2.3 (± 1.3) | 2.7 (± 1.3) | 3.4 (± 1.4) | 2.9 (± 1.4) | 2.4 (± 1.2) | 2.9 (± 1.1) | 3.4 (± 1.2) | 3.3 (± 1.2) | |
| 1.0 (± 1.1) | 2.1 (± 1.2) | 3.2 (± 0.88) | 2.3 (± 1.4) | 2.1 (± 1.2) | 2.6 (± 1.2) | 3.6 (± 0.69) | 3.4 (± 0.90) | |
| 0.98 (± 1.4) | 2.4 (± 1.8) | 4.2 (± 1.3) | 2.9 (± 2.0) | 3.1 (± 1.6) | 3.9 (± 1.3) | 4.7 (± 0.70) | 4.6 (± 0.95) | |
| 13 (± 6.6) | 22 (± 4.3) | 27 (± 2.6) | 22 (± 7.2) | 21 (± 4.7) | 24 (± 3.8) | 28 (± 2.2) | 28 (± 3.2) | |
| 4.3 (± 2.6) | 2.1 (± 1.8) | 0.80 (± 1.1) | 2.1 (± 2.3) | 4.3 (± 2.2) | 3.8 (± 2.0) | 1.7 (± 1.1) | 2.0 (± 1.5) | |
Classification performance of each model evaluated internally by the double CV on the nursing dataset.
| Method | Sensitivity | Specitivity | Accuracy | AUROC |
| Logistic regression | 0.8229 | 0.8289 | 0.8256 | 0.9068 |
| SVM_l | 0.8143 | 0.8453 | 0.8287 | 0.9127 |
| SVM_r | 0.8600 | 0.8455 | 0.8532 | 0.9287 |
| SVM_s | 0.8200 | 0.8976 | 0.8562 | 0.9374 |
| SVM_p | 0.8343 | 0.8947 | 0.8624 | 0.9378 |
| Neural network | 0.8429 | 0.8751 | 0.8578 | 0.9435 |
| Random forest | 0.8314 | 0.8618 | 0.8455 | 0.9340 |
| XGBoost | 0.8457 | 0.8552 | 0.8501 | 0.9398 |
| LASSO | 0.8400 | 0.8882 | 0.8624 | 0.9341 |
| Best subset | 0.8371 | 0.8553 | 0.8456 | 0.9141 |
FIGURE 4The final neural network model. Weights are color-coded by sign (blue+, gray−) and thickness is in proportion to magnitude. Input features include sex, age, Mini-Cog test (c1–c2), Clock Drawing test (d1–d5), Mini-Mental State exam (e11–e15, e21–e25, e31–e33, e41–e45, e51–e53, e61–e62, e7, e8, e91–e93, e10, e16), AD8 screening (f1–f8), education (h1), occupation (h22–h29), marital status (h32–h35, h37), past medical history (h61–66), number of medications (h8), smoking (h9), and drinking (h10).
Classification performance of each model evaluated externally on the community dataset.
| Method | Sensitivity | Specitivity | Accuracy | AUROC | Precision | F1-score | AUPRC |
| Logistic regression | 0.6215 | 0.8895 | 0.8464 | 0.8435 | 0.5189 | 0.5656 | 0.5199 |
| SVM_l | 0.5763 | 0.9458 | 0.8864 | 0.9282 | 0.6711 | 0.6201 | 0.6652 |
| SVM_r | 0.6102 | 0.9404 | 0.8873 | 0.9137 | 0.6626 | 0.6353 | 0.6395 |
| SVM_s | 0.5650 | 0.9437 | 0.8827 | 0.9177 | 0.6579 | 0.6079 | 0.6560 |
| SVM_p | 0.6045 | 0.9415 | 0.8873 | 0.9213 | 0.6646 | 0.6331 | 0.6549 |
| Neural network | 0.5876 | 0.9426 | 0.8855 | 0.9139 | 0.6624 | 0.6228 | 0.6513 |
| Random forest | 0.5706 | 0.9437 | 0.8836 | 0.9259 | 0.6601 | 0.6121 | 0.6623 |
| XGBoost | 0.5424 | 0.9415 | 0.8773 | 0.9006 | 0.6400 | 0.5872 | 0.6323 |
| LASSO | 0.5932 | 0.9393 | 0.8836 | 0.9023 | 0.6522 | 0.6213 | 0.6284 |
| Best subset | 0.4859 | 0.9274 | 0.8564 | 0.8483 | 0.5621 | 0.5212 | 0.5432 |
FIGURE 5ROC and PRC curves for the machine learning models for detecting dementia. (A) Mean ROC curves for the models on the nursing datasets. Each curve represents the mean ROC curve from the outer loop of the double cross validation. (B) ROC curves for the models on the community data. (C) PRC curves for the models on the community data.
FIGURE 6The role of education in predicting dementia. (A) The marginal distribution of the levels of education on the nursing dataset. (B) The correlation coefficients between education and other features on the nursing dataset.