| Literature DB >> 34350389 |
Sayantan Kumar1, Inez Oh1, Suzanne Schindler2, Albert M Lai1, Philip R O Payne1, Aditi Gupta1.
Abstract
OBJECTIVE: Alzheimer disease (AD) is the most common cause of dementia, a syndrome characterized by cognitive impairment severe enough to interfere with activities of daily life. We aimed to conduct a systematic literature review (SLR) of studies that applied machine learning (ML) methods to clinical data derived from electronic health records in order to model risk for progression of AD dementia.Entities:
Keywords: Alzheimer disease; clinical data; dementia; electronic health records; machine learning
Year: 2021 PMID: 34350389 PMCID: PMC8327375 DOI: 10.1093/jamiaopen/ooab052
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Exclusion criteria for research articles
| Exclusion criteria | Reasons for exclusion | |
|---|---|---|
| 1. | Only neuroimaging features were considered for predictive modeling | Scope of the review was inclusion of clinical EHR-derived data with/or without imaging features |
| 2. | ML methods were not used for clinical predictive modeling related to AD/dementia | We excluded articles which performed cohort summarization and hypothesis testing using statistical methods like logistic regression odds ratio, Chi-square distribution, Kruskal–Wallis test, etc. |
| 3 | Focus on a disease other than AD/dementia | AD/dementia is not the focus of the main analyses |
| 4. | AD/dementia is used only as an example of a neurodegenerative disease | AD/dementia is not the focus of the main analyses |
| 5. | Not peer-reviewed conference proceedings, journal, or preprints | Outside the scope of our review |
| 6. | Multiple publications from the same research group with similar final outcomes. In such cases, only the most recent studies were considered | Considered to be duplicate articles |
| 7. | Review articles | Review articles did not focus on a specific research goal |
Abbreviations: AD: Alzheimer disease; EHR: electronic health record; ML: machine learning.
Figure 1.PRISMA flow diagram. Abbreviation: ACM: Association for Computing Machinery.
Figure 2.Studies published per year which use machine learning on clinical data for prognostic estimates of Alzheimer’s Disease. For 2020, studies dated till May 31st were considered for the review.
Figure 3.Size of cohort used in the reviewed studies.
Features related to AD dementia identified by articles
| Feature categories | Measures/factors | Number of articles (%) |
|---|---|---|
| Neuroimaging | MRI (structural, functional, unspecified), PET (FDG, amyloid) | 35 (54%) |
| Cognitive assessments | MMSE, ADAS-Cog, others (CDR, FAQ) | 48 (75%) |
| Genetic |
| 24 (38%) |
| Laboratory | CSF, vitals, medications, medical history, other laboratory tests | 32 (50%) |
| Demographics | Age, gender, education, race | 47 (72%) |
| Clinical notes | Discharge summary | 4 (6%) |
Abbreviations: AD: Alzheimer disease; ADAS-Cog: Alzheimer’s Disease Assessment Scale-cognitive subscale; APOE ε4: apolipoprotein epsilon 4 allele; CDR: clinical dementia rating; CSF: cerebrospinal fluid; FAQ: Functional Activities Questionnaire; FDG, •••; MMSE: Mini-Mental State Exam; MRI: magnetic resonance imaging; PET: position emission tomography.
Figure 4.Relationship between the modality and accessibility of the datasets used in the included studies.
Specific computational and machine learning methods utilized
| Computational methods | Specific models | Number of articles (%) |
|---|---|---|
| Regression |
Linear regression Logistic regression Lasso regression Ridge regression Support vector regression | 22 (34%) |
| SVM |
SVM with linear kernel SVM with RBF kernel SVM with polynomial kernel Support vector regression | 22 (34%) |
| Decision trees |
Decision trees Random forest Adaboost GBM | 32 (50%) |
| Bayesian networks |
Naïve Bayes model Bayesian belief networks GMM | 13 (20%) |
| Neural networks |
Multilayer perceptron CNN-based models RNN-based models Autoencoder RBM Graph neural networks | 28 (44%) |
| NLP | Text mining | 4 (6%) |
| Others |
KNN k-Means | 7 (11%) |
Abbreviations: CNN: convolutional neural network; GBM: gradient boosting models; GMM: Gaussian mixture model; KNN: K-nearest neighbor; NLP: natural language processing; RBF: radial basis function; RBM: restricted Boltzmann machines; RNN: recurrent neural network; SVM: support vector machines.