| Literature DB >> 34805994 |
Thomas W Rowe1, Ioanna K Katzourou1, Joshua O Stevenson-Hoare1, Matthew R Bracher-Smith2, Dobril K Ivanov1, Valentina Escott-Price1,2.
Abstract
Alzheimer's disease is a neurodegenerative disorder and the most common form of dementia. Early diagnosis may assist interventions to delay onset and reduce the progression rate of the disease. We systematically reviewed the use of machine learning algorithms for predicting Alzheimer's disease using single nucleotide polymorphisms and instances where these were combined with other types of data. We evaluated the ability of machine learning models to distinguish between controls and cases, while also assessing their implementation and potential biases. Articles published between December 2009 and June 2020 were collected using Scopus, PubMed and Google Scholar. These were systematically screened for inclusion leading to a final set of 12 publications. Eighty-five per cent of the included studies used the Alzheimer's Disease Neuroimaging Initiative dataset. In studies which reported area under the curve, discrimination varied (0.49-0.97). However, more than half of the included manuscripts used other forms of measurement, such as accuracy, sensitivity and specificity. Model calibration statistics were also found to be reported inconsistently across all studies. The most frequent limitation in the assessed studies was sample size, with the total number of participants often numbering less than a thousand, whilst the number of predictors usually ran into the many thousands. In addition, key steps in model implementation and validation were often not performed or unreported, making it difficult to assess the capability of machine learning models.Entities:
Keywords: AUC; Alzheimer’s disease; EPV; SNPs; machine learning
Year: 2021 PMID: 34805994 PMCID: PMC8598986 DOI: 10.1093/braincomms/fcab246
Source DB: PubMed Journal: Brain Commun ISSN: 2632-1297
Figure 1Visual breakdown of publication selection based on a similar diagram found in PRISMA.
Figure 2A forest plot displaying models used across publications which reported AUC, with the addition of confidence intervals derive using the Newcombe Method. Column 1—Publication number as found in Supplementary Table 1, along with sample size. Column 2—Type of machine learning model. Column 3—Information to help distinguish between models in publications, including differing SNP numbers and methodologies.
Figure 3A forest plot displaying all models used across publications which reported ACC. Column 1—Publication number as found in Supplementary Table 1, along with sample size. Column 2—Type of machine learning model. Column 3—Information to help distinguish between models in publications, including differing SNP numbers and methodologies.
Summary of ML methods used in the analysed publications
| ML approach | Number of publications | Number of models reported across publications | Additional information |
|---|---|---|---|
| Support vector machine (SVMs) | 8 | 44 | Linear kernels (22 models, 5 studies). Quadratic polynomials (4 models, 2 study). Cubic Polynomials (4 models, 2 study). Radial basis functions (3 models, 2 studies). Pearson kernel function (2 models, 1 study). Unreported kernels (9 models, 3 studies). A supervised method which uses distance-based calculations to separate samples into groups. |
| Penalised regression (LASSO) | 4 | 15 | All 15 LASSO regressions across 3 studies. A regression analysis which performs both feature selection and regularization. |
| Naïve Bayes (NB) | 4 | 10 | Six ordinary NB models, three tree-augmented NB and one model averaged NB. A probabilistic classifier which uses bayes theorem to make predictions. |
| Random forest (RF) | 3 | 5 | Five classification RFs used, two of which used the RPART package. These are an ensemble of decision trees which produce aggregated classifications. |
| Bayesian networks (BN) | 2 | 4 | 2 BNs with K2 learning algorithm, one markov blanket and one minimal augmented markov blanket. A graphical model which calculates conditional dependencies between variables using Bayesian statistics. |
| Linear models | 2 | 4 | Bootstrapping Stage-Wise Model Selection (BSWiMS). A supervised model-selection algorithm which uses a combination of linear models for prediction. |
| K nearest neighbour (KNN) | 2 | 3 | This is a distanced based algorithm which uses similarities in features to classify. |
| Ensemble methods | 1 | 2 | Ensembles are the use of a number of ML models, these arrive at a collective prediction result. |
| Logistic regression (LR) | 1 | 1 | A form of linear regression whereby the outcome is a categorical variable. |
| Multi-factor dimensionality reduction (MFDR) | 1 | 1 | A technique used to detect combinations of independent variables that influence a dependent variable. |
Type of machine learning model.
The number of publications models were used in.
The number of publications these models occurred in.
Further information regarding the machine model used.
BN = Bayesian networks; RF = random forest; KNN= K nearest neighbour; LASSO= least absolute shrinkage and selection operator; LR= logistic regression; MFDR= multi-factor dimensionality reduction; ML= machine learning.
Figure 4A forest plot displaying all available EPV values across the included studies. Column 1—Publication number as found in Supplementary Table 1. Column 2—Number of samples. Column 3—Number of predictors used, Column 4—AUC of models if reported, Column 5—ACC of models if reported, Column 6—values of EPV.