| Literature DB >> 30167371 |
José María Mateos-Pérez1, Mahsa Dadar2, María Lacalle-Aurioles2, Yasser Iturria-Medina2, Yashar Zeighami2, Alan C Evans2.
Abstract
In this paper, we provide an extensive overview of machine learning techniques applied to structural magnetic resonance imaging (MRI) data to obtain clinical classifiers. We specifically address practical problems commonly encountered in the literature, with the aim of helping researchers improve the application of these techniques in future works. Additionally, we survey how these algorithms are applied to a wide range of diseases and disorders (e.g. Alzheimer's disease (AD), Parkinson's disease (PD), autism, multiple sclerosis, traumatic brain injury, etc.) in order to provide a comprehensive view of the state of the art in different fields.Entities:
Keywords: Alzheimer; Autism; Cross-validation; Ensembling; Machine learning; Multiple sclerosis; Neuroimaging; Parkinson; Predictive modeling; SVMs; Structural magnetic resonance imaging
Mesh:
Year: 2018 PMID: 30167371 PMCID: PMC6108077 DOI: 10.1016/j.nicl.2018.08.019
Source DB: PubMed Journal: Neuroimage Clin ISSN: 2213-1582 Impact factor: 4.881
Fig. 1Image processing workflow, from the raw datasets to final input matrix for the ML system. This example assumes two different MRI modalities are used: structural T1 and DTI. The complete pipeline, from image to data input matrix, involves 3 steps: a) image processing to obtain quantitative information (e.g. CT surfaces, FA volumes, or connectivity matrices); b) removal of spatial information (flattening) to obtain single feature vectors per subject; and c) aggregation of all feature vectors into a single data matrix. A corresponding label output vector contains the classification target (e.g. the clinical state) for each subject. This process can involve more modalities such as PET, CSF, rs-fMRI, EEG, genetic, and behavioral information, but the final aggregated product would be similar.
Fig. 2Optimal workflow for constructing a classifier or predictor. Splitting the data into N folds using a cross-validation approach is not the only step required to ensure generalizability. Internal cross-validation loops are necessary to obtain a subset of relevant features (if feature selection is needed) and to tune model hyperparameters (e.g. C in Gaussian SVMs, number of neurons in neural networks, or number of trees in random forests). Performing these steps on the full sample will result in an excessively optimistic classifier. Additionally, the cross-validation evaluation could be enhanced by performing permutation tests (Golland and Fischl, 2003; Ojala and Garriga, 2010). Figure reproduced with permission from the original (Gabrieli et al., 2015).
Fig. 3A) Using a SVM multi-kernel approach, Zhang et al. (2011) found 11 relevant cortical regions for AD classification: left and right amygdala, left and right hippocampal formations, left and right uncus, left entorhinal cortex, left middle temporal gyrus, left temporal lobe, left perirhinal cortex and left parahippocampal gyrus. This assessment of the importance of different features supports the usage of ML techniques in order to understand the biological bases of diseases. B) It is also possible to report variable importance without using the spatial distribution. Figure reproduced with permission from the original (Westman et al., 2012; Zhang et al., 2011).
Summary of the classification papers in Alzheimer's disease. Unless otherwise noted, reported accuracy rates are the highest found in the paper for different groups, methods and input modalities.
| Reference | Groups (N) | Method | Input Modalities | Accuracy | Comments |
|---|---|---|---|---|---|
| ( | Controls-AD (several groups) | SVM (linear) | T1 | Up to 0.964 | Independent samples for training and testing |
| ( | MCI-c (100) - MCI-nc (164) | SVM | T1, cognitive | 0.66 | Features selected on independent AD vs. Control samples. |
| LDS | 0.745 | ||||
| ( | Controls (25) – AD (28) | SVM (Gaussian) | rs-fMRI | 0.74 | AUC = 0.8 |
| DTI | 0.85 | AUC = 0.87 | |||
| T1 | 0.81 | AUC = 0.6 | |||
| SVM (multi-kernel) | rs-FMRI, DTI, T1 | 0.79 | AUC = 0.82 | ||
| DTI, T1 | 0.85 | AUC = 0.89 | |||
| ( | Controls (162) – AD (137) | Multiple classifiers tested, linear SVM had best accuracy. | T1 | 0.82 | MCI conversion or non-conversion at 18 months. Results for MCIc vs. MCI-nc are non-significant. |
| Controls (162) – MCI-c (76) | 0.76 | ||||
| MCI-c (76) – MCI-nc (134) | 0.62 | ||||
| ( | Controls (143) – MCI (113) | Logistic regression | T1 | 0.823 | Independent samples for training and testing (AUC = 0.95). |
| ( | Controls (52) – AD (51) | SVM (linear) | T1 | 0.862 | No hyperparameter search. |
| Controls (51) – MCI (99) | SVM (multikernel) | PET | 0.865 | ||
| SVM (linear) | CSF | 0.821 | |||
| SVM (multikernel) | T1, PET, CSF | 0.932 | |||
| T1 | 0.72 | ||||
| PET | 0.716 | ||||
| CSF | 0.714 | ||||
| T1, PET, CSF | 0.745 | ||||
| ( | Controls (17) – MCI (20) | SVM (linear) | DTI | 0.667 | Leakage (features selected on full dataset). |
| SVM (Gaussian) | 0.889 | ||||
| ( | Controls (229)-AD (188) | SVM (linear) | T1 | 0.883 | Leakage (voxels excluded based on statistical assessment on full dataset). Data mixed 1.5 T and 3 T images and was processed with 3 different software packages (FreeSurfer, MorphoBox, SPM). SVM hyperparameter search performed outside cross-validation loop. |
| Controls (229)-MCI (401) | 0.779 | ||||
| MCI (401)-AD (188) | 0.687 | ||||
| MCI-nc (130)-MCI-c (111) (2 y) | 0.688 | ||||
| MCI-nc (103)-MCI-c (137) (3 y) | 0.698 | ||||
| ( | Controls (130) – AD (130) | SVM (linear) | T1 | 0.896 | |
| SVM (Gaussian) | 0.893 | ||||
| ( | Controls (79) – AD (80) | SVM (linear) | T1 | 0.832 | Effect of age is removed from data. |
| ( | Controls (35) – AD (37) | Random forest | T1, CSF, genetic, FDG-PET | 0.89 | |
| Controls – MCI (75) | 0.746 | ||||
| MCI-nc (41) – MCI-c (34) | 0.580 | ||||
| ( | MCI-nc (170) – MCI-c (69) | SVM (linear kernel) | T1, CSF | 0.734 | Part of the features come from a method trained on Controls vs. AD. |
| ( | Controls (15) – MCI (15) | SVM (linear kernel) | T1 | 0.9 | Longitudinal dataset. |
| ( | Controls (110) – AD (116) | OPLS | T1, genetic, demographic | 0.876 | Accuracies for Controls – AD for different classifiers may use different input data. |
| MCI-nc (98) – MCI-c (21) | SVM (Gaussian kernel) Decision trees | 0.867 | |||
| Neural networks | 0.827 | ||||
| OPLS | 0.872 | ||||
| SVM (Gaussian kernel) Decision trees | 0.747 | ||||
| Neural networks | 0.709 | ||||
| 0.674 | |||||
| 0.701 | |||||
| ( | Controls (20) – AD (15) | SVM (Gaussian) | T1 | 0.882 | Hyperparameters optimized in the outer loop. Features selected on full dataset. |
| ( | Controls (112) – AD (117) | OPLS | T1 | 0.92 | |
| Controls (112) – MCI (122) | 0.769 | ||||
| MCI (122) – AD (117) | 0.71 | ||||
| ( | Controls (111) – AD (96) | OPLS | T1, CSF | 0.918 | |
| Controls (111) – MCI (162) | 0.776 | ||||
| MCI-nc (81) – MCI-c (81) | 0.685 | ||||
| ( | Controls (117) – AD (113) | wmSRC | T1, PET | 0.948 | |
| Controls (117) – MCI (110) | 0.745 | ||||
| MCI-nc (83) – MCI-c (27) | 0.778 | ||||
| ( | Controls (17) – MCI (10) | SVM (multi-kernel) | DTI, fMRI | 0.963 | Features selected on full dataset. |
| ( | Controls (50) – AD (37) | SVM (Gaussian) | DTI | 0.849 | Features selected on full dataset. |
| Controls (50) – late MCI (39) | 0.79 | ||||
| ( | Controls (66) – AD (56) | SVM (linear) | T1 | 0.965 | |
| Controls (66) – MCI (88) | 0.846 | ||||
| MCI (88) – AD (56) | 0.759 | ||||
| ( | MCI-nc (96) – MCI-c (47) | Gaussian Process | T1, PET, APOE, CSF | 0.643 | Trained on healthy subjects + AD, tested on MCI cohort. |
| ( | Controls (162) – AD (137) | SVM | T1 | 0.92 | |
| Controls (162) – MCI-c (76) | 0.86 | ||||
| MCI-nc (134) – MCI-c (76) | 0.73 | ||||
| ( | Controls (75) – AD (75) | SVM (linear) | T1 | 0.92 | Other classifiers used (not reported here). |
| ( | Controls (282) | LDA | T1 | 0.63 | Multi-class classification results. Winner of CADDementia challenge. More complex classifiers did not improve performance. |
| MCI (283) | |||||
| AD (154) | |||||
| CADDementia Test (354) | |||||
| ( | Controls (52) – AD (45) | SVM (Gaussian) | T1, DTI | 0.902 | Use a multiple kernel learning method to combine features from T1, DTI, and CSF. |
| Controls (52) – MCI (58) | 0.794 | ||||
| MCI (58) – AD (45) | 0.766 | ||||
| ( | Controls (190) – AD (190) | SVM (linear) | T1 | 0.885 | Model selection and optimization performed on 280 samples and validated on the remaining 100. |
| T1, APOE | 0.893 |
Indicates that accuracies have been computed using sensitivity and specificity values from the paper (accuracy = sensitivity · prevalence + specificity ·(1 − prevalence). The value for prevalence has been obtained from the number of cases for each group.
Informative regions (GM and WM) the classification tasks in AD. This table does not make any distinction regarding the cohorts involved in the classification (AD, MCI, controls), as it has been shown that affected regions are similar for AD and MCI.
| Region | References | N |
|---|---|---|
| Gray Matter | ||
| Hippocampus | ( | 18 |
| Temporal lobes | ( | 10 |
| Amygdala | ( | 11 |
| Parahippocampal gyrus | ( | 9 |
| Middle temporal | ( | 8 |
| Entorhinal cortex | ( | 8 |
| Insula | ( | 6 |
| Inferior temporal | ( | 6 |
| Posterior cingulate | ( | 7 |
| Frontal lobes | ( | 3 |
| Inferior parietal | ( | 3 |
| Anterior cingulate | ( | 3 |
| Supramarginal gyrus | ( | 2 |
| Middle cingulate | ( | 2 |
| Thalamus | ( | 3 |
| Uncus | ( | 2 |
| Superior frontal lobe | ( | 1 |
| Parietal cortex | ( | 2 |
| Cerebellar areas | ( | 1 |
| Posterior middle frontal | ( | 1 |
| Fusiform gyrus | ( | 1 |
| Lingual | ( | 1 |
| Precuneus | ( | 6 |
| Superior temporal | ( | 6 |
| Perirhinal cortex | ( | 1 |
| Rectus gyrus | ( | 1 |
| Inferior lateral ventricle | ( | 1 |
| Isthmus cingulate gyrus | ( | 1 |
| Orbitofrontal cortex | ( | 2 |
| White Matter | ||
| Fornix (WM) | ( | 2 |
| Temporal lobes (WM) | ( | 2 |
| Ventral cingulum (WM) | ( | 1 |
| Caudate nucleus | ( | 2 |
| Corpus callosum (WM) | ( | 1 |
| Periventricular WM | ( | 2 |
| Parietal WM | ( | 1 |
| Frontal WM | ( | 1 |
| Occipital WM | ( | 1 |
| Inferior temporal WM | ( | 1 |
Summary of the classification papers in autism. Unless otherwise noted, reported accuracy rates are the highest found in the paper for different groups, methods and input modalities.
| Ref | Groups (N) | Method | Input Modalities | Accuracy | Comments |
|---|---|---|---|---|---|
| ( | Controls (153)-Autism (127) | Multiple | T1, rs-fMRI | 0.7 | Leakage: features selected on the full dataset. Uses 67 different classifiers from the WEKA toolbox. |
| ( | Controls (20)-Autism (20) | SVM (linear) | T1 | 0.9 | No hyperparameter search (fixed C = 1). |
| ( | Controls (84)-Autism (82) | SVR (Gaussian) | T1 | Predict ADOS scores instead of clinical state as a binary class problem. No hyperparameter search (fixed γ). | |
| ( | Controls (24) - Autism (24) | SVM (Gaussian) | T1 | 0.92 | Analysis is done per individual region |
| ( | Controls (18) - Autism (19) | Decision tree | T1, DTI, spectroscopy | 0.919 | Possible leakage: […] data points included were the significant resulting values of the statistical analyses of separate neuroimaging modalities |
| ( | Controls (42)-Autism (93) | LDA ensemble | MEG, DTI | 0.83 | Final accuracy rates are the result of ensembling LDA classifiers that use different combinations of input data. |
| ASD/LI+ (36)-ASD/LI- (57) | 0.7 | ||||
| ( | Controls (59)-Autism (58) | SVM (multi-kernel) | T1 | 0.963 | |
| ( | Controls (22)-Autism (22) | SVM (linear) | T1 | 0.81 | No hyperparameter search (fixed C = 1). |
| ( | Control (30 + 7)-Autism (30 + 12) | QDA | DTI | 0.916 | Independent test set |
Summary of the classification papers in MS. Unless otherwise noted, reported accuracy rates are the highest found in each paper for different groups, method and input modalities.
| Reference | Groups (N) | Method | Input Modalities | Accuracy | Comments |
|---|---|---|---|---|---|
| ( | CIS (74) - longitudinal (1 year) | SVM (polynomial) | T2, PD, clinical, demographic | 0.714 | |
| CIS (74) - longitudinal (3 years) | 0.680 | ||||
| ( | Early MS (17) - late MS (17) | SVM (linear) | T1, T2 | 0.85 | |
| Low lesion load MS (20)-High lesion load MS (20) | 0.83 | ||||
| Benign MS (13)-Non-benign MS (13) | 0.77 | ||||
| ( | Controls (26) – MS (41) | SVM (linear) | T1, T2 | 0.96 | |
| ( | Controls (15 + 15)-EOPMS (15 + 16) | Logistic regression | T2 | 0.867* | Each voxel individually tested. 2 groups of subjects matched differently (lesion load, gender, & disease duration or age). |
| Controls (15 + 15)-LOPMS (16+ 17) | 0.871* | ||||
| EOPMS (15 + 16)-LOPMS (16 + 17) | 0.807* |
Summary of the classification papers in PD.
| Ref | Groups (N) | Method | Input Modalities | Accuracy | Comments |
|---|---|---|---|---|---|
| ( | Controls (22) - PD (21) | SVM (linear) | T1 | 0.42 | Default C hyperparameter (C = 1). F-contrast computed using the whole sample applied as weight. |
| Controls (22) - PSP (10) | 0.937 | ||||
| Controls (22) - MSA (11) | 0.788 | ||||
| MSA (11) - PD (21) | 0.719 | ||||
| MSA (11) - PSP (10) | 0.762 | ||||
| PD (21) - PSP (10) | 0.968 | ||||
| ( | PD (57) - PSP (21) | SVM (kernel not specified) | T1, T2, DTI | 1 | F-contrast computed using the whole sample applied as weight. |
| ( | Controls (22) - PD (20) | Bootstrap | DTI | 0.901 | |
| ( | PSP (17), PD (14), MSA (19) | Multinomial logit | T1 | 0.917 | |
| Controls (19), PSP (17), PD (14), MSA (19) | 0.736 | ||||
| PSP (17), PD (14), MSA-C (7), MSA-P (12) | 0.845 | ||||
| Controls (19), PSP (17), PD (14), MSA-C (7), MSA-P (12) | 0.662 | ||||
| ( | Controls (14), PD (14), PSP | Multinomial logit | T1, T2, DTI | Brier = 0.753 | Highest multiclass error score (Brier) obtained using GM only. |
| (16), MSA (18) | |||||
| ( | Controls (28) - PD (28) | SVM (linear kernel) | T1 | 0.927 | Not mentioned how hyperparameters were tuned. |
| Controls (28) - PSP (28) | 0.970 | ||||
| PD (28) - PSP (28) | 0.982 | ||||
| ( | PD (16) - PSP (8) + MSA (8) | SVM | T1 | 0.906 | PCA transformation applied on 149 healthy controls. No mention on the type of kernel or how hyperparameters were tuned. |
| ( | PD (17) - Other (23) | SVM (Gaussian kernel) | DTI | 0.975 | Heterogeneous “Other” containing patients with different diseases, including MSA and PSP. |
| ( | PD (16) - Other (20) | SVM (Gaussian kernel) | SWI | 0.869 | Same considerations as for ( |
Informative regions (GM and WM) for PD classification tasks.
| Region | References | Number |
|---|---|---|
| Gray Matter | ||
| Rectal gyrus | ( | 1 |
| Middle cingulate | ( | 1 |
| Left Putamen | ( | 1 |
| Right Putamen | ( | 1 |
| Thalamus | ( | 3 |
| Pons | ( | 1 |
| Midbrain | ( | 2 |
| Brainstem | ( | 2 |
| Caudate | ( | 2 |
| Putamen | ( | 1 |
| Precuneus | ( | 1 |
| Basal ganglia | ( | 1 |
| Cerebellum | ( | 1 |
| White Matter | ||
| Corpus callosum | ( | 1 |
| Brainstem | ( | 2 |
| Mesoencephalon | ( | 1 |
| Right frontal WM | ( | 1 |
Summary of the studies differentiating between normal control and patients.
| Disease | Methods | Input Modalities | Accuracy | Number of Studies | |
|---|---|---|---|---|---|
| Mean | Min - Max | ||||
| Alzheimer's Disease | SVM, OPLS, Random Forests | T1, PET, DTI, CSF | 0.897 | 0.82–0.965 | 19 |
| Autism | SVM, Decision Tree, LDA, QDA | T1, DTI, Spectroscopy | 0.867 | 0.70–0.963 | 8 |
| Multiple Sclerosis | SVM, Logistic Regression | T1, T2 | 0.915 | 0.871–0.96 | 2 |
| Parkinson's Disease | SVM, Bootstrap, Multinomial Logit | T1, T2, DTI | 0.7472 | 0.42–0.927 | 5 |
| Attention Deficit Hyperactivity Disorder | SVM, Gaussian Process Classifier | T1 | 0.8475 | 0.793–0.902 | 2 |
| Depression | SVM | T1, DTI | 0.7477 | 0.70–0.831 | 3 |