| Literature DB >> 32357201 |
Shangran Qiu1,2, Prajakta S Joshi3, Matthew I Miller1, Chonghua Xue1, Xiao Zhou2, Cody Karjadi4, Gary H Chang1, Anant S Joshi5, Brigid Dwyer6, Shuhan Zhu6, Michelle Kaku6, Yan Zhou7, Yazan J Alderazi8,9, Arun Swaminathan10, Sachin Kedar10, Marie-Helene Saint-Hilaire6, Sanford H Auerbach4,6, Jing Yuan7, E Alton Sartor6, Rhoda Au3,4,6,11,12, Vijaya B Kolachalama1,12,13,14.
Abstract
Alzheimer's disease is the primary cause of dementia worldwide, with an increasing morbidity burden that may outstrip diagnosis and management capacity as the population ages. Current methods integrate patient history, neuropsychological testing and MRI to identify likely cases, yet effective practices remain variably applied and lacking in sensitivity and specificity. Here we report an interpretable deep learning strategy that delineates unique Alzheimer's disease signatures from multimodal inputs of MRI, age, gender, and Mini-Mental State Examination score. Our framework linked a fully convolutional network, which constructs high resolution maps of disease probability from local brain structure to a multilayer perceptron and generates precise, intuitive visualization of individual Alzheimer's disease risk en route to accurate diagnosis. The model was trained using clinically diagnosed Alzheimer's disease and cognitively normal subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset (n = 417) and validated on three independent cohorts: the Australian Imaging, Biomarker and Lifestyle Flagship Study of Ageing (AIBL) (n = 382), the Framingham Heart Study (n = 102), and the National Alzheimer's Coordinating Center (NACC) (n = 582). Performance of the model that used the multimodal inputs was consistent across datasets, with mean area under curve values of 0.996, 0.974, 0.876 and 0.954 for the ADNI study, AIBL, Framingham Heart Study and NACC datasets, respectively. Moreover, our approach exceeded the diagnostic performance of a multi-institutional team of practicing neurologists (n = 11), and high-risk cerebral regions predicted by the model closely tracked post-mortem histopathological findings. This framework provides a clinically adaptable strategy for using routinely available imaging techniques such as MRI to generate nuanced neuroimaging signatures for Alzheimer's disease diagnosis, as well as a generalizable approach for linking deep learning to pathophysiological processes in human disease.Entities:
Keywords: Alzheimer’s disease; biomarkers; dementia; neurodegeneration; structural MRI
Mesh:
Substances:
Year: 2020 PMID: 32357201 PMCID: PMC7296847 DOI: 10.1093/brain/awaa137
Source DB: PubMed Journal: Brain ISSN: 0006-8950 Impact factor: 15.255
Figure 1Schematic of the deep learning framework. The FCN model was developed using a patch-based strategy in which randomly selected samples (sub-volumes of size 47 × 47 × 47 voxels) of T1-weighted full MRI volumes were passed to the model for training (Step 1). The corresponding Alzheimer’s disease status of the individual served as the output for the classification model. Given that the operation of FCNs is independent of input data size, the model led to the generation of participant-specific disease probability maps of the brain (Step 2). Selected voxels of high-risk from the disease probability maps were then passed to the MLP for binary classification of disease status (Model A in Step 3; MRI model). As a further control, we used only the non-imaging features including age, gender and MMSE and developed an MLP model to classify individuals with Alzheimer’s disease and the ones with normal cognition (Model B in Step 3; non-imaging model). We also developed another model that integrated multimodal input data including the selected voxels of high-risk disease probability maps alongside age, gender and MMSE score to perform binary classification of Alzheimer’s disease status (Model C in Step 3; Fusion model). AD = Alzheimer’s disease; NC = normal cognition.
Study population and characteristics
| Dataset | ADNI | AIBL | FHS | NACC | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Characteristic | NC | AD |
| NC | AD |
| NC | AD |
| NC | AD |
|
| ( | ( | ( | ( | ( | ( | ( | ( | |||||
| Age, years, median [range] | 76 [60, 90] | 76 [55, 91] | 0.4185 | 72 [60, 92] | 73 [55, 93] | 0.5395 | 73 [57, 100] | 81 [67, 94] | <0.0001 | 74 [56, 94] | 77 [55, 95] | 0.0332 |
| Education, years, median [range] | 16 [6, 20] | 16 [4, 20] | <0.0001 | NA | NA | NA | 14 [8, 25] | 13 [5, 25] | 0.3835 | 16 | 14.5 | 0.8363 |
| Gender, male (%) | 119 (51.96) | 101 (53.72) | 0.7677 | 144 (45.00) | 24 (38.71) | 0.40 | 37 (50.68) | 12 (41.38) | 0.51 | 126 (35.39) | 95 (45.45) | 0.0203 |
| MMSE, median [range] | 29 [25, 30] | 23.5 [18, 28] | <0.0001 | 29 [25, 30] | 21 [6, 28] | <0.0001 | 29 | 25 | <0.0001 | 29 | 22 | <0.0001 |
| APOE4, positive (%) | 61 (26.65) | 124 (65.97) | <0.0001 | 11 (3.44) | 12 (19.35) | <0.0001 | 13 (17.81) | 11 | 0.035 | 102 (28.65) | 112 (53.59) | <0.0001 |
Four independent datasets were used for this study including: the ADNI dataset, the AIBL, the FHS, and the NACC. The ADNI dataset was randomly split in the ratio of 3:1:1, where 60% of it was used for model training, 20% of the data were used for internal validation and the rest was used for internal testing. The best performing model on the validation dataset was selected for making predictions on the ADNI test data as well as on the AIBL, FHS and NACC datasets, which served as external test datasets for model validation. All the MRI scans considered for this study were performed on individuals within ±6 months from the date of clinical diagnosis. AD = Alzheimer’s disease; NA = not available; NC = normal cognition.
Years of education not available for all AIBL study participants.
Years of education not available for some study participants.
MMSE scores not available for some subjects in the study cohort within 6 months of diagnosis.
APOE4 (genetic) information not available for some subjects in the study cohort.
Figure 2Subject-specific disease probability maps. (A) Disease probability maps generated by the FCN model highlight high-risk brain regions that are associated with Alzheimer’s disease pathology. Individual cases are shown where the blue colour indicates low-risk and red indicates high-risk of Alzheimer’s disease. The first two individuals were clinically confirmed to have normal cognition whereas the other two individuals had clinical diagnosis of Alzheimer’s disease. (B–D) Axial, coronal and sagittal stacks of disease probability maps from a single subject with clinically confirmed Alzheimer’s disease are shown. All imaging planes were used to construct 3D disease probability maps. Red colour indicates locally inferred probability of Alzheimer’s disease >0.5, whereas blue indicates <0.5. AD = Alzheimer’s disease; NC = normal cognition.
Figure 3Summary of the FCN model performance. (A) Voxel-wise maps of Matthew’s correlation coefficient (MCC) were computed independently across all the datasets to demonstrate predictive performance derived from all regions within the brain. (B–D) Axial, coronal and sagittal stacks of the MCC maps at each cross-section from a single subject, are shown. These maps were generated by averaging the MCC values on the ADNI test data.
Figure 4Correlation of model findings with neuropathology. (A) Overlap of model predicted regions of high Alzheimer’s disease risk with post-mortem findings of Alzheimer’s disease pathology in a single subject. This subject had clinically confirmed Alzheimer’s disease with affected regions including the bilateral asymmetrical temporal lobes and the right-side hippocampus, the cingulate cortex, the corpus callosum, part of the parietal lobe and the frontal lobe. The first column (i) shows MRI slices in three different planes followed by a column (ii), which shows corresponding model predicted disease probability maps. A cut-off value of 0.7 was chosen to delineate the regions of high Alzheimer’s disease risk and overlapped with the MRI scan in the next column (iii). The next column (iv), depicts a segmented mask of cortical and subcortical structures of the brain obtained from FreeSurfer (Fischl, 2012). A sequential colour-coding scheme denotes different levels of pathology ranging from green (0, low) to pale red (4, high). The final column (v), shows the overlay of the magnetic resonance scan, disease probability maps of high Alzheimer’s disease risk and the colour-coded regions based on pathology grade. (B) We then qualitatively assessed trends of neuropathological findings from the FHS dataset (n = 11). The same colour-coding scheme as described above was used to represent the pathology grade (0–4) in the heat maps. The boxes coloured in ‘white’ in the heat maps indicate missing data. Using the Spearman’s Rank correlation coefficient test, an increasing Alzheimer’s disease probability risk was associated with a higher grade of amyloid-β and tau accumulation, in the hippocampal formation, the middle frontal region, the amygdala and the temporal region, respectively. Biel = Bielschowsky stain; L = left; R = right.
Figure 5Performance of the MLP model for Alzheimer’s disease classification and model comparison with neurologists. (A) Sensitivity-specificity and precision-recall curves showing the sensitivity, the true positive rate, versus specificity, the true negative rate, calculated on the ADNI test set. Individual neurologist performance is indicated by the red plus symbol and averaged neurologist performance along with the error bars is indicated by the green plus symbol on both the sensitivity-specificity and precision-recall curves on the ADNI test data. Visual description of pairwise Cohen’s kappa (κ), which denotes the inter-operator agreement between all the 11 neurologists is also shown. (B) Sensitivity-specificity and PR curves calculated on the AIBL, FHS and NACC datasets, respectively. For all cases, model A indicates the performance of the MLP model that used MRI data as the sole input, model B is the MLP model with non-imaging features as input and model C indicates the MLP model that used MRI data along with age, gender and MMSE values as the inputs for binary classification.
Performance of the deep learning models
| Accuracy | Sensitivity | Specificity | F1-score | MCC | |
|---|---|---|---|---|---|
|
| |||||
| ADNI test | 0.834 ± 0.020 | 0.767 ± 0.036 | 0.889 ± 0.030 | 0.806 ± 0.024 | 0.666 ± 0.042 |
| AIBL | 0.870 ± 0.022 | 0.594 ± 0.119 | 0.924 ± 0.025 | 0.593 ± 0.088 | 0.520 ± 0.095 |
| FHS | 0.766 ± 0.064 | 0.901 ± 0.096 | 0.712 ± 0.123 | 0.692 ± 0.044 | 0.571 ± 0.056 |
| NACC | 0.818 ± 0.033 | 0.764 ± 0.031 | 0.849 ± 0.052 | 0.757 ± 0.033 | 0.613 ± 0.059 |
|
| |||||
| ADNI test | 0.957 ± 0.010 | 0.924 ± 0.019 | 0.983 ± 0.032 | 0.951 ± 0.010 | 0.915 ± 0.020 |
| AIBL | 0.915 ± 0.022 | 0.872 ± 0.037 | 0.923 ± 0.034 | 0.772 ± 0.035 | 0.731 ± 0.035 |
| FHS | 0.760 ± 0.042 | 0.517 ± 0.043 | 0.842 ± 0.068 | 0.512 ± 0.026 | 0.367 ± 0.053 |
| NACC | 0.854 ± 0.021 | 0.881 ± 0.013 | 0.838 ± 0.041 | 0.817 ± 0.019 | 0.703 ± 0.033 |
|
| |||||
| ADNI test | 0.968 ± 0.014 | 0.957 ± 0.014 | 0.977 ± 0.031 | 0.965 ± 0.014 | 0.937 ± 0.026 |
| AIBL | 0.932 ± 0.031 | 0.877 ± 0.032 | 0.943 ± 0.042 | 0.814 ± 0.054 | 0.780 ± 0.059 |
| FHS | 0.792 ± 0.039 | 0.742 ± 0.185 | 0.808 ± 0.082 | 0.633 ± 0.076 | 0.517 ± 0.098 |
| NACC | 0.852 ± 0.037 | 0.924 ± 0.025 | 0.810 ± 0.068 | 0.824 ± 0.032 | 0.714 ± 0.053 |
Three models were constructed for explicit performance comparison. The MRI model predicted Alzheimer’s disease status based upon imaging features derived from the patch-wise trained FCN. The non-imaging model consisted of an MLP that processed non-imaging clinical variables (age, gender, MMSE). The fusion model appended the clinical variables used by the MRI model to the MLP portion of the non-imaging model in order to form a multimodal imaging/non-imaging input. Accuracy, sensitivity, specificity, F1-score, and Matthew’s correlation coefficient (MCC) are demonstrated for each. The fusion model was found to outperform the other models in nearly all metrics in each of the four datasets. Of interest, however, we noted that the performance of the MRI model and the non-imaging model still displayed higher specificity and sensitivity than many of the human neurologists, all of whom used the full suite of available data sources to arrive at an impression.
Figure 6Visualization of data. (A) Voxel-level MRI intensity values from all four datasets (ADNI, AIBL, FHS and NACC) were used as inputs and a two-dimensional plot was generated using t-SNE, a method for visualizing high-dimensional data. The colour in the plot represents the site and the digit ‘0’ was used to present cases who had normal cognition (NC) and the digit ‘1’ was used to show cases who had confirmed Alzheimer’s disease (AD). (B) This t-SNE plot was generated only on using the ADNI dataset, where the colour was used to represent the scanner. The digit ‘0’ was used for normal cognition cases and ‘1’ for Alzheimer’s disease cases. (C) FCN-based outputs that served as input features to the MLP model were embedded in a two-dimensional plot generated using t-SNE for the two classes (Alzheimer’s disease and normal cognition). The colour (blue versus red) was used to distinguish normal cognition from Alzheimer’s disease cases, whereas a unique symbol shape was used to represent individuals derived from the same cohort. Several individual cases that were clinically confirmed to have Alzheimer’s disease or normal cognition are also shown (indicated as a black circle overlying the respective datapoint). The plot also indicates co-localization of subjects in the feature space based on the disease state and not on the dataset of origin.