| Literature DB >> 34814936 |
Martin Dyrba1, Moritz Hanzig2,3, Slawek Altenstein4,5, Sebastian Bader3, Tommaso Ballarini6, Frederic Brosseron6,7, Katharina Buerger8,9, Daniel Cantré10, Peter Dechent11, Laura Dobisch12, Emrah Düzel12,13, Michael Ewers8,9, Klaus Fliessbach6,7, Wenzel Glanz12, John-Dylan Haynes14, Michael T Heneka6,7, Daniel Janowitz9, Deniz B Keles15, Ingo Kilimann2,16, Christoph Laske17,18,19, Franziska Maier20, Coraline D Metzger12,13,21, Matthias H Munk17,19,22, Robert Perneczky8,23,24,25, Oliver Peters4,15, Lukas Preis4,15, Josef Priller4,5,26, Boris Rauchmann23, Nina Roy6, Klaus Scheffler27, Anja Schneider6,7, Björn H Schott28,29,30, Annika Spottke6,31, Eike J Spruth4,5, Marc-André Weber10, Birgit Ertl-Wagner32,33, Michael Wagner6,7, Jens Wiltfang28,29,34, Frank Jessen6,20,35, Stefan J Teipel2,16.
Abstract
BACKGROUND: Although convolutional neural networks (CNNs) achieve high diagnostic accuracy for detecting Alzheimer's disease (AD) dementia based on magnetic resonance imaging (MRI) scans, they are not yet applied in clinical routine. One important reason for this is a lack of model comprehensibility. Recently developed visualization methods for deriving CNN relevance maps may help to fill this gap as they allow the visualization of key input image features that drive the decision of the model. We investigated whether models with higher accuracy also rely more on discriminative brain regions predefined by prior knowledge.Entities:
Keywords: Alzheimer’s disease; Convolutional neural network; Deep learning; Layer-wise relevance propagation; MRI
Mesh:
Year: 2021 PMID: 34814936 PMCID: PMC8611898 DOI: 10.1186/s13195-021-00924-2
Source DB: PubMed Journal: Alzheimers Res Ther Impact factor: 6.982
Overview of previous studies applying neural networks for the detection of AD and MCI
| Study (chronologic order) | Data type | Sample | Algorithm | Performance | Addressed model comprehensibility | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| AD | MCI | CN | Groups | Accuracy | Balanced accuracy | |||||
| Suk et al. [ | MRI GM and FDG-PET | 93 | 76/128 | 101 | RBM on class discriminative patches selected by statistical significance tests | AD/CN MCI/CN MCIc/MCInc | 95.4% 85.7% 74.6% | 94.9% 80.6% 71.6% | 0.988 0.881 0.747 | Visualization of selected features (image patches) and RBM model weights projected on MRI scan |
| Li et al. [ | MRI and FDG-PET | 51 | 43/56 | 52 | RBM for feature learning, SVM for classification | AD/CN MCI/CN MCIc/MCInc | 91.4% 77.4% 57.4% | No | ||
| Ortiz et al. [ | MRI GM and FDG-PET | 70 | 39/64 | 68 | RBM for feature learning, SVM for classification | AD/CN MCIc/CN MCIc/MCInc | 90% 83% 78% | 0.95 0.95 0.82 | Visualization of SVM model weights projected on MRI scan | |
| Aderghal et al. [ | MRI and DTI | 188 | 339 | 228 | CNN for hippocampus region of interest only | AD/CN MCI/CN | 92.5% 80.0% | 92.5% 82.9% | No | |
| Liu et al. [ | FDG-PET | 93 | 146 | 100 | CNN and RNN | AD/CN MCI/CN | 91.2% 78.9% | 0.953 0.839 | Visualization of most contributing brain areas obtained from occlusion sensitivity analysis | |
| Liu et al. [ | MRI | 199 | – | 229 | CNN on landmarks selected by statistical significance tests | AD/CN MCIc/CN | 90.6% | 0.957 | Visualization of top 50 anatomical landmarks used as input for the CNN | |
| Lin et al. [ | MRI | 188 | 169/193 | 229 | CNN | AD/CN MCIc/MCInc | 88.8% 79.9% | 0.861 | No | |
| Böhle et al. [ | MRI | 211 | – | 169 | CNN | AD/CN | 88.0% | Visualization of LRP relevance and guided backpropagation maps, comparison of LRP relevance scores by group and brain region | ||
| Li et al. [ | MRI | Training 192 Test 225 | 383 479 | 228 639 | CNN for hippocampus only | AD/CN MCIc/MCInc | 92.9% | 0.958 0.891 | Visualization of most contributing hippocampus areas obtained from CNN class activation mapping | |
| Dyrba et al. [ | MRI | 189 | 219 | 254 | CNN for coronal slices covering hippocampus | AD/CN MCI/CN | 0.93 0.75 | Visualization of LRP and other methods’ relevance maps and comparison by diagnostic group | ||
| Lian et al. [ | MRI | Training 199 Test 159 | 167/226 38/239 | 229 200 | CNN | AD/CN MCIc/MCInc | 90.3% 80.9% | 0.951 0.781 | Visualization of most contributing image areas obtained from CNN class activation mapping | |
| Qiu et al. [ | MRI | Training 188 Test1 62 Test2 29 Test3 209 | – – – – | 229 320 73 356 | FCN | AD/CN1 AD/CN2 AD/CN3 | 87.0% 76.6% 81.8% | 0.870 0.892 0.881 | Visualization of most contributing brain areas obtained from occlusion sensitivity analysis | |
| Wen et al. [ | MRI | Training 336 Test1 76 Test2 78 | 295/298 20/13 – | 330 429 76 | CNN | AD/CN1 MCIc/MCInc1 AD/CN2 | 86% 50% 70% | No | ||
| Thibeau-Sutre et al. [ | MRI | Training 336 Test 76 | – – | 330 429 | CNN | AD/CN | 90% | Visualization of most contributing brain areas obtained from occlusion sensitivity analysis | ||
| Jo et al. [ | Tau-PET | 66 | – | 66 | CNN | AD/CN | 90.8% | Visualization of LRP relevance maps, visualization of most contributing brain areas obtained from occlusion sensitivity analysis | ||
Empty cells in the performance columns indicate that the respective values were not reported
AD Alzheimer’s dementia, MCI mild cognitive impairment, MCIc MCI converted to dementia, MCInc non-converter/stable MCI, CN cognitively normal controls, DTI diffusion tensor imaging, FCN fully connected network, RBM restricted Boltzmann machine, RNN recurrent neural network, CNN convolutional neural network, MRI T1-weighted magnetic resonance imaging, GM gray matter volume, FDG-PET glucose metabolism derived from fluorodeoxyglucose positron emission tomography
Summary of sample characteristics
| Sample | CN | MCI | AD |
|---|---|---|---|
| Sample size (female) | 254 (130) | 220 (93) | 189 (80) |
| Age ( | 75.4 (6.6) | 74.1 (8.1) | 75.0 (8.0) |
| Education ( | 16.4 (2.7) | 16.2 (2.8) | 15.9 (2.7) |
| MMSE ( | 29.1 (1.2) | 27.6 (1.9) | 22.6 (3.2) |
| RAVLT Delayed recall ( | 7.6 (4.1) | 3.2 (3.7) | 0.8 (1.9) |
| WMS-LM Delayed recall ( | 13.9 (3.7) | 5.1 (3.8) | 1.5 (2.1) |
| Hippocampus volume ( | 6235 (756) | 5619 (963) | 4834 (930) |
| Amyloid status (neg/pos) | 177/77 | 79/141 | 28/161 |
| MRI field strength (1.5T/3T) | 71/183 | 49/171 | 35/154 |
| Sample size (female) | 326 (211) | 187 (85) | 62 (27) |
| Age ( | 70.0 (7.5) | 72.2 (7.5) | 74.8 (7.7) |
| Education ( | 16.6 (2.2) | 16.6 (2.5) | 16.5 (2.4) |
| MMSE ( | 29.1 (1.1) | 27.8 (2.0) | 23.1 (3.3) |
| RAVLT Delayed recall ( | 8.3 (4.4) | 4.7 (4.7) | 0.3 (0.9) |
| WMS-LM Delayed recall ( | 13.0 (3.5) | 7.2 (3.9) | 2.0 (2.8) |
| Hippocampus volume ( | 6583 (649) | 6112 (902) | 4839 (978) |
| Amyloid status (neg/pos) | 75/39 | 19/27 | 3/17 |
| MRI field strength (1.5T/3T) | 0/326 | 0/187 | 0/62 |
| Sample size (female) | 448 (260) | 96 (46) | 62 (36) |
| Age ( | 72.4 (6.2) | 74.3 (6.9) | 73.2 (7.3) |
| MMSE ( | 28.7 (1.2) | 27.0 (2.2) | 21.2 (5.3) |
| WMS-LM Delayed recall ( | 11.2 (4.3) | 4.9 (4.0) | 1.0 (1.9) |
| Hippocampus volume ( | 6362 (704) | 5712 (1028) | 4940 (1055) |
| Amyloid status (neg/pos) | 316/101 | 34/54 | 6/53 |
| MRI field strength (1.5T/3T) | 55/393 | 7/89 | 2/60 |
| Sample size (female) | 215 (124) | 155 (72) | 104 (61) |
| Age ( | 69.5 (5.5) | 73.0 (5.7) | 75.2 (6.2) |
| Education ( | 14.7 (2.7) | 14.0 (3.1) | 12.9 (3.1) |
| MMSE ( | 29.5 (0.8) | 27.8 (2.0) | 23.1 (3.2) |
| WMS-LM Delayed recall ( | 14.3 (3.6) | 7.4 (5.2) | 1.8 (2.8) |
| Hippocampus volume ( | 6543 (679) | 5665 (950) | 4610 (944) |
| Amyloid status (neg/pos) | 58/28 | 30/57 | 5/49 |
| MRI field strength (1.5T/3T) | 0/215 | 0/155 | 0/104 |
Numbers indicate mean and standard deviation (SD) if not indicated otherwise. Years of education were not available for the AIBL dataset. RAVLT Delayed recall scores were not available for the AIBL and DELCODE samples
CN cognitively normal controls, MCI amnestic mild cognitive impairment, AD Alzheimer’s dementia, SD standard deviation, MMSE Mini Mental State Examination, RAVLT Rey Auditory Verbal Learning Test, WMS-LM Wechsler Memory Scale Logical Memory Test, MRI magnetic resonance imaging
Fig. 1Data flow chart and convolutional neural network structure
Fig. 2Web application to interactively examine the neural network relevance maps for individual MRI scans
Group separation performance for hippocampus volume and the convolutional neural network models
| Sample | Hippocampus volume (residuals) | 3D convolutional neural network | ||
|---|---|---|---|---|
| Balanced accuracy (mean ± | Balanced accuracy (mean ± | |||
| MCI vs. CN | (70.0% ± 6.8%) | (0.773 ± 0.091) | ||
| AD vs. CN | (84.4% ± 3.6%) | (0.945 ± 0.024) | ||
| MCI+ vs. CN− | (75.6% ± 7.1%) | (0.831 ± 0.080) | ||
| AD+ vs. CN− | (86.2% ± 4.2%) | (0.954 ± 0.025) | ||
| MCI vs. CN | 62.8% (63.1% ± 1.4%) | 0.683 | 63.1% (63.6% ± 1.5%) | 0.684 (0.677 ± 0.020) |
| AD vs. CN | 83.4% (83.4% ± 0.4%) | 0.917 | 84.4% (81.7% ± 2.9%) | 0.913 (0.899 ± 0.013) |
| MCI+ vs. CN− | 69.1% (69.2% ± 2.7%) | 0.791 | 69.8% (68.3% ± 4.4%) | 0.810 (0.742 ± 0.024) |
| AD+ vs. CN− | 83.6% (82.0% ± 1.8%) | 0.882 | 80.2% (75.5% ± 4.2%) | 0.830 (0.828 ± 0.028) |
| MCI vs. CN | 67.4% (67.6% ± 0.5%) | 0.741 | 68.2% (67.3% ± 2.7%) | 0.763 (0.749 ± 0.012) |
| AD vs. CN | 84.1% (85.3% ± 1.5%) | 0.927 | 85.0% (82.3% ± 3.0%) | 0.950 (0.926 ± 0.007) |
| MCI+ vs. CN− | 78.5% (78.8% ± 0.9%) | 0.874 | 75.4% (73.6% ± 3.1%) | 0.828 (0.814 ± 0.022) |
| AD+ vs. CN− | 87.2% (89.1% ± 2.4%) | 0.976 | 88.3% (85.3% ± 3.3%) | 0.978 (0.958 ± 0.011) |
| MCI vs. CN | 69.0% (69.0% ± 9.6%) | 0.774 | 71.0% (69.7% ± 2.6%) | 0.775 (0.772 ± 0.017) |
| AD vs. CN | 88.4% (86.4% ± 3.0%) | 0.943 | 85.5% (80.5% ± 4.0%) | 0.953 (0.938 ± 0.013) |
| MCI+ vs. CN− | 77.4% (77.8% ± 0.7%) | 0.867 | 72.2% (74.9% ± 3.5%) | 0.840 (0.830 ± 0.017) |
| AD+ vs. CN− | 88.2% (87.6% ± 1.8%) | 0.954 | 83.3% (82.2% ± 4.0%) | 0.968 (0.956 ± 0.012) |
Reported values are for the single model trained on the whole ADNI-GO/2 dataset. In parenthesis, the mean values and standard deviation for the ten models trained in the tenfold cross-validation procedure are provided to indicate the variability of the measures. Values for the ADNI-GO/2 sample (in italics) may be biased as the respective test subsamples were used to determine the optimal model during training. We still report them for better comparison of the model performance across samples
Fig. 3Example relevance maps obtained for different people. Top row: Alzheimer’s dementia patients, middle row: patients with mild cognitive impairment, bottom row: cognitively normal controls
Fig. 4Mean relevance maps for Alzheimer’s dementia patients (top row), patients with mild cognitive impairment (middle row), and healthy controls (bottom row) for the DELCODE validation sample. Relevance maps thresholded at 0.2 for better comparison
Fig. 5Results from the occlusion sensitivity analysis. A gray matter volume loss of 50% was simulated in a cube of 30-mm edge length. Each voxel encodes the derived values when centering the cube at that position. Top: probability of AD for the areas with simulated atrophy. Bottom: total sum of image relevance depending on simulated atrophy. Numbers indicate the y-axis slice coordinates in MNI reference space
Fig. 6Scatter plot and correlation of bilateral hippocampus volume and neural network relevance scores for the hippocampus region for the DELCODE sample (r = −0.87, p < 0.001)