| Literature DB >> 33335244 |
Jong Bin Bae1,2, Subin Lee3, Wonmo Jung4, Sejin Park4, Weonjin Kim4, Hyunwoo Oh4, Ji Won Han1,2, Grace Eun Kim3, Jun Sung Kim3, Jae Hyoung Kim5, Ki Woong Kim6,7,8.
Abstract
The classification of Alzheimer's disease (AD) using deep learning methods has shown promising results, but successful application in clinical settings requires a combination of high accuracy, short processing time, and generalizability to various populations. In this study, we developed a convolutional neural network (CNN)-based AD classification algorithm using magnetic resonance imaging (MRI) scans from AD patients and age/gender-matched cognitively normal controls from two populations that differ in ethnicity and education level. These populations come from the Seoul National University Bundang Hospital (SNUBH) and Alzheimer's Disease Neuroimaging Initiative (ADNI). For each population, we trained CNNs on five subsets using coronal slices of T1-weighted images that cover the medial temporal lobe. We evaluated the models on validation subsets from both the same population (within-dataset validation) and other population (between-dataset validation). Our models achieved average areas under the curves of 0.91-0.94 for within-dataset validation and 0.88-0.89 for between-dataset validation. The mean processing time per person was 23-24 s. The within-dataset and between-dataset performances were comparable between the ADNI-derived and SNUBH-derived models. These results demonstrate the generalizability of our models to different patients with different ethnicities and education levels, as well as their potential for deployment as fast and accurate diagnostic support tools for AD.Entities:
Mesh:
Substances:
Year: 2020 PMID: 33335244 PMCID: PMC7746752 DOI: 10.1038/s41598-020-79243-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Within-dataset testing of AD classification algorithms.
| Trial | AUC | Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|
| 1st triala | 0.90 (0.81–0.95) | 0.85 (0.77–0.93) | 0.83 (0.67–0.93) | 0.90 (0.75–0.97) |
| 2nd triala | 0.97 (0.90–1.00) | 0.91 (0.85–0.97) | 0.85 (0.68–0.95) | 0.96 (0.85–1.00) |
| 3rd triala | 0.95 (0.88–0.99) | 0.92 (0.86–0.98) | 0.94 (0.81–0.99) | 0.91 (0.77–0.97) |
| 4th triala | 0.95 (0.87–0.99) | 0.89 (0.81–0.96) | 0.97 (0.85–1.00) | 0.82 (0.67–0.92) |
| 5th triala | 0.95 (0.87–0.99) | 0.89 (0.81–0.96) | 0.81 (0.65–0.92) | 0.95 (0.84–0.99) |
| Mean (SD) | 0.94 (0.03) | 0.89 (0.03) | 0.88 (0.07) | 0.91 (0.06) |
| 1st triala | 0.94 (0.87–0.98) | 0.92 (0.86–0.98) | 0.92 (0.79–0.98) | 0.93 (0.80–0.98) |
| 2nd triala | 0.88 (0.79–0.94) | 0.82 (0.74–0.91) | 0.79 (0.64–0.89) | 0.87 (0.70–0.96) |
| 3rd triala | 0.87 (0.77–0.94) | 0.85 (0.77–0.93) | 0.83 (0.66–0.93) | 0.86 (0.72–0.95) |
| 4th triala | 0.90 (0.81–0.96) | 0.87 (0.80–0.95) | 0.84 (0.69–0.93) | 0.91 (0.77–0.98) |
| 5th triala | 0.94 (0.86–0.98) | 0.94 (0.88–0.99) | 0.88 (0.73–0.97) | 0.98 (0.77–0.98) |
| Mean (SD) | 0.91 (0.03) | 0.88 (0.05) | 0.85 (0.05) | 0.91 (0.05) |
| 1.93 | 0.40 | 0.72 | − 0.14 | |
| 0.09 | 0.70 | 0.49 | 0.89 | |
AD classification algorithms were developed by randomly selecting 80% of the participants (156 AD patients and 156 CN controls) in each dataset (ADNI and SNUBH) and tested within each dataset on the remaining 20% of the participants (39 AD patients and 39 CN controls).
AUC area under the receiver operating characteristic curve, ADNI dataset from the Alzheimer’s Disease Neuroimaging Initiative, SNUBH dataset from the Seoul National University Bundang Hospital, SD standard deviation.
a95% confidence intervals in parentheses.
bComparison of performances on the ADNI and SNUBH datasets using Student’s t-test.
Between-dataset testing of AD classification algorithms.
| Trial | AUC | Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|
| 1st triala | 0.87 (0.84–0.90) | 0.82 (0.78–0.86) | 0.80 (0.74–0.85) | 0.84 (0.78–0.89) |
| 2nd triala | 0.88 (0.85–0.91) | 0.83 (0.79–0.87) | 0.77 (0.70–0.83) | 0.89 (0.83–0.93) |
| 3rd triala | 0.88 (0.84–0.91) | 0.83 (0.79–0.87) | 0.74 (0.67–0.80) | 0.92 (0.88–0.96) |
| 4th triala | 0.89 (0.86–0.92) | 0.84 (0.80–0.87) | 0.74 (0.68–0.80) | 0.93 (0.88–0.96) |
| 5th triala | 0.86 (0.82–0.89) | 0.81 (0.77–0.85) | 0.76 (0.69–0.82) | 0.86 (0.80–0.90) |
| Mean (SD) | 0.88 (0.01) | 0.83 (0.01) | 0.76 (0.03) | 0.89 (0.04) |
| 1st triala | 0.90 (0.87–0.93) | 0.83 (0.79–0.86) | 0.85 (0.79–0.90) | 0.80 (0.74–0.85) |
| 2nd triala | 0.88 (0.84–0.91) | 0.81 (0.77–0.85) | 0.72 (0.65–0.78) | 0.90 (0.85–0.94) |
| 3rd triala | 0.90 (0.86–0.92) | 0.82 (0.79–0.86) | 0.79 (0.73–0.85) | 0.86 (0.80–0.90) |
| 4th triala | 0.89 (0.85–0.92) | 0.83 (0.79–0.87) | 0.82 (0.75–0.87) | 0.84 (0.78–0.89) |
| 5th triala | 0.88 (0.84–0.91) | 0.80 (0.76–0.84) | 0.77 (0.71–0.83) | 0.83 (0.77–0.88) |
| Mean (SD) | 0.89 (0.01) | 0.82 (0.01) | 0.79 (0.05) | 0.85 (0.04) |
| ADNI—SNUBHb | ||||
| − 1.53 | 1.01 | − 0.79 | 1.64 | |
| 0.17 | 0.34 | 0.45 | 0.14 | |
| Within—betweenc | ||||
| ADNI | ||||
| 5.52 | 5.57 | 2.56 | 0.51 | |
| 0.005 | 0.005 | 0.06 | 0.64 | |
| SNUBH | ||||
| 1.26 | 2.64 | 4.17 | 1.86 | |
| 0.28 | 0.06 | 0.01 | 0.14 | |
AD classification algorithms were developed by randomly selecting 80% of the participants (156 AD patients and 156 CN controls) in each dataset (ADNI and SNUBH) and tested within each dataset on the remaining 20% of the participants (39 AD patients and 39 CN controls).
AUC area under the receiver operating characteristic curve, ADNI dataset from the Alzheimer’s Disease Neuroimaging Initiative, SNUBH dataset from the Seoul National University Bundang Hospital, SD standard deviation.
a95% confidence intervals in parentheses.
bComparison of performances on the ADNI and SNUBH datasets using Student’s t-test.
cComparison of within-dataset and between-dataset performances using a paired t-test.
Characteristics of participants.
| AD | CN | |||||||
|---|---|---|---|---|---|---|---|---|
| ADNI | SNUBH | t or χ2 | ADNI | SNUBH | t or χ2 | |||
| N | 195 | 195 | - | - | 195 | 195 | – | – |
| Age (years, mean ± SD) | 74.7 ± 8.2 | 74.5 ± 8.7 | 0.2 | 0.84 | 74.8 ± 6.6 | 73.9 ± 6.3 | 1.4 | 0.17 |
| Sex (women, %) | 91 (46.7%) | 91 (46.7%) | 0.0 | > 0.99 | 91 (46.7%) | 91 (46.7%) | 0.0 | > 0.99 |
| Education (years, mean ± SD) | 15.5 ± 2.9 | 10.2 ± 5.4 | 12.2 | < 0.001 | 16.0 ± 2.7 | 11.4 ± 4.8 | 11.6 | < 0.001 |
| CDR (score, mean ± SD) | 0.8 ± 0.3 | 0.8 ± 0.3 | 0.0 | > 0.99 | 0.0 ± 0.0 | 0.0 ± 0.0 | 0.0 | > 0.99 |
| SOB (score, mean ± SD) | 4.5 ± 1.6 | 4.3 ± 1.9 | 1.0 | 0.30 | 0.0 ± 0.0 | 0.0 ± 0.0 | 0.0 | > 0.99 |
| MMSE (score, mean ± SD) | 23.0 ± 2.3 | 18.4 ± 5.0 | 11.7 | < 0.001 | 29.1 ± 1.3 | 27.2 ± 2.4 | 9.5 | < 0.001 |
AD Alzheimer’s disease, CN cognitively normal, ADNI Alzheimer’s Disease Neuroimaging Initiative, SNUBH Seoul National University Bundang Hospital, SD standard deviation, CDR clinical dementia rating scale, SOB sum of box scores of CDR, MMSE Mini Mental State Examination.
Figure 1Preprocessing for extracting 2D coronal slices of the medial temporal lobe from complete 3D brain scans. The input whole-brain 3D T1-weighted MRI images are subjected to an initial rigid transformation to fit a template, followed by brain extraction (skull stripping). Next, a second rigid transformation is applied to the skull-stripped version of the template. Once the subject image is in the same space as the template, the range of slices that correspond to the MTL in the template are used to extract coronal slices from the template-registered output subject image.
Figure 2Diagram of the network architecture. For each subject, 1 out of 30 coronal slices are fed into the model independently, and the results of the 30 slices are averaged to produce a AD probability for that subject. The first part of the model consists of the architecture of a pretrained network (Inception V4), and the last part of the model involves the addition of the subject’s age, sex, and slice location (a). The specific constituents of Inception v4 are shown (stem, Inception-A, Inception-B, Inception-C, Reduction-A, Reduction-B) (b).
Figure 3Training of the coronal slice-based AD classification model. We performed stratified fivefold cross-validation to distribute the samples equally by considering class balance between the training set and validation set. The average ensemble values of the average probabilities of the models (models a to e) generated from cross-validation are then used as the final results in the testing phase.