Hongyoon Choi1,2, Seunggyun Ha1,2, Hyung Jun Im1,3, Sun Ha Paek4, Dong Soo Lee1,2,5. 1. Department of Nuclear Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea. 2. Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Republic of Korea. 3. Department of Transdisciplinary Studies, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Republic of Korea. 4. Department of Neurosurgery, Seoul National University Hospital, Seoul, Republic of Korea. 5. Korea Brain Research Institute, Daegu, Republic of Korea.
Abstract
Dopaminergic degeneration is a pathologic hallmark of Parkinson's disease (PD), which can be assessed by dopamine transporter imaging such as FP-CIT SPECT. Until now, imaging has been routinely interpreted by human though it can show interobserver variability and result in inconsistent diagnosis. In this study, we developed a deep learning-based FP-CIT SPECT interpretation system to refine the imaging diagnosis of Parkinson's disease. This system trained by SPECT images of PD patients and normal controls shows high classification accuracy comparable with the experts' evaluation referring quantification results. Its high accuracy was validated in an independent cohort composed of patients with PD and nonparkinsonian tremor. In addition, we showed that some patients clinically diagnosed as PD who have scans without evidence of dopaminergic deficit (SWEDD), an atypical subgroup of PD, could be reclassified by our automated system. Our results suggested that the deep learning-based model could accurately interpret FP-CIT SPECT and overcome variability of human evaluation. It could help imaging diagnosis of patients with uncertain Parkinsonism and provide objective patient group classification, particularly for SWEDD, in further clinical studies.
Dopaminergic degeneration is a pathologic hallmark of Parkinson's disease (PD), which can be assessed by dopamine transporter imaging such as FP-CIT SPECT. Until now, imaging has been routinely interpreted by human though it can show interobserver variability and result in inconsistent diagnosis. In this study, we developed a deep learning-based FP-CIT SPECT interpretation system to refine the imaging diagnosis of Parkinson's disease. This system trained by SPECT images of PDpatients and normal controls shows high classification accuracy comparable with the experts' evaluation referring quantification results. Its high accuracy was validated in an independent cohort composed of patients with PD and nonparkinsonian tremor. In addition, we showed that some patients clinically diagnosed as PD who have scans without evidence of dopaminergic deficit (SWEDD), an atypical subgroup of PD, could be reclassified by our automated system. Our results suggested that the deep learning-based model could accurately interpret FP-CIT SPECT and overcome variability of human evaluation. It could help imaging diagnosis of patients with uncertain Parkinsonism and provide objective patient group classification, particularly for SWEDD, in further clinical studies.
Entities:
Keywords:
Deep learning; Deep neural network; FP-CIT; Parkinson's disease; SWEDD
Dopamine transporter (DAT) imaging such as 123I-fluoropropyl-carbomethoxyiodophenylnortropane (FP-CIT) single-photon emission computed tomography (SPECT) is one of the established tools for the diagnosis of Parkinson's disease (PD) (de la Fuente-Fernandez, 2012). In the clinical setting, visual analysis of FP-CIT SPECT has been routinely performed for determining whether a subject has dopaminergic degeneration. Currently, visual analysis combined with striatal DAT quantification is regarded as a standard practice in clinical studies (Albert et al., 2016). However, visual analysis is suboptimal because it causes interobserver variability (McKeith et al., 2007, Papathanasiou et al., 2012, Tondeur et al., 2010).The main indication of FP-CIT SPECT is differentiating mild or uncertain Parkinsonismpatients (Marshall and Grosset, 2003). However, because of uncertainty in PD classification and DAT imaging interpretation, atypical subgroup among PDpatients has been consistently identified. It is scans without evidence of dopaminergic deficit (SWEDD). The term SWEDD refers to the absence of imaging abnormality in patients who are clinically diagnosed as PD. SWEDD patients are approximately 10–15% of clinically diagnosed PDpatients (Group, P.S., 2000, Marek et al., 2014, Parkinson Study, G, 2002). There is growing evidence that the SWEDD is different from typical PD in terms of pathophysiology and prognosis (Fahn et al., 2004, Schwingenschuh et al., 2010). However, the determination of SWEDD is often inconsistent because of visual interpretation of DAT imaging which has high sensitivity (98%) but low specificity (67%) in early PD (de la Fuente-Fernandez, 2012).In this study, we aimed to develop an automated FP-CIT SPECT interpretation system based on deep learning for the objective diagnosis. Recent development of deep learning is changing a variety of scientific and industrial fields (LeCun et al., 2015). Deep convolutional neural networks (CNN), a type of deep learning, have dramatically improved the performance in image classification and detection (Krizhevsky et al., 2012, LeCun et al., 2015). Recently, deep learning techniques have started to be applied to medical images for segmentation, lesion-detection, and disease classification (Choi and Jin, 2016, Ithapu et al., 2015, Moeskops et al., 2016, Pereira et al., 2016, Shen et al., 2015, Wong and Bressler, 2016, Zhang et al., 2015). Our objective in terms of clinical application was to discriminate PD among patients with uncertain Parkinsonism. In this study, the system was developed using Parkinson's Progression Markers Initiative (PPMI) database. It was further validated in an independent data acquired from Seoul National University Hospital (SNUH) that consists of patients with PD and nonparkinsonian tremor.
Materials and methods
Subjects
Data used in the preparation of this article were obtained from two different cohorts, the PPMI database (www.ppmi-info.org/data) and SNUH cohort. For up-to-date information of PPMI database on the study, visit www.ppmi-info.org. The subjects of the PPMI cohort in this study consisted of 431 patients with PD, 193 normal controls (NCs) and 77 patients with SWEDD. PDpatients and NCs were divided into two datasets, training/validation set and test set, to develop the CNN and test its accuracy. Training/validation set consisted of 549 subjects (379 PD and 170 NCs). 75 subjects (52 PD and 23 NCs) were included in the PPMI test set to evaluate the accuracy of our framework. Training and test sets were randomly selected from the PPMI cohort. The two sets were divided so that the ratio between PD and NC was the same. SNUH cohort was applied as an independent test set from the training data. SNUH cohort included 82 patients initially suspected of PD who underwent FP-CIT SPECT from Mar 2014 to Sep 2016. FP-CIT SPECT scans were acquired to determine treatment plan and obtain accurate diagnosis.Informed consents to clinical testing and neuroimaging prior to participation of the PPMI cohort were obtained, approved by the institutional review boards (IRB) of all participating institutions. The retrospective study using SNUH cohort was approved by IRB of our institute, and informed consent was waived due to the retrospective design. All procedures performed in studies involving humanparticipants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.Baseline diagnosis in the PPMI was made by clinical evaluation according to the UK PD Brain Bank criteria (Gibb and Lees, 1988). Patients with PD have had their clinical diagnosis for 2 years or less, and they were untreated status. In addition, according to the PPMI diagnosis criteria, PD was diagnosed if a patient also had imaging evidence for dopaminergic deficits interpreted by the PPMI imaging core. Thus, the gold standard of our further analysis was the clinical diagnosis and the results of visual imaging interpretation determined by the imaging core consensus of PPMI. Patients with SWEDD were clinically PDpatients, but they had no evidence of dopaminergic deficit in the imaging. Motor ratings were clinically assessed with the revised Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS) part 3 at baseline.Interpretation of FP-CIT SPECT from SNUH cohort was initially determined by concurrence of image interpretation among 3 nuclear medicine physicians. Images were visually assessed and classified into two groups, patients with preserved and reduced DAT density. To reach a consensus on the imaging interpretation, readers referred clinical symptoms, and drug response according to the clinical follow-up. Accordingly, subjects of the SNUH cohort were divided into two groups, 72 PDpatients and 10 patients with nonparkinsonian tremor.
FP-CIT SPECT images
Because of different SPECT systems in different centers, PPMI used standardized imaging acquisition protocol. SPECT was performed at the screening visit. Prior to the injection of FP-CIT, subjects were pretreated with iodine solution for thyroid protection. Images were acquired within 4 ± 0.5 h after the radiotracer injection with a target dose of 111–185 MBq. SPECT data were acquired into a 128 × 128 matrix.After the acquisition, the raw data were transferred to the PPMI imaging core and reconstructed using a hybrid ordered subset expectation maximization algorithm (Hermes Medical Solutions, Stockholm, Sweden). The subsequent processing was performed on PMOD (PMOD Technologies, Zurich, Switzerland). Attenuation correction was applied to the reconstructed data by Chang's correction. Spatial normalization into Montreal Neurological Institute (MNI) space was performed in PMOD using a template image based on a European multicenter database of healthy controls (Varrone et al., 2013). The dimension of final preprocessed images was 91 × 100 × 91, and the voxel size was 2 × 2 × 2 mm3. SPECT images of SNUH dataset were acquired by a dedicated triple-head gamma camera (TRIONIX Triad XLT 3, Trionix Research Laboratory, Inc., Twinsburg, OH, USA) with Fan-Beam collimator. Subjects were intravenously injected 185 MBq of FP-CIT 3 h before image acquisition. Images were acquired by protocols of 40 step-and-shoot for 45 s per each step. Images were reconstructed as follows: 1) 128 × 128 matrices, 2) filtered back projection, 3) Butterworth filter with high cut frequency of 0.4 and roll off degree of 5.0, 4) Chang's method for attenuation correction. Spatial normalization was performed in Statistical Parametric Mapping (SPM8, University College of London, London, UK) using an in-house template with MNI space. We checked whether normalized SPECT images using SPM8 was aligned with those of PPMI cohort normalized by PMOD. The final preprocessed images have same dimensions and voxel size with those of PPMI cohort.
Regional DAT binding ratio
Automated quantification of DAT binding ratio (BR) was performed for SPECT data as a conventional method for quantitative analysis. Each spatially normalized SPECT image was used to calculate regional BR. Mean counts of target regions were calculated. Target regions were putamen/caudate and occipital cortex. Automated anatomical labeling (AAL) template was used to segment the target regions of each SPECT image and mean counts were calculated. BR was defined as BR = (C − C) / C, where C represented mean counts of the region. Note that the counts of occipital cortex were regarded as the region of nonspecific binding.
Deep CNN architecture and training
We designed a deep CNN framework, PDNet, and the architecture is summarized in Fig. 1A. Input data were SPECT images downloaded from the PPMI database without further processing. Input values of voxels were rescaled by the range from 0 to 255, and then mean scalar value of each SPECT volume was subtracted. After this step, each 3D volume (91 × 109 × 91) was used for input argument of PDNet.
Fig. 1
Deep convolutional neural network framework (PD Net) for interpretation of FP-CIT SPECT images. (A) A FP-CIT SPECT volume with matrix size 91 × 109 × 91 is used for an input matrix of PD Net. It consists of multiple 3-dimensional convolutional layers which learn image features from training data. Each convolutional layer is followed by ReLU activation function and max-pooling layers subsample images. The final output of PD Net has two nodes, which respectively correspond to Parkinson's disease and normal control. (B) Parameters of convolutional layers of PD Net were learned by training SPECT dataset to discriminate SPECT images of Parkinson's disease from those of normal controls. The accuracy of the classification was measured from two independent test datasets. Two expert readers interpreted same image data blinded to diagnosis. The accuracy of PD Net and the readers was compared. In addition, the classification using PD Net was tested in Parkinson's disease patients who have scans without evidence of dopaminergic deficit (SWEDD) whether PD Net interpreted those images as normal scans. (C) PPMI and SNUH cohorts were used for the PD Net training and validation. PD and NC subjects of PPMI data were randomly divided into two datasets, training/validation and test sets. SWEDD subjects of PPMI data were used for another set for testing refined diagnosis by PD Net. Another independent cohort, SNUH dataset was used for another test set for differentiating PD from nonparkinsonian tremor. For SWEDD cohorts, 2-year follow-up image and clinical diagnosis was reassessed.
Deep convolutional neural network framework (PDNet) for interpretation of FP-CIT SPECT images. (A) A FP-CIT SPECT volume with matrix size 91 × 109 × 91 is used for an input matrix of PDNet. It consists of multiple 3-dimensional convolutional layers which learn image features from training data. Each convolutional layer is followed by ReLU activation function and max-pooling layers subsample images. The final output of PDNet has two nodes, which respectively correspond to Parkinson's disease and normal control. (B) Parameters of convolutional layers of PDNet were learned by training SPECT dataset to discriminate SPECT images of Parkinson's disease from those of normal controls. The accuracy of the classification was measured from two independent test datasets. Two expert readers interpreted same image data blinded to diagnosis. The accuracy of PDNet and the readers was compared. In addition, the classification using PDNet was tested in Parkinson's diseasepatients who have scans without evidence of dopaminergic deficit (SWEDD) whether PDNet interpreted those images as normal scans. (C) PPMI and SNUH cohorts were used for the PDNet training and validation. PD and NC subjects of PPMI data were randomly divided into two datasets, training/validation and test sets. SWEDD subjects of PPMI data were used for another set for testing refined diagnosis by PDNet. Another independent cohort, SNUH dataset was used for another test set for differentiating PD from nonparkinsonian tremor. For SWEDD cohorts, 2-year follow-up image and clinical diagnosis was reassessed.Zero-padding along two dimensional axes (x- and z-axes) was applied to images have 109 × 109 × 109 matrix dimensions. The images were passed by the 3-D convolutional layer which produced 16 feature maps after the 7 × 7 × 7 convolutional filters. As a stride size of four voxels was applied, size of the feature maps was 26 × 26 × 26. After the convolutional layer, Rectified Linear Unit (ReLU) activation layer and max-pooling layer were followed. For max-pooling operation, pool size of 3 × 3 × 3 and a stride size of two voxels were applied. 3-D convolutional layers with filter size of 5 × 5 × 5 and 3 × 3 × 3 were followed. Number of filter banks of these two convolutional layers was 64 and 256, respectively. ReLU activation layers were respectively applied after these convolutional layers. Max-pooling layer was applied after the second convolutional layer. Consequently, these multiple layers produced 256 feature vectors. The 256 features were connected to two output labels (fully-connected layer), Parkinson's disease and NC. A softmax function, exponential activation function with normalized operator, was applied to discriminate two labels after the output of the fully-connected layer. The network was trained to minimize the cross entropy loss between the predicted diagnosis and the true diagnosis of the patients.This training was conducted by stochastic gradient descent algorithm using MatConvNet deep learning library (Version 1.0-beta 20) (Vedaldi and Lenc, 2015). 90.0% of imaging data of training/validation set (494/549 subjects) were used for the training. Those 494 SPECT scans were left-right flipped for imaging data augmentation. The remaining 10.0% data of training/validation set were used for the validation which helped monitor the performance of PDNet. Therefore, the validation set was used to determine architecture and parameters including training epoch, number of nodes, layers and learning rate. PDNet was trained for 30 epochs. The momentum parameter was set to 0.9. The learning rate was initially 1 × 10− 4 and logarithmically decreased to have 1 × 10− 6 at the final epoch.Study design.The strategy for the image interpretation using PDNet is summarized in Fig. 1B. PDNet was trained by 90.0% of data of training/validation set and the remaining 10.0% data were used for the validation which helped find the best model of PDNet. Since model architectures and parameters could be varied during experiments, validation dataset was used for the model optimization. Validation data were randomly selected among training/validation set so that they also have same ratio of PD to NC. The performance was independently tested by two different test sets, PPMI and SNUH dataset. An overall workflow for training and testing process of PDNet and information of two cohorts for the study are summarized in Fig. 1C.Two readers visually reviewed images of PPMI test set blinded to the diagnosis and clinical information. Images were visually labeled with ‘normal’ and ‘abnormal’ DAT binding. The accuracy of PDNet was compared with that of readers. Additionally, PDNet classification was evaluated in SWEDD group to test whether PDNet classified SWEDD patients as normal SPECT as the visual analysis did.Accuracy test for PDNet and comparison with conventional analysisSensitivity, specificity, and accuracy of PDNet were calculated for the PPMI test set and those of two readers were also obtained. As a conventional approach reflecting the clinical setting, the accuracy of the overall decision results of two readers referring DATBR quantification was also obtained. Two readers referred DATBR of putamen and caudate and made consensus image diagnosis for each image. The results of accuracy were statistically compared with McNemar's nonparametric test. The degree of interobserver agreement between the two readers was measured by calculating Cohen's kappa-values. The output of PDNet provided scores for the probability of PD and NC. Using the scores of PDNet, receiver operating characteristic (ROC) analysis was performed. ROC curves of two readers were drawn. In addition, a ROC curve of conventional quantification method, putaminal BR, was also drawn. The area-under-curves (AUCs) were compared by a nonparametric test of DeLong for comparison of two correlated ROC curves (DeLong et al., 1988). ROC analysis was additionally performed in SNUH test set. ROC curves were drawn for the output score of PDNet and putaminal BR.
Test for SWEDD group
As defined in the term of SWEDD, SWEDD patients had normal DAT binding according to the visual interpretation consensus. SPECT images of SWEDD were evaluated by PDNet to divide those patients into two groups, ‘normal’ and ‘abnormal DAT’. Follow-up SPECT scans after 2 years for the SWEDD patients were evaluated. Among 77 subjects, 42 subjects underwent 2-year follow-up SPECT scans. In addition, clinical follow-up diagnosis was reassessed. 56 subjects were available for 2-year follow-up clinical diagnosis data. In order to compare these two groups (PDNet normal/abnormal in SWEDD patients), BR at baseline as well as at 2-year follow-up was compared using Mann-Whitney test. Two-year follow-up visual interpretation results of the two groups were statistically compared using chi-square test. In addition, follow-up clinical diagnosis after 2 years for SWEDD patients was also assessed according to the PDNet classification.
Results
Accuracy for the classification between PD and NC
Clinical characteristics of the subjects are summarized in Table 1. Images of the PPMI test dataset were independently interpreted by two nuclear imaging experts. The interobserver agreement measured by kappa was 0.65 ± 0.11. Nine cases among 75 test data (12.0%) were disagreed between the readers.
Table 1
Subjects' demographics and clinical data.
PPMI training/validation set(n = 549)
PPMI test set(n = 75)
PPMI SWEDD set
SNUH test set(n = 82)
Parkinson's disease(n = 379)
Normal control(n = 170)
Parkinson's disease(n = 52)
Normal control(n = 23)
SWEDD(n = 77)
Parkinson's disease(n = 72)
Normal control(n = 10)
Age
61.5 ± 9.9
60.9 ± 11.5
63.0 ± 7.7
58.9 ± 9.2
60.1 ± 10.8
62.5 ± 11.4
64.9 ± 11.4
Sex (M/F)
245/134
112/58
33/19
16/7
47/30
38/34
4/6
Disease duration (months)
6.5 ± 6.5
6.8 ± 6.8
7.4 ± 8.0
N/A
MDS-UPDRS part III
22.1 ± 9.9
19.3 ± 8.8
14.8 ± 10.8
N/A
Hoehn and Yahr stage
1.6 ± 0.5
1.6 ± 0.5
1.5 ± 0.6
2.9 ± 0.8
Receiver operating characteristic (ROC) curves for PDNet, human readers and conventional quantification. ROC curves are drawn for PDNet, the readers and putaminal binding ratio (BR) using PPMI test set data (Red line: PDNet, Blue line: reader 1, Green line: reader 2, Orange line: putaminal BR). Color shading shows the ROC curves of 95% CI. Area under curves were 0.988 ± 0.011, 0.860 ± 0.048, 0.763 ± 0.055 and 0.921 ± 0.034 for PDNet (A), reader 1 (B), reader 2 (C) and putaminal BR (D), respectively. ROC curves were also drawn for SNUH test set (E, F). Area under curves were 0.997 ± 0.003 and 0.968 ± 0.017 for PDNet and putaminal BR, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)Subjects' demographics and clinical data.Accuracy of PDNet and visual interpretation for discriminating Parkinson's disease from normal control (PPMI cohort) and from nonparkinsonian tremor (SNUH cohort).p-Value was uncorrected for multiple comparison.The sensitivity, specificity, and accuracy for differentiating PD from NC were evaluated (Table 2). PDNet showed 94.2% sensitivity to detect abnormal DAT which was not significantly different from the sensitivity of the two readers (98.1 and 96.2%, respectively). Specificity of PDNet was 100% and significantly higher than the two readers (73.9 and 56.5%; p = 0.030 and 0.002, respectively). Overall accuracy of PDNet was also significantly higher than that of individual readers (96.0% vs. 90.7% and 84.0%; p = 0.008 and 0.001, respectively). The accuracy of PDNet was comparable with the visual analysis referring quantitative analysis. Visual analysis combined with conventional quantification showed 96.2%, 82.6% and 92.0% for sensitivity, specificity and accuracy, respectively (Table 2). Specificity of PDNet was significantly higher than visual analysis combined with conventional quantification. The numbers of true positive, false positive, true negative and false negative were summarized in Supplementary Table 1. PDNet showed no false positive while visual analysis combined with conventional quantification showed 4 false positives. ROC curves for PDNet and the readers were drawn (Fig. 2). AUC value of PDNet was significantly higher than the individual readers as well as conventional quantification method, putaminal BR (0.988 ± 0.011, 0.860 ± 0.048, 0.763 ± 0.055 and 0.921 ± 0.034 for PDNet, reader 1, 2 and putaminal BR, respectively; p = 0.006, < 0.001 and 0.024 for PDNet vs. reader 1, vs. reader 2 and vs. putaminal BR, respectively).
Table 2
Accuracy of PD Net and visual interpretation for discriminating Parkinson's disease from normal control (PPMI cohort) and from nonparkinsonian tremor (SNUH cohort).
PPMI test set
p-Value⁎ (for comparison with PD Net)
SNUH test set
Rater 1
Rater 2
Visual + conventional quantification
PD Net
vs. Rater 1
vs. Rater 2
vs. visual + conventional quantification
PD Net
Sensitivity
98.1%
96.2%
96.2%
94.2%
n.s.
n.s.
n.s.
98.6%
Specificity
73.9%
56.5%
82.6%
100%
0.03
0.002
0.05
100%
Accuracy
90.7%
84.0%
92.0%
96.0%
0.008
0.001
n.s. (0.06)
98.8%
p-Value was uncorrected for multiple comparison.
Fig. 2
Receiver operating characteristic (ROC) curves for PD Net, human readers and conventional quantification. ROC curves are drawn for PD Net, the readers and putaminal binding ratio (BR) using PPMI test set data (Red line: PD Net, Blue line: reader 1, Green line: reader 2, Orange line: putaminal BR). Color shading shows the ROC curves of 95% CI. Area under curves were 0.988 ± 0.011, 0.860 ± 0.048, 0.763 ± 0.055 and 0.921 ± 0.034 for PD Net (A), reader 1 (B), reader 2 (C) and putaminal BR (D), respectively. ROC curves were also drawn for SNUH test set (E, F). Area under curves were 0.997 ± 0.003 and 0.968 ± 0.017 for PD Net and putaminal BR, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
PD Net for discriminating PD from nonparkinsonian tremor: independent test set
To validate PDNet, the performance was tested in an independent dataset acquired from SNUH. The PDNet was used to differentiate PD from patients with nonparkinsonian tremor. Accuracy of PDNet in this dataset was comparable with that in the PPMI dataset. Sensitivity, specificity and accuracy of PDNet for discriminating PD were 98.6%, 100% and 98.8%, respectively. ROC analysis revealed a trend of higher AUC value of PDNet than that of quantitative analysis using putaminal BR (0.997 ± 0.003 for PDNet and 0.968 ± 0.017 for putaminal BR; p = 0.081).
PD Net for SWEDD classification
All the scans of SWEDD patients were classified as ‘normal scan’ according to the consensus of PPMI visual interpretation. Among 77 patients, 6 patients (7.8%) revealed dopaminergic deficit when PDNet analyzed the SPECT images. They showed significantly lower DAT binding ratio (BR) of putamen and caudate nuclei than SWEDD patients with normal DAT according to the PDNet analysis (Putaminal BR: 1.22 ± 0.24 vs. 2.03 ± 0.40; Caudate BR: 1.33 ± 0.27 vs. 1.86 ± 0.40; p = 0.0002 and 0.002, respectively) (Table 3, Fig. 3). The follow-up assessment including FP-CIT SPECT was performed at 2 years after the baseline study. Follow-up SPECT scans were also classified by the consensus of PPMI visual interpretation. 42 among 77 subjects underwent 2-year follow-up SPECT as well as baseline. The follow-up visual interpretation was changed in 80.0% (4/5) subjects of SWEDD patients who showed abnormal DAT in PDNet at baseline and underwent follow-up SPECT scans. Those 4 subjects were clinically PD according to the follow-up diagnosis and a subject who showed normal SPECT in follow-up scan was a patient with Alzheimer's dementia. On the other hand, 5.4% (2/37) subjects of SWEDD patients who showed normal DAT in PDNet and underwent follow-up SPECT became positive in 2-year follow-up. So, the conversion of imaging diagnosis into abnormal DAT in 2-year follow-up was significantly more in subjects who showed abnormal DAT in baseline PDNet than those who showed normal in PDNet (p = 0.0001, Table 3). According to the clinical follow-up diagnosis, it was revealed that 76.5% (39/51) subjects of SWEDD with normal PDNet had nonparkinsonian tremor including essential tremor and psychogenic illness. Only 23.5% (12/51) of them were still clinically PD in follow-up exams. Among 12 clinical PD, 9 subjects had 2-year follow-up SPECT. 2 subjects were abnormal DAT in 2-year follow-up scan while seven subjects showed still normal DAT in the follow-up (Table 3).
Table 3
Reclassification of SWEDD patients according to the results of PD Net.
SWEDD patients
Baseline SPECT (n = 77)
2-years follow-up SPECT (n = 42)
Clinical follow-up diagnosis (n = 56)
n
Putamen BR
Caudate BR
PPMI visual consensus (abnormal:normal)
Putamen BR
Caudate BR
PPMI visual consensus normal/PD Net abnormal
6
1.22 ± 0.24
1.33 ± 0.27
4:1 (80.0%)a
1.01 ± 0.21
1.12 ± 0.15
4: PD1: Alzheimer's dementia
PPMI visual consensus normal/PD Net normal
71
2.03 ± 0.40
1.86 ± 0.40
2:35 (5.4%)b
1.77 ± 0.45
1.64 ± 0.41
12: PDc10: Essential tremor10: No neurologic disease6: Psychogenic illness13: Other types of nonparkinsonian tremor
p-Value
0.0002
0.002
0.0001
0.001
0.002
No follow-up imaging and clinical diagnosis data for a subject.
No follow-up imaging data for 34 subjects and no follow-up clinical diagnosis data for 20 subjects.
Among 12 subjects, two showed abnormal follow-up SPECT and 7 subjects showed normal follow-up scan, still clinically SWEDD. The others did not undergo 2-years follow-up SPECT.
Fig. 3
Binding ratio (BR) of SWEDD patients according to the PD Net classification. Baseline putaminal (A) and caudate (B) BR of SWEDD patients who showed decreased dopamine transporter (DAT) in PD Net analysis were significantly lower than those of SWEDD patients who showed normal DAT in PD Net (Putaminal BR: 1.22 ± 0.24 vs. 2.03 ± 0.40; Caudate BR: 1.33 ± 0.27 vs. 1.86 ± 0.40; p = 0.0002 and 0.002, respectively). BR was calculated in 2-years follow-up scans (C, D). Follow-up putaminal (C) and caudate (D) BR were also significantly lower in SWEDD patients with baseline abnormal PD Net than SWEDD patients with baseline normal PD Net (Putaminal BR: 1.01 ± 0.21 vs. 1.77 ± 0.45; Caudate BR: 1.12 ± 0.15 vs. 1.64 ± 0.41; p = 0.001 and 0.002, respectively).
Binding ratio (BR) of SWEDD patients according to the PDNet classification. Baseline putaminal (A) and caudate (B) BR of SWEDD patients who showed decreased dopamine transporter (DAT) in PDNet analysis were significantly lower than those of SWEDD patients who showed normal DAT in PDNet (Putaminal BR: 1.22 ± 0.24 vs. 2.03 ± 0.40; Caudate BR: 1.33 ± 0.27 vs. 1.86 ± 0.40; p = 0.0002 and 0.002, respectively). BR was calculated in 2-years follow-up scans (C, D). Follow-up putaminal (C) and caudate (D) BR were also significantly lower in SWEDD patients with baseline abnormal PDNet than SWEDD patients with baseline normal PDNet (Putaminal BR: 1.01 ± 0.21 vs. 1.77 ± 0.45; Caudate BR: 1.12 ± 0.15 vs. 1.64 ± 0.41; p = 0.001 and 0.002, respectively).Reclassification of SWEDD patients according to the results of PDNet.No follow-up imaging and clinical diagnosis data for a subject.No follow-up imaging data for 34 subjects and no follow-up clinical diagnosis data for 20 subjects.Among 12 subjects, two showed abnormal follow-up SPECT and 7 subjects showed normal follow-up scan, still clinically SWEDD. The others did not undergo 2-years follow-up SPECT.Additionally, DATBR of putamen and caudate nuclei in 2-year follow-up scans also showed significant difference between two groups (Putaminal BR: 1.01 ± 0.21 vs. 1.77 ± 0.45; Caudate BR: 1.12 ± 0.15 vs. 1.64 ± 0.41; p = 0.001 and 0.002, respectively) (Fig. 3). Representative cases of refined SWEDD diagnosis were presented in Fig. 4. Though two baseline SPECT images were classified as normal according to the visual interpretation consensus, the PDNet classified one as normal and the other as abnormal. The subject who had abnormal DAT on PDNet analysis showed abnormal DAT at 2-year follow-up even in visual analysis, while the other subject with normal DAT on PDNet analysis stayed normal in the follow-up scan.
Fig. 4
Refining SWEDD classification using PD Net analysis. Representative two cases show different image diagnosis analyzed by PD Net. Two subjects had normal DAT according to the visual interpretation consensus, while PD Net revealed that a subject (above) had reduced DAT in the striatum. The 2-years follow-up SPECT of the subject was abnormal according to the visual interpretation consensus. However, a SWEDD subject (below) who also showed normal DAT in PD Net persistently has normal DAT in the follow-up scan.
Refining SWEDD classification using PDNet analysis. Representative two cases show different image diagnosis analyzed by PDNet. Two subjects had normal DAT according to the visual interpretation consensus, while PDNet revealed that a subject (above) had reduced DAT in the striatum. The 2-years follow-up SPECT of the subject was abnormal according to the visual interpretation consensus. However, a SWEDD subject (below) who also showed normal DAT in PDNet persistently has normal DAT in the follow-up scan.
Discussion
We showed that the deep learning-based FP-CIT SPECT interpretation system could accurately and objectively determine dopaminergic degeneration and refine diagnosis. The accuracy of PDNet was comparable with experts' reading referring conventional quantitative analysis which has been regarded as a clinical standard. Our approach was validated in the independent SNUH SPECT dataset for discriminating PD from nonparkinsonian tremorpatients. In addition, some of SWEDD patients, a heterogeneous group that could be inconsistently classified according to the clinical studies, had dopaminergic degeneration in PDNet analysis. It was revealed that those patients eventually had dopaminergic degeneration in follow-up study, which implied they could be initially misclassified as SWEDD. As complicated image feature selection was not required and provided objective classification of SPECT images, PDNet was practical to use in the clinical setting.The main advantage of PDNet is in its objectiveness and high accuracy. It could overcome interobserver variability of visual interpretation which has been routinely performed in FP-CIT SPECT analysis (McKeith et al., 2007, Papathanasiou et al., 2012, Tondeur et al., 2010). In our study, Cohen's kappa of two independent readers was 0.65 ± 0.11, and the interpretation of 12.0% cases of the test dataset was disagreed. Such interobserver variability in image interpretation could affect treatment plan as well as clinical diagnosis. Moreover, overall accuracy of PDNet for discriminating PD was significantly higher than that of conventional quantification method, putaminal BR, as well as visual interpretation. Accuracy of PDNet was comparable with a clinical standard of image diagnosis made by multiple experts' reading referring quantification results (Albert et al., 2016). Moreover, specificity of PDNet was significantly higher than this conventional analysis method. Because of its high accuracy and objective results, PDNet could have clinical impacts on the diagnosis of PD.In the clinical setting, FP-CIT SPECT is mainly performed to discriminate neurodegenerative Parkinsonism from nonparkinsonian tremor. On the other hand, PDNet was trained to discriminate between PDpatients and controls. It is not regarded as a common clinical indication for FP-CIT SPECT because initial diagnosis of PD was made by clinical examination (Marshall and Grosset, 2003). In our study, using SNUH dataset, we showed that PDNet could differentiate PD from nonparkinsonian tremor with high accuracy. It suggested a feasibility of application of PDNet to differentiating neurodegenerative Parkinsonism from clinically ambiguous patients. Nevertheless, this test dataset was retrospectively collected and hardly reflected the performance for patients with mild or uncertain Parkinsonism. Therefore, further prospective study of the application of PDNet to validating clinical usefulness for patients with uncertain Parkinsonism will be needed.In addition, our approach could be used to refine diagnostic subgroups in clinical trials by objective identification of SWEDD participants. According to the result, PDNet identified abnormal DAT in 6 (7.8%) SWEDD patients and most of them (80.0%) eventually showed abnormal DAT in longitudinal follow-up visual interpretation. However, only two subjects among SWEDD patients who were also baseline normal DAT in PDNet analysis were changed to abnormal DAT in the follow-up interpretation . SWEDD patients are different from PD as previous studies showed poor responsiveness to levodopa, first-line drug for the management of PD (Fahn et al., 2004). Of note, DAT of SWEDD patients is mostly remained normal in long-term follow-up (Marek et al., 2005, Marek et al., 2014). SWEDD patients who had normal DAT in PDNet analysis mostly remained normal DAT after 2 years (94.6%). It suggested SWEDD patients who showed abnormal DAT in PDNet might be resulted from misclassification. Furthermore, DATBR of SWEDD patients who showed abnormal DAT in PDNet was significantly lower than that of SWEDD patients who showed normal DAT. Our results also imply that some patients with PD could be misclassified as SWEDD in several clinical trials as the imaging diagnosis has been made by visual interpretation. A recent retrospective study also showed that a large proportion of SWEDD population was due to SPECT misinterpretation (Nicastro et al., 2016). Moreover, a systematic review related to SWEDD revealed that SWEDD patients were heterogeneous and mostly due to a clinical misdiagnosis of PD (Erro et al., 2016). Our results corresponded to this review as most (76.5%) SWEDD patients with normal PDNet result had nonparkinsonian tremor in long term follow-up. This misclassification issue might influence the result of therapeutic interventions in clinical trials. An important advance in the application of PDNet to clinical studies could be an objective identification of dopaminergic degeneration, which results in refining subgroup classification of PDpatients, particularly for SWEDD group.According to the PDNet analysis, three PD subjects of PPMI data were misclassified as normal. Among them, two subjects were also misclassified by experts' reading referring conventional quantitative analysis. They were relatively early PD (UPDRS part 3 score was 8 and 17). For another misclassified subject, the output score of PDNet was 0.42. This value was the highest in NCs. It suggests that decision criteria using a different threshold value for the PDNet output score could improve the diagnosis for clinical settings. Thus, we also used ROC analysis, which revealed that AUC of PDNet was higher than conventional methods.PDNet is superior to other automated methods in terms of ease of application and performance. Recently, other machine learning methods using quantitative parameters of FP-CIT SPECT combined with or without clinical factors showed good accuracy (90–96%) for the diagnosis of PD (Huertas-Fernandez et al., 2015, Illan et al., 2012, Prashanth et al., 2014). Though these methods also showed high accuracy, there are limitations in generalization and clinical implementation. They used imaging features such as striatal BR rather than images themselves. The feature selection procedures are not standardized as quantification of striatal BR could be affected by image processing steps such as normalization and selection of nonspecific regions (Brahim et al., 2015, Tossici-Bolt et al., 2006). PDNet directly analyzed all input voxels and automatically found patterns of them, which resulted in high accuracy without striatal BR calculation. Moreover, generalized application of PDNet was validated by an independent cohort of SNUH.In spite of high accuracy of PDNet, it was tested by discriminating PDpatients from NCs. In the clinical setting, FP-CIT SPECT scans were acquired for patients with atypical Parkinsonism and SWEDD as well as PD. Thus, the accuracy of PDNet did not reflect the patient characteristics in the clinic and could be overestimated. In addition, PDNet training relied on the gold standard diagnosis of PPMI cohort, which was the clinical diagnosis combined with the visual imaging interpretation instead of pure clinical diagnosis independent from the image interpretation. Nonetheless, the strength of PDNet is less interobserver variability which could provide consistent interpretation results. Because of this strength, it can be used in clinical trials which require objective biomarkers. As another limitation in the study design, PDNet ignored patients' characteristics such as age. Because DAT is influenced by aging (Pirker et al., 2000), PDNet could not differentiate age-related degeneration from PD-related degeneration. In the future, modified designs of deep neural network which considers clinical variables could improve diagnostic performance in the clinical setting. The independent test set, SNUH test set, includes relatively small number of patients with nonparkinsonian tremor. Furthermore, the gold standard diagnosis for SNUH test set was made by visual interpretation results considering clinical information. Therefore, PDNet should be validated in a larger prospective cohort that includes patients with several movement disorders and uncertain Parkinsonism with clinically follow-up diagnosis.
Conclusion
We designed a deep CNN model, PDNet, for FP-CIT SPECT interpretation. Its accuracy for discriminating PD from NCs was comparable to that of the clinical standard, experts' visual interpretation combined with quantification. Our approach was also validated for discriminating PD from nonparkinsonian tremor using independent SPECT data. As an automated system, it could overcome interobserver variability which might result in misclassification of subject groups. Accordingly, a promising application of PDNet will be an objective diagnosis for patients with clinically uncertain Parkinsonism who showed ambiguous FP-CIT SPECT results. Furthermore, it will apply to reclassification of SWEDD group. In the future, the application will be extended to imaging interpretation in various diseases and development of imaging biomarkers.The following is the supplementary data related to this article.
Supplementary Table 1
Number of true positive, false positive, true negative and false negative for the PPMI test set.
Authors: I Huertas-Fernández; F J García-Gómez; D García-Solís; S Benítez-Rivero; V A Marín-Oyaga; S Jesús; M T Cáceres-Redondo; J A Lojo; J F Martín-Rodríguez; F Carrillo; P Mir Journal: Eur J Nucl Med Mol Imaging Date: 2014-08-14 Impact factor: 9.236
Authors: Pim Moeskops; Max A Viergever; Adrienne M Mendrik; Linda S de Vries; Manon J N L Benders; Ivana Isgum Journal: IEEE Trans Med Imaging Date: 2016-03-30 Impact factor: 10.048
Authors: Petra Schwingenschuh; Diane Ruge; Mark J Edwards; Carmen Terranova; Petra Katschnig; Fatima Carrillo; Laura Silveira-Moriyama; Susanne A Schneider; Georg Kägi; Francisco J Palomar; Penelope Talelli; John Dickson; Andrew J Lees; Niall Quinn; Pablo Mir; John C Rothwell; Kailash P Bhatia Journal: Mov Disord Date: 2010-04-15 Impact factor: 10.338
Authors: Arnoldo Piccardo; Roberto Cappuccio; Gianluca Bottoni; Diego Cecchin; Luca Mazzella; Alessio Cirone; Sergio Righi; Martina Ugolini; Pietro Bianchi; Pietro Bertolaccini; Elena Lorenzini; Michela Massollo; Antonio Castaldi; Francesco Fiz; Laura Strada; Angelina Cistaro; Massimo Del Sette Journal: Eur Radiol Date: 2021-03-08 Impact factor: 5.315
Authors: Markus Wenzel; Fausto Milletari; Julia Krüger; Catharina Lange; Michael Schenk; Ivayla Apostolova; Susanne Klutmann; Marcus Ehrenburg; Ralph Buchert Journal: Eur J Nucl Med Mol Imaging Date: 2019-08-31 Impact factor: 9.236