| Literature DB >> 25918550 |
D Mudali1, L K Teune2, R J Renken3, K L Leenders2, J B T M Roerdink4.
Abstract
Medical imaging techniques like fluorodeoxyglucose positron emission tomography (FDG-PET) have been used to aid in the differential diagnosis of neurodegenerative brain diseases. In this study, the objective is to classify FDG-PET brain scans of subjects with Parkinsonian syndromes (Parkinson's disease, multiple system atrophy, and progressive supranuclear palsy) compared to healthy controls. The scaled subprofile model/principal component analysis (SSM/PCA) method was applied to FDG-PET brain image data to obtain covariance patterns and corresponding subject scores. The latter were used as features for supervised classification by the C4.5 decision tree method. Leave-one-out cross validation was applied to determine classifier performance. We carried out a comparison with other types of classifiers. The big advantage of decision tree classification is that the results are easy to understand by humans. A visual representation of decision trees strongly supports the interpretation process, which is very important in the context of medical diagnosis. Further improvements are suggested based on enlarging the number of the training data, enhancing the decision tree method by bagging, and adding additional features based on (f)MRI data.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25918550 PMCID: PMC4395991 DOI: 10.1155/2015/136921
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1Classification steps.
Figure 2The decision tree built from the PD-HC dataset. Oval-shaped interior nodes: features (subject scores) used to split the data. Threshold values are shown next to the arrows. Rectangular leaf nodes: the final class labels (red = PD, blue = HC).
Figure 3The decision trees built from the MSA-HC (a) and PSP-HC (b) datasets. For details, refer to Figure 2.
Figure 4The decision tree built from the combined PD-PSP-MSA-HC dataset.
Classifier performance for the different data sets (patients versus healthy controls, number of cases in brackets) in the LOOCV, without feature preselection. The column Perf. (%) indicates the percentage of subject cases correctly classified per group, Sensitivity (%) indicates the percentage of correctly classified healthy controls, and Specificity (%) indicates the percentage of correctly classified patients.
| Feature set (size) | Perf. (%) | Sensitivity (%) | Specificity (%) |
|---|---|---|---|
| PD-HC (38) | 47.4 | 50 | 45 |
| MSA-HC (39) | 71.8 | 83.3 | 61.9 |
| PSP-HC (35) | 80.0 | 77.8 | 82.4 |
Classifier performance with preselection of features (patients versus healthy controls, number of cases in brackets). The percentage of principal components is arranged in order of highest to lowest variance accounted for and best number of PCs according to AIC. Highest performances are in bold.
| %/number of PCs | In order of amount of variance | According to AIC | ||||||
|---|---|---|---|---|---|---|---|---|
| 3% | 5% | 50% | 70% | 100% | 1 | 3 | 5 | |
| PD-HC (38) | 55.3 |
| 57.9 |
| 47.4 |
| 50 | 47.4 |
| MSA-HC (39) | 71.8 |
| 69.2 | 71.8 | 71.8 | 66.7 | 69.2 |
|
| PSP-HC (35) |
| 80 | 77.1 | 77.1 | 80 |
| 80 | 80 |
Performance for binary classification of disease groups in the LOOCV. The number of cases per group is in brackets. The column Perf. indicates the percentage of subject cases correctly classified (all features included), Sensitivity indicates the percentage of correctly classified first disease group, Specificity indicates the percentage of correctly classified second disease group, and Perf. (AIC-5) indicates the performance when features are reduced to the best 5 PCs according to AIC.
| Group | Perf. (%) | Sensitivity | Specificity | Perf. (AIC-5) (%) |
|---|---|---|---|---|
| PD versus MSA (41) | 73.2 | 70 | 76.2 | 78 |
| PD versus PSP (37) | 67.6 | 80 | 52.9 | 70.3 |
| MSA versus PSP (38) | 68.4 | 76.2 | 58.8 | 71.1 |
Figure 5The decision tree built from the disease groups compared to each other, that is, PD-PSP-MSA dataset.
Performance for binary classification of disease groups (number of cases in brackets) in the LOOCV with feature preselection. The columns Feat. and Perf. indicate the percentage of features used and the corresponding performance. The remaining columns show confusion matrices and class accuracies. The number of subjects correctly classified for each class is in bold.
| Feat. % | Perf. % | Class | PD (20) | PSP (17) | MSA (21) |
|---|---|---|---|---|---|
| 100 | 65.5 | PD |
| 5 | 3 |
| PSP | 4 |
| 3 | ||
| MSA | 1 | 4 |
| ||
| Accuracy | 75 | 47.1 | 71.4 | ||
|
| |||||
| 50 | 67.2 | PD |
| 5 | 2 |
| PSP | 4 |
| 4 | ||
| MSA | 1 | 3 |
| ||
| Accuracy | 75 | 52.9 | 71.4 | ||
|
| |||||
| 25 | 69 | PD |
| 5 | 2 |
| PSP | 4 |
| 3 | ||
| MSA | 1 | 3 |
| ||
| Accuracy | 75 | 52.9 | 76.2 | ||
The LOOCV performance for various types of classifier. Features used were the subject scores obtained after applying the SSM/PCA method on all subjects included in the datasets. (∗) Note that for LDA only 90% of the features were considered because of the classifier's restrictions while constructing the covariance matrix. For easy reference, the feature preselection results for C4.5 already presented in Table 2 are included.
| Dataset | PD-HC | MSA-HC | PSP-HC |
|---|---|---|---|
| Nearest neighbors | 76.3 | 76.9 | 80.0 |
| Linear SVM | 78.9 | 92.3 | 88.6 |
| Random forest | 63.2 | 61.5 | 71.4 |
| Naive Bayes | 65.8 | 71.8 | 71.4 |
| LDA (∗) | 50.0 | 61.5 | 65.7 |
| CART | 57.9 | 53.8 | 85.7 |
| C4.5 | 63.2 | 74.4 | 82.9 |