| Literature DB >> 22675292 |
Wei Du1, Vince D Calhoun, Hualiang Li, Sai Ma, Tom Eichele, Kent A Kiehl, Godfrey D Pearlson, Tülay Adali.
Abstract
We present a novel method to extract classification features from functional magnetic resonance imaging (fMRI) data collected at rest or during the performance of a task. By combining a two-level feature identification scheme with kernel principal component analysis (KPCA) and Fisher's linear discriminant analysis (FLD), we achieve high classification rates in discriminating healthy controls from patients with schizophrenia. Experimental results using leave-one-out cross-validation show that features extracted from the default mode network (DMN) lead to a classification accuracy of over 90% in both data sets. Moreover, using a majority vote method that uses multiple features, we achieve a classification accuracy of 98% in auditory oddball (AOD) task and 93% in rest data. Several components, including DMN, temporal, and medial visual regions, are consistently present in the set of features that yield high classification accuracy. The features we have extracted thus show promise to be used as biomarkers for schizophrenia. Results also suggest that there may be different advantages to using resting fMRI data or task fMRI data.Entities:
Keywords: FLD; KPCA; classification; fMRI; independent component analysis
Year: 2012 PMID: 22675292 PMCID: PMC3366580 DOI: 10.3389/fnhum.2012.00145
Source DB: PubMed Journal: Front Hum Neurosci ISSN: 1662-5161 Impact factor: 3.169
Demographic and clinical characteristics of patients with schizophrenia (.
| Variable | SZ | HC | |
|---|---|---|---|
| Age | 39.4 ± 12.7 | 31.5 ± 11.1 | 2.4/0.02 |
| Percent male | 82 | 68 | NS |
| NART, estimated IQ | 105.3 ± 6.9 | 111.3 ± 8.3 | NS |
| PANSS (P/N) | 15.8 ± 5.5/15.4 ± 5.6 | NA | NA |
| Percent treated with atypical antipsychotic medication | 100 | NA | NA |
| Percent treated with antidepressants | 43 | NA | NA |
| Percent with some psychotic symptoms | 67 | NA | NA |
SZ, schizophrenia; HC, healthy control; NS, non-significant; NART, national adult reading test; NA, non-applicable; P/N, positive/negative; Group comparisons are reported in the last column.
Figure 1The flow chart shows a leave-one-out scheme given . This includes the preprocessing stage and the three-phase feature selection and extraction framework. Spatial components as inputs are obtained from the data preprocessing stage. Training and test data are processed separately in the whole procedure.
Figure 2A one-sample . A two-sample t-test is performed to find voxels that are significantly different between two groups. The features including the significantly activated and different voxels are identified in this two-level feature identification scheme.
Figure 3Four slices from each component are shown in the figure; components are identified from the AOD task. Each component is entered into a one-sample t-test and thresholded at P < 1e−7 (corrected for multiple comparisons using the family wise error (FWE) approach, implemented in SPM5).
The definitions of sensitivity and specificity.
| Condition | ||
|---|---|---|
| Test outcome | True positive (TP) | False positive (FP) |
| False negative (FN) | True negative (TN) | |
| Sensitivity = TP/(TP + FN) | Specificity = TN/(FP + TN) | |
TP, correctly diagnosed patients; FP, incorrectly identified patients; TN, correctly diagnosed controls; FN, incorrectly identified controls.
Figure 4The figure shows features for controls versus patients after each feature extraction step. Each dot represents an individual and the color of the dot indicates the correct diagnosis of either control (blue) or schizophrenia (red). Individuals are close to each other if the Euclidean distances between training data are small. The original training samples cannot be separated by a linear classifier. A two-level feature identification step is used to select significantly different voxels. After KPCA, most of training samples separate to two groups. Training samples are linearly separable and a maximum margin is obtained after FLD. Parameters in two-level feature identification step are t1 = 0.5 and t2 = 0.5, which are selected during the training stage using the DMN component (from rest data) as input to the framework.
Classification results with AOD data (one component as features).
| Index | Accuracy | Sensitivity | Specificity |
|---|---|---|---|
| 1 | 0.93 | 0.93 | 0.93 |
| 2 | 0.86 | 0.89 | 0.82 |
| 3 | 0.91 | 0.86 | 0.96 |
| 4 | 0.88 | 0.86 | 0.89 |
| 5 | 0.82 | 0.82 | 0.82 |
| 6 | 0.86 | 0.86 | 0.86 |
| 7 | 0.84 | 0.75 | 0.93 |
| 8 | 0.84 | 0.89 | 0.79 |
| 9 | 0.82 | 0.79 | 0.86 |
| 10 | 0.80 | 0.82 | 0.79 |
| 11 | 0.79 | 0.71 | 0.86 |
| 12 | 0.82 | 0.86 | 0.79 |
| 13 | 0.84 | 0.68 | 1.00 |
| 14 | 0.86 | 0.79 | 0.93 |
Classification results using feature combination.
| Data set | Combinations | Sensitivity | Specificity | Accuracy |
|---|---|---|---|---|
| AOD | 1, 2, 3, 8, 14 | 0.98 | 1.00 | 0.98 |
| Rest | 1, 2, 11 | 0.93 | 0.93 | 0.93 |
| 1, 4, 14 | ||||
| 1, 2, 4, 11, 14 |
Classification results with rest data (one component as features).
| Index | Accuracy | Sensitivity | Specificity |
|---|---|---|---|
| 1 | 0.91 | 0.89 | 0.93 |
| 2 | 0.88 | 0.86 | 0.89 |
| 3 | NA | NA | NA |
| 4 | 0.84 | 0.86 | 0.82 |
| 5 | 0.82 | 0.93 | 0.71 |
| 6 | 0.80 | 0.82 | 0.79 |
| 7 | 0.82 | 0.79 | 0.86 |
| 8 | 0.84 | 0.82 | 0.86 |
| 9 | 0.84 | 0.89 | 0.79 |
| 10 | 0.79 | 0.79 | 0.79 |
| 11 | 0.80 | 0.82 | 0.79 |
| 12 | 0.82 | 0.82 | 0.82 |
| 13 | 0.80 | 0.64 | 0.96 |
| 14 | 0.84 | 0.89 | 0.79 |
Figure 5The figure shows the comparison between the classification accuracy in AOD and rest data. Using most of components of interest can lead to a higher accuracy in the AOD data than the rest data.
Figure 6The figure shows three estimated components in the rest data set. Each component is entered into a one-sample t-test and thresholded at P < 0.01 (corrected for multiple comparisons using FWE) shown with 16 slices. The left and middle components are the estimated temporal and motor-temporal components in the control group, respectively. The right component is the only temporal related component estimated in patients.