| Literature DB >> 29984156 |
Qiongmin Ma1, Tianhao Zhang2, Marcus V Zanetti3, Hui Shen4, Theodore D Satterthwaite5, Daniel H Wolf5, Raquel E Gur5, Yong Fan2, Dewen Hu4, Geraldo F Busatto3, Christos Davatzikos2.
Abstract
With the advent of Big Data Imaging Analytics applied to neuroimaging, datasets from multiple sites need to be pooled into larger samples. However, heterogeneity across different scanners, protocols and populations, renders the task of finding underlying disease signatures challenging. The current work investigates the value of multi-task learning in finding disease signatures that generalize across studies and populations. Herein, we present a multi-task learning type of formulation, in which different tasks are from different studies and populations being pooled together. We test this approach in an MRI study of the neuroanatomy of schizophrenia (SCZ) by pooling data from 3 different sites and populations: Philadelphia, Sao Paulo and Tianjin (50 controls and 50 patients from each site), which posed integration challenges due to variability in disease chronicity, treatment exposure, and data collection. Some existing methods are also tested for comparison purposes. Experiments show that classification accuracy of multi-site data outperformed that of single-site data and pooled data using multi-task feature learning, and also outperformed other comparison methods. Several anatomical regions were identified to be common discriminant features across sites. These included prefrontal, superior temporal, insular, anterior cingulate cortex, temporo-limbic and striatal regions consistently implicated in the pathophysiology of schizophrenia, as well as the cerebellum, precuneus, and fusiform, middle temporal, inferior parietal, postcentral, angular, lingual and middle occipital gyri. These results indicate that the proposed multi-task learning method is robust in finding consistent and reliable structural brain abnormalities associated with SCZ across different sites, in the presence of multiple sources of heterogeneity.Entities:
Keywords: Imaging heterogeneity; MRI; Multi-site classification; Multi-task learning; Schizophrenia; Sparsity
Mesh:
Year: 2018 PMID: 29984156 PMCID: PMC6029565 DOI: 10.1016/j.nicl.2018.04.037
Source DB: PubMed Journal: Neuroimage Clin ISSN: 2213-1582 Impact factor: 4.881
Characteristics of the participants in this study.
| Variable | Sample size | Gender (male/female) | Age (years) Mean ± SD (range) | |
|---|---|---|---|---|
| Site A | SCZ | 50 | 28/22 | 35.38 ± 11.78(19–60) |
| NC | 50 | 25/25 | 32.50 ± 12.96(15–65) | |
| 0.69 | 0.25 | |||
| Site B | SCZ | 50 | 34/16 | 27.48 ± 7.89(18–50) |
| NC | 50 | 29/21 | 30.60 ± 8.17(18–50) | |
| 0.30 | 0.055 | |||
| Site C | SCZ | 50 | 25/25 | 34.10 ± 8.44(16–56) |
| NC | 50 | 22/28 | 32.24 ± 11.42(21–57) | |
| 0.55 | 0.36 | |||
| Site A + B + C | SCZ | 150 | 87/63 | 32.32 ± 10.08(16–60) |
| NC | 150 | 76/74 | 31.78 ± 10.99(15–65) | |
| 0.20 | 0.66 | |||
Note: SCZ: schizophrenia; NC: normal controls.
Pearson Chi-square test.
Two-sample t-test.
Fig. 1Workflow of each iteration in multi-site classification (with total 20 repeated experiments). (1) There were three sites in our study (sites A, B, and C). After preprocessing, the N samples in each data site were randomly divided into 5 folds and designated to three sets: one fold for testing, 80% and 20% of the other 4 folds for training and validation, respectively. Each sample had the feature dimension of D. (2) Feature learning. The training sets from three sites were used in the multi-task learning framework, which generated the feature weights of the three sites W. The three column vectors in the weight matrix W were then sorted according to their absolute values respectively, while the feature weights of the site-shared features obtained according to the K top ranked feature weights were strengthened. The new feature weights matrix W′ was generated. (3) Parameter tuning. Using the feature weights learned in the last step, the K features and their corresponding weights in the validation samples from three sites were used to classify the validation sets. The parameter set which contributed to the best classification accuracy was selected. (4) Testing. The best parameter set was used in testing and the classification accuracy was obtained.
Summary of the comparison methods.
| Method | Description |
|---|---|
| The proposed multi-task classification framework on multi-site data, which contains | |
| Single-task classification on each single site data: the single-task feature learning step uses the | |
| Data from all the datasets are pooled together as a larger dataset. The classification framework is the same as the single-task classification. | |
| Using multi-site data, the feature learning step is multi-task learning, and uses SVM classifier with the linear kernel. | |
| Use single-task feature learning framework and a SVM classifier to classify each single site data. | |
| Use single-task feature learning framework and a SVM classifier to classify the pooled data. | |
| PCA + SVM (SS) | Use principal component analysis (PCA) to learn features in each single site data and a SVM classifier with the linear kernel. |
| PCA + SVM (PO) | Use principal component analysis (PCA) to learn features on the pooled data and a SVM classifier with the linear kernel. |
| ttest2 + SVM (SS) | Use two-sample |
| ttest2 + SVM (PO) | Using two-sample |
The average accuracy values of multi-site, single-site and pooling classification with 20 repetitions in 10 experiments.
| Method | Site A | Site B | Site C | Average of 3 sites |
|---|---|---|---|---|
| 0.76 | 0.56 | 0.825 | 0.713 | |
| 0.70 | 0.575 | 0.753 | 0.676 | |
| 0.713 | 0.648 | 0.658 | 0.673 | |
| 0.728 | 0.598 | 0.825 | 0.690 | |
| 0.610 | 0.573 | 0.668 | 0.617 | |
| PCA + SVM (SS) | 0.668 | 0.474 | 0.724 | 0.622 |
| PCA + SVM (PO) | 0.646 | 0.494 | 0.732 | 0.624 |
| ttest2 + SVM (SS) | 0.562 | 0.468 | 0.680 | 0.570 |
| ttest2 + SVM (PO) | 0.672 | 0.530 | 0.646 | 0.616 |
MS: multi-site data; SS: single-site data; PO: pooled data.
Fig. 2Left panel: locations of the site-shared features Right panel: locations of the site-specific features corresponding to each site. These site-shared and site-specific features were shown with cluster size of >50 voxels obtained by multi-task learning on multi-site schizophrenia classification. The colorbar represents the weight values of features. The warm and hot colors corresponded to negative and positive weight values, respectively.
The site-shared gray matter alteration features of brain regions.
| Regions | Side | BA | Cluster size (voxels) | MNI coordinates |
|---|---|---|---|---|
| Cerebellum crus I, fusiform gyrus, lingual gyrus | L | 18,19 | 146 | −32,−86,−17/−29,−74,−3 |
| Middle temporal gyrus | L | 38,21,22 | 77 | −44,12,−35/−59,−14,−8/−56,−59,−13/−65,−35,−1 |
| Superior temporal gyrus, middle temporal gyrus, insula | R | 22,41,13 | 77 | 52,−41,−10/49,−17,1 |
| Anterior cingulate, superior medial frontal gyrus | L | 32 | 51 | −12,34,1/−12,34,25 |
| Middle occipital gyrus, angular | L/R | 39,40 | 111 | 45,−55,20/33,−76,32/−51,67,39 |
| Insula | R | 13 | 27 | 33,16,12 |
| Middle frontal gyrus, inferior triangular frontal gyrus | L/R | 9,10,44,45 | 162 | −51,26,28/−33,52,16/−42,22,13/39,35,24 |
| Postcentral | L | 2 | 18 | −49,−31,50 |
| Precuneus | L/R | 31 | 25 | −6,−43,36 |
| Inferior parietal gyrus | L | 40 | 15 | −40,−49,51 |
| Superior parietal gyrus | L | 7,5 | 42 | −28,−51,75 |
| Superior frontal gyrus, middle frontal gyrus | R | 9,6 | 56 | 27,47,32/33,41,35/35,8,42 |
| Superior frontal gyrus, middle frontal gyrus | L | 32,6 | 63 | −16,23,47/−34,11,52 |
| dlPFC | L/R | 9,6,32 | 119 | −16,23,47/−34,11,52/27,47,32/33,41,35/35,8,42 |
Note: BA = Brodmann area, R = right, L = left.
The site-specific gray matter alteration features of brain regions in site A.
| Regions | Side | BA | Cluster size (voxels) | MNI coordinates |
|---|---|---|---|---|
| Fusiform gyrus, cerebellum VI, lingual gyrus | L | 37,19,18 | 135 | −20,−59,−15 |
| Inferior temporal gyrus, middle temporal gyrus | L | 37,20,19,21 | 50 | −56,−56,−10 |
| Superior temporal gyrus, middle temporal gyrus | L/R | 38,21,41 | 148 | −47,9,−32/55,−53,−4 |
| Putamen | R | 69 | 30,8,15 | |
| Middle frontal gyrus/superior frontal gyrus | R | 9,8,10 | 114 | 29,38,38 |
| Postcentral gyrus, superior parietal gyrus | R | 3,2,5,40 | 236 | −43,−37,65/26,−46,49 |
| Superior medial frontal gyrus, supplemental motor area, superior frontal gyrus | L | 8,6 | 203 | −19,35,58 |
| Superior parietal gyrus, postcentral gyrus, precuneus gyrus | L | 7,5 | 92 | −28,−54,75 |
| Precentral gyrus | L | 6 | 54 | −34,−12,77 |
Note: BA = Brodmann area, R = right, L = left.
The site-specific gray matter alteration features of brain regions in site B.
| Regions | Side | BA | Cluster size (voxels) | MNI coordinates |
|---|---|---|---|---|
| Cerebellum IV, V, fusiform gyrus | R | 19,36 | 51 | 22,−35,−20 |
| Hippocampus, parahippocampus, fusiform gyrus | R | 36 | 75 | 34,−35,−11 |
| Anterior cingulate cortex | L | 32,9,24,10 | 294 | −3,38,23 |
| Inferior parietal gyrus, middle occipital gyrus, angular | R | 40,39,7 | 105 | −46,−49,47 |
| Posterior cingulum | L | 65 | −9,−43,36 | |
| Inferior frontal gyrus | L | 13,44,47 | 38 | −39,22,13 |
| Middle frontal gyrus | L | 8,9 | 53 | −34,35,37 |
| Precuneus | L | 7 | 120 | −13,−67,37 |
| Postcentral gyrus, precentral gyrus | L/R | 6,4,3,40 | 180 | −52,−31,53 |
Note: BA = Brodmann area, R = right, L = left, dlPFC = dorsolateral prefrontal cortex.
The site-specific gray matter alteration features of brain regions in site C.
| Regions | Side | BA | Cluster size (voxels) | MNI coordinates |
|---|---|---|---|---|
| Cerebellum crus I, crus II, fusiform gyrus, lingual gyrus | L | 18,19 | 209 | −28,−84,−38 |
| Middle temporal gyrus, inferior temporal gyrus | L | 21,20,22 | 274 | −59,−12,−26 |
| Superior temporal gyrus, middle temporal gyrus, insula | R | 22,13,41,21 | 112 | 46,−14,−2 |
| Anterior cingulate cortex, superior medial frontal gyrus | L | 10,32,9 | 122 | 0,46,14 |
| Angular | R | 39,22,13 | 62 | 45,−55,23 |
| Middle frontal gyrus | L | 9,10 | 81 | −28,56,37 |
| Supramarginal gyrus | L | 40 | 80 | −54,−37,29 |
| Postcentral gyrus | R | 2,40,3 | 62 | 44,−31,55 |
| Precuneus | L/R | 7 | 72 | 2,−66,67 |
Note: BA = Brodmann area, R = right, L = left.