| Literature DB >> 35305545 |
Gemma C Monte-Rubio1,2, Barbara Segura1,2,3,4, Antonio P Strafella5,6,7, Thilo van Eimeren8,9, Naroa Ibarretxe-Bilbao10, Maria Diez-Cirarda7, Carsten Eggers11,12,13, Olaia Lucas-Jiménez10, Natalia Ojeda10, Javier Peña10, Marina C Ruppert11,12, Roser Sala-Llonch1,3,14,15, Hendrik Theis9, Carme Uribe1,2,3,7, Carme Junque1,2,3,4.
Abstract
Multi-site MRI datasets are crucial for big data research. However, neuroimaging studies must face the batch effect. Here, we propose an approach that uses the predictive probabilities provided by Gaussian processes (GPs) to harmonize clinical-based studies. A multi-site dataset of 216 Parkinson's disease (PD) patients and 87 healthy subjects (HS) was used. We performed a site GP classification using MRI data. The outcomes estimated from this classification, redefined like Weighted HARMonization PArameters (WHARMPA), were used as regressors in two different clinical studies: A PD versus HS machine learning classification using GP, and a VBM comparison (FWE-p < .05, k = 100). Same studies were also conducted using conventional Boolean site covariates, and without information about site belonging. The results from site GP classification provided high scores, balanced accuracy (BAC) was 98.39% for grey matter images. PD versus HS classification performed better when the WHARMPA were used to harmonize (BAC = 78.60%; AUC = 0.90) than when using the Boolean site information (BAC = 56.31%; AUC = 0.71) and without it (BAC = 57.22%; AUC = 0.73). The VBM analysis harmonized using WHARMPA provided larger and more statistically robust clusters in regions previously reported in PD than when the Boolean site covariates or no corrections were added to the model. In conclusion, WHARMPA might encode global site-effects quantitatively and allow the harmonization of data. This method is user-friendly and provides a powerful solution, without complex implementations, to clean the analyses by removing variability associated with the differences between sites.Entities:
Keywords: Gaussian process; MRI harmonization; Parkinson's disease; VBM; machine learning; predictive probabilities; site-effects
Mesh:
Year: 2022 PMID: 35305545 PMCID: PMC9188966 DOI: 10.1002/hbm.25838
Source DB: PubMed Journal: Hum Brain Mapp ISSN: 1065-9471 Impact factor: 5.399
Demographics of the whole multi‐site sample: Healthy subjects (HS) and Parkinson's disease (PD) patients
| Site 1 | Site 2 | Site 3 | Site 4 | Total | ANOVA | |
|---|---|---|---|---|---|---|
|
| ||||||
|
| 26 (11) | 29 (15) | 17 (8) | 15 (9) | 87 (43) | |
| Age ± | 68.31 ± 7.52 | 64.21 ± 8.77 | 61.47 ± 7.40 | 68.40 ± 7.63 | 65.62 ± 8.29 | 0.020 |
|
| ||||||
|
| 36 (14) | 86 (36) | 35 (8) | 59 (19) | 216 (77) | |
| Age ± | 68.00 ± 6.26 | 63.94 ± 10.37 | 64.77 ± 6.41 | 67.27 ± 10.32 | 65.66 ± 9.33 | 0.063 |
|
| ||||||
| Sex (χ2
| .787 | .355 | .076 | .047 | .027 | |
| Age (Student's | .861 | .902 | .104 | .693 | .971 |
*p<.05.
Scores from center classification
| Feature set | Multiclass Gaussian Process classification | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Balanced accuracy (BAC %) | Class accuracies (%) | Class positive predictive value (%) | |||||||
| GM | 98.39 | 98.39 | 96.52 | 100 | 98.65 | 98.39 | 99.11 | 100 | 94.81 |
| WM | 95.98 | 98.39 | 94.78 | 96.15 | 94.59 | 89.71 | 99.09 | 96.15 | 95.89 |
| GM + WM | 98.82 | 98.39 | 98.26 | 100 | 98.65 | 98.39 | 99.12 | 100 | 97.33 |
Note: Permutation test p‐value = .0099.
Abbreviations: GM, grey matter; WM, white matter.
Feature sets from non‐Jacobian scaled data; smooth 11mm FWHM.
FIGURE 1(a) Weights map from the site classification using grey matter (GM) as feature; (b) The list shows those regions with a contribution above 100 in the expected ranking (ER) to identify site effects for the current sample; (c) Histogram and (d) confusion matrix from the same classification analysis
Scores from the classification between Parkinson's disease (PD) patients and healthy subjects (HS)
| Scores derived from | PRoNTo | Multisite accuracy | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| ||
| GM | None | 57.22 | 21.84 | 92.59 | 54.29 | 74.63 | 0.73 | 57.21 | 0.22 | 0.92 |
| GM + WM | None | 58.59 | 26.44 | 90.74 | 53.49 | 75.38 | 0.73 | 58.59 | 0.26 | 0.91 |
| GM | Boolean site covariate | 56.31 | 17.24 | 95.37 | 60.00 | 74.10 | 0.71 | 56.22 | 0.17 | 0.95 |
| GM + WM | Boolean site covariate | 56.19 | 16.09 | 96.30 | 63.64 | 74.02 | 0.71 | 57.37 | 0.18 | 0.96 |
| GM | WHARMPA from GM | 78.60 | 63.22 | 93.98 | 80.88 | 86.38 | 0.90 | 79.09 | 0.64 | 0.94 |
| GM + WM | WHARMPA from GM | 76.07 | 58.62 | 93.52 | 78.46 | 84.87 | 0.90 | 76.67 | 0.60 | 0.93 |
| GM | WHARMPA from GM + WM | 76.29 | 60.92 | 91.67 | 74.65 | 85.34 | 0.90 | 77.33 | 0.63 | 0.92 |
| GM + WM | WHARMPA from GM + WM | 76.30 | 58.62 | 93.98 | 79.69 | 84.94 | 0.90 | 76.73 | 0.60 | 0.93 |
Abbreviations: Boolean site covariate, conventional center information; EoS, effect of site; GM, grey matter; None, no center correction used; Sens, sensitivity; Spec, specificity; WHARMPA, weighted harmonization parameters; WM, white matter.
McNemar's test estimation between each pair of performances
| Correction for site‐effect | Boolean site covariate | WHARMPA from GM | WHARMPA from GM + WM | ||||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
| |
|
|
| 0.031 | 0.011 | 0.000 | 0.000 | 0.000 | 0.000 |
|
| 0.002 | 0.000 | 0.001 | 0.002 | 0.000 | 0.004 | |
|
|
| 0.581 | 0.000 | 0.000 | 0.000 | 0.000 | |
|
| 0.000 | 0.000 | 0.000 | 0.000 | |||
|
|
| 0.549 | 0.508 | 0.388 | |||
|
| 0.238 | 1.000 | |||||
|
|
| 0.092 | |||||
|
| |||||||
Note: Cut‐off of significance (α) was set to 0.05, and the confidence interval (CI) to 95%.
Abbreviations: GM, grey matter; None, no site information used; WHARMPA, weighted harmonization parameters; WM, white matter.
*p <.05.
FIGURE 2Scores of the classification between Parkinson's disease (PD) patients and healthy subjects (HS) using grey matter (GM) tissue as feature, and harmonization parameters from GM to protect from the scanner effect: (a) histogram of the classification, (b) ROC and AUC, (c) weights map shows the contribution of each region in the decision and (d) table of the regions that played a role in the classification decision according to the AAL atlas (https://www.pmod.com/files/download/v35/doc/pneuro/6750.htm). An expected ranking (ER) is assigned to each region, in the table only the contributions above 100 were selected
FIGURE 3Overlap of the outcomes from the comparison between healthy subjects (HS) > Parkinson's disease (PD) patients. Age and sex were used as covariates in all the models. The t‐score is represented in green for the weighted harmonization parameters (WHARMPA) from grey matter, in blue for the conventional Boolean site correction, and in red when no correction was applied. Right is left