| Literature DB >> 29386387 |
Hao Henry Zhou1, Vikas Singh2,3, Sterling C Johnson4,5, Grace Wahba6,7,3.
Abstract
When sample sizes are small, the ability to identify weak (but scientifically interesting) associations between a set of predictors and a response may be enhanced by pooling existing datasets. However, variations in acquisition methods and the distribution of participants or observations between datasets, especially due to the distributional shifts in some predictors, may obfuscate real effects when datasets are combined. We present a rigorous statistical treatment of this problem and identify conditions where we can correct the distributional shift. We also provide an algorithm for the situation where the correction is identifiable. We analyze various properties of the framework for testing model fit, constructing confidence intervals, and evaluating consistency characteristics. Our technical development is motivated by Alzheimer's disease (AD) studies, and we present empirical results showing that our framework enables harmonizing of protein biomarkers, even when the assays across sites differ. Our contribution may, in part, mitigate a bottleneck that researchers face in clinical research when pooling smaller sized datasets and may offer benefits when the subjects of interest are difficult to recruit or when resources prohibit large single-site studies.Entities:
Keywords: causal model; maximum mean discrepancy; meta-analysis; multisite analysis; multisource
Mesh:
Substances:
Year: 2018 PMID: 29386387 PMCID: PMC5816202 DOI: 10.1073/pnas.1719747115
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.A shows the distributional shift of across ADNI and W-ADRC. B shows the distributional shift of hippocampus volume across ADNI and W-ADRC.
Fig. 2.A is an example of a graphical causal model. The colored nodes are an example of a d-separation rule, where and are d-separated by . B is the graphical causal model for our CSF data analysis example. Here, the population characteristics difference only has a direct causal effect on the age distribution. The sample selection bias is only directly related to diagnosis status for each specific study. Nodes denoting age and sex influence the CSF measurements denoted by , which then influence the diagnosis status . The CSF measurements and the nodes and are d-separated by diagnosis status and age.
Variations of age and diagnosis status across datasets
| Description | ADNI | W-ADRC |
| Sample size | ||
| Age range ( | ||
| Diagnosis status (CN/AD), % |
Fig. 3.The plots of (A) and (B) show the empirical distributions of W-ADRC samples (blue), ADNI samples (red), and transformed ADNI samples (brown). W-ADRC samples are nicely matched with transformed ADNI samples.
The performance of thresholds in ADNI and W-ADRC
| Dataset | |||||
| W-ADRC | |||||
| Threshold | 568.08 | 629.39 | 48.86 | 0.77 | 0.07 |
| Sensitivity, % | 75.86 | 89.66 | 82.75 | 93.10 | 93.10 |
| Specificity, % | 92.23 | 69.90 | 67.96 | 86.41 | 79.61 |
| ADNI | |||||
| Threshold | 93.00 | 192.00 | 23.00 | 0.39 | 0.10 |
| Sensitivity, % | 69.6 | 96.4 | 67.9 | 85.7 | 91.1 |
| Specificity, % | 92.3 | 76.9 | 73.1 | 84.6 | 71.2 |
The W-ADRC thresholds are derived from corresponding ADNI thresholds reported in the literature (11) using Algorithm.
Fig. 4.A shows the trend of MSPE for hippocampus volume as the sample size increases using 400 bootstraps. The bar plot covers the prediction error for three types of training set as depicted in the legend, including W-ADRC only (red), W-ADRC plus ADNI (green), and W-ADRC plus transformed ADNI (blue). The third model continues to perform the best. B shows the trend of classification accuracy with respect to patients with AD (solid lines) and healthy patients (dotted lines) as sample size increases using 400 bootstraps. An SVM model is used, and three types of training sets are shown in the legend. For samples with AD, the three methods converge to the same accuracy as the training sample size increases. For healthy CNs, the W-ADRC plus the transformed ADNI dataset is always better than the other two schemes. It is interesting to see that W-ADRC plus the raw ADNI data also performs better than W-ADRC alone, possibly because only 25 (24%) subjects from W-ADRC are diagnosed with AD—with few AD samples, even the uncorrected ADNI data nicely inform the classification model.
Subsampling MMD Algorithm ()
| 1: Divide |
| 2: Decide subsample size |
| 3: For |
| 4: Generate subsamples |
| 5: Generate subsamples |
| 6: |
| 7: Calculate and record |
| 8: Set |