Literature DB >> 31742253

When can Multi-Site Datasets be Pooled for Regression? Hypothesis Tests, 2-consistency and Neuroscience Applications.

Hao Henry Zhou1, Yilin Zhang1, Vamsi K Ithapu1, Sterling C Johnson1,2, Grace Wahba1, Vikas Singh1.   

Abstract

Many studies in biomedical and health sciences involve small sample sizes due to logistic or financial constraints. Often, identifying weak (but scientifically interesting) associations between a set of predictors and a response necessitates pooling datasets from multiple diverse labs or groups. While there is a rich literature in statistical machine learning to address distributional shifts and inference in multi-site datasets, it is less clear when such pooling is guaranteed to help (and when it does not) - independent of the inference algorithms we use. In this paper, we present a hypothesis test to answer this question, both for classical and high dimensional linear regression. We precisely identify regimes where pooling datasets across multiple sites is sensible, and how such policy decisions can be made via simple checks executable on each site before any data transfer ever happens. With a focus on Alzheimer's disease studies, we present empirical results showing that in regimes suggested by our analysis, pooling a local dataset with data from an international study improves power.

Entities:  

Year:  2017        PMID: 31742253      PMCID: PMC6859896     

Source DB:  PubMed          Journal:  Proc Mach Learn Res


  12 in total

1.  Domain adaptation via transfer component analysis.

Authors:  Sinno Jialin Pan; Ivor W Tsang; James T Kwok; Qiang Yang
Journal:  IEEE Trans Neural Netw       Date:  2010-11-18

2.  The inevitable application of big data to health care.

Authors:  Travis B Murdoch; Allan S Detsky
Journal:  JAMA       Date:  2013-04-03       Impact factor: 56.272

3.  Hypothesis Testing in Unsupervised Domain Adaptation with Applications in Alzheimer's Disease.

Authors:  Hao Henry Zhou; Sathya N Ravi; Vamsi K Ithapu; Sterling C Johnson; Grace Wahba; Vikas Singh
Journal:  Adv Neural Inf Process Syst       Date:  2016

4.  The Centiloid Project: standardizing quantitative amyloid plaque estimation by PET.

Authors:  William E Klunk; Robert A Koeppe; Julie C Price; Tammie L Benzinger; Michael D Devous; William J Jagust; Keith A Johnson; Chester A Mathis; Davneet Minhas; Michael J Pontecorvo; Christopher C Rowe; Daniel M Skovronsky; Mark A Mintun
Journal:  Alzheimers Dement       Date:  2014-10-28       Impact factor: 21.566

Review 5.  Chemotherapy in adult high-grade glioma: a systematic review and meta-analysis of individual patient data from 12 randomised trials.

Authors:  L A Stewart
Journal:  Lancet       Date:  2002-03-23       Impact factor: 79.321

Review 6.  Impact of the Alzheimer's Disease Neuroimaging Initiative, 2004 to 2014.

Authors:  Michael W Weiner; Dallas P Veitch; Paul S Aisen; Laurel A Beckett; Nigel J Cairns; Jesse Cedarbaum; Michael C Donohue; Robert C Green; Danielle Harvey; Clifford R Jack; William Jagust; John C Morris; Ronald C Petersen; Andrew J Saykin; Leslie Shaw; Paul M Thompson; Arthur W Toga; John Q Trojanowski
Journal:  Alzheimers Dement       Date:  2015-07       Impact factor: 21.566

Review 7.  Accuracy of neutrophil gelatinase-associated lipocalin (NGAL) in diagnosis and prognosis in acute kidney injury: a systematic review and meta-analysis.

Authors:  Michael Haase; Rinaldo Bellomo; Prasad Devarajan; Peter Schlattmann; Anja Haase-Fielitz
Journal:  Am J Kidney Dis       Date:  2009-10-21       Impact factor: 8.860

8.  Multi-site genetic analysis of diffusion images and voxelwise heritability analysis: a pilot project of the ENIGMA-DTI working group.

Authors:  Neda Jahanshad; Peter V Kochunov; Emma Sprooten; René C Mandl; Thomas E Nichols; Laura Almasy; John Blangero; Rachel M Brouwer; Joanne E Curran; Greig I de Zubicaray; Ravi Duggirala; Peter T Fox; L Elliot Hong; Bennett A Landman; Nicholas G Martin; Katie L McMahon; Sarah E Medland; Braxton D Mitchell; Rene L Olvera; Charles P Peterson; John M Starr; Jessika E Sussmann; Arthur W Toga; Joanna M Wardlaw; Margaret J Wright; Hilleke E Hulshoff Pol; Mark E Bastin; Andrew M McIntosh; Ian J Deary; Paul M Thompson; David C Glahn
Journal:  Neuroimage       Date:  2013-04-28       Impact factor: 6.556

Review 9.  Machine learning and its applications to biology.

Authors:  Adi L Tarca; Vincent J Carey; Xue-wen Chen; Roberto Romero; Sorin Drăghici
Journal:  PLoS Comput Biol       Date:  2007-06       Impact factor: 4.475

Review 10.  Meta-analysis: pitfalls and hints.

Authors:  T Greco; A Zangrillo; G Biondi-Zoccai; G Landoni
Journal:  Heart Lung Vessel       Date:  2013
View more
  1 in total

1.  Optimizing Nondecomposable Data Dependent Regularizers via Lagrangian Reparameterization Offers Significant Performance and Efficiency Gains.

Authors:  Sathya N Ravi; Abhay Venkatesh; Glenn M Fung; Vikas Singh
Journal:  Proc Conf AAAI Artif Intell       Date:  2020-06-16
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.