Literature DB >> 30034040

DISTRIBUTED TESTING AND ESTIMATION UNDER SPARSE HIGH DIMENSIONAL MODELS.

Heather Battey1,2, Jianqing Fan1,3, Han Liu1, Junwei Lu1, Ziwei Zhu1.   

Abstract

This paper studies hypothesis testing and parameter estimation in the context of the divide-and-conquer algorithm. In a unified likelihood based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from k subsamples of size n/k, where n is the sample size. In both low dimensional and sparse high dimensional settings, we address the important question of how large k can be, as n grows large, such that the loss of efficiency due to the divide-and-conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as an oracle with access to the full sample. Thorough numerical results are provided to back up the theory.

Entities:  

Keywords:  62F10; Divide and conquer; Primary 62F05; debiasing; massive data; secondary 62F12; thresholding

Year:  2018        PMID: 30034040      PMCID: PMC6051757          DOI: 10.1214/17-AOS1587

Source DB:  PubMed          Journal:  Ann Stat        ISSN: 0090-5364            Impact factor:   4.028


  5 in total

1.  Non-Concave Penalized Likelihood with NP-Dimensionality.

Authors:  Jianqing Fan; Jinchi Lv
Journal:  IEEE Trans Inf Theory       Date:  2011-08       Impact factor: 2.501

2.  Variance estimation using refitted cross-validation in ultrahigh dimensional regression.

Authors:  Jianqing Fan; Shaojun Guo; Ning Hao
Journal:  J R Stat Soc Series B Stat Methodol       Date:  2012-01-01       Impact factor: 4.488

3.  A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA.

Authors:  Tianqi Zhao; Guang Cheng; Han Liu
Journal:  Ann Stat       Date:  2016-07-07       Impact factor: 4.028

4.  OPTIMAL COMPUTATIONAL AND STATISTICAL RATES OF CONVERGENCE FOR SPARSE NONCONVEX LEARNING PROBLEMS.

Authors:  Zhaoran Wang; Han Liu; Tong Zhang
Journal:  Ann Stat       Date:  2014       Impact factor: 4.028

5.  Challenges of Big Data Analysis.

Authors:  Jianqing Fan; Fang Han; Han Liu
Journal:  Natl Sci Rev       Date:  2014-06       Impact factor: 17.275

  5 in total
  5 in total

1.  dPQL: a lossless distributed algorithm for generalized linear mixed model with application to privacy-preserving hospital profiling.

Authors:  Chongliang Luo; Md Nazmul Islam; Natalie E Sheils; John Buresh; Martijn J Schuemie; Jalpa A Doshi; Rachel M Werner; David A Asch; Yong Chen
Journal:  J Am Med Inform Assoc       Date:  2022-07-12       Impact factor: 7.942

2.  Distributed Simultaneous Inference in Generalized Linear Models via Confidence Distribution.

Authors:  Lu Tang; Ling Zhou; Peter X-K Song
Journal:  J Multivar Anal       Date:  2019-11-28       Impact factor: 1.473

3.  Sampling-based estimation for massive survival data with additive hazards model.

Authors:  Lulu Zuo; Haixiang Zhang; HaiYing Wang; Lei Liu
Journal:  Stat Med       Date:  2020-11-03       Impact factor: 2.373

4.  ODAL: A one-shot distributed algorithm to perform logistic regressions on electronic health records data from multiple clinical sites.

Authors:  Rui Duan; Mary Regina Boland; Jason H Moore; Yong Chen
Journal:  Pac Symp Biocomput       Date:  2019

5.  Multisite learning of high-dimensional heterogeneous data with applications to opioid use disorder study of 15,000 patients across 5 clinical sites.

Authors:  Xiaokang Liu; Rui Duan; Chongliang Luo; Alexis Ogdie; Jason H Moore; Henry R Kranzler; Jiang Bian; Yong Chen
Journal:  Sci Rep       Date:  2022-06-30       Impact factor: 4.996

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.