Literature DB >> 28428647

A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA.

Tianqi Zhao1, Guang Cheng2, Han Liu1.   

Abstract

We consider a partially linear framework for modelling massive heterogeneous data. The major goal is to extract common features across all sub-populations while exploring heterogeneity of each sub-population. In particular, we propose an aggregation type estimator for the commonality parameter that possesses the (non-asymptotic) minimax optimal bound and asymptotic distribution as if there were no heterogeneity. This oracular result holds when the number of sub-populations does not grow too fast. A plug-in estimator for the heterogeneity parameter is further constructed, and shown to possess the asymptotic distribution as if the commonality information were available. We also test the heterogeneity among a large number of sub-populations. All the above results require to regularize each sub-estimation as though it had the entire sample size. Our general theory applies to the divide-and-conquer approach that is often used to deal with massive homogeneous data. A technical by-product of this paper is the statistical inferences for the general kernel ridge regression. Thorough numerical results are also provided to back up our theory.

Entities:  

Keywords:  bias propagation; heterogenous data; joint asymptotics; massive data; mean square error; partially linear model; reproducing kernel Hilbert space

Year:  2016        PMID: 28428647      PMCID: PMC5394596          DOI: 10.1214/15-AOS1410

Source DB:  PubMed          Journal:  Ann Stat        ISSN: 0090-5364            Impact factor:   4.028


  1 in total

1.  Learning bounds for kernel regression using effective data dimensionality.

Authors:  Tong Zhang
Journal:  Neural Comput       Date:  2005-09       Impact factor: 2.026

  1 in total
  5 in total

1.  A Massive Data Framework for M-Estimators with Cubic-Rate.

Authors:  Chengchun Shi; Wenbin Lu; Rui Song
Journal:  J Am Stat Assoc       Date:  2018-06-19       Impact factor: 5.033

2.  DISTRIBUTED TESTING AND ESTIMATION UNDER SPARSE HIGH DIMENSIONAL MODELS.

Authors:  Heather Battey; Jianqing Fan; Han Liu; Junwei Lu; Ziwei Zhu
Journal:  Ann Stat       Date:  2018-05-03       Impact factor: 4.028

3.  Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health.

Authors:  Peng Liao; Predrag Klasnja; Susan Murphy
Journal:  J Am Stat Assoc       Date:  2020-10-01       Impact factor: 5.033

4.  Sampling-based estimation for massive survival data with additive hazards model.

Authors:  Lulu Zuo; Haixiang Zhang; HaiYing Wang; Lei Liu
Journal:  Stat Med       Date:  2020-11-03       Impact factor: 2.373

5.  A difference-based approach in the partially linear model with dependent errors.

Authors:  Zhen Zeng; Xiangdong Liu
Journal:  J Inequal Appl       Date:  2018-10-01       Impact factor: 2.491

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.