Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA.

Literature DB >> 28428647

A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA.

Abstract

We consider a partially linear framework for modelling massive heterogeneous data. The major goal is to extract common features across all sub-populations while exploring heterogeneity of each sub-population. In particular, we propose an aggregation type estimator for the commonality parameter that possesses the (non-asymptotic) minimax optimal bound and asymptotic distribution as if there were no heterogeneity. This oracular result holds when the number of sub-populations does not grow too fast. A plug-in estimator for the heterogeneity parameter is further constructed, and shown to possess the asymptotic distribution as if the commonality information were available. We also test the heterogeneity among a large number of sub-populations. All the above results require to regularize each sub-estimation as though it had the entire sample size. Our general theory applies to the divide-and-conquer approach that is often used to deal with massive homogeneous data. A technical by-product of this paper is the statistical inferences for the general kernel ridge regression. Thorough numerical results are also provided to back up our theory.

Entities: Chemical Disease Gene Species

Keywords: bias propagation; heterogenous data; joint asymptotics; massive data; mean square error; partially linear model; reproducing kernel Hilbert space

Year: 2016 PMID： 28428647 PMCID： PMC5394596 DOI： 10.1214/15-AOS1410

Source DB: PubMed Journal: Ann Stat ISSN： 0090-5364 Impact factor: 4.028

1 in total

1. Learning bounds for kernel regression using effective data dimensionality.

Authors: Tong Zhang
Journal: Neural Comput Date: 2005-09 Impact factor: 2.026

1 in total

5 in total

A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA.

1. Learning bounds for kernel regression using effective data dimensionality.

1. A Massive Data Framework for M-Estimators with Cubic-Rate.

2. DISTRIBUTED TESTING AND ESTIMATION UNDER SPARSE HIGH DIMENSIONAL MODELS.

3. Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health.

4. Sampling-based estimation for massive survival data with additive hazards model.

5. A difference-based approach in the partially linear model with dependent errors.