Literature DB >> 29662263

Online updating method with new variables for big data streams.

Chun Wang1, Ming-Hui Chen2, Jing Wu3, Jun Yan2, Yuping Zhang2, Elizabeth Schifano2.   

Abstract

For big data arriving in streams, online updating is an important statistical method that breaks the storage barrier and the computational barrier under certain circumstances. In the regression context, online updating algorithms assume that the set of predictor variables does not change, and consequently cannot incorporate new variables that may become available midway through the data stream. A naive approach would be to discard all previous information and start updating with new variables from scratch. We propose a method that utilizes the information from earlier data in the online updating algorithm with bias corrections to improve efficiency. The method is developed for linear models first, and then extended to estimating equations for generalized linear models. Closed-form expressions for the efficiency gain over the naive approach are derived in a particular linear model setting. We compare the performance of our proposed bias-correcting approach and the naive approach in simulation studies with data generated from a normal linear model and a logistic regression model. The method is applied to a study on airline delay, where reasons for delays were only available more recently, starting in 2003.

Entities:  

Keywords:  Added variable; data compression; estimating equation; regression

Year:  2017        PMID: 29662263      PMCID: PMC5898930          DOI: 10.1002/cjs.11330

Source DB:  PubMed          Journal:  Can J Stat        ISSN: 0319-5724            Impact factor:   0.875


  4 in total

1.  Systems biology and new technologies enable predictive and preventative medicine.

Authors:  Leroy Hood; James R Heath; Michael E Phelps; Biaoyang Lin
Journal:  Science       Date:  2004-10-22       Impact factor: 47.728

2.  Online Updating of Statistical Inference in the Big Data Setting.

Authors:  Elizabeth D Schifano; Jing Wu; Chun Wang; Jun Yan; Ming-Hui Chen
Journal:  Technometrics       Date:  2016-07-08

3.  Multivariate Meta-Analysis of Heterogeneous Studies Using Only Summary Statistics: Efficiency and Robustness.

Authors:  Dungang Liu; Regina Liu; Minge Xie
Journal:  J Am Stat Assoc       Date:  2015       Impact factor: 5.033

4.  Statistical methods and computing for big data.

Authors:  Chun Wang; Ming-Hui Chen; Elizabeth Schifano; Jing Wu; Jun Yan
Journal:  Stat Interface       Date:  2016       Impact factor: 0.582

  4 in total
  1 in total

1.  Online Updating of Survival Analysis.

Authors:  Jing Wu; Ming-Hui Chen; Elizabeth D Schifano; Jun Yan
Journal:  J Comput Graph Stat       Date:  2021-03-08       Impact factor: 2.302

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.