Literature DB >> 27695593

Statistical methods and computing for big data.

Chun Wang1, Ming-Hui Chen1, Elizabeth Schifano1, Jing Wu1, Jun Yan1.   

Abstract

Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay.

Entities:  

Keywords:  Bootstrap; Divide and conquer; External memory algorithm; High performance computing; Online update; Sampling; Software

Year:  2016        PMID: 27695593      PMCID: PMC5041595          DOI: 10.4310/SII.2016.v9.n4.a1

Source DB:  PubMed          Journal:  Stat Interface        ISSN: 1938-7989            Impact factor:   0.582


  9 in total

1.  Non-Concave Penalized Likelihood with NP-Dimensionality.

Authors:  Jianqing Fan; Jinchi Lv
Journal:  IEEE Trans Inf Theory       Date:  2011-08       Impact factor: 2.501

2.  Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures.

Authors:  Marc A Suchard; Quanli Wang; Cliburn Chan; Jacob Frelinger; Andrew Cron; Mike West
Journal:  J Comput Graph Stat       Date:  2010-06-01       Impact factor: 2.302

3.  Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.

Authors:  Hao Helen Zhang
Journal:  J R Stat Soc Series B Stat Methodol       Date:  2008-11       Impact factor: 4.488

4.  A general construction for parallelizing Metropolis-Hastings algorithms.

Authors:  Ben Calderhead
Journal:  Proc Natl Acad Sci U S A       Date:  2014-11-24       Impact factor: 11.205

5.  Online Updating of Statistical Inference in the Big Data Setting.

Authors:  Elizabeth D Schifano; Jing Wu; Chun Wang; Jun Yan; Ming-Hui Chen
Journal:  Technometrics       Date:  2016-07-08

6.  A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data.

Authors:  Faming Liang; Jinsu Kim; Qifan Song
Journal:  Technometrics       Date:  2016-07-08

7.  Challenges of Big Data Analysis.

Authors:  Jianqing Fan; Fang Han; Han Liu
Journal:  Natl Sci Rev       Date:  2014-06       Impact factor: 17.275

8.  parallelMCMCcombine: an R package for bayesian methods for big data and analytics.

Authors:  Alexey Miroshnikov; Erin M Conlon
Journal:  PLoS One       Date:  2014-09-26       Impact factor: 3.240

9.  PopGenome: an efficient Swiss army knife for population genomic analyses in R.

Authors:  Bastian Pfeifer; Ulrich Wittelsbürger; Sebastian E Ramos-Onsins; Martin J Lercher
Journal:  Mol Biol Evol       Date:  2014-04-16       Impact factor: 16.240

  9 in total
  7 in total

1.  Online updating method with new variables for big data streams.

Authors:  Chun Wang; Ming-Hui Chen; Jing Wu; Jun Yan; Yuping Zhang; Elizabeth Schifano
Journal:  Can J Stat       Date:  2017-08-09       Impact factor: 0.875

2.  Online Updating of Survival Analysis.

Authors:  Jing Wu; Ming-Hui Chen; Elizabeth D Schifano; Jun Yan
Journal:  J Comput Graph Stat       Date:  2021-03-08       Impact factor: 2.302

3.  Statistical Inference for High-Dimensional Models via Recursive Online-Score Estimation.

Authors:  Chengchun Shi; Rui Song; Wenbin Lu; Runze Li
Journal:  J Am Stat Assoc       Date:  2020-01-23       Impact factor: 5.033

4.  Meta-analysis under imbalance in measurement of confounders in cohort studies using only summary-level data.

Authors:  Debashree Ray; Alvaro Muñoz; Mingyu Zhang; Xiuhong Li; Nilanjan Chatterjee; Lisa P Jacobson; Bryan Lau
Journal:  BMC Med Res Methodol       Date:  2022-05-19       Impact factor: 4.612

5.  Principles of Experimental Design for Big Data Analysis.

Authors:  Christopher C Drovandi; Christopher Holmes; James M McGree; Kerrie Mengersen; Sylvia Richardson; Elizabeth G Ryan
Journal:  Stat Sci       Date:  2017-08       Impact factor: 2.901

6.  Aggregating predictions from experts: a review of statistical methods, experiments, and applications.

Authors:  Thomas McAndrew; Nutcha Wattanachit; Graham C Gibson; Nicholas G Reich
Journal:  Wiley Interdiscip Rev Comput Stat       Date:  2020-06-16

Review 7.  Toward Community-Driven Big Open Brain Science: Open Big Data and Tools for Structure, Function, and Genetics.

Authors:  Adam S Charles; Benjamin Falk; Nicholas Turner; Talmo D Pereira; Daniel Tward; Benjamin D Pedigo; Jaewon Chung; Randal Burns; Satrajit S Ghosh; Justus M Kebschull; William Silversmith; Joshua T Vogelstein
Journal:  Annu Rev Neurosci       Date:  2020-04-13       Impact factor: 15.553

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.