Literature DB >> 28883686

Principles of Experimental Design for Big Data Analysis.

Christopher C Drovandi1, Christopher Holmes2, James M McGree1, Kerrie Mengersen1, Sylvia Richardson3, Elizabeth G Ryan4.   

Abstract

Big Datasets are endemic, but are often notoriously difficult to analyse because of their size, heterogeneity and quality. The purpose of this paper is to open a discourse on the potential for modern decision theoretic optimal experimental design methods, which by their very nature have traditionally been applied prospectively, to improve the analysis of Big Data through retrospective designed sampling in order to answer particular questions of interest. By appealing to a range of examples, it is suggested that this perspective on Big Data modelling and analysis has the potential for wide generality and advantageous inferential and computational properties. We highlight current hurdles and open research questions surrounding efficient computational optimisation in using retrospective designs, and in part this paper is a call to the optimisation and experimental design communities to work together in the field of Big Data analysis.

Entities:  

Keywords:  active learning; big data; dimension reduction; experimental design; sub-sampling

Year:  2017        PMID: 28883686      PMCID: PMC5584669          DOI: 10.1214/16-STS604

Source DB:  PubMed          Journal:  Stat Sci        ISSN: 0883-4237            Impact factor:   2.901


  18 in total

1.  Bayesian communication: a clinically significant paradigm for electronic publication.

Authors:  H P Lehmann; S N Goodman
Journal:  J Am Med Inform Assoc       Date:  2000 May-Jun       Impact factor: 4.497

2.  Evaluation of the pre-posterior distribution of optimized sampling times for the design of pharmacokinetic studies.

Authors:  Stephen B Duffull; Gordon Graham; Kerrie Mengersen; John Eccleston
Journal:  J Biopharm Stat       Date:  2012       Impact factor: 1.051

3.  Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.

Authors:  Hao Helen Zhang
Journal:  J R Stat Soc Series B Stat Methodol       Date:  2008-11       Impact factor: 4.488

4.  Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group.

Authors:  R B D'Agostino
Journal:  Stat Med       Date:  1998-10-15       Impact factor: 2.373

5.  A heteroscedastic measurement error model for method comparison data with replicate measurements.

Authors:  Lakshika S Nawarathna; Pankaj K Choudhary
Journal:  Stat Med       Date:  2015-01-23       Impact factor: 2.373

6.  Optimal selection of individuals for repeated covariate measurements in follow-up studies.

Authors:  Jaakko Reinikainen; Juha Karvanen; Hanna Tolonen
Journal:  Stat Methods Med Res       Date:  2014-02-24       Impact factor: 3.021

7.  A Bayesian approach to measurement error problems in epidemiology using conditional independence models.

Authors:  S Richardson; W R Gilks
Journal:  Am J Epidemiol       Date:  1993-09-15       Impact factor: 4.897

8.  Challenges of Big Data Analysis.

Authors:  Jianqing Fan; Fang Han; Han Liu
Journal:  Natl Sci Rev       Date:  2014-06       Impact factor: 17.275

9.  Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods.

Authors:  Chao Chen; Kay Grennan; Judith Badner; Dandan Zhang; Elliot Gershon; Li Jin; Chunyu Liu
Journal:  PLoS One       Date:  2011-02-28       Impact factor: 3.240

10.  Bayesian adjustment for measurement error in continuous exposures in an individually matched case-control study.

Authors:  Gabriela Espino-Hernandez; Paul Gustafson; Igor Burstyn
Journal:  BMC Med Res Methodol       Date:  2011-05-14       Impact factor: 4.615

View more
  4 in total

1.  Computationally efficient methods for fitting mixed models to electronic health records data.

Authors:  K M Rhodes; R M Turner; R A Payne; I R White
Journal:  Stat Med       Date:  2018-08-28       Impact factor: 2.373

2.  Value of Information Analysis in Models to Inform Health Policy.

Authors:  Christopher H Jackson; Gianluca Baio; Anna Heath; Mark Strong; Nicky J Welton; Edward C F Wilson
Journal:  Annu Rev Stat Appl       Date:  2022-03-07       Impact factor: 7.917

Review 3.  Enhancing the rate of genetic gain in public-sector plant breeding programs: lessons from the breeder's equation.

Authors:  Joshua N Cobb; Roselyne U Juma; Partha S Biswas; Juan D Arbelaez; Jessica Rutkoski; Gary Atlin; Tom Hagen; Michael Quinn; Eng Hwa Ng
Journal:  Theor Appl Genet       Date:  2019-03-01       Impact factor: 5.699

Review 4.  Applications and Trends of Machine Learning in Genomics and Phenomics for Next-Generation Breeding.

Authors:  Salvatore Esposito; Domenico Carputo; Teodoro Cardi; Pasquale Tripodi
Journal:  Plants (Basel)       Date:  2019-12-25
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.