Literature DB >> 28474419

Online cross-validation-based ensemble learning.

David Benkeser1, Cheng Ju1, Sam Lendle1, Mark van der Laan1.   

Abstract

Online estimators update a current estimate with a new incoming batch of data without having to revisit past data thereby providing streaming estimates that are scalable to big data. We develop flexible, ensemble-based online estimators of an infinite-dimensional target parameter, such as a regression function, in the setting where data are generated sequentially by a common conditional data distribution given summary measures of the past. This setting encompasses a wide range of time-series models and, as special case, models for independent and identically distributed data. Our estimator considers a large library of candidate online estimators and uses online cross-validation to identify the algorithm with the best performance. We show that by basing estimates on the cross-validation-selected algorithm, we are asymptotically guaranteed to perform as well as the true, unknown best-performing algorithm. We provide extensions of this approach including online estimation of the optimal ensemble of candidate online estimators. We illustrate excellent performance of our methods using simulations and a real data example where we make streaming predictions of infectious disease incidence using data from a large database.
Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

Entities:  

Keywords:  cross-validation; dependent data ensemble learning; machine learning; online estimation; stochastic gradient descent; time-series

Mesh:

Year:  2017        PMID: 28474419      PMCID: PMC5671383          DOI: 10.1002/sim.7320

Source DB:  PubMed          Journal:  Stat Med        ISSN: 0277-6715            Impact factor:   2.373


  4 in total

1.  Super learner.

Authors:  Mark J van der Laan; Eric C Polley; Alan E Hubbard
Journal:  Stat Appl Genet Mol Biol       Date:  2007-09-16

2.  Mortality risk score prediction in an elderly population using machine learning.

Authors:  Sherri Rose
Journal:  Am J Epidemiol       Date:  2013-01-29       Impact factor: 4.897

3.  Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): a population-based study.

Authors:  Romain Pirracchio; Maya L Petersen; Marco Carone; Matthieu Resche Rigon; Sylvie Chevret; Mark J van der Laan
Journal:  Lancet Respir Med       Date:  2014-11-24       Impact factor: 30.700

4.  Contagious diseases in the United States from 1888 to the present.

Authors:  Willem G van Panhuis; John Grefenstette; Su Yon Jung; Nian Shong Chok; Anne Cross; Heather Eng; Bruce Y Lee; Vladimir Zadorozhny; Shawn Brown; Derek Cummings; Donald S Burke
Journal:  N Engl J Med       Date:  2013-11-28       Impact factor: 91.245

  4 in total
  9 in total

1.  Improved small-sample estimation of nonlinear cross-validated prediction metrics.

Authors:  David Benkeser; Maya Petersen; Mark J van der Laan
Journal:  J Am Stat Assoc       Date:  2019-10-21       Impact factor: 5.033

2.  Collaborative-controlled LASSO for constructing propensity score-based estimators in high-dimensional data.

Authors:  Cheng Ju; Richard Wyss; Jessica M Franklin; Sebastian Schneeweiss; Jenny Häggström; Mark J van der Laan
Journal:  Stat Methods Med Res       Date:  2017-12-11       Impact factor: 3.021

3.  Adaptively stacking ensembles for influenza forecasting.

Authors:  Thomas McAndrew; Nicholas G Reich
Journal:  Stat Med       Date:  2021-10-14       Impact factor: 2.373

Review 4.  Clinical artificial intelligence quality improvement: towards continual monitoring and updating of AI algorithms in healthcare.

Authors:  Jean Feng; Rachael V Phillips; Ivana Malenica; Andrew Bishara; Alan E Hubbard; Leo A Celi; Romain Pirracchio
Journal:  NPJ Digit Med       Date:  2022-05-31

5.  Finding hotspots: development of an adaptive spatial sampling approach.

Authors:  Ricardo Andrade-Pacheco; Francois Rerolle; Jean Lemoine; Leda Hernandez; Aboulaye Meïté; Lazarus Juziwelo; Aurélien F Bibaut; Mark J van der Laan; Benjamin F Arnold; Hugh J W Sturrock
Journal:  Sci Rep       Date:  2020-07-02       Impact factor: 4.379

6.  Piloting an integrated SARS-CoV-2 testing and data system for outbreak containment among college students: A prospective cohort study.

Authors:  Laura Packel; Arthur Reingold; Lauren Hunter; Shelley Facente; Yi Li; Anna Harte; Guy Nicolette; Fyodor D Urnov; Michael Lu; Maya Petersen
Journal:  PLoS One       Date:  2021-01-26       Impact factor: 3.240

7.  Emulator-based Bayesian optimization for efficient multi-objective calibration of an individual-based model of malaria.

Authors:  Theresa Reiker; Monica Golumbeanu; Andrew Shattock; Lydia Burgert; Thomas A Smith; Sarah Filippi; Ewan Cameron; Melissa A Penny
Journal:  Nat Commun       Date:  2021-12-10       Impact factor: 14.919

8.  Trend-following with better adaptation to large downside risks.

Authors:  Teruko Takada; Takahiro Kitajima
Journal:  PLoS One       Date:  2022-10-18       Impact factor: 3.752

9.  Optimising treatment decision rules through generated effect modifiers: a precision medicine tutorial.

Authors:  Eva Petkova; Hyung Park; Adam Ciarleglio; R Todd Ogden; Thaddeus Tarpey
Journal:  BJPsych Open       Date:  2019-12-03
  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.