Literature DB >> 19789656

A Sequential Monte Carlo Method for Bayesian Analysis of Massive Datasets.

Greg Ridgeway1, David Madigan.   

Abstract

Markov chain Monte Carlo (MCMC) techniques revolutionized statistical practice in the 1990s by providing an essential toolkit for making the rigor and flexibility of Bayesian analysis computationally practical. At the same time the increasing prevalence of massive datasets and the expansion of the field of data mining has created the need for statistically sound methods that scale to these large problems. Except for the most trivial examples, current MCMC methods require a complete scan of the dataset for each iteration eliminating their candidacy as feasible data mining techniques.In this article we present a method for making Bayesian analysis of massive datasets computationally feasible. The algorithm simulates from a posterior distribution that conditions on a smaller, more manageable portion of the dataset. The remainder of the dataset may be incorporated by reweighting the initial draws using importance sampling. Computation of the importance weights requires a single scan of the remaining observations. While importance sampling increases efficiency in data access, it comes at the expense of estimation efficiency. A simple modification, based on the "rejuvenation" step used in particle filters for dynamic systems models, sidesteps the loss of efficiency with only a slight increase in the number of data accesses.To show proof-of-concept, we demonstrate the method on two examples. The first is a mixture of transition models that has been used to model web traffic and robotics. For this example we show that estimation efficiency is not affected while offering a 99% reduction in data accesses. The second example applies the method to Bayesian logistic regression and yields a 98% reduction in data accesses.

Entities:  

Year:  2003        PMID: 19789656      PMCID: PMC2753529          DOI: 10.1023/A:1024084221803

Source DB:  PubMed          Journal:  Data Min Knowl Discov        ISSN: 1384-5810            Impact factor:   3.670


  2 in total

1.  Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.

Authors:  S Geman; D Geman
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  1984-06       Impact factor: 6.226

2.  An Equivalence Between Sparse Approximation and Support Vector Machines.

Authors: 
Journal:  Neural Comput       Date:  1998-07-28       Impact factor: 2.026

  2 in total
  5 in total

1.  Sequential updating of a new dynamic pharmacokinetic model for caffeine in premature neonates.

Authors:  Sandrine Micallef; Billy Amzal; Véronique Bach; Karen Chardon; Pierre Tourneux; Frédéric Y Bois
Journal:  Clin Pharmacokinet       Date:  2007       Impact factor: 6.447

Review 2.  Three case studies in the Bayesian analysis of cognitive models.

Authors:  Michael D Lee
Journal:  Psychon Bull Rev       Date:  2008-02

3.  Selection Sampling from Large Data Sets for Targeted Inference in Mixture Modeling.

Authors:  Ioanna Manolopoulou; Cliburn Chan; Mike West
Journal:  Bayesian Anal       Date:  2010       Impact factor: 3.728

4.  Reuse, Recycle, Reweigh: Combating Influenza through Efficient Sequential Bayesian Computation for Massive Data.

Authors:  Jennifer A Tom; Janet S Sinsheimer; Marc A Suchard
Journal:  Ann Appl Stat       Date:  2010       Impact factor: 2.083

5.  Clustering algorithms: A comparative approach.

Authors:  Mayra Z Rodriguez; Cesar H Comin; Dalcimar Casanova; Odemir M Bruno; Diego R Amancio; Luciano da F Costa; Francisco A Rodrigues
Journal:  PLoS One       Date:  2019-01-15       Impact factor: 3.240

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.