Literature DB >> 20865145

Selection Sampling from Large Data Sets for Targeted Inference in Mixture Modeling.

Ioanna Manolopoulou1, Cliburn Chan, Mike West.   

Abstract

One of the challenges in using Markov chain Monte Carlo for model analysis in studies with very large datasets is the need to scan through the whole data at each iteration of the sampler, which can be computationally prohibitive. Several approaches have been developed to address this, typically drawing computationally manageable subsamples of the data. Here we consider the specific case where most of the data from a mixture model provides little or no information about the parameters of interest, and we aim to select subsamples such that the information extracted is most relevant. The motivating application arises in flow cytometry, where several measurements from a vast number of cells are available. Interest lies in identifying specific rare cell subtypes and characterizing them according to their corresponding markers. We present a Markov chain Monte Carlo approach where an initial subsample of the full dataset is used to guide selection sampling of a further set of observations targeted at a scientifically interesting, low probability region. We define a Sequential Monte Carlo strategy in which the targeted subsample is augmented sequentially as estimates improve, and introduce a stopping rule for determining the size of the targeted subsample. An example from flow cytometry illustrates the ability of the approach to increase the resolution of inferences for rare cell subtypes.

Entities:  

Year:  2010        PMID: 20865145      PMCID: PMC2943396     

Source DB:  PubMed          Journal:  Bayesian Anal        ISSN: 1931-6690            Impact factor:   3.728


  5 in total

1.  Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures.

Authors:  Marc A Suchard; Quanli Wang; Cliburn Chan; Jacob Frelinger; Andrew Cron; Mike West
Journal:  J Comput Graph Stat       Date:  2010-06-01       Impact factor: 2.302

Review 2.  T-cell quality in memory and protection: implications for vaccine design.

Authors:  Robert A Seder; Patricia A Darrah; Mario Roederer
Journal:  Nat Rev Immunol       Date:  2008-03-07       Impact factor: 53.106

3.  Automated high-dimensional flow cytometric data analysis.

Authors:  Saumyadipta Pyne; Xinli Hu; Kui Wang; Elizabeth Rossin; Tsung-I Lin; Lisa M Maier; Clare Baecher-Allan; Geoffrey J McLachlan; Pablo Tamayo; David A Hafler; Philip L De Jager; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2009-05-14       Impact factor: 11.205

4.  A Sequential Monte Carlo Method for Bayesian Analysis of Massive Datasets.

Authors:  Greg Ridgeway; David Madigan
Journal:  Data Min Knowl Discov       Date:  2003-07-01       Impact factor: 3.670

5.  Statistical mixture modeling for cell subtype identification in flow cytometry.

Authors:  Cliburn Chan; Feng Feng; Janet Ottinger; David Foster; Mike West; Thomas B Kepler
Journal:  Cytometry A       Date:  2008-08       Impact factor: 4.355

  5 in total
  5 in total

1.  Efficient Classification-Based Relabeling in Mixture Models.

Authors:  Andrew J Cron; Mike West
Journal:  Am Stat       Date:  2011-02-01       Impact factor: 8.710

2.  Hierarchical Bayesian mixture modelling for antigen-specific T-cell subtyping in combinatorially encoded flow cytometry studies.

Authors:  Lin Lin; Cliburn Chan; Sine R Hadrup; Thomas M Froesig; Quanli Wang; Mike West
Journal:  Stat Appl Genet Mol Biol       Date:  2013-06

3.  Parameterizing Spatial Models of Infectious Disease Transmission that Incorporate Infection Time Uncertainty Using Sampling-Based Likelihood Approximations.

Authors:  Rajat Malik; Rob Deardon; Grace P S Kwong
Journal:  PLoS One       Date:  2016-01-05       Impact factor: 3.240

4.  Clustering spatio-temporal series of confirmed COVID-19 deaths in Europe.

Authors:  A Bucci; L Ippoliti; P Valentini; S Fontanella
Journal:  Spat Stat       Date:  2021-10-06

5.  SWIFT-scalable clustering for automated identification of rare cell populations in large, high-dimensional flow cytometry datasets, part 2: biological evaluation.

Authors:  Tim R Mosmann; Iftekhar Naim; Jonathan Rebhahn; Suprakash Datta; James S Cavenaugh; Jason M Weaver; Gaurav Sharma
Journal:  Cytometry A       Date:  2014-02-14       Impact factor: 4.355

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.