Literature DB >> 32982129

Scalable Bayesian Nonparametric Clustering and Classification.

Yang Ni1,2, Peter Müller3, Maurice Diesendruck2, Sinead Williamson4, Yitan Zhu5, Yuan Ji6.   

Abstract

We develop a scalable multi-step Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is "embarrassingly parallel" and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach makes inference for a wide range of Bayesian nonparametric mixture models applicable to large datasets. Specifically, we apply the approach to inference under a product partition model with regression on covariates. We show results for inference with two motivating data sets: a large set of electronic health records (EHR) and a bank telemarketing dataset. We find interesting clusters and competitive classification performance relative to other widely used competing classifiers. Supplementary materials for this article are available online.

Entities:  

Keywords:  Electronic health records; non-conjugate models; parallel computing; product partition models

Year:  2019        PMID: 32982129      PMCID: PMC7518195          DOI: 10.1080/10618600.2019.1624366

Source DB:  PubMed          Journal:  J Comput Graph Stat        ISSN: 1061-8600            Impact factor:   2.302


  14 in total

1.  Are Gibbs-Type Priors the Most Natural Generalization of the Dirichlet Process?

Authors:  Pierpaolo De Blasi; Stefano Favaro; Antonio Lijoi; Ramsés H Mena; Igor Prünster; Matteo Ruggiero
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2015-02       Impact factor: 6.226

2.  Fitting semiparametric random effects models to large data sets.

Authors:  Michael L Pennell; David B Dunson
Journal:  Biostatistics       Date:  2007-04-11       Impact factor: 5.899

3.  Defining Predictive Probability Functions for Species Sampling Models.

Authors:  Jaeyong Lee; Fernando A Quintana; Peter Müller; Lorenzo Trippa
Journal:  Stat Sci       Date:  2013       Impact factor: 2.901

4.  Sparse covariance estimation in heterogeneous samples.

Authors:  Abel Rodríguez; Alex Lenkoski; Adrian Dobra
Journal:  Electron J Stat       Date:  2011-09-15       Impact factor: 1.125

5.  A Product Partition Model With Regression on Covariates.

Authors:  Peter Müller; Fernando Quintana; Gary L Rosner
Journal:  J Comput Graph Stat       Date:  2011-03-01       Impact factor: 2.302

6.  Heterogeneous reciprocal graphical models.

Authors:  Yang Ni; Peter Müller; Yitan Zhu; Yuan Ji
Journal:  Biometrics       Date:  2017-10-10       Impact factor: 2.571

7.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

8.  Two-Stage Metropolis-Hastings for Tall Data.

Authors:  Richard D Payne; Bani K Mallick
Journal:  J Classif       Date:  2018-03-16       Impact factor: 1.673

9.  Semiparametric Bayesian classification with longitudinal markers.

Authors:  Rolando De la Cruz-Mesía; Fernando A Quintana; Peter Müller
Journal:  J R Stat Soc Ser C Appl Stat       Date:  2007-03       Impact factor: 1.864

10.  Optimal Bayesian estimators for latent variable cluster models.

Authors:  Riccardo Rastelli; Nial Friel
Journal:  Stat Comput       Date:  2017-10-31       Impact factor: 2.559

View more
  3 in total

1.  Consensus Monte Carlo for Random Subsets using Shared Anchors.

Authors:  Yang Ni; Yuan Ji; Peter Müller
Journal:  J Comput Graph Stat       Date:  2020-04-15       Impact factor: 2.302

2.  Bayesian biclustering for microbial metagenomic sequencing data via multinomial matrix factorization.

Authors:  Fangting Zhou; Kejun He; Qiwei Li; Robert S Chapkin; Yang Ni
Journal:  Biostatistics       Date:  2022-07-18       Impact factor: 5.279

3.  Consensus clustering for Bayesian mixture models.

Authors:  Stephen Coleman; Paul D W Kirk; Chris Wallace
Journal:  BMC Bioinformatics       Date:  2022-07-21       Impact factor: 3.307

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.