| Literature DB >> 32982129 |
Yang Ni1,2, Peter Müller3, Maurice Diesendruck2, Sinead Williamson4, Yitan Zhu5, Yuan Ji6.
Abstract
We develop a scalable multi-step Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is "embarrassingly parallel" and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach makes inference for a wide range of Bayesian nonparametric mixture models applicable to large datasets. Specifically, we apply the approach to inference under a product partition model with regression on covariates. We show results for inference with two motivating data sets: a large set of electronic health records (EHR) and a bank telemarketing dataset. We find interesting clusters and competitive classification performance relative to other widely used competing classifiers. Supplementary materials for this article are available online.Entities:
Keywords: Electronic health records; non-conjugate models; parallel computing; product partition models
Year: 2019 PMID: 32982129 PMCID: PMC7518195 DOI: 10.1080/10618600.2019.1624366
Source DB: PubMed Journal: J Comput Graph Stat ISSN: 1061-8600 Impact factor: 2.302