| Literature DB >> 34776654 |
Abstract
We consider the modeling of data generated by a latent continuous-time Markov jump process with a state space of finite but unknown dimensions. Typically in such models, the number of states has to be pre-specified, and Bayesian inference for a fixed number of states has not been studied until recently. In addition, although approaches to address the problem for discrete-time models have been developed, no method has been successfully implemented for the continuous-time case. We focus on reversible jump Markov chain Monte Carlo which allows the trans-dimensional move among different numbers of states in order to perform Bayesian inference for the unknown number of states. Specifically, we propose an efficient split-combine move which can facilitate the exploration of the parameter space, and demonstrate that it can be implemented effectively at scale. Subsequently, we extend this algorithm to the context of model-based clustering, allowing numbers of states and clusters both determined during the analysis. The model formulation, inference methodology, and associated algorithm are illustrated by simulation studies. Finally, we apply this method to real data from a Canadian healthcare system in Quebec. SUPPLEMENTARY INFORMATION: The online version supplementary material available at 10.1007/s11222-021-10032-8.Entities:
Keywords: Bayesian model selection; Continuous-time processes; Hidden Markov models; Markov chain Monte Carlo; Model-based clustering; Reversible jump algorithms
Year: 2021 PMID: 34776654 PMCID: PMC8550639 DOI: 10.1007/s11222-021-10032-8
Source DB: PubMed Journal: Stat Comput ISSN: 0960-3174 Impact factor: 2.559
Example 5.1: Posterior distribution of the number of hidden states. The true number of states is four
| # of hidden states | Normal | Normal | Normal | Poisson |
|---|---|---|---|---|
| 1 | 0.0001 | 0.0001 | 0.0001 | 0.0001 |
| 2 | 0.0005 | 0.0001 | 0.0001 | 0.0002 |
| 3 | 0.0002 | 0.0002 | 0.0002 | 0.0001 |
| 4 | 0.4906 | 0.3270 | 0.2534 | 0.6993 |
| 5 | 0.3845 | 0.3927 | 0.3624 | 0.2506 |
| 6 | 0.1175 | 0.2237 | 0.2786 | 0.0482 |
| 7 | 0.0067 | 0.0521 | 0.0928 | 0.0017 |
| 8 | 0.0000 | 0.0040 | 0.0126 | 0.0000 |
| 9 | 0.0000 | 0.0003 | 0.0000 | 0.0000 |
Fig. 1Posterior distribution of the number of states for Normal case with 100 replications for the same dataset
Fig. 2Posterior distribution of the number of states with different prior specifications. TrPois(3.5) represents the zero-truncated Poisson(3.5). Geom(0.2) represents geometric distribution with success probability 0.2. TrNegBin(2,0.75) represents zero-truncated Negative Binomial(2,0.75), and Unif(1,10) represents the discrete uniform distribution
Example 5.3: Posterior distribution of the number of states (Intercept Only). The true number of states is three
| # of hidden states | Normal | Normal | Normal | Poisson |
|---|---|---|---|---|
| 1 | 0.0001 | 0.0001 | 0.0001 | 0.0001 |
| 2 | 0.0001 | 0.0002 | 0.0002 | 0.0001 |
| 3 | 0.9823 | 0.9671 | 0.9533 | 0.9765 |
| 4 | 0.0174 | 0.0328 | 0.0465 | 0.0228 |
| 5 | 0.0003 | 0.0000 | 0.0000 | 0.0005 |
Example 5.4: Posterior distribution of the number of cluster with varying states. The true number of clusters is four
| # of cluster | Normal | Normal | Normal | Poisson |
|---|---|---|---|---|
| 1 | 0.0001 | 0.0002 | 0.0001 | 0.0001 |
| 2 | 0.0001 | 0.0003 | 0.0004 | 0.0001 |
| 3 | 0.0092 | 0.0072 | 0.0029 | 0.0151 |
| 4 | 0.8706 | 0.4085 | 0.3390 | 0.8713 |
| 5 | 0.1130 | 0.4354 | 0.2640 | 0.1130 |
| 6 | 0.0070 | 0.1016 | 0.1930 | 0.0004 |
| 0.0000 | 0.0468 | 0.2006 | 0.0000 |
Application: Posterior distribution of the number of states corresponding to models with and without healthcare utilizations as a covariate in Sects. 6.1.1 and 6.1.2, respectively
| # of states | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
| With Covariates | 0.0001 | 0.0002 | 0.0002 | 0.4212 | 0.0796 | 0.0415 | 0.0045 | |
| Without Covariates | 0.0001 | 0.0005 | 0.3863 | 0.1261 | 0.0064 | 0.0000 | 0.0000 |
The bold value represent the posterior mode of the number of states
Fig. 3Application: Trace plot for the number of states over 20000 iterations to identify the number of states. The left panel is the observation model with the types of healthcare utilization as covariates, while the right panel is the model without covariates
Application: Exponential of coefficients (Parameters in the GLM for each state)
| Variables | State 1 | State 2 | State 3 | State 4 | State 5 |
|---|---|---|---|---|---|
| Intercept | 2.05 | 3.53 | 4.55 | 6.03 | 7.52 |
| (95% CI) | (1.80,2.34) | (3.00,4.02) | (3.74,5.32) | (5.35,6.87) | (7.02,8.14) |
| ED | 1.03 | 1.00 | 0.99 | 0.98 | 0.97 |
| (95% CI) | (1.00,1.06) | (0.98,1.02) | (0.97,1.00) | (0.97,0.99) | (0.95,0.99) |
| HOSP | 1.00 | 1.00 | 0.99 | 0.98 | 0.96 |
| (95% CI) | (0.95,1.05) | (0.95,1.02) | (0.95,1.01) | (0.97,1.00) | (0.92,0.99) |
| SPEC | 1.02 | 0.98 | 0.96 | 0.97 | 0.93 |
| (95% CI) | (0.98,1.06) | (0.96,1.01) | (0.94,0.98) | (0.95,0.98) | (0.89,0.96) |
Application: Expected number of drugs for the intercept-only model over the time spent in each state
| State 1 | State 2 | State 3 | State 4 | |
|---|---|---|---|---|
| Expected # of Drug Prescribed | 3.19 | 4.00 | 4.75 | 5.90 |
| (95% CI) | (2.89,3.31) | (3.29,4.58) | (4.53,5.84) | (5.85,6.05) |
Application: Posterior distribution of the number of cluster and numbers of states conditional on three-cluster iterations
| Number of clusters | Number of states | |||
|---|---|---|---|---|
| Cluster 1 | Cluster 2 | Cluster 3 | ||
| 1 | 0.0058 | 0.0000 | 0.0000 | 0.0000 |
| 2 | 0.3678 | 0.0000 | ||
| 3 | 0.0418 | 0.2447 | 0.1566 | |
| 4 | 0.0820 | 0.0140 | 0.1627 | |
| 5 | 0.0072 | 0.0002 | 0.0000 | 0.0269 |
| 6 | 0.0014 | 0.0000 | 0.0000 | 0.0058 |
Bold value represents the posterior modes of the numbers of clusters and states
Fig. 4Application: trace plots for the number of clusters and of the number of states on three-cluster iterations
Application: Expected number of drugs for the three-cluster Poisson model
| State 1 | State 2 | State 3 | State 4 | ||
|---|---|---|---|---|---|
| Cluster 1 | # of Drug Prescribed | 2.04 | 3.38 | 4.79 | 6.43 |
| (95% CI) | (1.93,2.26) | (3.04,3.77) | (4.42,5.28) | (6.12,6.79) | |
| Cluster 2 | # of Drug Prescribed | 3.72 | 6.62 | ||
| (95% CI) | (3.13,4.48) | (4.19,7.74) | |||
| Cluster 3 | # of Drug Prescribed | 3.15 | 5.61 | ||
| (95% CI) | (2.48,5.47) | (4.09,7.66) |