| Literature DB >> 33897113 |
Abstract
Given noisy, partial observations of a time-homogeneous, finite-statespace Markov chain, conceptually simple, direct statistical inference is available, in theory, via its rate matrix, or infinitesimal generator, Q , since exp ( Q t ) is the transition matrix over time t. However, perhaps because of inadequate tools for matrix exponentiation in programming languages commonly used amongst statisticians or a belief that the necessary calculations are prohibitively expensive, statistical inference for continuous-time Markov chains with a large but finite state space is typically conducted via particle MCMC or other relatively complex inference schemes. When, as in many applications Q arises from a reaction network, it is usually sparse. We describe variations on known algorithms which allow fast, robust and accurate evaluation of the product of a non-negative vector with the exponential of a large, sparse rate matrix. Our implementation uses relatively recently developed, efficient, linear algebra tools that take advantage of such sparsity. We demonstrate the straightforward statistical application of the key algorithm on a model for the mixing of two alleles in a population and on the Susceptible-Infectious-Removed epidemic model.Entities:
Keywords: Bayesian inference; Likelihood inference; Markov jump process; Matrix exponential
Year: 2021 PMID: 33897113 PMCID: PMC8054858 DOI: 10.1007/s00180-021-01102-6
Source DB: PubMed Journal: Comput Stat ISSN: 0943-4062 Impact factor: 1.000
Time (in units of 31 days), and numbers of susceptibles and infecteds, originally from Raggett (1982). The final rows indicates, for each pair of consecutive observations, the size of the statespace for evaluating the transition probability and the value for the associated rate matrix
| Time | 0 | 0.5 | 1.0 | 1.5 | 2.0 | 2.5 | 3.0 | 4.0 |
|---|---|---|---|---|---|---|---|---|
| S | 254 | 235 | 201 | 153 | 121 | 110 | 97 | 83 |
| I | 7 | 14 | 22 | 29 | 20 | 8 | 8 | 0 |
| - | 261 | 946 | 2059 | 1387 | 289 | 197 | 346 | |
| - | 245 | 867 | 1868 | 1308 | 282 | 181 | 240 | |
| - | 101.5 | 171.4 | 217.1 | 170.1 | 83.1 | 53.6 | 106.3 |
Timings for estimating the full log-likelihood (1000 repeats) and the log-likelihood for the jump from the initial to the final observation (20 repeats) for the Eyam data set, number of sparse vector-matrix multiplications for one repeat, and the accuracies of the estimates. Results are given for the method of Ho et al. (2018) (HCS), the expAtv function in the expm package, which uses the Krylov subspace techniques of Sidje (1998), the method of Al-Mohy and Higham (2011) (AMH), the uniformisation algorithm (Unif) and the scaling and squaring algorithm (SS). The timing for SS on the jump likelihood was estimated from a single repeat
| Algorithm | Full likelihood | Jump likelihood | ||||
|---|---|---|---|---|---|---|
| Time (secs) | Mult | Accuracy | Time (secs) | Mult | Accuracy | |
| HCS | 45.3 | – | 9.7 | – | ||
| expAtv | 558.5 | – | 323.2 | – | ||
| AMH | – | 3701 | - | 14300 | ||
| Unif | 18.72 | 1596 | 15.2 | 3921 | ||
| SS | 1678 | – | 8940 | - | ||
Fig. 1Moran model (left): true values (), observations (), filtering/prediction mean (solid lines) and quantiles (dashed and dotted lines) for a further time of 5000 from data up to and data up to . Swansea measles SIR model (right): posterior median (solid line, with to show the positions) and posterior quantiles for the number of infected people at each real or latent observation time (top) and the cumulative number recovered by that time (bottom)
Number of measles notifications in the Swansea Local Authority area by month (from http://www.wales.nhs.uk/sitesplus/888/page/66389, February 10th 2020)
| Month | 2012 | 2013 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Oct | Nov | Dec | Jan | Feb | Mar | Apr | May | Jun | Jul | |
| Day number | 0 | 30 | 61 | 92 | 120 | 151 | 181 | 212 | 242 | 273 |
| Notifications | 0 | 10 | 27 | 34 | 59 | 183 | 278 | 56 | 17 | 0 |