| Literature DB >> 29967358 |
Subhadeep Mukhopadhyay1, Douglas Fletcher2.
Abstract
The two key issues of modern Bayesian statistics are: (i) establishing principled approach for distilling statistical prior that is consistent with the given data from an initial believable scientific prior; and (ii) development of a consolidated Bayes-frequentist data analysis workflow that is more effective than either of the two separately. In this paper, we propose the idea of "Bayes via goodness-of-fit" as a framework for exploring these fundamental questions, in a way that is general enough to embrace almost all of the familiar probability models. Several examples, spanning application areas such as clinical trials, metrology, insurance, medicine, and ecology show the unique benefit of this new point of view as a practical data science tool.Entities:
Year: 2018 PMID: 29967358 PMCID: PMC6040203 DOI: 10.1038/s41598-018-28130-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Graphical diagnostic tool: U-functions for (a) rat tumor data; (b) terbinafine and ulcer data; (c) rolling tacks data. The deviation from uniformity (red dotted line) indicates that the default prior contradicts the observed data. The flat shape of the U-function in panel (b) suggests Beta(1.24, 34.7) and are consistent with the terbinafine and ulcer data, respectively.
Details on the distributions, their conjugate priors, and the resulting marginal and posterior distributions for four familiar distributions (two discrete and two continuous): Binomial, Poisson, Normal, and Exponential.
| Family | Conjugate | Marginal [ | Posterior [ |
|---|---|---|---|
| Binomial( | Beta( |
| Beta( |
| Poisson( | Gamma( |
|
|
|
| Normal( |
|
|
| Exp( | Gamma( |
|
|
For the normal-normal posterior and in the marginal of the Poisson-gamma p = 1/(1 + β). We use to denote the normalizing constant of beta distribution.
For the insurance data set, estimates for the number of claims expected in the following year by an individual who made y claims during the present year, , by five different methods.
| Claims | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|---|
| Counts | 7840 | 1317 | 239 | 42 | 14 | 4 | 4 | 1 |
| Gamma PEB | 0.164 | 0.398 | 0.633 | 0.87 | 1.10 | 1.34 | 1.57 | 1.80 |
| Robbins’ EB | 0.168 | 0.363 | 0.527 | 1.33 | 1.43 | 6.00 | 1.75 | — |
| Deconvolve | 0.164 | 0.377 | 0.642 | 1.14 | 2.13 | 3.45 | 4.47 | 5.08 |
| NPMLE | 0.168 | 0.362 | 0.534 | 1.24 | 2.21 | 2.53 | 2.58 | 2.58 |
| DS Elastic-Bayes | 0.156 | 0.322 | 0.517 | 0.744 | 1.02 | 1.56 | 3.01 | 5.24 |
Figure 2Comparisons of the DS(G, m) prior (solid red) with the respective parametric EB (PEB) priors g(θ; α, β) (dashed blue) for the (a) rat tumor data, (b) surgical node data, (c) Navy shipyard data, and (d) insurance data.
Figure 3Estimated macro-inference summary along with standard errors (using smooth bootstrap) are shown. Panel (a) displays the rat tumor data modes located at 0.034 (±0.016) and 0.156 (±0.016). Panel (b) shows the estimated unimodal prior of the terbinafine data has a mean at 0.034 (±0.006). Panel (c) presents the modes of the rolling tacks data at 0.55 (±0.022) and 0.77 (±0.018).
Two group partitions of the rat tumor studies based on K-means clustering on the posterior mode predictions (see Section 3.3 and Fig. 5(c)).
| Group | Studies |
|---|---|
| 1 | (0, 20), (0, 20), (0, 20), (0, 20), (0, 20), (0, 20), (0, 20), (0, 19), (0, 19), (0, 19), (0, 19) |
| 2 | (3, 27), (2, 25), (2, 24), (2, 23), (2, 20), (2, 20), (2, 20), (2, 20), (2, 20), (2, 20), (1, 10) |
Figure 5Comparisons of DS Elastic-Bayes and PEB posterior predictions of the rat tumor data: (a) posterior means, (b) posterior medians, and (c) posterior modes. The vertical red triangles indicate the location of the modes on the DS prior; the blue triangles respectively denote the mean, median, and mode of the parametric .
Measurements (sorted) along with their uncertainty from different laboratories in arsenic data.
| Laboratory | 1 | 2 | 3 | 4 | 5 |
| 25 | 26 | 27 | 28 |
|---|---|---|---|---|---|---|---|---|---|---|
| Measurement ( | 9.78 | 10.18 | 10.35 | 11.60 | 12.01 |
| 14.70 | 15.00 | 15.10 | 15.50 |
| Uncertainty ( | 0.30 | 0.46 | 0.07 | 0.78 | 2.62 |
| 0.30 | 1.00 | 0.20 | 1.60 |
Figure 4Panel (a) shows the U-function, while panel (b) compares the DS-prior (solid red) with the PEB prior g(θ; α, β) (dashed blue) for the arsenic data. Based on the estimated macro-inference summary along with standard errors (using smooth bootstrap), the best consensus value is the mode 13.6 (±0.242).
Figure 6Panel (a) illustrates the prior-data conflict for η = 0.1 versus η = 0.4; ‘*’ denotes 0.3, the true mean of ynew. Panel (b) shows the MSE ratios for PEB to Frequentist MLE (PEB/FQ; green) and PEB to DS (PEB/DS; red) with respect to η. Notice that as more prior-data conflict is introduced, DS outperforms PEB while frequentist MLE performance improves.
Figure 7Panel (a) shows DS posterior plots of three observations from the surgical node data: (y = 7, n = 32), (y = 3, n = 6), and (y = 17, n = 18). For panels (b) through (f), red denotes the DS posterior and blue dashed is the PEB posterior. Panel (b) is for the rat tumor data. Panel (c) displays for the Navy shipyard data. The second row shows the posterior distributions of (d) y = 3, (e) y = 6, and (f) y = 8 from the rolling tacks data.
Figure 8Panel (a) displays the estimated DS(G, m = 4) prior (solid red) with the PEB Gamma prior g(θ; α, β) (dashed blue) for the butterfly data; these results indicate that Fisher’s Gamma-prior guess required some correction. Panel (b) shows estimates for the number of butterfly species caught in the following year by the Gamma PEB, Robbins’ formula, Bayesian deconvolution, NPMLE, and our Elastic-Bayes estimate.