| Literature DB >> 25745272 |
Anneli Schöniger1, Thomas Wöhling2, Luis Samaniego3, Wolfgang Nowak4.
Abstract
Bayesian model selection or averaging objectively ranks a number of plausible, competing conceptual models based on Bayes' theorem. It implicitly performs an optimal trade-off between performance in fitting available data and minimum model complexity. The procedure requires determining Bayesian model evidence (BME), which is the likelihood of the observed data integrated over each model's parameter space. The computation of this integral is highly challenging because it is as high-dimensional as the number of model parameters. Three classes of techniques to compute BME are available, each with its own challenges and limitations: (1) Exact and fast analytical solutions are limited by strong assumptions. (2) Numerical evaluation quickly becomes unfeasible for expensive models. (3) Approximations known as information criteria (ICs) such as the AIC, BIC, or KIC (Akaike, Bayesian, or Kashyap information criterion, respectively) yield contradicting results with regard to model ranking. Our study features a theory-based intercomparison of these techniques. We further assess their accuracy in a simplistic synthetic example where for some scenarios an exact analytical solution exists. In more challenging scenarios, we use a brute-force Monte Carlo integration method as reference. We continue this analysis with a real-world application of hydrological model selection. This is a first-time benchmarking of the various methods for BME evaluation against true solutions. Results show that BME values from ICs are often heavily biased and that the choice of approximation method substantially influences the accuracy of model ranking. For reliable model selection, bias-free numerical methods should be preferred over ICs whenever computationally feasible.Entities:
Keywords: Bayesian model evidence; Bayesian model selection; information criteria
Year: 2014 PMID: 25745272 PMCID: PMC4328146 DOI: 10.1002/2014WR016062
Source DB: PubMed Journal: Water Resour Res ISSN: 0043-1397 Impact factor: 5.240
Overview of Methods to Evaluate Bayesian Model Evidence
| Evaluation method | Abbreviation | Eq. | Underlying Assumptions | Comp. Effort | Performance in Linear Test Case | Performance in Non-linear Test Cases | Recommended Use |
|---|---|---|---|---|---|---|---|
| Theoretical distribution of BME | - | 9 | Gaussian parameter prior and likelihood, linear model | Negligible | Exact | Not available | Whenever available |
| Normalizing constant of parameter posterior | - | 6 | conjugate prior, linear model | Negligible | Exact | Not available | Whenever available |
| Kashyap's information criterion, evaluated at MLE | KIC@MLE | 14 | Gaussian parameter posterior, negligible influence of prior | Medium | Relatively accurate (assumptions mildly violated) | Inaccurate | KIC@MAP to be preferred |
| Kashyap's information criterion, evaluated at MAP | KIC@MAP | 15 | Gaussian parameter posterior | Medium | Exact (assumptions fulfilled) | Inaccurate | If assumptions fulfilled/ numerical techniques too expensive |
| Bayesian information criterion | BIC | 16 | Gaussian parameter posterior, negligible influence of prior | Low | Potentially very inaccurate (depending on actual data set), ignores prior | Not recommended for BMA | |
| Akaike information criterion | AIC | 18 | (not derived as approximation to BME) | Low | Potentially very inaccurate (depending on actual data set), ignores prior | Not recommended for BMA | |
| corrected Akaike information criterion | AlCc | 19 | (not derived as approximation to BME) | Low | Potentially very inaccurate (depending on actual data set), ignores prior | Not recommended for BMA | |
| Simple Monte Carlo integration | MC | 23 | None | Extreme | Slow convergence, but bias-free | Whenever computationally feasible | |
| MC integration with importance sampling | MC IS | 24 | None | High | Faster convergence, but (potentially) biased | As a more efficient alternative to MC | |
| MC integration with posterior sampling | MC PS | 25 | None | High | Even faster convergence, but even more biased (due to harmonic mean approach) | Not recommended for BMA | |
| Nested sampling | NS | 26 | None | High | Slow convergence for BME (due to uncertainty in prior mass shrinkage), but bias-free | Promising alternative to MC, more research needed | |
Figure 1Synthetic test case setup. Measurements marked in black, prior estimate of linear (L1, L2) and nonlinear (NL2, NL4) models in solid lines, 95% Bayesian prediction confidence intervals in dashed lines of the respective color.
Definition of Parameters Used in Different Scenarios of the Synthetic Test Casea
| Parameter | Symbol | Value |
|---|---|---|
| Prior mean | ||
| Prior covariance | ||
| Data set size | Ns | Ns = 15 |
| Meas. error covariance | ||
| Data set size | Ns | |
| Prior covariance | ||
| Prior mean | ||
| Prior mean (L1) | ||
| Prior variance (L1) | s2 | |
| Prior mean (NL2) | ||
| Prior covariance (NL2) | ||
| Prior mean (NL4) | ||
| Prior covariance (NL4) | ||
For variations of the base case, only differences to the base case parameters are listed.
Figure 2Prior densities (gray), likelihood (orange), and posterior densities (blue) for the different scenarios of the synthetic test case. Contour lines represent 10–90% Bayesian confidence intervals: (a) variations of prior width (fractions of base case variance shown here: 0.5,…,5), (b) variations of prior/likelihood overlap (distance between prior mean and MLE shown here: 0,…,0.3).
Figure 3Relative error of BME approximation with respect to the analytical solution for the synthetic base case as a function of ensemble size. IC solutions are plotted as horizontal lines, as they do not use realizations for BME evaluation. Results of the numerical evaluation schemes are presented with 95% Bayesian confidence intervals.
Figure 4Synthetic test case results as a function of data set size: (a) approximation of BME, (b) relative error with respect to the analytical solution with 95% Bayesian confidence intervals, (c) likelihood term approximation, (d) Occam factor approximation. The result obtained from KIC@MAP represents the analytical solution in this case.
Figure 5(left) Results obtained for the synthetic test case as a function of prior width and (right) as a function of distance between prior mean and MLE. (a and e) Approximation of BME, (b and f) relative error with respect to the analytical solution, (c and g) likelihood term approximation, (d and h) Occam factor approximation. The result obtained from KIC@MAP represents the analytical solution in this case.
Relative Error of BME Approximation Methods for Different Model Structures as Compared to the Reference Solution (Analytical Solution Equal to the KIC@MAP in Case of Linear Models, Brute-Force MC Integration in Case of Nonlinear Models, Highlighted in Italic Font)a
| Method | ||||
|---|---|---|---|---|
| KIC@MLE | 0.9 | 30.4 | 24.9 | 99.8 |
| KIC@MAP | 13.2 | 59.4 | ||
| BIC | 94.0 | 21.3 | 37.5 | 70.3 |
| AIC | 176.4 | 59.7 | 179.2 | 22.5 |
| AICc | 137.0 | 3.2 | 69.4 | 83.4 |
| MC | 0.0 | 0.0 | ||
| MC IS | 0.8 [0.0; 2.2] | 0.1 [0.0; 0.4] | 1.1 [0.0; 2.9] | 1.5 [0.1; 4.1] |
| MC PS | 132.0 [25.7; 196.6] | 131.8 [23.5; 221.7] | 324.8 [40.8; 490.3] | 232.8 [18.1; 481.8] |
| NS | 2.4 [0.1; 6.4] | 2.8 [0.4; 27.2] | 4.0 [0.2; 10.7] | 11.2 [3.3; 18.9] |
95% Bayesian confidence intervals of numerical results given in parentheses.
Figure 6Posterior model weights as obtained from the different BME evaluation methods for linear (L1, L2) and nonlinear (NL2, NL4) models. L2, L2a, and L2b represent the same linear model with differently shaped priors. Green vertical lines indicate reference solution (obtained from brute-force MC integration).
Figure 7Model predictions of discharge for first year of calibration time series. (a) mHM1L, (b) mHM2L. Prior 95% Bayesian confidence intervals are shaded in gray, posterior 95% Bayesian confidence intervals in respective color.
Performance of BME Approximation Methods in Hydrological Test Case as Compared to the Brute-Force MC Reference Solution (Highlighted in Italic Font)
| Method | Full Time Series | Reduced Time Series | ||
|---|---|---|---|---|
| KIC | −5.2 | −7.2 | −6.8 | −8.4 |
| BIC | −103.1 | −120.6 | −79.7 | −92.1 |
| AIC | −15.0 | −20.3 | −23.1 | −27.8 |
| AICc | −25.7 | −31.2 | ||
| MC IS | 1.4 | 3.3 | 1.0 | 1.5 |
| MC PS | 3.1 | 4.8 | 0.8 | 1.0 |
| NS | 1.1 | 0.5 | 0.3 | −0.1 |
Figure 8Posterior model weights for (a) full observation time series and b) reduced observation time series as obtained from the different BME evaluation methods for models mHM1L (dark blue) and mHM2L (light blue). Green vertical line indicates reference solution generated by MC integration.