| Literature DB >> 23341757 |
Mikael Sunnåker1, Alberto Giovanni Busetto, Elina Numminen, Jukka Corander, Matthieu Foll, Christophe Dessimoz.
Abstract
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. In all model-based statistical inference, the likelihood function is of central importance, since it expresses the probability of the observed data under a particular statistical model, and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate. ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically well-founded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed. Furthermore, the wider application domain of ABC exacerbates the challenges of parameter estimation and model selection. ABC has rapidly gained popularity over the last years and in particular for the analysis of complex problems arising in biological sciences (e.g., in population genetics, ecology, epidemiology, and systems biology).Entities:
Mesh:
Year: 2013 PMID: 23341757 PMCID: PMC3547661 DOI: 10.1371/journal.pcbi.1002803
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Parameter estimation by Approximate Bayesian Computation: a conceptual overview.
Figure 2A dynamic bistable hidden Markov model.
Example of ABC rejection algorithm.
| i | θ | Simulated Datasets (Step 2) | Summary Statistic ω | Distance ρ (ω | Outcome (Step 4) |
| 1 | 0.08 | AABAAAABAABAAABAAAAA | 8 | 2 | accepted |
| 2 | 0.68 |
| 13 | 7 | rejected |
| 3 | 0.87 |
| 9 | 3 | rejected |
| 4 | 0.43 |
| 6 | 0 | accepted |
| 5 | 0.53 |
| 9 | 3 | rejected |
Figure 3Posterior of θ obtained in the example (red), compared with the true posterior distribution (black), and ABC simulations with large n.
The use of the insufficient summary statistic ω introduces a bias, even when requiring ε = 0 (light green).
Potential risks and remedies in ABC-based statistical inference.
| Error Source | Potential Issue | Solution | Subsection |
| Nonzero tolerance ε | The inexactness introduces a bias in the computed posterior distribution. | Theoretical/practical studies of the sensitivity of the posterior distribution to the tolerance. Noisy ABC. | Approximation of the posterior |
| Nonsufficient summary statistics | The information loss causes inflated credible intervals. | Automatic selection/semi-automatic identification of sufficient statistics. Model validation checks (e.g., Templeton 2009 | Choice and sufficiency of summary statistics |
| Small number of models/mis-specified models | The investigated models are not representative/lack predictive power. | Careful selection of models. Evaluation of the predictive power. | Small number of models |
| Priors and parameter ranges | Conclusions may be sensitive to the choice of priors. Model choice may be meaningless. | Check sensitivity of Bayes factors to the choice of priors. Some theoretical results regarding choice of priors are available. Use alternative methods for model validation. | Prior distribution and parameter ranges |
| Curse-of-dimensionality | Low parameter acceptance rates. Model errors cannot be distinguished from an insufficient exploration of the parameter space. Risk of overfitting. | Methods for model reduction if applicable. Methods to speed up the parameter exploration. Quality controls to detect overfitting. | Curse-of-dimensionality |
| Model ranking with summary statistics | The computation of Bayes factors on summary statistics may not be related to the Bayes factors on the original data, which may therefore render the results meaningless. | Only use summary statistics that fulfill the necessary and sufficient conditions to produce a consistent Bayesian model choice. Use alternative methods for model validation. | Bayes factor with ABC and summary statistics |
| Implementation | Low protection to common assumptions in the simulation and the inference process. | Sanity checks of results. Standardization of software. | Indispensable quality controls |
Software incorporating ABC.
| Software | Keywords and Features | Reference |
| DIY-ABC | Software for fit of genetic data to complex situations. Comparison of competing models. Parameter estimation. Computation of bias and precision measures for a given model and known parameters values. |
|
| ABC R package | Several ABC algorithms for performing parameter estimation and model selection. Nonlinear heteroscedastic regression methods for ABC. Cross-validation tool. |
|
| ABC-SysBio | Python package. Parameter inference and model selection for dynamical systems. Combines ABC rejection sampler, ABC SMC for parameter inference, and ABC SMC for model selection. Compatible with models written in Systems Biology Markup Language (SBML). Deterministic and stochastic models. |
|
| ABCtoolbox | Open source programs for various ABC algorithms including rejection sampling, MCMC without likelihood, a particle-based sampler, and ABC-GLM. Compatibility with most simulation and summary statistics computation programs. |
|
| msBayes | Open source software package consisting of several C and R programs that are run with a Perl “front-end.” Hierarchical coalescent models. Population genetic data from multiple co-distributed species. |
|
| PopABC | Software package for inference of the pattern of demographic divergence. Coalescent simulation. Bayesian model choice. |
|
| ONeSAMP | Web-based program to estimate the effective population size from a sample of microsatellite genotypes. Estimates of effective population size, together with 95% credible limits. |
|
| ABC4F | Software for estimation of F-statistics for dominant data. |
|
| 2BAD | Two-event Bayesian ADmixture. Software allowing up to two independent admixture events with up to three parental populations. Estimation of several parameters (admixture, effective sizes, etc.). Comparison of pairs of admixture models. |
|