Literature DB >> 20887273

On optimal selection of summary statistics for approximate Bayesian computation.

Matthew A Nunes1, David J Balding.   

Abstract

How best to summarize large and complex datasets is a problem that arises in many areas of science. We approach it from the point of view of seeking data summaries that minimize the average squared error of the posterior distribution for a parameter of interest under approximate Bayesian computation (ABC). In ABC, simulation under the model replaces computation of the likelihood, which is convenient for many complex models. Simulated and observed datasets are usually compared using summary statistics, typically in practice chosen on the basis of the investigator's intuition and established practice in the field. We propose two algorithms for automated choice of efficient data summaries. Firstly, we motivate minimisation of the estimated entropy of the posterior approximation as a heuristic for the selection of summary statistics. Secondly, we propose a two-stage procedure: the minimum-entropy algorithm is used to identify simulated datasets close to that observed, and these are each successively regarded as observed datasets for which the mean root integrated squared error of the ABC posterior approximation is minimized over sets of summary statistics. In a simulation study, we both singly and jointly inferred the scaled mutation and recombination parameters from a population sample of DNA sequences. The computationally-fast minimum entropy algorithm showed a modest improvement over existing methods while our two-stage procedure showed substantial and highly-significant further improvement for both univariate and bivariate inferences. We found that the optimal set of summary statistics was highly dataset specific, suggesting that more generally there may be no globally-optimal choice, which argues for a new selection for each dataset even if the model and target of inference are unchanged.

Mesh:

Year:  2010        PMID: 20887273     DOI: 10.2202/1544-6115.1576

Source DB:  PubMed          Journal:  Stat Appl Genet Mol Biol        ISSN: 1544-6115


  26 in total

Review 1.  Post-GWAS: where next? More samples, more SNPs or more biology?

Authors:  P Marjoram; A Zubair; S V Nuzhdin
Journal:  Heredity (Edinb)       Date:  2013-06-12       Impact factor: 3.821

2.  Integrating multiple lines of evidence into historical biogeography hypothesis testing: a Bison bison case study.

Authors:  Jessica L Metcalf; Stefan Prost; David Nogués-Bravo; Eric G DeChaine; Christian Anderson; Persaram Batra; Miguel B Araújo; Alan Cooper; Robert P Guralnick
Journal:  Proc Biol Sci       Date:  2014-01-08       Impact factor: 5.349

3.  Lack of confidence in approximate Bayesian computation model choice.

Authors:  Christian P Robert; Jean-Marie Cornuet; Jean-Michel Marin; Natesh S Pillai
Journal:  Proc Natl Acad Sci U S A       Date:  2011-08-29       Impact factor: 11.205

4.  AABC: approximate approximate Bayesian computation for inference in population-genetic models.

Authors:  Erkan O Buzbas; Noah A Rosenberg
Journal:  Theor Popul Biol       Date:  2014-09-26       Impact factor: 1.570

5.  Using ABC and microsatellite data to detect multiple introductions of invasive species from a single source.

Authors:  A Benazzo; S Ghirotto; S T Vilaça; S Hoban
Journal:  Heredity (Edinb)       Date:  2015-04-29       Impact factor: 3.821

6.  Bayesian learning from marginal data in bionetwork models.

Authors:  Fernando V Bonassi; Lingchong You; Mike West
Journal:  Stat Appl Genet Mol Biol       Date:  2011-10-27

7.  Storytelling and story testing in domestication.

Authors:  Pascale Gerbault; Robin G Allaby; Nicole Boivin; Anna Rudzinski; Ilaria M Grimaldi; J Chris Pires; Cynthia Climer Vigueira; Keith Dobney; Kristen J Gremillion; Loukas Barton; Manuel Arroyo-Kalin; Michael D Purugganan; Rafael Rubio de Casas; Ruth Bollongino; Joachim Burger; Dorian Q Fuller; Daniel G Bradley; David J Balding; Peter J Richerson; M Thomas P Gilbert; Greger Larson; Mark G Thomas
Journal:  Proc Natl Acad Sci U S A       Date:  2014-04-21       Impact factor: 11.205

8.  Fundamentals and Recent Developments in Approximate Bayesian Computation.

Authors:  Jarno Lintusaari; Michael U Gutmann; Ritabrata Dutta; Samuel Kaski; Jukka Corander
Journal:  Syst Biol       Date:  2017-01-01       Impact factor: 15.683

9.  Approximation Bayesian Computation.

Authors:  Paul Marjoram
Journal:  OA Genet       Date:  2013-05-01

10.  Calibrating spatio-temporal models of leukocyte dynamics against in vivo live-imaging data using approximate Bayesian computation.

Authors:  Juliane Liepe; Harriet Taylor; Chris P Barnes; Maxime Huvet; Laurence Bugeon; Thomas Thorne; Jonathan R Lamb; Margaret J Dallman; Michael P H Stumpf
Journal:  Integr Biol (Camb)       Date:  2012-02-10       Impact factor: 2.192

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.