Literature DB >> 30048667

Full likelihood inference from the site frequency spectrum based on the optimal tree resolution.

Raazesh Sainudiin1, Amandine Véber2.   

Abstract

We develop a novel importance sampler to compute the full likelihood function of a demographic or structural scenario given the site frequency spectrum (SFS) at a locus free of intra-locus recombination. This sampler, instead of representing the hidden genealogy of a sample of individuals by a labelled binary tree, uses the minimal level of information about such a tree that is needed for the likelihood of the SFS and thus takes advantage of the huge reduction in the size of the state space that needs to be integrated. We assume that the population may have demographically changed and may be non-panmictically structured, as reflected by the branch lengths and the topology of the genealogical tree of the sample, respectively. We also assume that mutations conform to the infinitely-many-sites model. We achieve this by a controlled Markov process that generates 'particles' in the hidden space of SFS histories which are always compatible with the observed SFS. To produce the particles, we use Aldous' Beta-splitting model for a one parameter family of prior distributions over genealogical topologies or shapes (including that of the Kingman coalescent) and allow the branch lengths or epoch times to have a parametric family of priors specified by a model of demography (including exponential growth and bottleneck models). Assuming independence across unlinked loci, we can estimate the likelihood of a population scenario based on a large collection of independent SFS by an importance sampling scheme, using the (unconditional) distribution of the genealogies under this scenario when the latter is available. When it is not available, we instead compute the joint likelihood of the tree balance parameter β assuming that the tree topology follows Aldous' Beta-splitting model, and of the demographic scenario determining the distribution of the inter-coalescence times or epoch times in the genealogy of a sample, in order to at least distinguish different equivalence classes of population scenarios leading to different tree balances and epoch times. Simulation studies are conducted to demonstrate the capabilities of the approach with publicly available code.
Copyright © 2018 Elsevier Inc. All rights reserved.

Keywords:  Controlled Markov process on hidden genealogical trees; Importance sampler; Optimal tree resolution; Semi-parametric estimation

Mesh:

Year:  2018        PMID: 30048667     DOI: 10.1016/j.tpb.2018.07.002

Source DB:  PubMed          Journal:  Theor Popul Biol        ISSN: 0040-5809            Impact factor:   1.570


  1 in total

1.  SEQUENTIAL IMPORTANCE SAMPLING FOR MULTIRESOLUTION KINGMAN-TAJIMA COALESCENT COUNTING.

Authors:  Lorenzo Cappello; Julia A Palacios
Journal:  Ann Appl Stat       Date:  2020-06       Impact factor: 2.083

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.