| Literature DB >> 35976512 |
Yao-Ban Chan1, Qiuyi Li2, Celine Scornavacca3.
Abstract
methods seek to infer a species tree from a set of gene trees. A desirable property of such methods is that of statistical consistency; that is, the probability of inferring the wrong species tree (the error probability) tends to 0 as the number of input gene trees becomes large. A popular paradigm is to infer a species tree that agrees with the maximum number of quartets from the input set of gene trees; this has been proved to be statistically consistent under several models of gene evolution. In this paper, we study the asymptotic behaviour of the error probability of such methods in this limit, and show that it decays exponentially. For a 4-taxon species tree, we derive a closed form for the asymptotic behaviour in terms of the probability that the gene evolution process produces the correct topology. We also derive bounds for the sample complexity (the number of gene trees required to infer the true species tree with a given probability), which outperform existing bounds. We then extend our results to bounds for the asymptotic behaviour of the error probability for any species tree, and compare these to the true error probability for some model species trees using simulations.Entities:
Keywords: Asymptotic behaviour; Multispecies coalescent; Sample complexity; Species tree
Mesh:
Year: 2022 PMID: 35976512 PMCID: PMC9385842 DOI: 10.1007/s00285-022-01786-4
Source DB: PubMed Journal: J Math Biol ISSN: 0303-6812 Impact factor: 2.164
Fig. 1The function f(p)
Fig. 2The logarithm of the error probability (for a 4-taxon species tree) vs N, for fixed p (fixed l under the MSC). We show the asymptotic behaviour (black), our bounds (blue), and the upper bound of Shekhar et al. (2017) (red) (color figure online)
Fig. 3The sample complexity of ASTRAL (for a 4-taxon species tree) vs , for fixed . We show the asymptotic behaviour (black), our bounds (blue), and the bounds of Shekhar et al. (2017) (red) (color figure online)
Fig. 4The three model trees. The branch that provides the lower bound is highlighted in red (color figure online)
Fig. 5Growth constants for for the three model trees, with our asymptotic bounds (blue) and the upper bound of Shekhar et al. (2017) (red) (color figure online)