| Literature DB >> 28185570 |
Lawrence H Uricchio1, Tandy Warnow2, Noah A Rosenberg3.
Abstract
BACKGROUND: Many methods for species tree inference require data from a sufficiently large sample of genomic loci in order to produce accurate estimates. However, few studies have attempted to use analytical theory to quantify "sufficiently large".Entities:
Keywords: Bipartitions; Coalescent; Gene trees; Species trees
Mesh:
Year: 2016 PMID: 28185570 PMCID: PMC5123308 DOI: 10.1186/s12859-016-1266-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Schematic of a species tree (black) and two gene trees (blue, green). Coalescent events in a gene tree are constrained to occur only once lineages are present in the same population. The red dashed line indicates a species tree bipartition AB |CD, separating species A and B from species C and D. The same bipartition occurs in the blue gene tree; by contrast, the green gene tree does not contain this bipartition, instead containing AD |BC
Fig. 2Two species tree topologies with four taxa. a Symmetric topology. b Asymmetric topology. Times T 1 and T 2 denote the species tree internal branch lengths
Fig. 3The probability (P ) that a random set of n gene trees under the multispecies coalescent is a bipartition cover of a four-taxon asymmetric species tree, as a function of n. Points represent the exact probability computed at each n, for several values of T 1 (Eq. 5)
Fig. 4Upper bound on the number of gene trees required for a random set of n gene trees to have probability at least q of being a bipartition cover of a k-taxon species tree with smallest internal branch length T min. The plot uses Eq. 14. a q=0.99. b q=0.99999. The maximal number of independent gene trees in a genome is on the order of 104 to 105
Fig. 5The ratio of the upper bound on the minimum number of gene trees required to obtain a bipartition cover with probability q (Eq. 14) to the corresponding number of simulated gene trees required to obtain a bipartition cover with probability q. The ratio is plotted as a function of q, for several values of the number of species k. a T min=0.2. b T min=0.5. c T min=1.0. The y-axis is plotted on a logarithmic scale. Irregular spacing of q values is a result of our simulation procedure, in which each q is determined from 104 simulations at a fixed n in the set {1,2,3,5,10,20,50,100,200,500}. Note that for some large values of n at a fixed T min, all 104 simulations produced a bipartition cover, meaning that . In these cases, n computed from Eq. 14 is infinite and we do not plot
Fig. 6T min under the Yule pure birth process for speciation at rate λ speciation events per coalescent time unit. a as a function of the number of species k. The y-axis is plotted on a logarithmic scale. b The number of gene trees n required in Eq. 14 for obtaining with probability q all species tree bipartitions in a gene tree set, as a function of values from a. The value of q is fixed at 0.99. Note that the maximal number of independent gene trees in a genome is approximately 104 to 105