| Literature DB >> 27307621 |
Abstract
MOTIVATION: Gene tree represents the evolutionary history of gene lineages that originate from multiple related populations. Under the multispecies coalescent model, lineages may coalesce outside the species (population) boundary. Given a species tree (with branch lengths), the gene tree probability is the probability of observing a specific gene tree topology under the multispecies coalescent model. There are two existing algorithms for computing the exact gene tree probability. The first algorithm is due to Degnan and Salter, where they enumerate all the so-called coalescent histories for the given species tree and the gene tree topology. Their algorithm runs in exponential time in the number of gene lineages in general. The second algorithm is the STELLS algorithm (2012), which is usually faster but also runs in exponential time in almost all the cases.Entities:
Mesh:
Year: 2016 PMID: 27307621 PMCID: PMC4908345 DOI: 10.1093/bioinformatics/btw261
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.A gene tree (in thin lines) in the species tree (in thick lines) shown in (a). Gene lineages a1, a2 and a3 originate from species A, b1 and b2 from B and c1 from C. The species tree T is shown separately in (b), so is the gene tree T in (c). Internal nodes of both T and T are labeled. Coalescents: internal nodes of T. Branches of trees are represented by their lower nodes
The list of all compact coalescent histories for the trees T and T in Figure 1
| CCH | #histories | |
|---|---|---|
| {3,2,1,5} | 1 | |
| {3,2,1,4} | 2 | |
| {3,2,1,3} | 2 | |
| {3,2,1,2} | 1 | |
| {2,2,1,4} | 1 | |
| {2,2,1,3} | 2 | |
| {2,2,1,2} | 1 |
CCH: compact coalescent history (the numbers are the numbers of upper lineage counts for species tree branches A, B, C and D). For each compact history, we give the number of coalescent histories that are merged into this compact history (denoted as #histories).
Running time of CompactCH (outside the parenthesis) and the STELLS algorithm (inside parenthesis) for computing gene tree probability for 500 simulated gene trees
| 1 | 2 | 5 | 10 | 15 | 20 | 50 | 100 | |
|---|---|---|---|---|---|---|---|---|
| 2 | <1 (<1) | <1 (<1) | 1 ( | 3 ( | 20 (453) | 94 (21 516) | 21 974 (—) | 1 166 613 (—) |
| 3 | <1 (<1) | 1 (<1) | 11 ( | 2062 (18747) | 83 272 (—) | — (—) | — (—) | — (—) |
| 4 | <1 ( | 1 ( | 1634 (218) | — (—) | — (—) | — (—) | — (—) | — (—) |
Results are not given if it takes longer than 15 days. m: number of populations. Columns: g, the number of gene alleles per population. Time: in seconds.
Running time of CompactCH (outside the parenthesis) and the STELLS algorithm (inside parenthesis) for computing gene tree probability for 50 simulated gene trees and g = 1 (i.e. 1 allele per population)
| 2 | 3 | 4 | 5 | 10 | 15 |
|---|---|---|---|---|---|
| <1 (<1) | <1 (<1) | <1 (<1) | <1 (<1) | 3 ( | 208 ( |
Columns: number of populations. Time: in seconds.
Accuracy and time for inferring population trees using pairwise population distances
| Ht | Accuracy/Time | |||||
|---|---|---|---|---|---|---|
| 2 | 4 | 10 | 20 | 30 | ||
| 0.1 | Inf. error | 0.38 (0.47) STELLS: 0.30 | 0.20 (0.34) | 0.20 (0.24) | 0.11 (0.18) | 0.11 (0.16) |
| Time | 8 s (44 h,18 m, 8 s) | 33 s | 21 m, 49 s | 7 h, 2 m, 4 s | 36 h, 42 m, 49 s | |
| 0.5 | Inf. error | 0.23 (0.22) STELLS:0.14 | 0.20 (0.15) | 0.10 (0.12) | 0.12 (0.14) | 0.12 (0.13) |
| Time | 6 s (25 h, 30 m, 37 s) | 29 s | 7 m, 18 s | 1 h, 33 m, 5 s | 5 h, 56 m, 55 s |
Average over 50 replicates. Eight populations. Inference error: normalized RF distance. Population tree height (Ht): 0.1 or 0.5. 100 loci. g: number of haplotypes per population. Time: in seconds (s), minutes (m) and hours (h). Results for TreeMix are inside the parentheses. The original STELLS is only feasible for g = 2 and so only the results for the g = 2 case are provided for the original STELLS.
Fig. 2.The inferred population tree from ten populations in the 1000 Genomes Project using 20 haplotypes from then individual per population. Branch length shown is the estimated time in coalescent units