| Literature DB >> 24884964 |
Leo van Iersel1, Steven Kelk, Nela Lekić, Celine Scornavacca.
Abstract
BACKGROUND: Reticulate events play an important role in determining evolutionary relationships. The problem of computing the minimum number of such events to explain discordance between two phylogenetic trees is a hard computational problem. Even for binary trees, exact solvers struggle to solve instances with reticulation number larger than 40-50.Entities:
Mesh:
Year: 2014 PMID: 24884964 PMCID: PMC4023542 DOI: 10.1186/1471-2105-15-127
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Two binary trees and and the auxiliary graph . A maximum agreement forest M of T1 and T2 is obtained by deleting the dashed edges. Graph D can be made acyclic by deleting either both filled or both unfilled vertices. Hence, removing either v1 and v2 or v3 and v4 from M makes it an acyclic agreement forest for T1 and T2, see Lemma 3. The acyclic agreement forest M∖{v1,v2} obtained by removing v1 and v2 from M is depicted on the right.
Experimental results for instances with two binary trees
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Easy | 800 | 767 | 798 | 800 | 3 | - | - | 1.003 | 96.6% |
| Medium | 640 | 199 | 2572 | 613 | 212 | 27 | <1 | 1.002 | 97.5% |
| Hard | 640 | 0 | 3600 | 440 | 1271 | 200 | 1.5 | - | - |
The third column indicates for how many instances at least one exact algorithm finished within one hour. The fifth column indicates for how many instances the 2-approx option of CYCLEKILLER finished within one hour. For the remaining instances, the 4-approx option finished within one hour, as can be seen from the seventh column. The average running time for the 2-approx and the 4-approx in seconds are reported respectively in the sixth and eighth column. The average approximation ratio (ninth column) is taken over all instances for which at least one exact method finished. The last column indicates the percentage of those instances for which CYCLEKILLER found an optimal solution.
Summary of results for instances with one binary and one nonbinary tree
| | | | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 25% | Simple | 7.504 | 8.004 | 7.567 | 0.967 | 1.007 | 11.421 | 0.996 | 1.532 |
| | Tricky | 17.000 | 203.650 | 17.288 | 3.675 | 1.003 | 27.238 | 3.638 | 1.600 |
| 50% | Simple | 6.736 | 9.896 | 6.829 | 0.942 | 1.008 | 10.900 | 0.925 | 1.639 |
| | Tricky | 14.976 | 374.263 | 16.288 | 3.388 | 1.006 | 26.413 | 3.438 | 1.640 |
| 75% | Simple | 5.139 | 12.304 | 5.263 | 0.867 | 1.011 | 8.692 | 0.963 | 1.659 |
| | Tricky | 10.500 | 391.575 | 13.475 | 3.263 | 1.006 | 23.200 | 3.275 | 1.633 |
| Worst case | 20 | 600 | 22 | 15 | 1.75 | 37 | 13 | 3 | |
We list the average hybridization number found (opt and r(N)), the average running time in seconds (Time) and where applicable the average approximation ratio (Ratio) for the three algorithms.
Summary of results for instances with two nonbinary trees
| | | | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 25% | Simple | 7.168 | 12.971 | 7.240 | 43.967 | 1.032 | 16.338 | 2.463 | 2.343 |
| | Tricky | 16.148 | 279.100 | - | - | - | 35.638 | 7.000 | 2.193 |
| 50% | Simple | 5.933 | 11.150 | 5.900 | 41.325 | 1.030 | 13.721 | 2.004 | 2.405 |
| | Tricky | 13.216 | 379.238 | - | - | - | 32.363 | 7.200 | 2.331 |
| 75% | Simple | 3.654 | 1.121 | 3.729 | 4.208 | 1.015 | 9.075 | 1.483 | 2.590 |
| | Tricky | 8.672 | 183.150 | - | - | - | 21.950 | 5.800 | 2.294 |
| Worst case | 20 | 600 | 29 | 600 | 1.5 | 56 | 22 | 4 | |
The layout of the table is the same as that of Table 2.
Summary of results for dataset (204 gene trees) originally obtained from GreenPhylDB database
| Common taxa | 3 | 5.235 | 20 |
| 0 | 0.873 | 7 | |
| Ratio 4-approx | 1 | 1.002 | 1.2 |
| Ratio 6-approx | 1 | 1.088 | 3 |
| Gap (T-EST - MAF) | 0 | 0.010 | 1 |
| Gap (4-approx - MAF) | 0 | 0.020 | 2 |
| Time T-EST | 0 | 0.221 | 3 |
| Time 4-approx | 0 | 0.270 | 1 |
Common taxa is the number of taxa after restricting the gene tree and the species tree to common taxa. opt is the exact hybridization number, as computed by TERMINUSEST. Ratio 4-approx (resp. 6-approx) is the ratio of the solution obtained by NONBINARYCYCLEKILLER (running in 4-approx, resp. 6-approx mode) to the solution obtained by TERMINUSEST. Gap (T-EST - MAF) is the absolute gap between the optimum MAF solution (here computed with RSPR) and the exact hybridization number, as computed by TERMINUSEST. Gap (4-approx - MAF) is the absolute gap between the optimum MAF solution and the reticulation number of the solution generated by NONBINARYCYCLEKILLER running in its 4-approx mode. Time T-EST is the running time (in seconds) of TERMINUSEST, and Time 4-approx is the running time (in seconds) of NONBINARYCYCLEKILLER running in its 4-approx mode. In 202 instances TERMINUSEST returned the same size solution as RSPR, in 202 cases TERMINUSEST returned the same size solution as NONBINARYCYCLEKILLER (running in 4-approx mode), and in 201 cases NONBINARYCYCLEKILLER (running in 4-approx mode) returned the same size solution as RSPR.
Summary of results for dataset (1003 gene trees) originally obtained from GreenPhylDB database
| Common taxa | 3 | 11.704 | 22 |
| 0 | 2.854 | 10 | |
| Ratio 4-approx | 1 | 1.025 | 2 |
| Ratio 6-approx | 1 | 1.264 | 3 |
| Gap (T-EST - MAF) | 0 | 0.048 | 1 |
| Gap (4-approx - MAF) | 0 | 0.165 | 3 |
| Time T-EST | 0 | 0.576 | 7 |
| Time 4-approx | 0 | 0.605 | 3 |
In 955 instances TERMINUSEST returned the same size solution as RSPR, in 911 cases TERMINUSEST returned the same size solution as NONBINARYCYCLEKILLER (running in 4-approx mode), and in 880 cases NONBINARYCYCLEKILLER (running in 4-approx mode) returned the same size solution as RSPR.
Summary of results for dataset (5924 gene trees) originally obtained from GreenPhylDB database
| Common taxa | 2 | 14.206 | 22 |
| 0 | 3.613 | 12 | |
| Ratio 4-approx | 1 | 1.027 | 2 |
| Ratio 6-approx | 1 | 1.277 | 3 |
| Gap (T-EST - MAF) | 0 | 0.065 | 2 |
| Gap (4-approx - MAF) | 0 | 0.195 | 4 |
| Time T-EST | 0 | 0.689 | 21 |
| Time 4-approx | 0 | 0.729 | 3 |
In 5553 instances TERMINUSEST returned the same size solution as RSPR, in 5297 cases TERMINUSEST returned the same size solution as NONBINARYCYCLEKILLER (running in 4-approx mode), and in 5030 cases NONBINARYCYCLEKILLER (running in 4-approx mode) returned the same size solution as RSPR.
Summary of results for dataset (5789 gene trees) originally obtained from GreenPhylDB database
| Common taxa | 3 | 17.319 | 22 |
| 0 | 1.560 | 12 | |
| Ratio 4-approx | 1 | 1.021 | 2 |
| Ratio 7-approx | 1 | 1.704 | 4 |
| Gap (T-EST - MAF) | 0 | 0.053 | 4 |
| Gap (4-approx - MAF) | 0 | 0.132 | 5 |
| Time T-EST | 0 | 0.422 | 15 |
| Time 4-approx | 0 | 1.182 | 14 |
In 5552 instances TERMINUSEST returned the same size solution as RSPR, in 5415 cases TERMINUSEST returned the same size solution as NONBINARYCYCLEKILLER (running in 4-approx mode), and in 5209 cases NONBINARYCYCLEKILLER (running in 4-approx mode) returned the same size solution as MAF. In this dataset the gene trees were also nonbinary, meaning that NONBINARYCYCLEKILLER had to use the MAF algorithm described in [21] instead of RSPR.