| Literature DB >> 27663499 |
Pranjal Vachaspati1, Tandy Warnow1,2,3.
Abstract
Motivation: The estimation of phylogenetic trees is a major part of many biological dataset analyses, but maximum likelihood approaches are NP-hard and Bayesian MCMC methods do not scale well to even moderate-sized datasets. Supertree methods, which are used to construct trees from trees computed on subsets, are critically important tools for enabling the statistical estimation of phylogenies for large and potentially heterogeneous datasets. Supertree estimation is itself NP-hard, and no current supertree method has sufficient accuracy and scalability to provide good accuracy on the large datasets that supertree methods were designed for, containing thousands of species and many subset trees.Entities:
Mesh:
Year: 2017 PMID: 27663499 PMCID: PMC5870905 DOI: 10.1093/bioinformatics/btw600
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Average Robinson-Foulds Supertree criterion scores on the simulated datasets; lower is better
| Method | 100 | 100 | 100 | 100 | 500 | 500 | 500 | 500 | 1000 | 1000 | 1000 | 1000 |
| Scaffold % | 20 | 50 | 75 | 100 | 20 | 50 | 75 | 100 | 20 | 50 | 75 | 100 |
| # Replicates | 9 | 10 | 10 | 10 | 8 | 10 | 10 | 10 | 10 | 10 | 10 | 10 |
| ASTRAL | 32 | 31 | 38 | 45 | 170 | 190 | 225 | 274 | 365 | 414 | 502 | 591 |
| ASTRAL-enhanced | 32 | 30 | 38 | 45 | 163 | 182 | 221 | 274 | 337 | 393 | 491 | 591 |
| ASTRID | 40 | 41 | 50 | 41 | 360 | 914 | 905 | 223 | 1066 | 2447 | 2370 | 470 |
| MRL | 30 | 30 | 36 | 42 | 158 | 179 | 202 | 223 | 309 | 362 | 412 | 474 |
| MulRF | 32 | 34 | 38 | 282 | 315 | 279 | 229 | – | – | – | – | |
| PluMiST | 31 | 29 | 210 | 245 | 246 | 214 | – | – | – | – | ||
| FastRFS-basic | 29 | 152 | 173 | 191 | 209 | 325 | 366 | 394 | 434 | |||
| FastRFS-enhanced |
No results shown for the 1000-taxon datasets for MulRF and PluMiST, due to time constraints; otherwise, results are shown for those datasets for which all methods completed. The best result shown for a given model condition is boldfaced.
Fig. 1RFS criterion scores on biological data of supertree methods; lower is better. MulRF and PluMiST could not be run on the CPL dataset due to its large size; hence no values are shown for those methods on that dataset. Overall, FastRFS-enhanced produces the best RFS criterion scores on these datasets
Supertree topology estimation error on simulated datasets, measured using the Robinson-Foulds error rate, expressed as a percentage
| Method | 100 | 100 | 100 | 100 | 500 | 500 | 500 | 500 | 1000 | 1000 | 1000 | 1000 |
| Scaffold % | 20 | 50 | 75 | 100 | 20 | 50 | 75 | 100 | 20 | 50 | 75 | 100 |
| # Replicates | 9 | 10 | 10 | 10 | 8 | 10 | 10 | 10 | 10 | 10 | 10 | 10 |
| ASTRAL | 14.0 | 11.6 | 10.0 | 15.3 | 14.8 | 12.7 | 11.2 | 16.9 | 15.7 | 13.6 | 11.6 | |
| ASTRAL-enhanced | 11.8 | 11.5 | 10.0 | 14.8 | 14.1 | 12.6 | 11.2 | 13.5 | 11.6 | |||
| ASTRID | 15.8 | 18.7 | 17.1 | 9.6 | 26.0 | 50.1 | 45.4 | 35.6 | 58.1 | 52.0 | ||
| MRL | 13.6 | 13.6 | 11.2 | 10.8 | 15.4 | 14.3 | 12.1 | 11.2 | 17.4 | 13.5 | 12.2 | |
| MulRF | 22.1 | 26.0 | 15.3 | 9.3 | 46.9 | 40.3 | 27.4 | 12.6 | – | – | – | – |
| PluMiST | 25.9 | 16.6 | 11.5 | 9.3 | 35.4 | 29.5 | 22.4 | 10.9 | – | – | – | – |
| FastRFS-basic | 13.5 | 14.3 | 14.5 | 14.3 | 12.4 | 11.1 | 17.3 | 15.6 | 13.5 | 12.0 | ||
| FastRFS-enhanced | 13.5 | 13.4 | 10.6 | 9.3 | 10.8 | 16.7 | 11.8 |
The best result for each model condition is boldfaced. No results are shown for PluMiST or MulRF on the 1000-taxon simulated datasets due to running time limitations for these methods. Results are averaged over the completed replicates.
Fig. 2Sequential running times (in seconds) on biological data of supertree methods. MulRF and PluMiST could not be run on the CPL dataset, due to its large size; hence no values are shown for those methods on that dataset