Literature DB >> 29568323

OCTAL: Optimal Completion of gene trees in polynomial time.

Sarah Christensen1, Erin K Molloy1, Pranjal Vachaspati1, Tandy Warnow1.   

Abstract

BACKGROUND: For a combination of reasons (including data generation protocols, approaches to taxon and gene sampling, and gene birth and loss), estimated gene trees are often incomplete, meaning that they do not contain all of the species of interest. As incomplete gene trees can impact downstream analyses, accurate completion of gene trees is desirable.
RESULTS: We introduce the Optimal Tree Completion problem, a general optimization problem that involves completing an unrooted binary tree (i.e., adding missing leaves) so as to minimize its distance from a reference tree on a superset of the leaves. We present OCTAL, an algorithm that finds an optimal solution to this problem when the distance between trees is defined using the Robinson-Foulds (RF) distance, and we prove that OCTAL runs in [Formula: see text] time, where n is the total number of species. We report on a simulation study in which gene trees can differ from the species tree due to incomplete lineage sorting, and estimated gene trees are completed using OCTAL with a reference tree based on a species tree estimated from the multi-locus dataset. OCTAL produces completed gene trees that are closer to the true gene trees than an existing heuristic approach in ASTRAL-II, but the accuracy of a completed gene tree computed by OCTAL depends on how topologically similar the reference tree (typically an estimated species tree) is to the true gene tree.
CONCLUSIONS: OCTAL is a useful technique for adding missing taxa to incomplete gene trees and provides good accuracy under a wide range of model conditions. However, results show that OCTAL's accuracy can be reduced when incomplete lineage sorting is high, as the reference tree can be far from the true gene tree. Hence, this study suggests that OCTAL would benefit from using other types of reference trees instead of species trees when there are large topological distances between true gene trees and species trees.

Entities:  

Keywords:  Gene trees; Missing data; Multispecies coalescent; Phylogenomics; Species trees

Year:  2018        PMID: 29568323      PMCID: PMC5853121          DOI: 10.1186/s13015-018-0124-5

Source DB:  PubMed          Journal:  Algorithms Mol Biol        ISSN: 1748-7188            Impact factor:   1.405


  24 in total

1.  QDist--quartet distance between evolutionary trees.

Authors:  Thomas Mailund; Christian N S Pedersen
Journal:  Bioinformatics       Date:  2004-02-12       Impact factor: 6.937

2.  A metric for phylogenetic trees based on matching.

Authors:  Yu Lin; Vaibhav Rajan; Bernard M E Moret
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2012 Jul-Aug       Impact factor: 3.710

3.  DendroPy: a Python library for phylogenetic computing.

Authors:  Jeet Sukumaran; Mark T Holder
Journal:  Bioinformatics       Date:  2010-04-25       Impact factor: 6.937

4.  Networks: expanding evolutionary thinking.

Authors:  Eric Bapteste; Leo van Iersel; Axel Janke; Scot Kelchner; Steven Kelk; James O McInerney; David A Morrison; Luay Nakhleh; Mike Steel; Leen Stougie; James Whitfield
Journal:  Trends Genet       Date:  2013-06-11       Impact factor: 11.639

5.  Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent.

Authors:  Sebastien Roch; Mike Steel
Journal:  Theor Popul Biol       Date:  2014-12-26       Impact factor: 1.570

6.  Split Probabilities and Species Tree Inference Under the Multispecies Coalescent Model.

Authors:  Elizabeth S Allman; James H Degnan; John A Rhodes
Journal:  Bull Math Biol       Date:  2017-11-10       Impact factor: 1.758

7.  ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes.

Authors:  Siavash Mirarab; Tandy Warnow
Journal:  Bioinformatics       Date:  2015-06-15       Impact factor: 6.937

8.  Efficient Gene Tree Correction Guided by Genome Evolution.

Authors:  Emmanuel Noutahi; Magali Semeria; Manuel Lafond; Jonathan Seguin; Bastien Boussau; Laurent Guéguen; Nadia El-Mabrouk; Eric Tannier
Journal:  PLoS One       Date:  2016-08-11       Impact factor: 3.240

9.  SimPhy: Phylogenomic Simulation of Gene, Locus, and Species Trees.

Authors:  Diego Mallo; Leonardo De Oliveira Martins; David Posada
Journal:  Syst Biol       Date:  2015-11-01       Impact factor: 15.683

10.  ASTRID: Accurate Species TRees from Internode Distances.

Authors:  Pranjal Vachaspati; Tandy Warnow
Journal:  BMC Genomics       Date:  2015-10-02       Impact factor: 3.969

View more
  5 in total

1.  PhyloMissForest: a random forest framework to construct phylogenetic trees with missing data.

Authors:  Diogo Pinheiro; Sergio Santander-Jimenéz; Aleksandar Ilic
Journal:  BMC Genomics       Date:  2022-05-18       Impact factor: 4.547

2.  Non-parametric correction of estimated gene trees using TRACTION.

Authors:  Sarah Christensen; Erin K Molloy; Pranjal Vachaspati; Ananya Yammanuru; Tandy Warnow
Journal:  Algorithms Mol Biol       Date:  2020-01-04       Impact factor: 1.405

3.  Machine learning based imputation techniques for estimating phylogenetic trees from incomplete distance matrices.

Authors:  Ananya Bhattacharjee; Md Shamsuzzoha Bayzid
Journal:  BMC Genomics       Date:  2020-07-20       Impact factor: 3.969

4.  Recombination-aware phylogeographic inference using the structured coalescent with ancestral recombination.

Authors:  Fangfang Guo; Ignazio Carbone; David A Rasmussen
Journal:  PLoS Comput Biol       Date:  2022-08-19       Impact factor: 4.779

5.  Forcing external constraints on tree inference using ASTRAL.

Authors:  Maryam Rabiee; Siavash Mirarab
Journal:  BMC Genomics       Date:  2020-04-16       Impact factor: 3.969

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.