Literature DB >> 30344357

Consistency and convergence rate of phylogenetic inference via regularization.

Vu Dinh1, Lam Si Tung Ho2, Marc A Suchard3, Frederick A Matsen1.   

Abstract

It is common in phylogenetics to have some, perhaps partial, information about the overall evolutionary tree of a group of organisms and wish to find an evolutionary tree of a specific gene for those organisms. There may not be enough information in the gene sequences alone to accurately reconstruct the correct "gene tree." Although the gene tree may deviate from the "species tree" due to a variety of genetic processes, in the absence of evidence to the contrary it is parsimonious to assume that they agree. A common statistical approach in these situations is to develop a likelihood penalty to incorporate such additional information. Recent studies using simulation and empirical data suggest that a likelihood penalty quantifying concordance with a species tree can significantly improve the accuracy of gene tree reconstruction compared to using sequence data alone. However, the consistency of such an approach has not yet been established, nor have convergence rates been bounded. Because phylogenetics is a non-standard inference problem, the standard theory does not apply. In this paper, we propose a penalized maximum likelihood estimator for gene tree reconstruction, where the penalty is the square of the Billera-Holmes-Vogtmann geodesic distance from the gene tree to the species tree. We prove that this method is consistent, and derive its convergence rate for estimating the discrete gene tree structure and continuous edge lengths (representing the amount of evolution that has occurred on that branch) simultaneously. We find that the regularized estimator is "adaptive fast converging," meaning that it can reconstruct all edges of length greater than any given threshold from gene sequences of polynomial length. Our method does not require the species tree to be known exactly; in fact, our asymptotic theory holds for any such guide tree.

Entities:  

Keywords:  gene tree; maximum likelihood estimator; phylogenetics; regularization; species tree; tree reconstruction

Year:  2018        PMID: 30344357      PMCID: PMC6191858          DOI: 10.1214/17-AOS1592

Source DB:  PubMed          Journal:  Ann Stat        ISSN: 0090-5364            Impact factor:   4.028


  22 in total

1.  Slicing hyperdimensional oranges: the geometry of phylogenetic estimation.

Authors:  J Kim
Journal:  Mol Phylogenet Evol       Date:  2000-10       Impact factor: 4.286

2.  Genome-scale approaches to resolving incongruence in molecular phylogenies.

Authors:  Antonis Rokas; Barry L Williams; Nicole King; Sean B Carroll
Journal:  Nature       Date:  2003-10-23       Impact factor: 49.962

3.  On the impossibility of reconstructing ancestral data and phylogenies.

Authors:  Elchanan Mossel
Journal:  J Comput Biol       Date:  2003       Impact factor: 1.479

4.  A fast algorithm for computing geodesic distances in tree space.

Authors:  Megan Owen; J Scott Provan
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2011 Jan-Mar       Impact factor: 3.710

5.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions.

Authors:  Liang Liu; Dennis K Pearl
Journal:  Syst Biol       Date:  2007-06       Impact factor: 15.683

Review 6.  Genomes as documents of evolutionary history.

Authors:  Bastien Boussau; Vincent Daubin
Journal:  Trends Ecol Evol       Date:  2009-10-31       Impact factor: 17.712

7.  On the Robustness to Gene Tree Estimation Error (or lack thereof) of Coalescent-Based Species Tree Methods.

Authors:  Sebastien Roch; Tandy Warnow
Journal:  Syst Biol       Date:  2015-03-25       Impact factor: 15.683

8.  Joint amalgamation of most parsimonious reconciled gene trees.

Authors:  Celine Scornavacca; Edwin Jacox; Gergely J Szöllősi
Journal:  Bioinformatics       Date:  2014-11-06       Impact factor: 6.937

9.  Bayesian inference of species trees from multilocus data.

Authors:  Joseph Heled; Alexei J Drummond
Journal:  Mol Biol Evol       Date:  2009-11-11       Impact factor: 16.240

10.  TreeFix: statistically informed gene tree error correction using species trees.

Authors:  Yi-Chieh Wu; Matthew D Rasmussen; Mukul S Bansal; Manolis Kellis
Journal:  Syst Biol       Date:  2012-09-04       Impact factor: 15.683

View more
  2 in total

1.  ESTIMATION OF CELL LINEAGE TREES BY MAXIMUM-LIKELIHOOD PHYLOGENETICS.

Authors:  Jean Feng; William S Dewitt; Aaron McKenna; Noah Simon; Amy D Willis; Frederick A Matsen
Journal:  Ann Appl Stat       Date:  2021-03-18       Impact factor: 1.959

2.  Nonbifurcating Phylogenetic Tree Inference via the Adaptive LASSO.

Authors:  Cheng Zhang; V U Dinh; Frederick A Matsen
Journal:  J Am Stat Assoc       Date:  2020-07-20       Impact factor: 5.033

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.