Literature DB >> 35394192

A stochastic Farris transform for genetic data under the multispecies coalescent with applications to data requirements.

Gautam Dasarathy1, Elchanan Mossel2, Robert Nowak3, Sebastien Roch4.   

Abstract

Species tree estimation faces many significant hurdles. Chief among them is that the trees describing the ancestral lineages of each individual gene-the gene trees-often differ from the species tree. The multispecies coalescent is commonly used to model this gene tree discordance, at least when it is believed to arise from incomplete lineage sorting, a population-genetic effect. Another significant challenge in this area is that molecular sequences associated to each gene typically provide limited information about the gene trees themselves. While the modeling of sequence evolution by single-site substitutions is well-studied, few species tree reconstruction methods with theoretical guarantees actually address this latter issue. Instead, a standard-but unsatisfactory-assumption is that gene trees are perfectly reconstructed before being fed into a so-called summary method. Hence much remains to be done in the development of inference methodologies that rigorously account for gene tree estimation error-or completely avoid gene tree estimation in the first place. In previous work, a data requirement trade-off was derived between the number of loci m needed for an accurate reconstruction and the length of the locus sequences k. It was shown that to reconstruct an internal branch of length f, one needs m to be of the order of [Formula: see text]. That previous result was obtained under the restrictive assumption that mutation rates as well as population sizes are constant across the species phylogeny. Here we further generalize this result beyond this assumption. Our main contribution is a novel reduction to the molecular clock case under the multispecies coalescent, which we refer to as a stochastic Farris transform. As a corollary, we also obtain a new identifiability result of independent interest: for any species tree with [Formula: see text] species, the rooted topology of the species tree can be identified from the distribution of its unrooted weighted gene trees even in the absence of a molecular clock.
© 2022. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.

Entities:  

Keywords:  Coalescent; Data requirement; Distance methods; Gene tree/species tree; Phylogenetic reconstruction

Mesh:

Year:  2022        PMID: 35394192      PMCID: PMC9258723          DOI: 10.1007/s00285-022-01731-5

Source DB:  PubMed          Journal:  J Math Biol        ISSN: 0303-6812            Impact factor:   2.164


  31 in total

1.  Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent.

Authors:  Elizabeth S Allman; James H Degnan; John A Rhodes
Journal:  J Math Biol       Date:  2010-07-23       Impact factor: 2.259

2.  Robustness to divergence time underestimation when inferring species trees from estimated gene trees.

Authors:  Michael DeGiorgio; James H Degnan
Journal:  Syst Biol       Date:  2013-08-29       Impact factor: 15.683

3.  Statistical binning enables an accurate coalescent-based estimation of the avian tree.

Authors:  Siavash Mirarab; Md Shamsuzzoha Bayzid; Bastien Boussau; Tandy Warnow
Journal:  Science       Date:  2014-12-11       Impact factor: 47.728

4.  Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes, site-specific rate variation, and invariable sites.

Authors:  Julia Chifman; Laura Kubatko
Journal:  J Theor Biol       Date:  2015-03-17       Impact factor: 2.691

5.  Species Tree Inference from Gene Splits by Unrooted STAR Methods.

Authors:  Elizabeth S Allman; James H Degnan; John A Rhodes
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2016-08-31       Impact factor: 3.710

6.  Species tree estimation using Neighbor Joining.

Authors:  Joseph Rusinko; Matthew McPartlon
Journal:  J Theor Biol       Date:  2016-11-17       Impact factor: 2.691

7.  Quartet inference from SNP data under the coalescent model.

Authors:  Julia Chifman; Laura Kubatko
Journal:  Bioinformatics       Date:  2014-08-07       Impact factor: 6.937

8.  Evaluating Summary Methods for Multilocus Species Tree Estimation in the Presence of Incomplete Lineage Sorting.

Authors:  Siavash Mirarab; Md Shamsuzzoha Bayzid; Tandy Warnow
Journal:  Syst Biol       Date:  2014-08-26       Impact factor: 15.683

9.  The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors:  N Saitou; M Nei
Journal:  Mol Biol Evol       Date:  1987-07       Impact factor: 16.240

10.  Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses.

Authors:  Md Shamsuzzoha Bayzid; Siavash Mirarab; Bastien Boussau; Tandy Warnow
Journal:  PLoS One       Date:  2015-06-18       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.