Literature DB >> 15155796

On inconsistency of the neighbor-joining, least squares, and minimum evolution estimation when substitution processes are incorrectly modeled.

Edward Susko1, Yuji Inagaki, Andrew J Roger.   

Abstract

Using analytical methods, we show that under a variety of model misspecifications, Neighbor-Joining, minimum evolution, and least squares estimation procedures are statistically inconsistent. Failure to correctly account for differing rates-across-sites processes, failure to correctly model rate matrix parameters, and failure to adjust for parallel rates-across-sites changes (a rates-across-subtrees process) are all shown to lead to a "long branch attraction" form of inconsistency. In addition, failure to account for rates-across-sites processes is also shown to result in underestimation of evolutionary distances for a wide variety of substitution models, generalizing an earlier analytical result for the Jukes-Cantor model reported in Golding and a similar bias result for the GTR or REV model in Kelly and Rice (1996). Although standard rates-across-sites models can be employed in many of these cases to restore consistency, current models cannot account for other kinds of misspecification. We examine an idealized but biologically relevant case, where parallel changes in rates at sites across subtrees is shown to give rise to inconsistency. This changing rates-across-subtrees type model misspecification cannot be adjusted for with conventional methods or without carefully considering the rate variation in the larger tree. The results are presented for four-taxon trees, but the expectation is that they have implications for larger trees as well. To illustrate this, a simulated 42-taxon example is given in which the microsporidia, an enigmatic group of eukaryotes, are incorrectly placed at the archaebacteria-eukaryotes split because of incorrectly specified pairwise distances. The analytical nature of the results lend insight into the reasons that long branch attraction tends to be a common form of inconsistency and reasons that other forms of inconsistency like "long branches repel" can arise in some settings. In many of the cases of inconsistency presented, a particular incorrect topology is estimated with probability converging to one, the implication being that measures of uncertainty like bootstrap support will be unable to detect that there is a problem with the estimation. The focus is on distance methods, but previous simulation results suggest that the zones of inconsistency for distance methods contain the zones of inconsistency for maximum likelihood methods as well.

Entities:  

Mesh:

Year:  2004        PMID: 15155796     DOI: 10.1093/molbev/msh159

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


  13 in total

1.  Biases in phylogenetic estimation can be caused by random sequence segments.

Authors:  Edward Susko; Mathew Spencer; Andrew J Roger
Journal:  J Mol Evol       Date:  2005-07-21       Impact factor: 2.395

Review 2.  The origin and diversification of eukaryotes: problems with molecular phylogenetics and molecular clock estimation.

Authors:  Andrew J Roger; Laura A Hug
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2006-06-29       Impact factor: 6.237

3.  Topological estimation biases with covarion evolution.

Authors:  Huai-Chun Wang; Edward Susko; Matthew Spencer; Andrew J Roger
Journal:  J Mol Evol       Date:  2007-12-14       Impact factor: 2.395

4.  Characterization and phylogenetic analysis of a cnidarian LMP X-like cDNA.

Authors:  Larry J Dishaw; Manuel L Herrera; Charles H Bigger
Journal:  Immunogenetics       Date:  2006-03-22       Impact factor: 2.846

5.  Combinatorics of distance-based tree inference.

Authors:  Fabio Pardi; Olivier Gascuel
Journal:  Proc Natl Acad Sci U S A       Date:  2012-09-25       Impact factor: 11.205

6.  On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic inference.

Authors:  Alexis Criscuolo
Journal:  F1000Res       Date:  2020-11-10

7.  Multidimensional scaling reveals the main evolutionary pathways of class A G-protein-coupled receptors.

Authors:  Julien Pelé; Hervé Abdi; Matthieu Moreau; David Thybert; Marie Chabbert
Journal:  PLoS One       Date:  2011-04-22       Impact factor: 3.240

Review 8.  The evolution of HIV: inferences using phylogenetics.

Authors:  Eduardo Castro-Nallar; Marcos Pérez-Losada; Gregory F Burton; Keith A Crandall
Journal:  Mol Phylogenet Evol       Date:  2011-11-27       Impact factor: 4.286

9.  Analyses of Twelve New Whole Genome Sequences of Cassava Brown Streak Viruses and Ugandan Cassava Brown Streak Viruses from East Africa: Diversity, Supercomputing and Evidence for Further Speciation.

Authors:  Joseph Ndunguru; Peter Sseruwagi; Fred Tairo; Francesca Stomeo; Solomon Maina; Appolinaire Djikeng; Appolinaire Djinkeng; Monica Kehoe; Laura M Boykin
Journal:  PLoS One       Date:  2015-10-06       Impact factor: 3.240

10.  Visualizing differences in phylogenetic information content of alignments and distinction of three classes of long-branch effects.

Authors:  Johann Wolfgang Wägele; Christoph Mayer
Journal:  BMC Evol Biol       Date:  2007-08-28       Impact factor: 3.260

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.