Literature DB >> 26256643

Twisted trees and inconsistency of tree estimation when gaps are treated as missing data - The impact of model mis-specification in distance corrections.

Emily Jane McTavish1, Mike Steel2, Mark T Holder3.   

Abstract

Statistically consistent estimation of phylogenetic trees or gene trees is possible if pairwise sequence dissimilarities can be converted to a set of distances that are proportional to the true evolutionary distances. Susko et al. (2004) reported some strikingly broad results about the forms of inconsistency in tree estimation that can arise if corrected distances are not proportional to the true distances. They showed that if the corrected distance is a concave function of the true distance, then inconsistency due to long branch attraction will occur. If these functions are convex, then two "long branch repulsion" trees will be preferred over the true tree - though these two incorrect trees are expected to be tied as the preferred true. Here we extend their results, and demonstrate the existence of a tree shape (which we refer to as a "twisted Farris-zone" tree) for which a single incorrect tree topology will be guaranteed to be preferred if the corrected distance function is convex. We also report that the standard practice of treating gaps in sequence alignments as missing data is sufficient to produce non-linear corrected distance functions if the substitution process is not independent of the insertion/deletion process. Taken together, these results imply inconsistent tree inference under mild conditions. For example, if some positions in a sequence are constrained to be free of substitutions and insertion/deletion events while the remaining sites evolve with independent substitutions and insertion/deletion events, then the distances obtained by treating gaps as missing data can support an incorrect tree topology even given an unlimited amount of data.
Copyright © 2015 Elsevier Inc. All rights reserved.

Keywords:  Deletion; Distance methods; Gaps as missing data; Insertion; Invariant sites; Phylogenetics

Mesh:

Year:  2015        PMID: 26256643     DOI: 10.1016/j.ympev.2015.07.027

Source DB:  PubMed          Journal:  Mol Phylogenet Evol        ISSN: 1055-7903            Impact factor:   4.286


  5 in total

1.  A 250 plastome phylogeny of the grass family (Poaceae): topological support under different data partitions.

Authors:  Jeffery M Saarela; Sean V Burke; William P Wysocki; Matthew D Barrett; Lynn G Clark; Joseph M Craine; Paul M Peterson; Robert J Soreng; Maria S Vorontsova; Melvin R Duvall
Journal:  PeerJ       Date:  2018-02-02       Impact factor: 2.984

2.  Accurate Inference of Tree Topologies from Multiple Sequence Alignments Using Deep Learning.

Authors:  Anton Suvorov; Joshua Hochuli; Daniel R Schrider
Journal:  Syst Biol       Date:  2020-03-01       Impact factor: 15.683

3.  On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic inference.

Authors:  Alexis Criscuolo
Journal:  F1000Res       Date:  2020-11-10

4.  Maximum Likelihood Phylogenetic Inference is Consistent on Multiple Sequence Alignments, with or without Gaps.

Authors:  Jakub Truszkowski; Nick Goldman
Journal:  Syst Biol       Date:  2015-11-28       Impact factor: 15.683

5.  Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation.

Authors:  Eli Levy Karin; Dafna Shkedy; Haim Ashkenazy; Reed A Cartwright; Tal Pupko
Journal:  Genome Biol Evol       Date:  2017-05-01       Impact factor: 3.416

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.