Literature DB >> 25085999

Alignment errors strongly impact likelihood-based tests for comparing topologies.

Eli Levy Karin1, Edward Susko2, Tal Pupko3.   

Abstract

Estimating phylogenetic trees from sequence data is an extremely challenging and important statistical task. Within the maximum-likelihood paradigm, the best tree is a point estimate. To determine how strongly the data support such an evolutionary scenario, a hypothesis testing methodology is required. To this end, the Kishino-Hasegawa (KH) test was developed to determine whether one topology is significantly more supported by the sequence data than another one. This test and its derivatives are widely used in phylogenetics and phylogenomics. Here, we show that the KH test is biased in the presence of alignment error and can lead to erroneous conclusions. Using simulations we demonstrated that due to alignment errors the KH test often rejects one of the competing topologies, even though both topologies are equally supported by the data. Specifically, we show that the KH test favors the guide tree used to align the analyzed sequences. Further, branch length optimization renders the test too conservative. We propose two possible corrections for these biases. First, we evaluated the impact of removing unreliable alignment columns and found out that it decreases the bias at the cost of substantially reducing the test's power. Second, we developed a parametric test that entirely abolishes the biases without data filtering. This test incorporates the alignment construction step into the test's hypothesis, thus removing the above guide tree effect. We extend this methodology for the case of multiple-topology comparisons and demonstrate the applicability of the new methodology on an exemplary data set.
© The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Keywords:  KH test; SOWH test; alignment; alignment uncertainty; branch length optimization; likelihood; phylogeny; tree comparisons

Mesh:

Year:  2014        PMID: 25085999     DOI: 10.1093/molbev/msu231

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


  13 in total

1.  Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets.

Authors:  Michael Nute; Ehsan Saleh; Tandy Warnow
Journal:  Syst Biol       Date:  2019-05-01       Impact factor: 15.683

2.  Optimization of sequence alignments according to the number of sequences vs. number of sites trade-off.

Authors:  Julien Y Dutheil; Emeric Figuet
Journal:  BMC Bioinformatics       Date:  2015-06-09       Impact factor: 3.169

3.  Evidence of Statistical Inconsistency of Phylogenetic Methods in the Presence of Multiple Sequence Alignment Uncertainty.

Authors:  A S Md Mukarram Hossain; Benjamin P Blackburne; Abhijeet Shah; Simon Whelan
Journal:  Genome Biol Evol       Date:  2015-07-01       Impact factor: 3.416

4.  GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters.

Authors:  Itamar Sela; Haim Ashkenazy; Kazutaka Katoh; Tal Pupko
Journal:  Nucleic Acids Res       Date:  2015-04-16       Impact factor: 16.971

5.  Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs.

Authors:  Joseph L Herman; Ádám Novák; Rune Lyngsø; Adrienn Szabó; István Miklós; Jotun Hein
Journal:  BMC Bioinformatics       Date:  2015-04-01       Impact factor: 3.169

6.  Indel reliability in indel-based phylogenetic inference.

Authors:  Haim Ashkenazy; Ofir Cohen; Tal Pupko; Dorothée Huchon
Journal:  Genome Biol Evol       Date:  2014-11-18       Impact factor: 3.416

7.  SpartaABC: a web server to simulate sequences with indel parameters inferred using an approximate Bayesian computation algorithm.

Authors:  Haim Ashkenazy; Eli Levy Karin; Zach Mertens; Reed A Cartwright; Tal Pupko
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

8.  Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation.

Authors:  Eli Levy Karin; Dafna Shkedy; Haim Ashkenazy; Reed A Cartwright; Tal Pupko
Journal:  Genome Biol Evol       Date:  2017-05-01       Impact factor: 3.416

9.  Divergence and adaptive evolution of the gibberellin oxidase genes in plants.

Authors:  Yuan Huang; Xi Wang; Song Ge; Guang-Yuan Rao
Journal:  BMC Evol Biol       Date:  2015-09-29       Impact factor: 3.260

10.  Inferring Indel Parameters using a Simulation-based Approach.

Authors:  Eli Levy Karin; Avigayel Rabin; Haim Ashkenazy; Dafna Shkedy; Oren Avram; Reed A Cartwright; Tal Pupko
Journal:  Genome Biol Evol       Date:  2015-11-03       Impact factor: 3.416

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.