Literature DB >> 27633353

Phylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking.

Marcin Bogusz1, Simon Whelan1.   

Abstract

Phylogenetic tree inference is a critical component of many systematic and evolutionary studies. The majority of these studies are based on the two-step process of multiple sequence alignment followed by tree inference, despite persistent evidence that the alignment step can lead to biased results. Here we present a two-part study that first presents PaHMM-Tree, a novel neighbor joining-based method that estimates pairwise distances without assuming a single alignment. We then use simulations to benchmark its performance against a wide-range of other phylogenetic tree inference methods, including the first comparison of alignment-free distance-based methods against more conventional tree estimation methods. Our new method for calculating pairwise distances based on statistical alignment provides distance estimates that are as accurate as those obtained using standard methods based on the true alignment. Pairwise distance estimates based on the two-step process tend to be substantially less accurate. This improved performance carries through to tree inference, where PaHMM-Tree provides more accurate tree estimates than all of the pairwise distance methods assessed. For close to moderately divergent sequence data we find that the two-step methods using statistical inference, where information from all sequences is included in the estimation procedure, tend to perform better than PaHMM-Tree, particularly full statistical alignment, which simultaneously estimates both the tree and the alignment. For deep divergences we find the alignment step becomes so prone to error that our distance-based PaHMM-Tree outperforms all other methods of tree inference. Finally, we find that the accuracy of alignment-free methods tends to decline faster than standard two-step methods in the presence of alignment uncertainty, and identify no conditions where alignment-free methods are equal to or more accurate than standard phylogenetic methods even in the presence of substantial alignment error. [Alignment-free; distance-based phylogenetics; pair Hidden Markov Models; phylogenetic inference; statistical alignment.].
© The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Mesh:

Year:  2017        PMID: 27633353     DOI: 10.1093/sysbio/syw074

Source DB:  PubMed          Journal:  Syst Biol        ISSN: 1063-5157            Impact factor:   15.683


  10 in total

1.  Automated Removal of Non-homologous Sequence Stretches with PREQUAL.

Authors:  Iker Irisarri; Fabien Burki; Simon Whelan
Journal:  Methods Mol Biol       Date:  2021

2.  A laid-back trip through the Hennigian Forests.

Authors:  Christopher Dell; Laura Schroder; Evgeny V Mavrodiev
Journal:  PeerJ       Date:  2017-07-21       Impact factor: 2.984

3.  Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation.

Authors:  Eli Levy Karin; Dafna Shkedy; Haim Ashkenazy; Reed A Cartwright; Tal Pupko
Journal:  Genome Biol Evol       Date:  2017-05-01       Impact factor: 3.416

4.  String kernels for protein sequence comparisons: improved fold recognition.

Authors:  Saghi Nojoomi; Patrice Koehl
Journal:  BMC Bioinformatics       Date:  2017-02-28       Impact factor: 3.169

5.  Comparative Genomics Reveals Thousands of Novel Chemosensory Genes and Massive Changes in Chemoreceptor Repertories across Chelicerates.

Authors:  Joel Vizueta; Julio Rozas; Alejandro Sánchez-Gracia
Journal:  Genome Biol Evol       Date:  2018-04-01       Impact factor: 3.416

6.  Machine learning based imputation techniques for estimating phylogenetic trees from incomplete distance matrices.

Authors:  Ananya Bhattacharjee; Md Shamsuzzoha Bayzid
Journal:  BMC Genomics       Date:  2020-07-20       Impact factor: 3.969

7.  Druggability for COVID-19: in silico discovery of potential drug compounds against nucleocapsid (N) protein of SARS-CoV-2.

Authors:  Manisha Ray; Saurav Sarkar; Surya Narayan Rath
Journal:  Genomics Inform       Date:  2020-12-09

8.  Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model.

Authors:  Metin Balaban; Nishat Anjum Bristy; Ahnaf Faisal; Md Shamsuzzoha Bayzid; Siavash Mirarab
Journal:  Bioinform Adv       Date:  2022-08-12

9.  A fast and efficient algorithm for DNA sequence similarity identification.

Authors:  Machbah Uddin; Mohammad Khairul Islam; Md Rakib Hassan; Farah Jahan; Joong Hwan Baek
Journal:  Complex Intell Systems       Date:  2022-08-23

Review 10.  An in silico analysis of acquired antimicrobial resistance genes in Aeromonas plasmids.

Authors:  Ogueri Nwaiwu; Chiugo Claret Aduba
Journal:  AIMS Microbiol       Date:  2020-03-16
  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.