Literature DB >> 27387364

Statistically Consistent k-mer Methods for Phylogenetic Tree Reconstruction.

Elizabeth S Allman1, John A Rhodes1, Seth Sullivant2.   

Abstract

Frequencies of k-mers in sequences are sometimes used as a basis for inferring phylogenetic trees without first obtaining a multiple sequence alignment. We show that a standard approach of using the squared Euclidean distance between k-mer vectors to approximate a tree metric can be statistically inconsistent. To remedy this, we derive model-based distance corrections for orthologous sequences without gaps, which lead to consistent tree inference. The identifiability of model parameters from k-mer frequencies is also studied. Finally, we report simulations showing that the corrected distance outperforms many other k-mer methods, even when sequences are generated with an insertion and deletion process. These results have implications for multiple sequence alignment as well since k-mer methods are usually the first step in constructing a guide tree for such algorithms.

Entities:  

Keywords:  alignment-free methods; k-mer; phylogenetic trees

Mesh:

Year:  2016        PMID: 27387364     DOI: 10.1089/cmb.2015.0216

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  6 in total

1.  rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison.

Authors:  Lars Hahn; Chris-André Leimeister; Rachid Ounit; Stefano Lonardi; Burkhard Morgenstern
Journal:  PLoS Comput Biol       Date:  2016-10-19       Impact factor: 4.475

2.  An efficient strategy using k-mers to analyse 16S rRNA sequences.

Authors:  Marcel Martínez-Porchas; Francisco Vargas-Albores
Journal:  Heliyon       Date:  2017-07-27

3.  kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity.

Authors:  Kevin D Murray; Christfried Webers; Cheng Soon Ong; Justin Borevitz; Norman Warthmann
Journal:  PLoS Comput Biol       Date:  2017-09-05       Impact factor: 4.475

4.  Phylogenetic Networks as Circuits With Resistance Distance.

Authors:  Stefan Forcey; Drew Scalzo
Journal:  Front Genet       Date:  2020-10-15       Impact factor: 4.599

5.  Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model.

Authors:  Metin Balaban; Nishat Anjum Bristy; Ahnaf Faisal; Md Shamsuzzoha Bayzid; Siavash Mirarab
Journal:  Bioinform Adv       Date:  2022-08-12

6.  Exploring lateral genetic transfer among microbial genomes using TF-IDF.

Authors:  Yingnan Cong; Yao-Ban Chan; Mark A Ragan
Journal:  Sci Rep       Date:  2016-07-25       Impact factor: 4.379

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.