| Literature DB >> 27387364 |
Elizabeth S Allman1, John A Rhodes1, Seth Sullivant2.
Abstract
Frequencies of k-mers in sequences are sometimes used as a basis for inferring phylogenetic trees without first obtaining a multiple sequence alignment. We show that a standard approach of using the squared Euclidean distance between k-mer vectors to approximate a tree metric can be statistically inconsistent. To remedy this, we derive model-based distance corrections for orthologous sequences without gaps, which lead to consistent tree inference. The identifiability of model parameters from k-mer frequencies is also studied. Finally, we report simulations showing that the corrected distance outperforms many other k-mer methods, even when sequences are generated with an insertion and deletion process. These results have implications for multiple sequence alignment as well since k-mer methods are usually the first step in constructing a guide tree for such algorithms.Entities:
Keywords: alignment-free methods; k-mer; phylogenetic trees
Mesh:
Year: 2016 PMID: 27387364 DOI: 10.1089/cmb.2015.0216
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.479