Literature DB >> 16597244

The average common substring approach to phylogenomic reconstruction.

Igor Ulitsky1, David Burstein, Tamir Tuller, Benny Chor.   

Abstract

We describe a novel method for efficient reconstruction of phylogenetic trees, based on sequences of whole genomes or proteomes, whose lengths may greatly vary. The core of our method is a new measure of pairwise distances between sequences. This measure is based on computing the average lengths of maximum common substrings, which is intrinsically related to information theoretic tools (Kullback-Leibler relative entropy). We present an algorithm for efficiently computing these distances. In principle, the distance of two l long sequences can be calculated in O(l) time. We implemented the algorithm using suffix arrays our implementation is fast enough to enable the construction of the proteome phylogenomic tree for hundreds of species and the genome phylogenomic forest for almost two thousand viruses. An initial analysis of the results exhibits a remarkable agreement with "acceptable phylogenetic and taxonomic truth." To assess our approach, our results were compared to the traditional (single-gene or protein-based) maximum likelihood method. The obtained trees were compared to implementations of a number of alternative approaches, including two that were previously published in the literature, and to the published results of a third approach. Comparing their outcome and running time to ours, using a "traditional" trees and a standard tree comparison method, our algorithm improved upon the "competition" by a substantial margin. The simplicity and speed of our method allows for a whole genome analysis with the greatest scope attempted so far. We describe here five different applications of the method, which not only show the validity of the method, but also suggest a number of novel phylogenetic insights.

Mesh:

Substances:

Year:  2006        PMID: 16597244     DOI: 10.1089/cmb.2006.13.336

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  55 in total

1.  Forbidden penta-peptides.

Authors:  Tamir Tuller; Benny Chor; Nathan Nelson
Journal:  Protein Sci       Date:  2007-10       Impact factor: 6.725

2.  Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions.

Authors:  Gregory E Sims; Se-Ran Jun; Guohong A Wu; Sung-Hou Kim
Journal:  Proc Natl Acad Sci U S A       Date:  2009-02-02       Impact factor: 11.205

Review 3.  Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis.

Authors:  Oliver Bonham-Carter; Joe Steele; Dhundy Bastola
Journal:  Brief Bioinform       Date:  2013-07-31       Impact factor: 11.622

Review 4.  New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing.

Authors:  Kai Song; Jie Ren; Gesine Reinert; Minghua Deng; Michael S Waterman; Fengzhu Sun
Journal:  Brief Bioinform       Date:  2013-09-23       Impact factor: 11.622

5.  Prot-SpaM: fast alignment-free phylogeny reconstruction based on whole-proteome sequences.

Authors:  Chris-Andre Leimeister; Jendrik Schellhorn; Svenja Dörrer; Michael Gerth; Christoph Bleidorn; Burkhard Morgenstern
Journal:  Gigascience       Date:  2019-03-01       Impact factor: 6.524

6.  Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution.

Authors:  Se-Ran Jun; Gregory E Sims; Guohong A Wu; Sung-Hou Kim
Journal:  Proc Natl Acad Sci U S A       Date:  2009-12-14       Impact factor: 11.205

7.  Whole-proteome phylogeny of large dsDNA virus families by an alignment-free method.

Authors:  Guohong Albert Wu; Se-Ran Jun; Gregory E Sims; Sung-Hou Kim
Journal:  Proc Natl Acad Sci U S A       Date:  2009-06-24       Impact factor: 11.205

8.  Sequence Comparison Without Alignment: The SpaM Approaches.

Authors:  Burkhard Morgenstern
Journal:  Methods Mol Biol       Date:  2021

9.  Whole-proteome phylogeny of large dsDNA viruses and parvoviruses through a composition vector method related to dynamical language model.

Authors:  Zu-Guo Yu; Ka Hou Chu; Chi Pang Li; Vo Anh; Li-Qian Zhou; Roger Wei Wang
Journal:  BMC Evol Biol       Date:  2010-06-22       Impact factor: 3.260

10.  Fast algorithms for computing sequence distances by exhaustive substring composition.

Authors:  Alberto Apostolico; Olgert Denas
Journal:  Algorithms Mol Biol       Date:  2008-10-28       Impact factor: 1.405

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.