Literature DB >> 11836217

Integrated gene and species phylogenies from unaligned whole genome protein sequences.

Gary W Stuart1, Karen Moffett, Steve Baker.   

Abstract

MOTIVATION: Most molecular phylogenies are based on sequence alignments. Consequently, they fail to account for modes of sequence evolution that involve frequent insertions or deletions. Here we present a method for generating accurate gene and species phylogenies from whole genome sequence that makes use of short character string matches not placed within explicit alignments. In this work, the singular value decomposition of a sparse tetrapeptide frequency matrix is used to represent the proteins of organisms uniquely and precisely as vectors in a high-dimensional space. Vectors of this kind can be used to calculate pairwise distance values based on the angle separating the vectors, and the resulting distance values can be used to generate phylogenetic trees. Protein trees so derived can be examined directly for homologous sequences. Alternatively, vectors defining each of the proteins within an organism can be summed to provide a vector representation of the organism, which is then used to generate species trees.
RESULTS: Using a large mitochondrial genome dataset, we have produced species trees that are largely in agreement with previously published trees based on the analysis of identical datasets using different methods. These trees also agree well with currently accepted phylogenetic theory. In principle, our method could be used to compare much larger bacterial or nuclear genomes in full molecular detail, ultimately allowing accurate gene and species relationships to be derived from a comprehensive comparison of complete genomes. In contrast to phylogenetic methods based on alignments, sequences that evolve by relative insertion or deletion would tend to remain recognizably similar.

Mesh:

Substances:

Year:  2002        PMID: 11836217     DOI: 10.1093/bioinformatics/18.1.100

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  28 in total

1.  Metagenomic Classification Using an Abstraction Augmented Markov Model.

Authors:  Xiujun Sylvia Zhu; Monnie McGee
Journal:  J Comput Biol       Date:  2015-11-30       Impact factor: 1.479

2.  Phylogeny of prokaryotes and chloroplasts revealed by a simple composition approach on all protein sequences from complete genomes without sequence alignment.

Authors:  Z G Yu; L Q Zhou; V V Anh; K H Chu; S C Long; J Q Deng
Journal:  J Mol Evol       Date:  2005-04       Impact factor: 2.395

3.  The distribution of word matches between Markovian sequences with periodic boundary conditions.

Authors:  Conrad J Burden; Paul Leopardi; Sylvain Forêt
Journal:  J Comput Biol       Date:  2013-10-26       Impact factor: 1.479

4.  Fast and robust multiple sequence alignment with phylogeny-aware gap placement.

Authors:  Adam M Szalkowski
Journal:  BMC Bioinformatics       Date:  2012-06-13       Impact factor: 3.169

5.  Proper distance metrics for phylogenetic analysis using complete genomes without sequence alignment.

Authors:  Zu-Guo Yu; Xiao-Wen Zhan; Guo-Sheng Han; Roger W Wang; Vo Anh; Ka Hou Chu
Journal:  Int J Mol Sci       Date:  2010-03-18       Impact factor: 5.923

6.  A novel hierarchical clustering algorithm for gene sequences.

Authors:  Dan Wei; Qingshan Jiang; Yanjie Wei; Shengrui Wang
Journal:  BMC Bioinformatics       Date:  2012-07-23       Impact factor: 3.169

7.  N-gram analysis of 970 microbial organisms reveals presence of biological language models.

Authors:  Hatice Ulku Osmanbeyoglu; Madhavi K Ganapathiraju
Journal:  BMC Bioinformatics       Date:  2011-01-10       Impact factor: 3.169

8.  A novel alignment-free method for comparing transcription factor binding site motifs.

Authors:  Minli Xu; Zhengchang Su
Journal:  PLoS One       Date:  2010-01-20       Impact factor: 3.240

9.  A hybrid distance measure for clustering expressed sequence tags originating from the same gene family.

Authors:  Keng-Hoong Ng; Chin-Kuan Ho; Somnuk Phon-Amnuaisuk
Journal:  PLoS One       Date:  2012-10-11       Impact factor: 3.240

10.  Whole genome phylogenies for multiple Drosophila species.

Authors:  Arun Seetharam; Gary W Stuart
Journal:  BMC Res Notes       Date:  2012-12-04
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.