Literature DB >> 15290766

A comprehensive whole genome bacterial phylogeny using correlated peptide motifs defined in a high dimensional vector space.

Gary W Stuart1, Michael W Berry.   

Abstract

As whole genome sequences continue to expand in number and complexity, effective methods for comparing and categorizing both genes and species represented within extremely large datasets are required. Methods introduced to date have generally utilized incomplete and likely insufficient subsets of the available data. We have developed an accurate and efficient method for producing robust gene and species phylogenies using very large whole genome protein datasets. This method relies on multidimensional protein vector definitions supplied by the singular value decomposition (SVD) of a large sparse data matrix in which each protein is uniquely represented as a vector of overlapping tetrapeptide frequencies. Quantitative pairwise estimates of species similarity were obtained by summing the protein vectors to form species vectors, then determining the cosines of the angles between species vectors. Evolutionary trees produced using this method confirmed many accepted prokaryotic relationships. However, several unconventional relationships were also noted. In addition, we demonstrate that many of the SVD-derived right basis vectors represent particular conserved protein families, while many of the corresponding left basis vectors describe conserved motifs within these families as sets of correlated peptides (copeps). This analysis represents the most detailed simultaneous comparison of prokaryotic genes and species available to date.

Entities:  

Mesh:

Substances:

Year:  2003        PMID: 15290766     DOI: 10.1142/s0219720003000265

Source DB:  PubMed          Journal:  J Bioinform Comput Biol        ISSN: 0219-7200            Impact factor:   1.122


  5 in total

Review 1.  Empirical distributional semantics: methods and biomedical applications.

Authors:  Trevor Cohen; Dominic Widdows
Journal:  J Biomed Inform       Date:  2009-02-14       Impact factor: 6.317

2.  An SVD-based comparison of nine whole eukaryotic genomes supports a coelomate rather than ecdysozoan lineage.

Authors:  Gary W Stuart; Michael W Berry
Journal:  BMC Bioinformatics       Date:  2004-12-17       Impact factor: 3.169

3.  Is multiple-sequence alignment required for accurate inference of phylogeny?

Authors:  Michael Höhl; Mark A Ragan
Journal:  Syst Biol       Date:  2007-04       Impact factor: 15.683

4.  Whole genome phylogenies for multiple Drosophila species.

Authors:  Arun Seetharam; Gary W Stuart
Journal:  BMC Res Notes       Date:  2012-12-04

5.  Co-phylog: an assembly-free phylogenomic approach for closely related organisms.

Authors:  Huiguang Yi; Li Jin
Journal:  Nucleic Acids Res       Date:  2013-01-18       Impact factor: 16.971

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.