Literature DB >> 14729922

Local homology recognition and distance measures in linear time using compressed amino acid alphabets.

Robert C Edgar1.   

Abstract

Methods for discovery of local similarities and estimation of evolutionary distance by identifying k-mers (contiguous subsequences of length k) common to two sequences are described. Given unaligned sequences of length L, these methods have O(L) time complexity. The ability of compressed amino acid alphabets to extend these techniques to distantly related proteins was investigated. The performance of these algorithms was evaluated for different alphabets and choices of k using a test set of 1848 pairs of structurally alignable sequences selected from the FSSP database. Distance measures derived from k-mer counting were found to correlate well with percentage identity derived from sequence alignments. Compressed alphabets were seen to improve performance in local similarity discovery, but no evidence was found of improvements when applied to distance estimates. The performance of our local similarity discovery method was compared with the fast Fourier transform (FFT) used in MAFFT, which has O(L log L) time complexity. The method for achieving comparable coverage to FFT is revealed here, and is more than an order of magnitude faster. We suggest using k-mer distance for fast, approximate phylogenetic tree construction, and show that a speed improvement of more than three orders of magnitude can be achieved relative to standard distance methods, which require alignments.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 14729922      PMCID: PMC373290          DOI: 10.1093/nar/gkh180

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  18 in total

1.  Optimized representations and maximal information in proteins.

Authors:  A D Solis; S Rackovsky
Journal:  Proteins       Date:  2000-02-01

2.  Amino acid substitution matrices from protein blocks.

Authors:  S Henikoff; J G Henikoff
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

3.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

4.  The rapid generation of mutation data matrices from protein sequences.

Authors:  D T Jones; W R Taylor; J M Thornton
Journal:  Comput Appl Biosci       Date:  1992-06

5.  Touring protein fold space with Dali/FSSP.

Authors:  L Holm; C Sander
Journal:  Nucleic Acids Res       Date:  1998-01-01       Impact factor: 16.971

6.  Discovering empirically conserved amino acid substitution groups in databases of protein families.

Authors:  T D Wu; D L Brutlag
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1996

7.  The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors:  N Saitou; M Nei
Journal:  Mol Biol Evol       Date:  1987-07       Impact factor: 16.240

8.  The classification of amino acid conservation.

Authors:  W R Taylor
Journal:  J Theor Biol       Date:  1986-03-21       Impact factor: 2.691

9.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Authors:  J D Thompson; D G Higgins; T J Gibson
Journal:  Nucleic Acids Res       Date:  1994-11-11       Impact factor: 16.971

10.  Amino acid substitution matrices from an information theoretic perspective.

Authors:  S F Altschul
Journal:  J Mol Biol       Date:  1991-06-05       Impact factor: 5.469

View more
  32 in total

1.  MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Authors:  Robert C Edgar
Journal:  Nucleic Acids Res       Date:  2004-03-19       Impact factor: 16.971

2.  IRBIS: a systematic search for conserved complementarity.

Authors:  Dmitri D Pervouchine
Journal:  RNA       Date:  2014-08-20       Impact factor: 4.942

3.  Which Species of Coconut Moth Occurs in Brazil: Atheloca subrufella vs. Atheloca bondari (Lepidoptera: Pyralidae)?

Authors:  A A Paz-Neto; M T S Freitas; M G C Gondim; J W S Melo; R B Querino; V Q Balbino
Journal:  Neotrop Entomol       Date:  2019-08-26       Impact factor: 1.434

4.  SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier.

Authors:  Xiao Hu; Iddo Friedberg
Journal:  Gigascience       Date:  2019-10-01       Impact factor: 6.524

5.  Evidence for widespread association of mammalian splicing and conserved long-range RNA structures.

Authors:  Dmitri D Pervouchine; Ekaterina E Khrameeva; Marina Yu Pichugina; Oleksii V Nikolaienko; Mikhail S Gelfand; Petr M Rubtsov; Andrei A Mironov
Journal:  RNA       Date:  2011-11-29       Impact factor: 4.942

6.  Optimal neighborhood indexing for protein similarity search.

Authors:  Pierre Peterlongo; Laurent Noé; Dominique Lavenier; Van Hoa Nguyen; Gregory Kucherov; Mathieu Giraud
Journal:  BMC Bioinformatics       Date:  2008-12-16       Impact factor: 3.169

7.  Cgaln: fast and space-efficient whole-genome alignment.

Authors:  Ryuichiro Nakato; Osamu Gotoh
Journal:  BMC Bioinformatics       Date:  2010-04-30       Impact factor: 3.169

8.  Cloud-Coffee: implementation of a parallel consistency-based multiple alignment algorithm in the T-Coffee package and its benchmarking on the Amazon Elastic-Cloud.

Authors:  Paolo Di Tommaso; Miquel Orobitg; Fernando Guirado; Fernado Cores; Toni Espinosa; Cedric Notredame
Journal:  Bioinformatics       Date:  2010-07-06       Impact factor: 6.937

Review 9.  Histidine phosphotransfer proteins in fungal two-component signal transduction pathways.

Authors:  Jan S Fassler; Ann H West
Journal:  Eukaryot Cell       Date:  2013-06-14

10.  Fast algorithms for computing sequence distances by exhaustive substring composition.

Authors:  Alberto Apostolico; Olgert Denas
Journal:  Algorithms Mol Biol       Date:  2008-10-28       Impact factor: 1.405

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.