Literature DB >> 14734312

Comparative evaluation of word composition distances for the recognition of SCOP relationships.

Susana Vinga1, Rodrigo Gouveia-Oliveira, Jonas S Almeida.   

Abstract

MOTIVATION: Alignment-free metrics were recently reviewed by the authors, but have not until now been object of a comparative study. This paper compares the classification accuracy of word composition metrics therein reviewed. It also presents a new distance definition between protein sequences, the W-metric, which bridges between alignment metrics, such as scores produced by the Smith-Waterman algorithm, and methods based solely in L-tuple composition, such as Euclidean distance and Information content.
RESULTS: The comparative study reported here used the SCOP/ASTRAL protein structure hierarchical database and accessed the discriminant value of alternative sequence dissimilarity measures by calculating areas under the Receiver Operating Characteristic curves. Although alignment methods resulted in very good classification accuracy at family and superfamily levels, alignment-free distances, in particular Standard Euclidean Distance, are as good as alignment algorithms when sequence similarity is smaller, such as for recognition of fold or class relationships. This observation justifies its advantageous use to pre-filter homologous proteins since word statistics techniques are computed much faster than the alignment methods. AVAILABILITY: All MATLAB code used to generate the data is available upon request to the authors. Additional material available at http://bioinformatics.musc.edu/wmetric

Mesh:

Substances:

Year:  2004        PMID: 14734312     DOI: 10.1093/bioinformatics/btg392

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  14 in total

1.  Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences.

Authors:  Marika Kaden; Katrin Sophie Bohnsack; Mirko Weber; Mateusz Kudła; Kaja Gutowska; Jacek Blazewicz; Thomas Villmann
Journal:  Neural Comput Appl       Date:  2021-04-27       Impact factor: 5.606

2.  Automatic structure classification of small proteins using random forest.

Authors:  Pooja Jain; Jonathan D Hirst
Journal:  BMC Bioinformatics       Date:  2010-07-01       Impact factor: 3.169

3.  Biological sequences as pictures: a generic two dimensional solution for iterated maps.

Authors:  Jonas S Almeida; Susana Vinga
Journal:  BMC Bioinformatics       Date:  2009-03-31       Impact factor: 3.169

4.  Computing distribution of scale independent motifs in biological sequences.

Authors:  Jonas S Almeida; Susana Vinga
Journal:  Algorithms Mol Biol       Date:  2006-10-18       Impact factor: 1.405

5.  Fast algorithms for computing sequence distances by exhaustive substring composition.

Authors:  Alberto Apostolico; Olgert Denas
Journal:  Algorithms Mol Biol       Date:  2008-10-28       Impact factor: 1.405

6.  The effectiveness of position- and composition-specific gap costs for protein similarity searches.

Authors:  Aleksandar Stojmirović; E Michael Gertz; Stephen F Altschul; Yi-Kuo Yu
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

7.  Word decoding of protein amino Acid sequences with availability analysis: a linguistic approach.

Authors:  Kenta Motomura; Tomohiro Fujita; Motosuke Tsutsumi; Satsuki Kikuzato; Morikazu Nakamura; Joji M Otaki
Journal:  PLoS One       Date:  2012-11-21       Impact factor: 3.240

8.  Pattern-based phylogenetic distance estimation and tree reconstruction.

Authors:  Michael Höhl; Isidore Rigoutsos; Mark A Ragan
Journal:  Evol Bioinform Online       Date:  2007-02-25       Impact factor: 1.625

9.  Comparison study on k-word statistical measures for protein: from sequence to 'sequence space'.

Authors:  Qi Dai; Tianming Wang
Journal:  BMC Bioinformatics       Date:  2008-09-23       Impact factor: 3.169

Review 10.  Alignment-free sequence comparison: benefits, applications, and tools.

Authors:  Andrzej Zielezinski; Susana Vinga; Jonas Almeida; Wojciech M Karlowski
Journal:  Genome Biol       Date:  2017-10-03       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.