Literature DB >> 17038163

Testing statistical significance scores of sequence comparison methods with structure similarity.

Tim Hulsen1, Jacob de Vlieg, Jack A M Leunissen, Peter M A Groenen.   

Abstract

BACKGROUND: In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences.
RESULTS: All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores.
CONCLUSION: The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons.

Entities:  

Mesh:

Year:  2006        PMID: 17038163      PMCID: PMC1618413          DOI: 10.1186/1471-2105-7-444

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  25 in total

1.  Significance of Z-value statistics of Smith-Waterman scores for protein alignments.

Authors:  J P Comet; J C Aude; E Glémet; J L Risler; A Hénaut; P P Slonimski; J J Codani
Journal:  Comput Chem       Date:  1999-06-15

2.  Meta-analysis of ROC curves.

Authors:  A D Kester; F Buntinx
Journal:  Med Decis Making       Date:  2000 Oct-Dec       Impact factor: 2.583

3.  Empirical determination of effective gap penalties for sequence comparison.

Authors:  J T Reese; W R Pearson
Journal:  Bioinformatics       Date:  2002-11       Impact factor: 6.937

4.  Improvements to CluSTr: the database of SWISS-PROT+TrEMBL protein clusters.

Authors:  E V Kriventseva; F Servant; R Apweiler
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

5.  Fundamentals of massive automatic pairwise alignments of protein sequences: theoretical significance of Z-value statistics.

Authors:  Olivier Bastien; Jean-Christophe Aude; Sylvaine Roy; Eric Maréchal
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

6.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods.

Authors:  J Park; K Karplus; C Barrett; R Hughey; D Haussler; T Hubbard; C Chothia
Journal:  J Mol Biol       Date:  1998-12-11       Impact factor: 5.469

7.  Comparative accuracy of methods for protein sequence similarity search.

Authors:  P Agarwal; D J States
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

8.  Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.

Authors:  S E Brenner; C Chothia; T J Hubbard
Journal:  Proc Natl Acad Sci U S A       Date:  1998-05-26       Impact factor: 11.205

9.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

10.  Benchmarking ortholog identification methods using functional genomics data.

Authors:  Tim Hulsen; Martijn A Huynen; Jacob de Vlieg; Peter M A Groenen
Journal:  Genome Biol       Date:  2006-04-13       Impact factor: 13.583

View more
  4 in total

1.  Searching for evolutionary distant RNA homologs within genomic sequences using partition function posterior probabilities.

Authors:  Usman Roshan; Satish Chikkagoudar; Dennis R Livesay
Journal:  BMC Bioinformatics       Date:  2008-01-28       Impact factor: 3.169

2.  Island method for estimating the statistical significance of profile-profile alignment scores.

Authors:  Aleksandar Poleksic
Journal:  BMC Bioinformatics       Date:  2009-04-20       Impact factor: 3.169

3.  Enhancing genetic mapping of complex genomes through the design of highly-multiplexed SNP arrays: application to the large and unsequenced genomes of white spruce and black spruce.

Authors:  Nathalie Pavy; Betty Pelgas; Stéphanie Beauseigle; Sylvie Blais; France Gagnon; Isabelle Gosselin; Manuel Lamothe; Nathalie Isabel; Jean Bousquet
Journal:  BMC Genomics       Date:  2008-01-18       Impact factor: 3.969

4.  Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.

Authors:  Olivier Bastien; Eric Maréchal
Journal:  BMC Bioinformatics       Date:  2008-08-07       Impact factor: 3.169

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.