Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Testing statistical significance scores of sequence comparison methods with structure similarity.

Literature DB >> 17038163

Testing statistical significance scores of sequence comparison methods with structure similarity.

Tim Hulsen¹, Jacob de Vlieg, Jack A M Leunissen, Peter M A Groenen.

Abstract

BACKGROUND: In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences.
RESULTS: All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores.
CONCLUSION: The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons.

Entities: CellLine Gene Species

Mesh：

Year: 2006 PMID： 17038163 PMCID： PMC1618413 DOI： 10.1186/1471-2105-7-444

Source DB: PubMed Journal: BMC Bioinformatics ISSN： 1471-2105 Impact factor: 3.169

25 in total

1. Significance of Z-value statistics of Smith-Waterman scores for protein alignments.

Authors: J P Comet; J C Aude; E Glémet; J L Risler; A Hénaut; P P Slonimski; J J Codani
Journal: Comput Chem Date: 1999-06-15

2. Meta-analysis of ROC curves.

Authors: A D Kester; F Buntinx
Journal: Med Decis Making Date: 2000 Oct-Dec Impact factor: 2.583

3. Empirical determination of effective gap penalties for sequence comparison.

Authors: J T Reese; W R Pearson
Journal: Bioinformatics Date: 2002-11 Impact factor: 6.937

4. Improvements to CluSTr: the database of SWISS-PROT+TrEMBL protein clusters.

Authors: E V Kriventseva; F Servant; R Apweiler
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

5. Fundamentals of massive automatic pairwise alignments of protein sequences: theoretical significance of Z-value statistics.

Authors: Olivier Bastien; Jean-Christophe Aude; Sylvaine Roy; Eric Maréchal
Journal: Bioinformatics Date: 2004-01-22 Impact factor: 6.937

6. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods.

Authors: J Park; K Karplus; C Barrett; R Hughey; D Haussler; T Hubbard; C Chothia
Journal: J Mol Biol Date: 1998-12-11 Impact factor: 5.469

7. Comparative accuracy of methods for protein sequence similarity search.

Authors: P Agarwal; D J States
Journal: Bioinformatics Date: 1998 Impact factor: 6.937

8. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.

Authors: S E Brenner; C Chothia; T J Hubbard
Journal: Proc Natl Acad Sci U S A Date: 1998-05-26 Impact factor: 11.205

9. Improved tools for biological sequence comparison.

Authors: W R Pearson; D J Lipman
Journal: Proc Natl Acad Sci U S A Date: 1988-04 Impact factor: 11.205

10. Benchmarking ortholog identification methods using functional genomics data.

Authors: Tim Hulsen; Martijn A Huynen; Jacob de Vlieg; Peter M A Groenen
Journal: Genome Biol Date: 2006-04-13 Impact factor: 13.583

4 in total

1. Searching for evolutionary distant RNA homologs within genomic sequences using partition function posterior probabilities.

Authors: Usman Roshan; Satish Chikkagoudar; Dennis R Livesay
Journal: BMC Bioinformatics Date: 2008-01-28 Impact factor: 3.169

2. Island method for estimating the statistical significance of profile-profile alignment scores.

Authors: Aleksandar Poleksic
Journal: BMC Bioinformatics Date: 2009-04-20 Impact factor: 3.169

3. Enhancing genetic mapping of complex genomes through the design of highly-multiplexed SNP arrays: application to the large and unsequenced genomes of white spruce and black spruce.

Authors: Nathalie Pavy; Betty Pelgas; Stéphanie Beauseigle; Sylvie Blais; France Gagnon; Isabelle Gosselin; Manuel Lamothe; Nathalie Isabel; Jean Bousquet
Journal: BMC Genomics Date: 2008-01-18 Impact factor: 3.969

4. Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.

Authors: Olivier Bastien; Eric Maréchal
Journal: BMC Bioinformatics Date: 2008-08-07 Impact factor: 3.169

4 in total