Literature DB >> 9514730

Empirical statistical estimates for sequence similarity searches.

W R Pearson1.   

Abstract

The FASTA package of sequence comparison programs has been modified to provide accurate statistical estimates for local sequence similarity scores with gaps. These estimates are derived using the extreme value distribution from the mean and variance of the local similarity scores of unrelated sequences after the scores have been corrected for the expected effect of library sequence length. This approach allows accurate estimates to be calculated for both FASTA and Smith-Waterman similarity scores for protein/protein, DNA/DNA, and protein/translated-DNA comparisons. The accuracy of the statistical estimates is summarized for 54 protein families using FASTA and Smith-Waterman scores. Probability estimates calculated from the distribution of similarity scores are generally conservative, as are probabilities calculated using the Altschul-Gish lambda, kappa, and eta parameters. The performance of several alternative methods for correcting similarity scores for library-sequence length was evaluated using 54 protein superfamilies from the PIR39 database and 110 protein families from the Prosite/SwissProt rel. 34 database. Both regression-scaled and Altschul-Gish scaled scores perform significantly better than unscaled Smith-Waterman or FASTA similarity scores. When the Prosite/ SwissProt test set is used, regression-scaled scores perform slightly better; when the PIR database is used, Altschul-Gish scaled scores perform best. Thus, length-corrected similarity scores improve the sensitivity of database searches. Statistical parameters that are derived from the distribution of similarity scores from the thousands of unrelated sequences typically encountered in a database search provide accurate estimates of statistical significance that can be used to infer sequence homology.

Entities:  

Mesh:

Year:  1998        PMID: 9514730     DOI: 10.1006/jmbi.1997.1525

Source DB:  PubMed          Journal:  J Mol Biol        ISSN: 0022-2836            Impact factor:   5.469


  67 in total

1.  The estimation of statistical parameters for local alignment score distributions.

Authors:  S F Altschul; R Bundschuh; R Olsen; T Hwa
Journal:  Nucleic Acids Res       Date:  2001-01-15       Impact factor: 16.971

2.  Comparison of sequence profiles. Strategies for structural predictions using sequence information.

Authors:  L Rychlewski; L Jaroszewski; W Li; A Godzik
Journal:  Protein Sci       Date:  2000-02       Impact factor: 6.725

3.  Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins.

Authors:  R Jansen; M Gerstein
Journal:  Nucleic Acids Res       Date:  2000-03-15       Impact factor: 16.971

Review 4.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.

Authors:  A A Schäffer; L Aravind; T L Madden; S Shavirin; J L Spouge; Y I Wolf; E V Koonin; S F Altschul
Journal:  Nucleic Acids Res       Date:  2001-07-15       Impact factor: 16.971

5.  BALSA: Bayesian algorithm for local sequence alignment.

Authors:  Bobbie-Jo M Webb; Jun S Liu; Charles E Lawrence
Journal:  Nucleic Acids Res       Date:  2002-03-01       Impact factor: 16.971

6.  Use of residue pairs in protein sequence-sequence and sequence-structure alignments.

Authors:  J Jung; B Lee
Journal:  Protein Sci       Date:  2000-08       Impact factor: 6.725

7.  Database searching by flexible protein structure alignment.

Authors:  Yuzhen Ye; Adam Godzik
Journal:  Protein Sci       Date:  2004-07       Impact factor: 6.725

8.  Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs.

Authors:  Bastien Chevreux; Thomas Pfisterer; Bernd Drescher; Albert J Driesel; Werner E G Müller; Thomas Wetter; Sándor Suhai
Journal:  Genome Res       Date:  2004-05-12       Impact factor: 9.043

9.  Sensitivity and selectivity in protein structure comparison.

Authors:  Michael L Sierk; William R Pearson
Journal:  Protein Sci       Date:  2004-03       Impact factor: 6.725

10.  Extraction of tentative mobile introns in fungal histone genes.

Authors:  Hiromi Nishida; Choong-Soo Yun
Journal:  Mob Genet Elements       Date:  2011-05
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.