Literature DB >> 7549879

Comparison of methods for searching protein sequence databases.

W R Pearson1.   

Abstract

We have compared commonly used sequence comparison algorithms, scoring matrices, and gap penalties using a method that identifies statistically significant differences in performance. Search sensitivity with either the Smith-Waterman algorithm or FASTA is significantly improved by using modern scoring matrices, such as BLOSUM45-55, and optimized gap penalties instead of the conventional PAM250 matrix. More dramatic improvement can be obtained by scaling similarity scores by the logarithm of the length of the library sequence (In()-scaling). With the best modern scoring matrix (BLOSUM55 or JO93) and optimal gap penalties (-12 for the first residue in the gap and -2 for additional residues), Smith-Waterman and FASTA performed significantly better than BLASTP. With In()-scaling and optimal scoring matrices (BLOSUM45 or Gonnet92) and gap penalties (-12, -1), the rigorous Smith-Waterman algorithm performs better than either BLASTP and FASTA, although with the Gonnet92 matrix the difference with FASTA was not significant. Ln()-scaling performed better than normalization based on other simple functions of library sequence length. Ln()-scaling also performed better than scores based on normalized variance, but the differences were not statistically significant for the BLOSUM50 and Gonnet92 matrices. Optimal scoring matrices and gap penalties are reported for Smith-Waterman and FASTA, using conventional or In()-scaled similarity scores. Searches with no penalty for gap extension, or no penalty for gap opening, or an infinite penalty for gaps performed significantly worse than the best methods. Differences in performance between FASTA and Smith-Waterman were not significant when partial query sequences were used. However, the best performance with complete query sequences was obtained with the Smith-Waterman algorithm and In()-scaling.

Entities:  

Mesh:

Substances:

Year:  1995        PMID: 7549879      PMCID: PMC2143149          DOI: 10.1002/pro.5560040613

Source DB:  PubMed          Journal:  Protein Sci        ISSN: 0961-8368            Impact factor:   6.725


  19 in total

1.  A platform for biological sequence comparison on parallel computers.

Authors:  A S Deshpande; D S Richards; W R Pearson
Journal:  Comput Appl Biosci       Date:  1991-04

2.  Dynamic programming algorithms for biological sequence comparison.

Authors:  W R Pearson; W Miller
Journal:  Methods Enzymol       Date:  1992       Impact factor: 1.600

3.  The rapid generation of mutation data matrices from protein sequences.

Authors:  D T Jones; W R Taylor; J M Thornton
Journal:  Comput Appl Biosci       Date:  1992-06

4.  Rapid and sensitive sequence comparison with FASTP and FASTA.

Authors:  W R Pearson
Journal:  Methods Enzymol       Date:  1990       Impact factor: 1.600

5.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

Authors:  S Karlin; S F Altschul
Journal:  Proc Natl Acad Sci U S A       Date:  1990-03       Impact factor: 11.205

6.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

7.  The significance of protein sequence similarities.

Authors:  J F Collins; A F Coulson; A Lyall
Journal:  Comput Appl Biosci       Date:  1988-03

8.  Identification of common molecular subsequences.

Authors:  T F Smith; M S Waterman
Journal:  J Mol Biol       Date:  1981-03-25       Impact factor: 5.469

9.  Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms.

Authors:  W R Pearson
Journal:  Genomics       Date:  1991-11       Impact factor: 5.736

10.  Amino acid substitution matrices from an information theoretic perspective.

Authors:  S F Altschul
Journal:  J Mol Biol       Date:  1991-06-05       Impact factor: 5.469

View more
  54 in total

Review 1.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.

Authors:  A A Schäffer; L Aravind; T L Madden; S Shavirin; J L Spouge; Y I Wolf; E V Koonin; S F Altschul
Journal:  Nucleic Acids Res       Date:  2001-07-15       Impact factor: 16.971

2.  BALSA: Bayesian algorithm for local sequence alignment.

Authors:  Bobbie-Jo M Webb; Jun S Liu; Charles E Lawrence
Journal:  Nucleic Acids Res       Date:  2002-03-01       Impact factor: 16.971

3.  Pcons: a neural-network-based consensus predictor that improves fold recognition.

Authors:  J Lundström; L Rychlewski; J Bujnicki; A Elofsson
Journal:  Protein Sci       Date:  2001-11       Impact factor: 6.725

4.  Fesselin: a novel synaptopodin-like actin binding protein from muscle tissue.

Authors:  B D Leinweber; R S Fredricksen; D R Hoffman; J M Chalovich
Journal:  J Muscle Res Cell Motil       Date:  1999-08       Impact factor: 2.698

5.  Sequence conserved for subcellular localization.

Authors:  Rajesh Nair; Burkhard Rost
Journal:  Protein Sci       Date:  2002-12       Impact factor: 6.725

6.  Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs.

Authors:  Bastien Chevreux; Thomas Pfisterer; Bernd Drescher; Albert J Driesel; Werner E G Müller; Thomas Wetter; Sándor Suhai
Journal:  Genome Res       Date:  2004-05-12       Impact factor: 9.043

7.  Sensitivity and selectivity in protein structure comparison.

Authors:  Michael L Sierk; William R Pearson
Journal:  Protein Sci       Date:  2004-03       Impact factor: 6.725

8.  LOESS correction for length variation in gene set-based genomic sequence analysis.

Authors:  Anton Aboukhalil; Martha L Bulyk
Journal:  Bioinformatics       Date:  2012-04-05       Impact factor: 6.937

9.  Defining parameters for homology-tolerant database searching.

Authors:  J P Kayser; J L Vallet; R L Cerny
Journal:  J Biomol Tech       Date:  2004-12

Review 10.  Protein database searches using compositionally adjusted substitution matrices.

Authors:  Stephen F Altschul; John C Wootton; E Michael Gertz; Richa Agarwala; Aleksandr Morgulis; Alejandro A Schäffer; Yi-Kuo Yu
Journal:  FEBS J       Date:  2005-10       Impact factor: 5.542

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.