Literature DB >> 9600919

Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.

S E Brenner1, C Chothia, T J Hubbard.   

Abstract

Pairwise sequence comparison methods have been assessed using proteins whose relationships are known reliably from their structures and functions, as described in the SCOP database [Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia C. (1995) J. Mol. Biol. 247, 536-540]. The evaluation tested the programs BLAST [Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990). J. Mol. Biol. 215, 403-410], WU-BLAST2 [Altschul, S. F. & Gish, W. (1996) Methods Enzymol. 266, 460-480], FASTA [Pearson, W. R. & Lipman, D. J. (1988) Proc. Natl. Acad. Sci. USA 85, 2444-2448], and SSEARCH [Smith, T. F. & Waterman, M. S. (1981) J. Mol. Biol. 147, 195-197] and their scoring schemes. The error rate of all algorithms is greatly reduced by using statistical scores to evaluate matches rather than percentage identity or raw scores. The E-value statistical scores of SSEARCH and FASTA are reliable: the number of false positives found in our tests agrees well with the scores reported. However, the P-values reported by BLAST and WU-BLAST2 exaggerate significance by orders of magnitude. SSEARCH, FASTA ktup = 1, and WU-BLAST2 perform best, and they are capable of detecting almost all relationships between proteins whose sequence identities are >30%. For more distantly related proteins, they do much less well; only one-half of the relationships between proteins with 20-30% identity are found. Because many homologs have low sequence similarity, most distant relationships cannot be detected by any pairwise comparison method; however, those which are identified may be used with confidence.

Entities:  

Mesh:

Substances:

Year:  1998        PMID: 9600919      PMCID: PMC27587          DOI: 10.1073/pnas.95.11.6073

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  34 in total

1.  Amino acid substitution matrices from protein blocks.

Authors:  S Henikoff; J G Henikoff
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

2.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

3.  Database of homology-derived protein structures and the structural meaning of sequence alignment.

Authors:  C Sander; R Schneider
Journal:  Proteins       Date:  1991

4.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

Authors:  S Karlin; S F Altschul
Journal:  Proc Natl Acad Sci U S A       Date:  1990-03       Impact factor: 11.205

5.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

6.  Alignment of the amino acid sequences of distantly related proteins using variable gap penalties.

Authors:  A M Lesk; M Levitt; C Chothia
Journal:  Protein Eng       Date:  1986 Oct-Nov

Review 7.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine.

Authors:  M H Zweig; G Campbell
Journal:  Clin Chem       Date:  1993-04       Impact factor: 8.327

8.  Crystal structure of the catalytic domain of a thermophilic endocellulase.

Authors:  M Spezio; D B Wilson; P A Karplus
Journal:  Biochemistry       Date:  1993-09-28       Impact factor: 3.162

9.  Applications and statistics for multiple high-scoring segments in molecular sequences.

Authors:  S Karlin; S F Altschul
Journal:  Proc Natl Acad Sci U S A       Date:  1993-06-15       Impact factor: 11.205

10.  Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms.

Authors:  W R Pearson
Journal:  Genomics       Date:  1991-11       Impact factor: 5.736

View more
  118 in total

1.  SCOP: a structural classification of proteins database.

Authors:  L Lo Conte; B Ailey; T J Hubbard; S E Brenner; A G Murzin; C Chothia
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  The ASTRAL compendium for protein structure and sequence analysis.

Authors:  S E Brenner; P Koehl; M Levitt
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

3.  PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information.

Authors:  J Qian; B Stenger; C A Wilson; J Lin; R Jansen; S A Teichmann; J Park; W G Krebs; H Yu; V Alexandrov; N Echols; M Gerstein
Journal:  Nucleic Acids Res       Date:  2001-04-15       Impact factor: 16.971

4.  LiveBench-1: continuous benchmarking of protein structure prediction servers.

Authors:  J M Bujnicki; A Elofsson; D Fischer; L Rychlewski
Journal:  Protein Sci       Date:  2001-02       Impact factor: 6.725

5.  ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches.

Authors:  T Rognes
Journal:  Nucleic Acids Res       Date:  2001-04-01       Impact factor: 16.971

6.  Expectations from structural genomics.

Authors:  S E Brenner; M Levitt
Journal:  Protein Sci       Date:  2000-01       Impact factor: 6.725

7.  Comparison of sequence profiles. Strategies for structural predictions using sequence information.

Authors:  L Rychlewski; L Jaroszewski; W Li; A Godzik
Journal:  Protein Sci       Date:  2000-02       Impact factor: 6.725

8.  Genome analysis: Assigning protein coding regions to three-dimensional structures.

Authors:  A A Salamov; M Suwa; C A Orengo; M B Swindells
Journal:  Protein Sci       Date:  1999-04       Impact factor: 6.725

9.  A comparison of position-specific score matrices based on sequence and structure alignments.

Authors:  Anna R Panchenko; Stephen H Bryant
Journal:  Protein Sci       Date:  2002-02       Impact factor: 6.725

10.  The CATH extended protein-family database: providing structural annotations for genome sequences.

Authors:  Frances M G Pearl; David Lee; James E Bray; Daniel W A Buchan; Adrian J Shepherd; Christine A Orengo
Journal:  Protein Sci       Date:  2002-02       Impact factor: 6.725

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.