Literature DB >> 11452024

Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.

A A Schäffer1, L Aravind, T L Madden, S Shavirin, J L Spouge, Y I Wolf, E V Koonin, S F Altschul.   

Abstract

PSI-BLAST is an iterative program to search a database for proteins with distant similarity to a query sequence. We investigated over a dozen modifications to the methods used in PSI-BLAST, with the goal of improving accuracy in finding true positive matches. To evaluate performance we used a set of 103 queries for which the true positives in yeast had been annotated by human experts, and a popular measure of retrieval accuracy (ROC) that can be normalized to take on values between 0 (worst) and 1 (best). The modifications we consider novel improve the ROC score from 0.758 +/- 0.005 to 0.895 +/- 0.003. This does not include the benefits from four modifications we included in the 'baseline' version, even though they were not implemented in PSI-BLAST version 2.0. The improvement in accuracy was confirmed on a small second test set. This test involved analyzing three protein families with curated lists of true positives from the non-redundant protein database. The modification that accounts for the majority of the improvement is the use, for each database sequence, of a position-specific scoring system tuned to that sequence's amino acid composition. The use of composition-based statistics is particularly beneficial for large-scale automated applications of PSI-BLAST.

Entities:  

Mesh:

Substances:

Year:  2001        PMID: 11452024      PMCID: PMC55814          DOI: 10.1093/nar/29.14.2994

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  51 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods.

Authors:  J Park; K Karplus; C Barrett; R Hughey; D Haussler; T Hubbard; C Chothia
Journal:  J Mol Biol       Date:  1998-12-11       Impact factor: 5.469

3.  Fold prediction and evolutionary analysis of the POZ domain: structural and evolutionary relationship with the potassium channel tetramerization domain.

Authors:  L Aravind; E V Koonin
Journal:  J Mol Biol       Date:  1999-01-29       Impact factor: 5.469

4.  Local alignment statistics.

Authors:  S F Altschul; W Gish
Journal:  Methods Enzymol       Date:  1996       Impact factor: 1.600

5.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

Authors:  S Karlin; S F Altschul
Journal:  Proc Natl Acad Sci U S A       Date:  1990-03       Impact factor: 11.205

6.  Weights for data related by a tree.

Authors:  S F Altschul; R J Carroll; D J Lipman
Journal:  J Mol Biol       Date:  1989-06-20       Impact factor: 5.469

7.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

8.  Volume changes in protein evolution.

Authors:  M Gerstein; E L Sonnhammer; C Chothia
Journal:  J Mol Biol       Date:  1994-03-04       Impact factor: 5.469

9.  An improved algorithm for matching biological sequences.

Authors:  O Gotoh
Journal:  J Mol Biol       Date:  1982-12-15       Impact factor: 5.469

Review 10.  Comparison of the complete protein sets of worm and yeast: orthology and divergence.

Authors:  S A Chervitz; L Aravind; G Sherlock; C A Ball; E V Koonin; S S Dwight; M A Harris; K Dolinski; S Mohr; T Smith; S Weng; J M Cherry; D Botstein
Journal:  Science       Date:  1998-12-11       Impact factor: 47.728

View more
  472 in total

1.  The Histone Database.

Authors:  Steven Sullivan; Daniel W Sink; Kenneth L Trout; Izabela Makalowska; Patrick M Taylor; Andreas D Baxevanis; David Landsman
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

2.  Comparative genomics and evolution of proteins involved in RNA metabolism.

Authors:  Vivek Anantharaman; Eugene V Koonin; L Aravind
Journal:  Nucleic Acids Res       Date:  2002-04-01       Impact factor: 16.971

3.  Definition of EGF-like, closely interacting modules that bear activation epitopes in integrin beta subunits.

Authors:  J Takagi; N Beglova; P Yalamanchili; S C Blacklow; T A Springer
Journal:  Proc Natl Acad Sci U S A       Date:  2001-09-25       Impact factor: 11.205

4.  Peptidase family U34 belongs to the superfamily of N-terminal nucleophile hydrolases.

Authors:  Jimin Pei; Nick V Grishin
Journal:  Protein Sci       Date:  2003-05       Impact factor: 6.725

5.  An insight into the sialotranscriptome of Triatoma matogrossensis, a kissing bug associated with fogo selvagem in South America.

Authors:  Teresa C F Assumpção; Donald P Eaton; Van M Pham; Ivo M B Francischetti; Valéria Aoki; Gunter Hans-Filho; Evandro A Rivitti; Jesus G Valenzuela; Luis A Diaz; José M C Ribeiro
Journal:  Am J Trop Med Hyg       Date:  2012-06       Impact factor: 2.345

6.  More for less in structural genomics.

Authors:  A Heger; L Holm
Journal:  J Struct Funct Genomics       Date:  2003

7.  The compositional adjustment of amino acid substitution matrices.

Authors:  Yi-Kuo Yu; John C Wootton; Stephen F Altschul
Journal:  Proc Natl Acad Sci U S A       Date:  2003-12-08       Impact factor: 11.205

8.  Selectively receptor-blind measles viruses: Identification of residues necessary for SLAM- or CD46-induced fusion and their localization on a new hemagglutinin structural model.

Authors:  Sompong Vongpunsawad; Numan Oezgun; Werner Braun; Roberto Cattaneo
Journal:  J Virol       Date:  2004-01       Impact factor: 5.103

9.  Amino acid variant in the kinase binding domain of dual-specific A kinase-anchoring protein 2: a disease susceptibility polymorphism.

Authors:  Stefan Kammerer; Lora L Burns-Hamuro; Yuliang Ma; Sara C Hamon; Jaume M Canaves; Michael M Shi; Matthew R Nelson; Charles F Sing; Charles R Cantor; Susan S Taylor; Andreas Braun
Journal:  Proc Natl Acad Sci U S A       Date:  2003-03-19       Impact factor: 11.205

10.  Mapping metabolic and transcript temporal switches during germination in rice highlights specific transcription factors and the role of RNA instability in the germination process.

Authors:  Katharine A Howell; Reena Narsai; Adam Carroll; Aneta Ivanova; Marc Lohse; Björn Usadel; A Harvey Millar; James Whelan
Journal:  Plant Physiol       Date:  2008-12-12       Impact factor: 8.340

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.