Literature DB >> 16718863

Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching.

M Gribskov1, N L Robinson.   

Abstract

In this paper, we borrow the idea of the receiver operating characteristic (ROC) from clinical medicine and demonstrate its application to sequence comparison. The ROC includes elements of both sensitivity and specificity, and is a quantitative measure of the usefulness of a diagnostic. The ROC is used in this work to investigate the effects of scoring table and gap penalties on database searches. Studies on three families of proteins, 4Fe-4S ferredoxins, lysR bacterial regulatory proteins, and bacterial RNA polymerase sigma-factors lead to the following conclusions: sequence families are quite idiosyncratic, but the best PAM distance for database searches using the Smith-Waterman method is somewhat larger than predicted by theoretical methods, about 200 PAM. The length independent gap penalty (gap initiation penalty) is quite important, but shows a broad peak at values of about 20-24. The length dependent gap penalty (gap extension penalty) is almost irrelevant suggesting that successful database searches rely only to a limited degree on gapped alignments. Taken together, these observations lead to the conclusion that the optimal conditions for alignments and database searches are not, and should not be expected to be, the same.

Entities:  

Mesh:

Substances:

Year:  1996        PMID: 16718863     DOI: 10.1016/s0097-8485(96)80004-0

Source DB:  PubMed          Journal:  Comput Chem        ISSN: 0097-8485


  125 in total

Review 1.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.

Authors:  A A Schäffer; L Aravind; T L Madden; S Shavirin; J L Spouge; Y I Wolf; E V Koonin; S F Altschul
Journal:  Nucleic Acids Res       Date:  2001-07-15       Impact factor: 16.971

2.  Surface-histogram: a new shape descriptor for protein-protein docking.

Authors:  Shengyin Gu; Patrice Koehl; Joel Hass; Nina Amenta
Journal:  Proteins       Date:  2011-11-09

3.  FoldMiner: structural motif discovery using an improved superposition algorithm.

Authors:  Jessica Shapiro; Douglas Brutlag
Journal:  Protein Sci       Date:  2004-01       Impact factor: 6.725

4.  Finding weak similarities between proteins by sequence profile comparison.

Authors:  Anna R Panchenko
Journal:  Nucleic Acids Res       Date:  2003-01-15       Impact factor: 16.971

5.  Protein ranking: from local to global structure in the protein similarity network.

Authors:  Jason Weston; Andre Elisseeff; Dengyong Zhou; Christina S Leslie; William Stafford Noble
Journal:  Proc Natl Acad Sci U S A       Date:  2004-04-15       Impact factor: 11.205

6.  Prediction of functional sites by analysis of sequence and structure conservation.

Authors:  Anna R Panchenko; Fyodor Kondrashov; Stephen Bryant
Journal:  Protein Sci       Date:  2004-03-09       Impact factor: 6.725

7.  Sensitivity and selectivity in protein structure comparison.

Authors:  Michael L Sierk; William R Pearson
Journal:  Protein Sci       Date:  2004-03       Impact factor: 6.725

8.  PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences.

Authors:  K Ganesan; S Parthasarathy
Journal:  J Struct Funct Genomics       Date:  2011-12-03

9.  Improving Retrieval Efficacy of Homology Searches Using the False Discovery Rate.

Authors:  Hyrum D Carroll; Alex C Williams; Anthony G Davis; John L Spouge
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2015 May-Jun       Impact factor: 3.710

10.  Structure- and sequence-based function prediction for non-homologous proteins.

Authors:  Lee Sael; Meghana Chitale; Daisuke Kihara
Journal:  J Struct Funct Genomics       Date:  2012-01-22
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.