Literature DB >> 11222263

What are the baselines for protein fold recognition?

L J McGuffin1, K Bryson, D T Jones.   

Abstract

MOTIVATION: What constitutes a baseline level of success for protein fold recognition methods? As fold recognition benchmarks are often presented without any thought to the results that might be expected from a purely random set of predictions, an analysis of fold recognition baselines is long overdue. Given varying amounts of basic information about a protein-ranging from the length of the sequence to a knowledge of its secondary structure-to what extent can the fold be determined by intelligent guesswork? Can simple methods that make use of secondary structure information assign folds more accurately than purely random methods and could these methods be used to construct viable hierarchical classifications? EXPERIMENTS PERFORMED: A number of rapid automatic methods which score similarities between protein domains were devised and tested. These methods ranged from those that incorporated no secondary structure information, such as measuring absolute differences in sequence lengths, to more complex alignments of secondary structure elements. Each method was assessed for accuracy by comparison with the Class Architecture Topology Homology (CATH) classification. Methods were rated against both a random baseline fold assignment method as a lower control and FSSP as an upper control. Similarity trees were constructed in order to evaluate the accuracy of optimum methods at producing a classification of structure.
RESULTS: Using a rigorous comparison of methods with CATH, the random fold assignment method set a lower baseline of 11% true positives allowing for 3% false positives and FSSP set an upper benchmark of 47% true positives at 3% false positives. The optimum secondary structure alignment method used here achieved 27% true positives at 3% false positives. Using a less rigorous Critical Assessment of Structure Prediction (CASP)-like sensitivity measurement the random assignment achieved 6%, FSSP-59% and the optimum secondary structure alignment method-32%. Similarity trees produced by the optimum method illustrate that these methods cannot be used alone to produce a viable protein structural classification system.
CONCLUSIONS: Simple methods that use perfect secondary structure information to assign folds cannot produce an accurate protein taxonomy, however they do provide useful baselines for fold recognition. In terms of a typical CASP assessment our results suggest that approximately 6% of targets with folds in the databases could be assigned correctly by randomly guessing, and as many as 32% could be recognised by trivial secondary structure comparison methods, given knowledge of their correct secondary structures.

Entities:  

Mesh:

Year:  2001        PMID: 11222263     DOI: 10.1093/bioinformatics/17.1.63

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  7 in total

1.  Rapid protein domain assignment from amino acid sequence using predicted secondary structure.

Authors:  Russell L Marsden; Liam J McGuffin; David T Jones
Journal:  Protein Sci       Date:  2002-12       Impact factor: 6.725

2.  A consensus view of fold space: combining SCOP, CATH, and the Dali Domain Dictionary.

Authors:  Ryan Day; David A C Beck; Roger S Armen; Valerie Daggett
Journal:  Protein Sci       Date:  2003-10       Impact factor: 6.725

3.  PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences.

Authors:  K Ganesan; S Parthasarathy
Journal:  J Struct Funct Genomics       Date:  2011-12-03

4.  The CATH hierarchy revisited-structural divergence in domain superfamilies and the continuity of fold space.

Authors:  Alison Cuff; Oliver C Redfern; Lesley Greene; Ian Sillitoe; Tony Lewis; Mark Dibley; Adam Reid; Frances Pearl; Tim Dallman; Annabel Todd; Richard Garratt; Janet Thornton; Christine Orengo
Journal:  Structure       Date:  2009-08-12       Impact factor: 5.006

5.  Alignment-free local structural search by writhe decomposition.

Authors:  Degui Zhi; Maxim Shatsky; Steven E Brenner
Journal:  Bioinformatics       Date:  2010-04-05       Impact factor: 6.937

6.  Benchmarking consensus model quality assessment for protein fold recognition.

Authors:  Liam J McGuffin
Journal:  BMC Bioinformatics       Date:  2007-09-18       Impact factor: 3.169

7.  Fold classification based on secondary structure--how much is gained by including loop topology?

Authors:  Jieun Jeong; Piotr Berman; Teresa Przytycka
Journal:  BMC Struct Biol       Date:  2006-03-08
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.