Literature DB >> 15937274

Assessing strategies for improved superfamily recognition.

Ian Sillitoe1, Mark Dibley, James Bray, Sarah Addou, Christine Orengo.   

Abstract

There are more than 200 completed genomes and over 1 million nonredundant sequences in public repositories. Although the structural data are more sparse (approximately 13,000 nonredundant structures solved to date), several powerful sequence-based methodologies now allow these structures to be mapped onto related regions in a significant proportion of genome sequences. We review a number of publicly available strategies for providing structural annotations for genome sequences, and we describe the protocol adopted to provide CATH structural annotations for completed genomes. In particular, we assess the performance of several sequence-based protocols employing Hidden Markov model (HMM) technologies for superfamily recognition, including a new approach (SAMOSA [sequence augmented models of structure alignments]) that exploits multiple structural alignments from the CATH domain structure database when building the models. Using a data set of remote homologs detected by structure comparison and manually validated in CATH, a single-seed HMM library was able to recognize 76% of the data set. Including the SAMOSA models in the HMM library showed little gain in homolog recognition, although a slight improvement in alignment quality was observed for very remote homologs. However, using an expanded 1D-HMM library, CATH-ISL increased the coverage to 86%. The single-seed HMM library has been used to annotate the protein sequences of 120 genomes from all three major kingdoms, allowing up to 70% of the genes or partial genes to be assigned to CATH superfamilies. It has also been used to recruit sequences from Swiss-Prot and TrEMBL into CATH domain superfamilies, expanding the CATH database eightfold.

Mesh:

Substances:

Year:  2005        PMID: 15937274      PMCID: PMC2253352          DOI: 10.1110/ps.041056105

Source DB:  PubMed          Journal:  Protein Sci        ISSN: 0961-8368            Impact factor:   6.725


  35 in total

1.  Assigning genomic sequences to CATH.

Authors:  F M Pearl; D Lee; J E Bray; I Sillitoe; A E Todd; A P Harrison; J M Thornton; C A Orengo
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  A rapid classification protocol for the CATH Domain Database to support structural genomics.

Authors:  F M Pearl; N Martin; J E Bray; D W Buchan; A P Harrison; D Lee; G A Reeves; A J Shepherd; I Sillitoe; A E Todd; J M Thornton; C A Orengo
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

3.  Enhanced genome annotation using structural profiles in the program 3D-PSSM.

Authors:  L A Kelley; R M MacCallum; M J Sternberg
Journal:  J Mol Biol       Date:  2000-06-02       Impact factor: 5.469

4.  Benchmarking PSI-BLAST in genome annotation.

Authors:  A Müller; R M MacCallum; M J Sternberg
Journal:  J Mol Biol       Date:  1999-11-12       Impact factor: 5.469

5.  CORA--topological fingerprints for protein structural families.

Authors:  C A Orengo
Journal:  Protein Sci       Date:  1999-04       Impact factor: 6.725

6.  The CATH extended protein-family database: providing structural annotations for genome sequences.

Authors:  Frances M G Pearl; David Lee; James E Bray; Daniel W A Buchan; Adrian J Shepherd; Christine A Orengo
Journal:  Protein Sci       Date:  2002-02       Impact factor: 6.725

Review 7.  Review: what can structural classifications reveal about protein evolution?

Authors:  C A Orengo; I Sillitoe; G Reeves; F M Pearl
Journal:  J Struct Biol       Date:  2001 May-Jun       Impact factor: 2.867

8.  Evaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set.

Authors:  K Karplus; B Hu
Journal:  Bioinformatics       Date:  2001-08       Impact factor: 6.937

9.  Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database.

Authors:  Daniel W A Buchan; Adrian J Shepherd; David Lee; Frances M G Pearl; Stuart C G Rison; Janet M Thornton; Christine A Orengo
Journal:  Genome Res       Date:  2002-03       Impact factor: 9.043

10.  A study on protein sequence alignment quality.

Authors:  Arne Elofsson
Journal:  Proteins       Date:  2002-02-15
View more
  11 in total

1.  Protein superfamily evolution and the last universal common ancestor (LUCA).

Authors:  Juan A G Ranea; Antonio Sillero; Janet M Thornton; Christine A Orengo
Journal:  J Mol Evol       Date:  2006-10-04       Impact factor: 2.395

Review 2.  Exploiting protein structure data to explore the evolution of protein function and biological complexity.

Authors:  Russell L Marsden; Juan A G Ranea; Antonio Sillero; Oliver Redfern; Corin Yeats; Michael Maibaum; David Lee; Sarah Addou; Gabrielle A Reeves; Timothy J Dallman; Christine A Orengo
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2006-03-29       Impact factor: 6.237

Review 3.  From local structure to a global framework: recognition of protein folds.

Authors:  Agnel Praveen Joseph; Alexandre G de Brevern
Journal:  J R Soc Interface       Date:  2014-04-16       Impact factor: 4.118

4.  Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition.

Authors:  Marcel Schmidt Am Busch; Audrey Sedano; Thomas Simonson
Journal:  PLoS One       Date:  2010-05-05       Impact factor: 3.240

5.  New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures.

Authors:  Ian Sillitoe; Alison L Cuff; Benoit H Dessailly; Natalie L Dawson; Nicholas Furnham; David Lee; Jonathan G Lees; Tony E Lewis; Romain A Studer; Robert Rentzsch; Corin Yeats; Janet M Thornton; Christine A Orengo
Journal:  Nucleic Acids Res       Date:  2012-11-29       Impact factor: 16.971

6.  Gene3D: modelling protein structure, function and evolution.

Authors:  Corin Yeats; Michael Maibaum; Russell Marsden; Mark Dibley; David Lee; Sarah Addou; Christine A Orengo
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

7.  The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution.

Authors:  Lesley H Greene; Tony E Lewis; Sarah Addou; Alison Cuff; Tim Dallman; Mark Dibley; Oliver Redfern; Frances Pearl; Rekha Nambudiry; Adam Reid; Ian Sillitoe; Corin Yeats; Janet M Thornton; Christine A Orengo
Journal:  Nucleic Acids Res       Date:  2006-11-29       Impact factor: 16.971

8.  Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.

Authors:  Eric D Scheeff; Philip E Bourne
Journal:  BMC Bioinformatics       Date:  2006-09-14       Impact factor: 3.169

9.  Gene3D: merging structure and function for a Thousand genomes.

Authors:  Jonathan Lees; Corin Yeats; Oliver Redfern; Andrew Clegg; Christine Orengo
Journal:  Nucleic Acids Res       Date:  2009-11-11       Impact factor: 16.971

10.  Identification of similar regions of protein structures using integrated sequence and structure analysis tools.

Authors:  Brandon Peters; Charles Moad; Eunseog Youn; Kris Buffington; Randy Heiland; Sean Mooney
Journal:  BMC Struct Biol       Date:  2006-03-09
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.