Literature DB >> 11125098

A rapid classification protocol for the CATH Domain Database to support structural genomics.

F M Pearl1, N Martin, J E Bray, D W Buchan, A P Harrison, D Lee, G A Reeves, A J Shepherd, I Sillitoe, A E Todd, J M Thornton, C A Orengo.   

Abstract

In order to support the structural genomic initiatives, both by rapidly classifying newly determined structures and by suggesting suitable targets for structure determination, we have recently developed several new protocols for classifying structures in the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath). These aim to increase the speed of classification of new structures using fast algorithms for structure comparison (GRATH) and to improve the sensitivity in recognising distant structural relatives by incorporating sequence information from relatives in the genomes (DomainFinder). In order to ensure the integrity of the database given the expected increase in data, the CATH Protein Family Database (CATH-PFDB), which currently includes 25,320 structural domains and a further 160,000 sequence relatives has now been installed in a relational ORACLE database. This was essential for developing more rigorous validation procedures and for allowing efficient querying of the database, particularly for genome analysis. The associated Dictionary of Homologous Superfamilies [Bray,J.E., Todd,A.E., Pearl,F.M.G., Thornton,J.M. and Orengo,C.A. (2000) Protein Eng., 13, 153-165], which provides multiple structural alignments and functional information to assist in assigning new relatives, has also been expanded recently and now includes information for 903 homologous superfamilies. In order to improve coverage of known structures, preliminary classification levels are now provided for new structures at interim stages in the classification protocol. Since a large proportion of new structures can be rapidly classified using profile-based sequence analysis [e.g. PSI-BLAST: Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402], this provides preliminary classification for easily recognisable homologues, which in the latest release of CATH (version 1.7) represented nearly three-quarters of the non-identical structures.

Entities:  

Mesh:

Substances:

Year:  2001        PMID: 11125098      PMCID: PMC29791          DOI: 10.1093/nar/29.1.223

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  13 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores.

Authors:  C A Wilson; J Kreychman; M Gerstein
Journal:  J Mol Biol       Date:  2000-03-17       Impact factor: 5.469

3.  IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices.

Authors:  A A Schäffer; Y I Wolf; C P Ponting; E V Koonin; L Aravind; S F Altschul
Journal:  Bioinformatics       Date:  1999-12       Impact factor: 6.937

4.  The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues.

Authors:  J E Bray; A E Todd; F M Pearl; J M Thornton; C A Orengo
Journal:  Protein Eng       Date:  2000-03

5.  CORA--topological fingerprints for protein structural families.

Authors:  C A Orengo
Journal:  Protein Sci       Date:  1999-04       Impact factor: 6.725

6.  Fast structure alignment for protein databank searching.

Authors:  C A Orengo; N P Brown; W R Taylor
Journal:  Proteins       Date:  1992-10

7.  PDBsum: a Web-based database of summaries and analyses of all PDB structures.

Authors:  R A Laskowski; E G Hutchinson; A D Michie; A C Wallace; M L Jones; J M Thornton
Journal:  Trends Biochem Sci       Date:  1997-12       Impact factor: 13.807

Review 8.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

9.  CATH--a hierarchic classification of protein domain structures.

Authors:  C A Orengo; A D Michie; S Jones; D T Jones; M B Swindells; J M Thornton
Journal:  Structure       Date:  1997-08-15       Impact factor: 5.006

10.  Protein structure alignment.

Authors:  W R Taylor; C A Orengo
Journal:  J Mol Biol       Date:  1989-07-05       Impact factor: 5.469

View more
  14 in total

1.  The CATH extended protein-family database: providing structural annotations for genome sequences.

Authors:  Frances M G Pearl; David Lee; James E Bray; Daniel W A Buchan; Adrian J Shepherd; Christine A Orengo
Journal:  Protein Sci       Date:  2002-02       Impact factor: 6.725

2.  Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database.

Authors:  Daniel W A Buchan; Adrian J Shepherd; David Lee; Frances M G Pearl; Stuart C G Rison; Janet M Thornton; Christine A Orengo
Journal:  Genome Res       Date:  2002-03       Impact factor: 9.043

3.  The CATH database: an extended protein family resource for structural and functional genomics.

Authors:  F M G Pearl; C F Bennett; J E Bray; A P Harrison; N Martin; A Shepherd; I Sillitoe; J Thornton; C A Orengo
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

4.  Gene3D: structural assignments for the biologist and bioinformaticist alike.

Authors:  Daniel W A Buchan; Stuart C G Rison; James E Bray; David Lee; Frances Pearl; Janet M Thornton; Christine A Orengo
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

5.  The PEDANT genome database.

Authors:  Dmitrij Frishman; Martin Mokrejs; Denis Kosykh; Gabi Kastenmüller; Grigory Kolesov; Igor Zubrzycki; Christian Gruber; Birgitta Geier; Andreas Kaps; Kaj Albermann; Andreas Volz; Christian Wagner; Matthias Fellenberg; Klaus Heumann; Hans-Werner Mewes
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

Review 6.  Protein folds and protein folding.

Authors:  R Dustin Schaeffer; Valerie Daggett
Journal:  Protein Eng Des Sel       Date:  2010-11-03       Impact factor: 1.650

7.  Assessing strategies for improved superfamily recognition.

Authors:  Ian Sillitoe; Mark Dibley; James Bray; Sarah Addou; Christine Orengo
Journal:  Protein Sci       Date:  2005-06-03       Impact factor: 6.725

Review 8.  Exploiting protein structure data to explore the evolution of protein function and biological complexity.

Authors:  Russell L Marsden; Juan A G Ranea; Antonio Sillero; Oliver Redfern; Corin Yeats; Michael Maibaum; David Lee; Sarah Addou; Gabrielle A Reeves; Timothy J Dallman; Christine A Orengo
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2006-03-29       Impact factor: 6.237

Review 9.  Bioinformatics approaches to classifying allergens and predicting cross-reactivity.

Authors:  Catherine H Schein; Ovidiu Ivanciuc; Werner Braun
Journal:  Immunol Allergy Clin North Am       Date:  2007-02       Impact factor: 3.479

Review 10.  Structural analysis of linear and conformational epitopes of allergens.

Authors:  Ovidiu Ivanciuc; Catherine H Schein; Tzintzuni Garcia; Numan Oezguen; Surendra S Negi; Werner Braun
Journal:  Regul Toxicol Pharmacol       Date:  2008-12-14       Impact factor: 3.271

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.