Literature DB >> 11752317

SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes.

Shashi B Pandit1, Dilip Gosar, S Abhiman, S Sujatha, Sayali S Dixit, Natasha S Mhatre, R Sowdhamini, N Srinivasan.   

Abstract

Members of a superfamily of proteins could result from divergent evolution of homologues with insignificant similarity in the amino acid sequences. A superfamily relationship is detected commonly after the three-dimensional structures of the proteins are determined using X-ray analysis or NMR. The SUPFAM database described here relates two homologous protein families in a multiple sequence alignment database of either known or unknown structure. The present release (1.1), which is the first version of the SUPFAM database, has been derived by analysing Pfam, which is one of the commonly used databases of multiple sequence alignments of homologous proteins. The first step in establishing SUPFAM is to relate Pfam families with the families in PALI, which is an alignment database of homologous proteins of known structure that is derived largely from SCOP. The second step involves relating Pfam families which could not be associated reliably with a protein superfamily of known structure. The profile matching procedure, IMPALA, has been used in these steps. The first step resulted in identification of 1280 Pfam families (out of 2697, i.e. 47%) which are related, either by close homologous connection to a SCOP family or by distant relationship to a SCOP family, potentially forming new superfamily connections. Using the profiles of 1417 Pfam families with apparently no structural information, an all-against-all comparison involving a sequence-profile match using IMPALA resulted in clustering of 67 homologous protein families of Pfam into 28 potential new superfamilies. Expansion of groups of related proteins of yet unknown structural information, as proposed in SUPFAM, should help in identifying 'priority proteins' for structure determination in structural genomics initiatives to expand the coverage of structural information in the protein sequence space. For example, we could assign 858 distinct Pfam domains in 2203 of the gene products in the genome of Mycobacterium tubercolosis. Fifty-one of these Pfam families of unknown structure could be clustered into 17 potentially new superfamilies forming good targets for structural genomics. SUPFAM database can be accessed at http://pauling.mbu.iisc.ernet.in/~supfam.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 11752317      PMCID: PMC99061          DOI: 10.1093/nar/30.1.289

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  19 in total

1.  The Pfam protein families database.

Authors:  A Bateman; E Birney; R Durbin; S R Eddy; K L Howe; E L Sonnhammer
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  A comparison of sequence and structure protein domain families as a basis for structural genomics.

Authors:  A Elofsson; E L Sonnhammer
Journal:  Bioinformatics       Date:  1999-06       Impact factor: 6.937

3.  PALI-a database of Phylogeny and ALIgnment of homologous protein structures.

Authors:  S Balaji; S Sujatha; S S Kumar; N Srinivasan
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

4.  Expectations from structural genomics.

Authors:  S E Brenner; M Levitt
Journal:  Protein Sci       Date:  2000-01       Impact factor: 6.725

5.  IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices.

Authors:  A A Schäffer; Y I Wolf; C P Ponting; E V Koonin; L Aravind; S F Altschul
Journal:  Bioinformatics       Date:  1999-12       Impact factor: 6.937

6.  The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues.

Authors:  J E Bray; A E Todd; F M Pearl; J M Thornton; C A Orengo
Journal:  Protein Eng       Date:  2000-03

Review 7.  Protein structure modeling for structural genomics.

Authors:  R Sánchez; U Pieper; F Melo; N Eswar; M A Martí-Renom; M S Madhusudhan; N Mirković; A Sali
Journal:  Nat Struct Biol       Date:  2000-11

Review 8.  From structure to function: approaches and limitations.

Authors:  J M Thornton; A E Todd; D Milburn; N Borkakoti; C A Orengo
Journal:  Nat Struct Biol       Date:  2000-11

9.  Completeness in structural genomics.

Authors:  D Vitkup; E Melamud; J Moult; C Sander
Journal:  Nat Struct Biol       Date:  2001-06

10.  Use of a database of structural alignments and phylogenetic trees in investigating the relationship between sequence and structural variability among homologous proteins.

Authors:  S Balaji; N Srinivasan
Journal:  Protein Eng       Date:  2001-04
View more
  18 in total

1.  Integration of related sequences with protein three-dimensional structural families in an updated version of PALI database.

Authors:  V S Gowri; Shashi B Pandit; P S Karthik; N Srinivasan; S Balaji
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

2.  Structural similarity to bridge sequence space: finding new families on the bridges.

Authors:  Parantu K Shah; Patrick Aloy; Peer Bork; Robert B Russell
Journal:  Protein Sci       Date:  2005-05       Impact factor: 6.725

Review 3.  FINDSITE: a combined evolution/structure-based approach to protein function prediction.

Authors:  Jeffrey Skolnick; Michal Brylinski
Journal:  Brief Bioinform       Date:  2009-03-26       Impact factor: 11.622

4.  Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB.

Authors:  Qifang Xu; Roland L Dunbrack
Journal:  Bioinformatics       Date:  2012-08-31       Impact factor: 6.937

5.  A bioinformatic and computational study of myosin phosphatase subunit diversity.

Authors:  Rachael P Dippold; Steven A Fisher
Journal:  Am J Physiol Regul Integr Comp Physiol       Date:  2014-06-04       Impact factor: 3.619

6.  Enhanced functional and structural domain assignments using remote similarity detection procedures for proteins encoded in the genome of Mycobacterium tuberculosis H37Rv.

Authors:  Seema Namboori; Natasha Mhatre; Sentivel Sujatha; Narayanaswamy Srinivasan; Shashi Bhushan Pandit
Journal:  J Biosci       Date:  2004-09       Impact factor: 1.826

7.  The eisosome core is composed of BAR domain proteins.

Authors:  Agustina Olivera-Couto; Martin Graña; Laura Harispe; Pablo S Aguilar
Journal:  Mol Biol Cell       Date:  2011-05-18       Impact factor: 4.138

8.  HomPPI: a class of sequence homology based protein-protein interface prediction methods.

Authors:  Li C Xue; Drena Dobbs; Vasant Honavar
Journal:  BMC Bioinformatics       Date:  2011-06-17       Impact factor: 3.169

9.  Computational Biology and Bioinformatics: a tinge of Indian spice.

Authors:  N Srinivasan
Journal:  Bioinformation       Date:  2006-02-28

10.  SUPFAM: a database of sequence superfamilies of protein domains.

Authors:  Shashi B Pandit; Rana Bhadra; V S Gowri; S Balaji; B Anand; N Srinivasan
Journal:  BMC Bioinformatics       Date:  2004-03-15       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.