Literature DB >> 10383473

A comparison of sequence and structure protein domain families as a basis for structural genomics.

A Elofsson1, E L Sonnhammer.   

Abstract

MOTIVATION: Protein families can be defined based on structure or sequence similarity. We wanted to compare two protein family databases, one based on structural and one on sequence similarity, to investigate to what extent they overlap, the similarity in definition of corresponding families, and to create a list of large protein families with unknown structure as a resource for structural genomics. We also wanted to increase the sensitivity of fold assignment by exploiting protein family HMMs.
RESULTS: We compared Pfam, a protein family database based on sequence similarity, to Scop, which is based on structural similarity. We found that 70% of the Scop families exist in Pfam while 57% of the Pfam families exist in Scop. Most families that occur in both databases correspond well to each other, but in some cases they are different. Such cases highlight situations in which structure and sequence approaches differ significantly. The comparison enabled us to compile a list of the largest families that do not occur in Scop; these are suitable targets for structure prediction and determination, and may be useful to guide projects in structural genomics. It can be noted that 13 out of the 20 largest protein families without a known structure are likely transmembrane proteins. We also exploited Pfam to increase the sensitivity of detecting homologs of proteins with known structure, by comparing query sequences to Pfam HMMs that correspond to Scop families. For SWISSPROT+TREMBL, this yielded an increase in fold assignment from 31% to 42% compared to using FASTA only. This method assigned a structure to 22% of the proteins in Saccharomyces cerevisiae, 24% in Escherichia coli, and 16% in Methanococcus jannaschii.

Entities:  

Mesh:

Substances:

Year:  1999        PMID: 10383473     DOI: 10.1093/bioinformatics/15.6.480

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  16 in total

1.  Automated search of natively folded protein fragments for high-throughput structure determination in structural genomics.

Authors:  Y Kuroda; K Tani; Y Matsuo; S Yokoyama
Journal:  Protein Sci       Date:  2000-12       Impact factor: 6.725

2.  SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes.

Authors:  Shashi B Pandit; Dilip Gosar; S Abhiman; S Sujatha; Sayali S Dixit; Natasha S Mhatre; R Sowdhamini; N Srinivasan
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

3.  Subunit H of the V-ATPase involved in endocytosis shows homology to beta-adaptins.

Authors:  Matthias Geyer; Oliver T Fackler; B Matija Peterlin
Journal:  Mol Biol Cell       Date:  2002-06       Impact factor: 4.138

4.  Signatures of domain shuffling in the human genome.

Authors:  Henrik Kaessmann; Sebastian Zöllner; Anton Nekrutenko; Wen-Hsiung Li
Journal:  Genome Res       Date:  2002-11       Impact factor: 9.043

5.  Evolution of a domain conserved in microtubule-associated proteins of eukaryotes.

Authors:  Alex S Rajangam; Hongqian Yang; Tuula T Teeri; Lars Arvestad
Journal:  Adv Appl Bioinform Chem       Date:  2008-09-23

6.  The dynamics and evolutionary potential of domain loss and emergence.

Authors:  Andrew D Moore; Erich Bornberg-Bauer
Journal:  Mol Biol Evol       Date:  2011-10-19       Impact factor: 16.240

7.  SNP@Domain: a web resource of single nucleotide polymorphisms (SNPs) within protein domain structures and sequences.

Authors:  Areum Han; Hyo Jin Kang; Yoobok Cho; Sunghoon Lee; Young Joo Kim; Sungsam Gong
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

8.  Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint.

Authors:  Russell L Marsden; Tony A Lewis; Christine A Orengo
Journal:  BMC Bioinformatics       Date:  2007-03-09       Impact factor: 3.169

9.  A comprehensive system for evaluation of remote sequence similarity detection.

Authors:  Yuan Qi; Ruslan I Sadreyev; Yong Wang; Bong-Hyun Kim; Nick V Grishin
Journal:  BMC Bioinformatics       Date:  2007-08-28       Impact factor: 3.169

10.  A comparison of Pfam and MEROPS: two databases, one comprehensive, and one specialised.

Authors:  David J Studholme; Neil D Rawlings; Alan J Barrett; Alex Bateman
Journal:  BMC Bioinformatics       Date:  2003-05-09       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.