Literature DB >> 11294790

MetaFam: a unified classification of protein families. I. Overview and statistics.

K A Silverstein1, E Shoop, J E Johnson, E F Retzel.   

Abstract

MOTIVATION: Protein sequence classification is becoming an increasingly important means of organizing the voluminous data produced by large-scale genome sequencing projects. At present, there are several independent classification methods. To aid the general classification effort, we have created a unified protein family resource, MetaFam. MetaFam is a protein family classification built upon 10 publicly-accessible protein family databases (Blocks + DOMO, Pfam, PIR-ALN, PRINTS, PROSITE, ProDom, PROTOMAP, SBASE, and SYSTERS). MetaFam's family 'supersets', as we call them, are created automatically using set-theory to compare families among the databases. Families of one database are matched to those in another when the intersection of their members exceeds all other possible family pairings between the two databases. Pairwise family matches are drawn together transitively to create a new list of protein family supersets.
RESULTS: MetaFam family supersets have several useful features: (1) each superset contains more members than the families from which it is composed, because each of the component family databases only works with a subset of our full non-redundant set of proteins; (2) conflicting assignments can be pinpointed quickly, since our analysis identifies individual members that are in conflict with the majority consensus; (3) family descriptions that are absent from automated databases can frequently be assigned; (4) statistics have been computed comparing domain boundaries, family size distributions, and overall quality of MetaFam supersets; (5) the supersets have been loaded into a relational database to allow for complex queries and visualization of the connections among families in a superset and the consensus of individual domain members; and (6) the quality of individual supersets has been assessed using numerous quantitative measures such as family consistency, connectedness, and size. We anticipate this new resource will be particularly useful to genomic database curators.

Mesh:

Substances:

Year:  2001        PMID: 11294790     DOI: 10.1093/bioinformatics/17.3.249

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  4 in total

1.  The MetaFam Server: a comprehensive protein family resource.

Authors:  K A Silverstein; E Shoop; J E Johnson; A Kilian; J L Freeman; T M Kunau; I A Awad; M Mayer; E F Retzel
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  Exploring the nonlinear geometry of protein homology.

Authors:  Michael A Farnum; Huafeng Xu; Dimitris K Agrafiotis
Journal:  Protein Sci       Date:  2003-08       Impact factor: 6.725

Review 3.  Tools and resources for identifying protein families, domains and motifs.

Authors:  Nicola J Mulder; Rolf Apweiler
Journal:  Genome Biol       Date:  2001-12-19       Impact factor: 13.583

4.  Microarrays for global expression constructed with a low redundancy set of 27,500 sequenced cDNAs representing an array of developmental stages and physiological conditions of the soybean plant.

Authors:  Lila O Vodkin; Anupama Khanna; Robin Shealy; Steven J Clough; Delkin Orlando Gonzalez; Reena Philip; Gracia Zabala; Françoise Thibaud-Nissen; Mark Sidarous; Martina V Strömvik; Elizabeth Shoop; Christina Schmidt; Ernest Retzel; John Erpelding; Randy C Shoemaker; Alicia M Rodriguez-Huete; Joseph C Polacco; Virginia Coryell; Paul Keim; George Gong; Lei Liu; Jose Pardinas; Peter Schweitzer
Journal:  BMC Genomics       Date:  2004-09-29       Impact factor: 3.969

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.