Literature DB >> 15769834

The properties of protein family space depend on experimental design.

Victor Kunin1, Sarah A Teichmann, Martijn A Huynen, Christos A Ouzounis.   

Abstract

MOTIVATION: Databases of protein families often exhibit drastically different properties of the protein family space.
RESULTS: We compared the properties of protein family space as reflected by exhaustive protein family databases and databases with predefined families. We used TRIBES, Protomap, ProDom and COGs as representatives of the exhaustive databases, and Pfam-A and Superfamily as databases that predefine families. We observe a power-law distribution of family sizes in all these databases, albeit in predefined databases the power-law line collapses before reaching smaller sized families. We discuss the future trends of this power-law distribution and suggest that saturation in the sampling of protein family space will result in a distortion of the power law in small family sizes. For larger genome sizes, predefined databases show logarithmic growth of the number of families per genome, whereas exhaustive databases exhibit a virtually linear relationship. All databases consistently differ in the proportion of protein families shared between taxa. Predefined databases have a larger number of protein families shared between the three domains of life, while exhaustive databases show a much more fragmented distribution. We argue that these discrepancies reflect alternative approaches to the trade-off issue of sensitivity versus specificity in the detection of homologous proteins. We conclude that these properties are complementary rather than contradictory, while describing the protein universe from different perspectives.

Mesh:

Substances:

Year:  2005        PMID: 15769834     DOI: 10.1093/bioinformatics/bti386

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  7 in total

1.  A limited universe of membrane protein families and folds.

Authors:  Amit Oberai; Yungok Ihm; Sanguk Kim; James U Bowie
Journal:  Protein Sci       Date:  2006-07       Impact factor: 6.725

2.  Evolutionary innovations and the organization of protein functions in genotype space.

Authors:  Evandro Ferrada; Andreas Wagner
Journal:  PLoS One       Date:  2010-11-30       Impact factor: 3.240

3.  Scaling properties of protein family phylogenies.

Authors:  Alejandro Herrada; Víctor M Eguíluz; Emilio Hernández-García; Carlos M Duarte
Journal:  BMC Evol Biol       Date:  2011-06-06       Impact factor: 3.260

4.  UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches.

Authors:  Baris E Suzek; Yuqi Wang; Hongzhan Huang; Peter B McGarvey; Cathy H Wu
Journal:  Bioinformatics       Date:  2014-11-13       Impact factor: 6.937

5.  MACHOS: Markov clusters of homologous subsequences.

Authors:  Simon Wong; Mark A Ragan
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

6.  EPGD: a comprehensive web resource for integrating and displaying eukaryotic paralog/paralogon information.

Authors:  Guohui Ding; Yan Sun; Hong Li; Zhen Wang; Haiwei Fan; Chuan Wang; Dan Yang; Yixue Li
Journal:  Nucleic Acids Res       Date:  2007-11-05       Impact factor: 16.971

7.  On the extent and origins of genic novelty in the phylum Nematoda.

Authors:  James Wasmuth; Ralf Schmid; Ann Hedley; Mark Blaxter
Journal:  PLoS Negl Trop Dis       Date:  2008-07-02
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.