Literature DB >> 11697896

Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model.

J Qian1, N M Luscombe, M Gerstein.   

Abstract

Global surveys of genomes measure the usage of essential molecular parts, defined here as protein families, superfamilies or folds, in different organisms. Based on surveys of the first 20 completely sequenced genomes, we observe that the occurrence of these parts follows a power-law distribution. That is, the number of distinct parts (F) with a given genomic occurrence (V) decays as F=aV(-b), with a few parts occurring many times and most occurring infrequently. For a given organism, the distributions of families, superfamilies and folds are nearly identical, and this is reflected in the size of the decay exponent b. Moreover, the exponent varies between different organisms, with those of smaller genomes displaying a steeper decay (i.e. larger b). Clearly, the power law indicates a preference to duplicate genes that encode for molecular parts which are already common. Here, we present a minimal, but biologically meaningful model that accurately describes the observed power law. Although the model performs equally well for all three protein classes, we focus on the occurrence of folds in preference to families and superfamilies. This is because folds are comparatively insensitive to the effects of point mutations that can cause a family member to diverge beyond detectable similarity. In the model, genomes evolve through two basic operations: (i) duplication of existing genes; (ii) net flow of new genes. The flow term is closely related to the exponent b and can accommodate considerable gene loss; however, we demonstrate that the observed data is reproduced best with a net inflow, i.e. with more gene gain than loss. Moreover, we show that prokaryotes have much higher rates of gene acquisition than eukaryotes, probably reflecting lateral transfer. A further natural outcome from our model is an estimation of the fold composition of the initial genome, which potentially relates to the common ancestor for modern organisms. Supplementary material pertaining to this work is available from www.partslist.org/powerlaw. Copyright 2001 Academic Press.

Mesh:

Substances:

Year:  2001        PMID: 11697896     DOI: 10.1006/jmbi.2001.5079

Source DB:  PubMed          Journal:  J Mol Biol        ISSN: 0022-2836            Impact factor:   5.469


  87 in total

1.  Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22.

Authors:  Paul M Harrison; Hedi Hegyi; Suganthi Balasubramanian; Nicholas M Luscombe; Paul Bertone; Nathaniel Echols; Ted Johnson; Mark Gerstein
Journal:  Genome Res       Date:  2002-02       Impact factor: 9.043

2.  Structural characterization of the human proteome.

Authors:  Arne Müller; Robert M MacCallum; Michael J E Sternberg
Journal:  Genome Res       Date:  2002-11       Impact factor: 9.043

3.  Expanding protein universe and its origin from the biological Big Bang.

Authors:  Nikolay V Dokholyan; Boris Shakhnovich; Eugene I Shakhnovich
Journal:  Proc Natl Acad Sci U S A       Date:  2002-10-16       Impact factor: 11.205

4.  Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination.

Authors:  Gordana Apic; Wolfgang Huber; Sarah A Teichmann
Journal:  J Struct Funct Genomics       Date:  2003

5.  Chalcone isomerase family and fold: no longer unique to plants.

Authors:  Michael Gensheimer; Arcady Mushegian
Journal:  Protein Sci       Date:  2004-01-10       Impact factor: 6.725

6.  TopNet: a tool for comparing biological sub-networks, correlating protein properties with topological statistics.

Authors:  Haiyuan Yu; Xiaowei Zhu; Dov Greenbaum; John Karro; Mark Gerstein
Journal:  Nucleic Acids Res       Date:  2004-01-14       Impact factor: 16.971

7.  An evolutionarily structured universe of protein architecture.

Authors:  Gustavo Caetano-Anollés; Derek Caetano-Anollés
Journal:  Genome Res       Date:  2003-07       Impact factor: 9.043

8.  GenomeHistory: a software tool and its application to fully sequenced genomes.

Authors:  Gavin C Conant; Andreas Wagner
Journal:  Nucleic Acids Res       Date:  2002-08-01       Impact factor: 16.971

9.  Protein evolution within a structural space.

Authors:  Eric J Deeds; Nikolay V Dokholyan; Eugene I Shakhnovich
Journal:  Biophys J       Date:  2003-11       Impact factor: 4.033

Review 10.  A scale-free systems theory of motivation and addiction.

Authors:  R Andrew Chambers; Warren K Bickel; Marc N Potenza
Journal:  Neurosci Biobehav Rev       Date:  2007-05-03       Impact factor: 8.989

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.