Literature DB >> 10357579

Comparing genomes in terms of protein structure: surveys of a finite parts list.

M Gerstein1, H Hegyi.   

Abstract

We give an overview of the emerging field of structural genomics, describing how genomes can be compared in terms of protein structure. As the number of genes in a genome and the total number of protein folds are both quite limited, these comparisons take the form of surveys of a finite parts list, similar in respects to demographic censuses. Fold surveys have many similarities with other whole-genome characterizations, e.g., analyses of motifs or pathways. However, structure has a number of aspects that make it particularly suitable for comparing genomes, namely the way it allows for the precise definition of a basic protein module and the fact that it has a better defined relationship to sequence similarity than does protein function. An essential requirement for a structure survey is a library of folds, which groups the known structures into 'fold families.' This library can be built up automatically using a structure comparison program, and we described how important objective statistical measures are for assessing similarities within the library and between the library and genome sequences. After building the library, one can use it to count the number of folds in genomes, expressing the results in the form of Venn diagrams and 'top-10' statistics for shared and common folds. Depending on the counting methodology employed, these statistics can reflect different aspects of the genome, such as the amount of internal duplication or gene expression. Previous analyses have shown that the common folds shared between very different microorganisms, i.e., in different kingdoms, have a remarkably similar structure, being comprised of repeated strand-helix-strand super-secondary structure units. A major difficulty with this sort of 'fold-counting' is that only a small subset of the structures in a complete genome are currently known and this subset is prone to sampling bias. One way of overcoming biases is through structure prediction, which can be applied uniformly and comprehensively to a whole genome. Various investigators have, in fact, already applied many of the existing techniques for predicting secondary structure and transmembrane (TM) helices to the recently sequenced genomes. The results have been consistent: microbial genomes have similar fractions of strands and helices even though they have significantly different amino acid composition. The fraction of membrane proteins with a given number of TM helices falls off rapidly with more TM elements, approximately according to a Zipf law. This latter finding indicates that there is no preference for the highly studied 7-TM proteins in microbial genomes. Continuously updated tables and further information pertinent to this review are available over the web at http://bioinfo.mbb.yale.edu/genome.

Mesh:

Substances:

Year:  1998        PMID: 10357579     DOI: 10.1111/j.1574-6976.1998.tb00371.x

Source DB:  PubMed          Journal:  FEMS Microbiol Rev        ISSN: 0168-6445            Impact factor:   16.408


  24 in total

1.  Estimating the probability for a protein to have a new fold: A statistical computational model.

Authors:  E Portugaly; M Linial
Journal:  Proc Natl Acad Sci U S A       Date:  2000-05-09       Impact factor: 11.205

2.  SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics.

Authors:  P Bertone; Y Kluger; N Lan; D Zheng; D Christendat; A Yee; A M Edwards; C H Arrowsmith; G T Montelione; M Gerstein
Journal:  Nucleic Acids Res       Date:  2001-07-01       Impact factor: 16.971

3.  Comparing function and structure between entire proteomes.

Authors:  J Liu; B Rost
Journal:  Protein Sci       Date:  2001-10       Impact factor: 6.725

Review 4.  Ingestion-controlling network: what's language got to do with it?

Authors:  Michael Myslobodsky; Richard Coppola
Journal:  Rev Neurosci       Date:  2010       Impact factor: 4.353

5.  Proteomics of Mycoplasma genitalium: identification and characterization of unannotated and atypical proteins in a small model genome.

Authors:  S Balasubramanian; T Schneider; M Gerstein; L Regan
Journal:  Nucleic Acids Res       Date:  2000-08-15       Impact factor: 16.971

6.  An evolutionarily structured universe of protein architecture.

Authors:  Gustavo Caetano-Anollés; Derek Caetano-Anollés
Journal:  Genome Res       Date:  2003-07       Impact factor: 9.043

7.  Phylogeny determined by protein domain content.

Authors:  Song Yang; Russell F Doolittle; Philip E Bourne
Journal:  Proc Natl Acad Sci U S A       Date:  2005-01-03       Impact factor: 11.205

8.  Universal sharing patterns in proteomes and evolution of protein fold architecture and life.

Authors:  Gustavo Caetano-Anollés; Derek Caetano-Anollés
Journal:  J Mol Evol       Date:  2005-04       Impact factor: 2.395

9.  Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world.

Authors:  Minglei Wang; Liudmila S Yafremava; Derek Caetano-Anollés; Jay E Mittenthal; Gustavo Caetano-Anollés
Journal:  Genome Res       Date:  2007-10-01       Impact factor: 9.043

10.  The protein-tethered lipid bilayer: a novel mimic of the biological membrane.

Authors:  Frank Giess; Marcel G Friedrich; Joachim Heberle; Renate L Naumann; Wolfgang Knoll
Journal:  Biophys J       Date:  2004-08-31       Impact factor: 4.033

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.