Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Comparing genomes in terms of protein structure: surveys of a finite parts list.

Literature DB >> 10357579

Comparing genomes in terms of protein structure: surveys of a finite parts list.

Abstract

We give an overview of the emerging field of structural genomics, describing how genomes can be compared in terms of protein structure. As the number of genes in a genome and the total number of protein folds are both quite limited, these comparisons take the form of surveys of a finite parts list, similar in respects to demographic censuses. Fold surveys have many similarities with other whole-genome characterizations, e.g., analyses of motifs or pathways. However, structure has a number of aspects that make it particularly suitable for comparing genomes, namely the way it allows for the precise definition of a basic protein module and the fact that it has a better defined relationship to sequence similarity than does protein function. An essential requirement for a structure survey is a library of folds, which groups the known structures into 'fold families.' This library can be built up automatically using a structure comparison program, and we described how important objective statistical measures are for assessing similarities within the library and between the library and genome sequences. After building the library, one can use it to count the number of folds in genomes, expressing the results in the form of Venn diagrams and 'top-10' statistics for shared and common folds. Depending on the counting methodology employed, these statistics can reflect different aspects of the genome, such as the amount of internal duplication or gene expression. Previous analyses have shown that the common folds shared between very different microorganisms, i.e., in different kingdoms, have a remarkably similar structure, being comprised of repeated strand-helix-strand super-secondary structure units. A major difficulty with this sort of 'fold-counting' is that only a small subset of the structures in a complete genome are currently known and this subset is prone to sampling bias. One way of overcoming biases is through structure prediction, which can be applied uniformly and comprehensively to a whole genome. Various investigators have, in fact, already applied many of the existing techniques for predicting secondary structure and transmembrane (TM) helices to the recently sequenced genomes. The results have been consistent: microbial genomes have similar fractions of strands and helices even though they have significantly different amino acid composition. The fraction of membrane proteins with a given number of TM helices falls off rapidly with more TM elements, approximately according to a Zipf law. This latter finding indicates that there is no preference for the highly studied 7-TM proteins in microbial genomes. Continuously updated tables and further information pertinent to this review are available over the web at http://bioinfo.mbb.yale.edu/genome.

Mesh：

Substances：

Year: 1998 PMID： 10357579 DOI： 10.1111/j.1574-6976.1998.tb00371.x

Source DB: PubMed Journal: FEMS Microbiol Rev ISSN： 0168-6445 Impact factor: 16.408

Keyword Cloud
Cited

24 in total

1. Estimating the probability for a protein to have a new fold: A statistical computational model.

Authors: E Portugaly; M Linial
Journal: Proc Natl Acad Sci U S A Date: 2000-05-09 Impact factor: 11.205

2. SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics.

Authors: P Bertone; Y Kluger; N Lan; D Zheng; D Christendat; A Yee; A M Edwards; C H Arrowsmith; G T Montelione; M Gerstein
Journal: Nucleic Acids Res Date: 2001-07-01 Impact factor: 16.971

3. Comparing function and structure between entire proteomes.

Authors: J Liu; B Rost
Journal: Protein Sci Date: 2001-10 Impact factor: 6.725

Review 4. Ingestion-controlling network: what's language got to do with it?

Authors: Michael Myslobodsky; Richard Coppola
Journal: Rev Neurosci Date: 2010 Impact factor: 4.353

5. Proteomics of Mycoplasma genitalium: identification and characterization of unannotated and atypical proteins in a small model genome.

Authors: S Balasubramanian; T Schneider; M Gerstein; L Regan
Journal: Nucleic Acids Res Date: 2000-08-15 Impact factor: 16.971

Comparing genomes in terms of protein structure: surveys of a finite parts list.

1. Estimating the probability for a protein to have a new fold: A statistical computational model.

2. SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics.

3. Comparing function and structure between entire proteomes.

Review 4. Ingestion-controlling network: what's language got to do with it?

5. Proteomics of Mycoplasma genitalium: identification and characterization of unannotated and atypical proteins in a small model genome.

6. An evolutionarily structured universe of protein architecture.

7. Phylogeny determined by protein domain content.

8. Universal sharing patterns in proteomes and evolution of protein fold architecture and life.

9. Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world.

10. The protein-tethered lipid bilayer: a novel mimic of the biological membrane.