Literature DB >> 9889159

How representative are the known structures of the proteins in a complete genome? A comprehensive structural census.

M Gerstein1.   

Abstract

BACKGROUND: Determining how representative the known structures are of the proteins encoded by a complete genome is important for assessing to what extent our current picture of protein stability and folding is overly influenced by biases in the structure databank (PDB). It is also important for improving database-based methods of structure prediction and genome annotation.
RESULTS: The known structures are compared to the proteins encoded by eight complete microbial genomes in terms of simple statistics such as sequence length, composition and secondary structure. The known structures are represented by a collection of nonhomologous domains from the PDB and a smaller list of 'biophysical proteins' on which folding experiments have concentrated. The proteins encoded by the genomes are considered as a whole and divided into various regions, such as known-structure homologue, low complexity (nonglobular), transmembrane or linker. Various tests are performed to assess the significance of the reported differences, in both a practical and a statistical sense.
CONCLUSIONS: The proteins encoded by the genomes are significantly different from those in the PDB. Their sequence lengths, which follow an extreme value distribution, are longer than the PDB proteins and much longer than the biophysical proteins. Their composition differs from the PDB proteins in having more Lys, Ile, Asn and Gln and less Cys and Trp. This is true overall and especially for the regions corresponding to soluble proteins of as yet unknown fold. Secondary-structure prediction on these uncharacterized regions indicates that they contain on average more helical structure than the PDB; differences about this mean are small, with yeast having slightly more sheet structure and Haemophilus influenzae and Helicobacter pylori more helical structure. Further information is available through the GeneCensus system at http://bioinfo.mbb.yale.edu/genome.

Entities:  

Mesh:

Substances:

Year:  1998        PMID: 9889159     DOI: 10.1016/S1359-0278(98)00066-2

Source DB:  PubMed          Journal:  Fold Des        ISSN: 1359-0278


  37 in total

1.  Detection of protein fold similarity based on correlation of amino acid properties.

Authors:  I V Grigoriev; S H Kim
Journal:  Proc Natl Acad Sci U S A       Date:  1999-12-07       Impact factor: 11.205

2.  PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information.

Authors:  J Qian; B Stenger; C A Wilson; J Lin; R Jansen; S A Teichmann; J Park; W G Krebs; H Yu; V Alexandrov; N Echols; M Gerstein
Journal:  Nucleic Acids Res       Date:  2001-04-15       Impact factor: 16.971

3.  Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels.

Authors:  J Lin; M Gerstein
Journal:  Genome Res       Date:  2000-06       Impact factor: 9.043

4.  Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins.

Authors:  R Jansen; M Gerstein
Journal:  Nucleic Acids Res       Date:  2000-03-15       Impact factor: 16.971

5.  SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics.

Authors:  P Bertone; Y Kluger; N Lan; D Zheng; D Christendat; A Yee; A M Edwards; C H Arrowsmith; G T Montelione; M Gerstein
Journal:  Nucleic Acids Res       Date:  2001-07-01       Impact factor: 16.971

6.  An NMR approach to structural proteomics.

Authors:  Adelinda Yee; Xiaoqing Chang; Antonio Pineda-Lucena; Bin Wu; Anthony Semesi; Brian Le; Theresa Ramelot; Gregory M Lee; Sudeepa Bhattacharyya; Pablo Gutierrez; Aleksej Denisov; Chang-Hun Lee; John R Cort; Guennadi Kozlov; Jack Liao; Grzegorz Finak; Limin Chen; David Wishart; Weontae Lee; Lawrence P McIntosh; Kalle Gehring; Michael A Kennedy; Aled M Edwards; Cheryl H Arrowsmith
Journal:  Proc Natl Acad Sci U S A       Date:  2002-02-19       Impact factor: 11.205

7.  Comparing function and structure between entire proteomes.

Authors:  J Liu; B Rost
Journal:  Protein Sci       Date:  2001-10       Impact factor: 6.725

8.  Annotation transfer for genomics: measuring functional divergence in multi-domain proteins.

Authors:  H Hegyi; M Gerstein
Journal:  Genome Res       Date:  2001-10       Impact factor: 9.043

9.  Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes.

Authors:  Nathaniel Echols; Paul Harrison; Suganthi Balasubramanian; Nicholas M Luscombe; Paul Bertone; Zhaolei Zhang; Mark Gerstein
Journal:  Nucleic Acids Res       Date:  2002-06-01       Impact factor: 16.971

10.  Structural characterization of the human proteome.

Authors:  Arne Müller; Robert M MacCallum; Michael J E Sternberg
Journal:  Genome Res       Date:  2002-11       Impact factor: 9.043

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.