Literature DB >> 1304348

Selection of representative protein data sets.

U Hobohm1, M Scharf, R Schneider, C Sander.   

Abstract

The Protein Data Bank currently contains about 600 data sets of three-dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many protein pairs are identical or very similar in sequence. However, statistical analyses of protein sequence-structure relations require nonredundant data. We have developed two algorithms to extract from the data base representative sets of protein chains with maximum coverage and minimum redundancy. The first algorithm focuses on optimizing a particular property of the selected proteins and works by successive selection of proteins from an ordered list and exclusion of all neighbors of each selected protein. The other algorithm aims at maximizing the size of the selected set and works by successive thinning out of clusters of similar proteins. Both algorithms are generally applicable to other data bases in which criteria of similarity can be defined and relate to problems in graph theory. The largest nonredundant set extracted from the current release of the Protein Data Bank has 155 protein chains. In this set, no two proteins have sequence similarity higher than a certain cutoff (30% identical residues for aligned subsequences longer than 80 residues), yet all structurally unique protein families are represented. Periodically updated lists of representative data sets are available by electronic mail from the file server "netserv@embl-heidelberg.de." The selection may be useful in statistical approaches to protein folding as well as in the analysis and documentation of the known spectrum of three-dimensional protein structures.

Mesh:

Substances:

Year:  1992        PMID: 1304348      PMCID: PMC2142204          DOI: 10.1002/pro.5560010313

Source DB:  PubMed          Journal:  Protein Sci        ISSN: 0961-8368            Impact factor:   6.725


  13 in total

1.  The SWISS-PROT protein sequence data bank.

Authors:  A Bairoch; B Boeckmann
Journal:  Nucleic Acids Res       Date:  1991-04-25       Impact factor: 16.971

2.  A rapid method of protein structure alignment.

Authors:  C A Orengo; W R Taylor
Journal:  J Theor Biol       Date:  1990-12-21       Impact factor: 2.691

3.  Database of homology-derived protein structures and the structural meaning of sequence alignment.

Authors:  C Sander; R Schneider
Journal:  Proteins       Date:  1991

4.  Amino acid similarity coefficients for protein modeling and sequence alignment derived from main-chain folding angles.

Authors:  K Niefind; D Schomburg
Journal:  J Mol Biol       Date:  1991-06-05       Impact factor: 5.469

5.  Side-chain clusters in protein structures and their role in protein folding.

Authors:  J Heringa; P Argos
Journal:  J Mol Biol       Date:  1991-07-05       Impact factor: 5.469

6.  Detection of common three-dimensional substructures in proteins.

Authors:  G Vriend; C Sander
Journal:  Proteins       Date:  1991

7.  The Protein Data Bank: a computer-based archival file for macromolecular structures.

Authors:  F C Bernstein; T F Koetzle; G J Williams; E F Meyer; M D Brice; J R Rodgers; O Kennard; T Shimanouchi; M Tasumi
Journal:  J Mol Biol       Date:  1977-05-25       Impact factor: 5.469

8.  A 3D building blocks approach to analyzing and predicting structure of proteins.

Authors:  R Unger; D Harel; S Wherland; J L Sussman
Journal:  Proteins       Date:  1989

9.  Identification of predictive sequence motifs limited by protein structure data base size.

Authors:  M J Rooman; S J Wodak
Journal:  Nature       Date:  1988-09-01       Impact factor: 49.962

10.  Identification of common molecular subsequences.

Authors:  T F Smith; M S Waterman
Journal:  J Mol Biol       Date:  1981-03-25       Impact factor: 5.469

View more
  216 in total

1.  The ASTRAL compendium for protein structure and sequence analysis.

Authors:  S E Brenner; P Koehl; M Levitt
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Diversity of functions of proteins with internal symmetry in spatial arrangement of secondary structural elements.

Authors:  K Kinoshita; A Kidera; N Go
Journal:  Protein Sci       Date:  1999-06       Impact factor: 6.725

3.  Generation of deviation parameters for amino acid singlets, doublets and triplets from three-dimentional structures of proteins and its implications for secondary structure prediction from amino acid sequences.

Authors:  S A Mugilan; K Veluraja
Journal:  J Biosci       Date:  2000-03       Impact factor: 1.826

4.  Associative memory hamiltonians for structure prediction without homology: alpha-helical proteins.

Authors:  C Hardin; M P Eastwood; Z Luthey-Schulten; P G Wolynes
Journal:  Proc Natl Acad Sci U S A       Date:  2000-12-19       Impact factor: 11.205

5.  Chloroplast transit peptide prediction: a peek inside the black box.

Authors:  A I Schein; J C Kissinger; L H Ungar
Journal:  Nucleic Acids Res       Date:  2001-08-15       Impact factor: 16.971

6.  BETAWRAP: successful prediction of parallel beta -helices from primary sequence reveals an association with many microbial pathogens.

Authors:  P Bradley; L Cowen; M Menke; J King; B Berger
Journal:  Proc Natl Acad Sci U S A       Date:  2001-12-18       Impact factor: 11.205

7.  Classification of protein disulphide-bridge topologies.

Authors:  J M Mas; P Aloy; M A Martí-Renom; B Oliva; R de Llorens; F X Avilés; E Querol
Journal:  J Comput Aided Mol Des       Date:  2001-05       Impact factor: 3.686

8.  ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites.

Authors:  O Emanuelsson; H Nielsen; G von Heijne
Journal:  Protein Sci       Date:  1999-05       Impact factor: 6.725

9.  Improved amino acid flexibility parameters.

Authors:  David K Smith; Predrag Radivojac; Zoran Obradovic; A Keith Dunker; Guang Zhu
Journal:  Protein Sci       Date:  2003-05       Impact factor: 6.725

10.  Discovery of a significant, nontopological preference for antiparallel alignment of helices with parallel regions in sheets.

Authors:  Brandon M Hespenheide; Leslie A Kuhn
Journal:  Protein Sci       Date:  2003-05       Impact factor: 6.725

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.