Literature DB >> 8771179

A 3D sequence-independent representation of the protein data bank.

D Fischer1, C J Tsai, R Nussinov, H Wolfson.   

Abstract

Here we address the following questions. How many structurally different entries are there in the Protein Data Bank (PDB)? How do the proteins populate the structural universe? To investigate these questions a structurally non-redundant set of representative entries was selected from the PDB. Construction of such a dataset is not trivial: (i) the considerable size of the PDB requires a large number of comparisons (there were more than 3250 structures of protein chains available in May 1994); (ii) the PDB is highly redundant, containing many structurally similar entries, not necessarily with significant sequence homology, and (iii) there is no clear-cut definition of structural similarity. The latter depend on the criteria and methods used. Here, we analyze structural similarity ignoring protein topology. To date, representative sets have been selected either by hand, by sequence comparison techniques which ignore the three-dimensional (3D) structures of the proteins or by using sequence comparisons followed by linear structural comparison (i.e. the topology, or the sequential order of the chains, is enforced in the structural comparison). Here we describe a 3D sequence-independent automated and efficient method to obtain a representative set of protein molecules from the PDB which contains all unique structures and which is structurally non-redundant. The method has two novel features. The first is the use of strictly structural criteria in the selection process without taking into account the sequence information. To this end we employ a fast structural comparison algorithm which requires on average approximately 2 s per pairwise comparison on a workstation. The second novel feature is the iterative application of a heuristic clustering algorithm that greatly reduces the number of comparisons required. We obtain a representative set of 220 chains with resolution better than 3.0 A, or 268 chains including lower resolution entries, NMR entries and models. The resulting set can serve as a basis for extensive structural classification and studies of 3D recurring motifs and of sequence-structure relationships. The clustering algorithm succeeds in classifying into the same structural family chains with no significant sequence homology, e.g. all the globins in one single group, all the trypsin-like serine proteases in another or all the immunoglobulin-like folds into a third. In addition, unexpected structural similarities of interest have been automatically detected between pairs of chains. A cluster analysis of the representative structures demonstrates the way the "structural universe' is populated.

Mesh:

Substances:

Year:  1995        PMID: 8771179     DOI: 10.1093/protein/8.10.981

Source DB:  PubMed          Journal:  Protein Eng        ISSN: 0269-2139


  7 in total

1.  Protein fold recognition using sequence-derived predictions.

Authors:  D Fischer; D Eisenberg
Journal:  Protein Sci       Date:  1996-05       Impact factor: 6.725

2.  The structural alignment between two proteins: is there a unique answer?

Authors:  A Godzik
Journal:  Protein Sci       Date:  1996-07       Impact factor: 6.725

3.  Seeking an ancient enzyme in Methanococcus jannaschii using ORF, a program based on predicted secondary structure comparisons.

Authors:  R Aurora; G D Rose
Journal:  Proc Natl Acad Sci U S A       Date:  1998-03-17       Impact factor: 11.205

4.  Identification of cooperative folding units in a set of native proteins.

Authors:  A Wallqvist; G W Smythers; D G Covell
Journal:  Protein Sci       Date:  1997-08       Impact factor: 6.725

Review 5.  Structural motifs at protein-protein interfaces: protein cores versus two-state and three-state model complexes.

Authors:  C J Tsai; D Xu; R Nussinov
Journal:  Protein Sci       Date:  1997-09       Impact factor: 6.725

6.  Detecting distant relatives of mammalian LPS-binding and lipid transport proteins.

Authors:  L J Beamer; D Fischer; D Eisenberg
Journal:  Protein Sci       Date:  1998-07       Impact factor: 6.725

7.  From the similarity analysis of protein cavities to the functional classification of protein families using cavbase.

Authors:  Daniel Kuhn; Nils Weskamp; Stefan Schmitt; Eyke Hüllermeier; Gerhard Klebe
Journal:  J Mol Biol       Date:  2006-04-25       Impact factor: 5.469

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.