Literature DB >> 15980462

Fragnostic: walking through protein structure space.

Iddo Friedberg1, Adam Godzik.   

Abstract

The Fragnostic (http://ffas.burnham.org/Fragnostic) web tool implements a novel and useful view of protein structure space. We mined a non-redundant subset of the PDB for common fragments shared between proteins inhabiting different SCOP folds. Subsequently, we formulated an inter-fold similarity measure based on fragment sharing. Fold space is described as a graph whose nodes are folds between which the edges are drawn depending on the extent of fragment sharing. In this fashion, Fragnostic helps discover meaningful relationships between proteins belonging to different folds, based on sharing similar fragments in the proteins comprising those folds. Distant fold similarity information is supplemented by annotations taken from Gene Ontology, SCOP and CATH. Overall, Fragnostic is a tool which helps discover structural and functional relationships between proteins which are distantly related or seemingly unrelated.

Entities:  

Mesh:

Year:  2005        PMID: 15980462      PMCID: PMC1160124          DOI: 10.1093/nar/gki363

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


BACKGROUND

The two popular protein classification schemes, CATH (1) and SCOP (2), partition the protein structure universe hierarchically, proceeding from coarse-grained to fine-grained partitions. The initial, coarse-grained partitioning of structure space is based on the secondary structure content. Because there are two well-ordered secondary structure elements, we have four possible classes as the topmost partitioning rank in those databases (SCOP and CATH actually use a few more, ad hoc classes). Classes are then more finely partitioned into folds (SCOP) or topologies (CATH), based on manual assignment. There may be between 100 and 200 folds per class. We know from experience that many proteins which are assigned to different folds share a structural/functional similarity. When proteins are categorically assigned to different folds, we lose important information about possible similarities between individual proteins assigned to different folds. Furthermore, because fold assignment is manual and sometimes arbitrary, there are cases where a fold–fold similarity between proteins inhabiting two different folds is glaringly obvious. These anomalies arise from the categorical assignment of proteins in a hierarchical classification scheme. We named the gap between the few classes and the many folds the ‘granularity gap’. This granularity gap acts as a barrier preventing us from seeing obvious and not-so-obvious similarities between proteins from different folds, as was elaborated upon in studies conducted by Harrison et al. (3) and Choi et al. (4).

BRIDGING THE GRANULARITY GAP

One way of bridging the granularity gap is to re-establish the relationships between fold populations using similarities in a sub-domain level. We have chosen to address this problem using short fragments shared between populations of proteins in different folds. In another place (I. Friedberg and A. Godzik, submitted for publication) we describe in detail the generation and analysis of a fragment dataset. Briefly, we used a non-redundant set of solved structures, PDB-SELECT25 (5), to generate a dataset of 2.5 × 107 fragments of lengths 5, 10, 15 and 20 residues. Fragments were generated using a sliding window along each protein's sequence. Those fragments were aligned using FFAS03 (6), a sensitive profile–profile alignment program. The high scoring profile-based alignments were then screened by aligning them structurally, and only the alignments with a C-α RMSD of ≤1 Å were retained. After this two-step screening process, we had a dataset of 1.25 × 105 fragment pairs. The fragments were derived without any assumptions regarding their secondary structure content, an ‘agnostic’ approach; hence, ‘Fragnostic’. We proceeded to implement a distance measure between folds, fragment based fold similarity (FBFS), based on fragment sharing. Having FBFS as a distance measure, we generated four weighted graphs, using fragment lengths of 5, 10, 15 and 20 residues. Each vertex represents a population of PDB-SELECT25 proteins in a given fold. Two vertices may be connected by a weighted edge, with the weight determined by the FBFS score. Given n folds, indexed (1, …, n). Each fold will have a set of fragments shared with other folds: (X1, X2, …, X). X being the set of all fragment pairs which are shared in fold i. |X| is the number of those pairs. X is the set of all fragment pairs shared between fold i and fold j and |X| is a number of such pairs. FBFS is then defined as follows:

IMPLEMENTATION

The Fragnostic web tool lets the user examine the relationship between fold populations, based on the graph representation outlined above. The user enters a fragment length, an FBFS threshold level and a number of shared fragments threshold level. The latter was entered to correct a positive bias which may exist in the case of folds with small populations. Fragnostic then generates a graph. Each vertex is shown as a circle, color-coded according to the SCOP class. The SCOP concise classification scheme code (SCCS) is shown in the vertex. SCCS is a four-position code assigned by SCOP to a family, with the first position (a letter) denoting the class, the second the fold, the third the superfamily and the fourth the family. Positions 2–4 of the SCCS are numbers, e.g a.4.3.23. As each vertex is composed of a population of proteins with a common fold, only the first two positions of the SCCS are shown (a.4). Placing the cursor over the vertex will show its fold's SCOP-assigned title. Two vertices are connected by an edge if the FBFS score between the two connected vertices is higher than the threshold provided by the user. Figure 1 shows a part of such a graph. Clicking on a vertex will display a table showing the SCOP domains from PDB-SELECT25 which belong to the vertex's fold. The table entry is linked to a 3D model of that domain, viewed using the Rasmol program (7). The model is displayed as a cartoon, and the regions which are covered by fragments shared with other folds are colored. Colors range from blue to red, the ‘hotter’ (redder in spectrum) the color, the more fragments are shared in that region with other folds (Figure 2). Using Rasmol—a simple yet powerful protein visualization tool—the user can further analyze the protein. The page is linked to the folds which are connected to the current one and to their connecting edges (see below). Clicking on an edge will produce a table of all the fragment alignments making up this edge. Whenever so annotated, a table entry will have Gene Ontology (GO) (8) terms associated with it, and/or Enzyme Commission (EC) classification number. The GO terms were taken from the PDB to GO mapping provided by The European Bioinformatics Institute (EBI). There may be multiple mappings between the chains and GO terms. This is because some protein chains have multiple functions, participate in more than one metabolic pathway, or are found in more than one cellular compartment. Care was taken, however, not to enter two GO terms when one clearly subsumes the other in an ‘is-a’ relationship. Thus if the term ‘phosphodiesterase’ appears associated with a given chain, ‘esterase’ will not be mentioned.
Figure 1

Part of a Fragnostic graph for fragment length 10, FBFS threshold of 0.2 and number of fragment threshold of 1. Circles are the SCOP fold populations, color coded according to SCOP class. Red, all alpha; blue, all beta; orange, alpha/beta; green, alpha + beta; and purple, small.

Figure 2

Coagulation factor X, light chain (PDB: 1FAX:L), which belongs to the knottins SCOP fold. The non-white areas are composed of length-10 fragments, shared with other folds.

CONCLUSIONS

We present Fragnostic as novel method for walking through protein structure space. Rather than replacing SCOP with a new classification, it complements the existing classification by showing connections between known SCOP folds. Fragnostic is a powerful tool for revealing hidden inter-fold connections based on shared fragments. Fragnostic is also suitable for confirming hypotheses of structural or functional connections between proteins from different folds. In the future, we aim to permit querying using any SCOP entry, not only those in PDB-SELECT25. We are currently developing a fragment-to-structure matching method, so that the fragments—or rather a clustered library thereof—can be used as a structural motif library. Fragnostic was written using Zope (zope.org) for web content management and GraphViz (AT&T Laboratories) for displaying the graphs. The fragment dataset and associated information were generated using Biopython (biopython.org) and are maintained in a MySQL (MySQL AB) database.
  8 in total

1.  Comparison of sequence profiles. Strategies for structural predictions using sequence information.

Authors:  L Rychlewski; L Jaroszewski; W Li; A Godzik
Journal:  Protein Sci       Date:  2000-02       Impact factor: 6.725

2.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

3.  Quantifying the similarities within fold space.

Authors:  Andrew Harrison; Frances Pearl; Richard Mott; Janet Thornton; Christine Orengo
Journal:  J Mol Biol       Date:  2002-11-08       Impact factor: 5.469

4.  Local feature frequency profile: a method to measure structural similarity in proteins.

Authors:  In-Geol Choi; Jaimyoung Kwon; Sung-Hou Kim
Journal:  Proc Natl Acad Sci U S A       Date:  2004-02-25       Impact factor: 11.205

5.  CATH--a hierarchic classification of protein domain structures.

Authors:  C A Orengo; A D Michie; S Jones; D T Jones; M B Swindells; J M Thornton
Journal:  Structure       Date:  1997-08-15       Impact factor: 5.006

6.  RASMOL: biomolecular graphics for all.

Authors:  R A Sayle; E J Milner-White
Journal:  Trends Biochem Sci       Date:  1995-09       Impact factor: 13.807

7.  SCOP: a structural classification of proteins database for the investigation of sequences and structures.

Authors:  A G Murzin; S E Brenner; T Hubbard; C Chothia
Journal:  J Mol Biol       Date:  1995-04-07       Impact factor: 5.469

8.  Enlarged representative set of protein structures.

Authors:  U Hobohm; C Sander
Journal:  Protein Sci       Date:  1994-03       Impact factor: 6.725

  8 in total
  9 in total

1.  Protein interface conservation across structure space.

Authors:  Qiangfeng Cliff Zhang; Donald Petrey; Raquel Norel; Barry H Honig
Journal:  Proc Natl Acad Sci U S A       Date:  2010-06-01       Impact factor: 11.205

Review 2.  Nothing about protein structure classification makes sense except in the light of evolution.

Authors:  Ruben E Valas; Song Yang; Philip E Bourne
Journal:  Curr Opin Struct Biol       Date:  2009-04-24       Impact factor: 6.809

3.  FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately.

Authors:  Inbal Budowski-Tal; Yuval Nov; Rachel Kolodny
Journal:  Proc Natl Acad Sci U S A       Date:  2010-02-03       Impact factor: 11.205

4.  The CATH hierarchy revisited-structural divergence in domain superfamilies and the continuity of fold space.

Authors:  Alison Cuff; Oliver C Redfern; Lesley Greene; Ian Sillitoe; Tony Lewis; Mark Dibley; Adam Reid; Frances Pearl; Tim Dallman; Annabel Todd; Richard Garratt; Janet Thornton; Christine Orengo
Journal:  Structure       Date:  2009-08-12       Impact factor: 5.006

5.  Identification of recurring protein structure microenvironments and discovery of novel functional sites around CYS residues.

Authors:  Shirley Wu; Tianyun Liu; Russ B Altman
Journal:  BMC Struct Biol       Date:  2010-02-02

Review 6.  Discrete-continuous duality of protein structure space.

Authors:  Ruslan I Sadreyev; Bong-Hyun Kim; Nick V Grishin
Journal:  Curr Opin Struct Biol       Date:  2009-05-29       Impact factor: 6.809

7.  Using structure to explore the sequence alignment space of remote homologs.

Authors:  Andrew Kuziemko; Barry Honig; Donald Petrey
Journal:  PLoS Comput Biol       Date:  2011-10-06       Impact factor: 4.475

8.  Global organization of a binding site network gives insight into evolution and structure-function relationships of proteins.

Authors:  Juyong Lee; Janez Konc; Dušanka Janežič; Bernard R Brooks
Journal:  Sci Rep       Date:  2017-09-14       Impact factor: 4.379

9.  Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis.

Authors:  Gergely Csaba; Fabian Birzele; Ralf Zimmer
Journal:  BMC Struct Biol       Date:  2009-04-17
  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.