Literature DB >> 15980442

pdbFun: mass selection and fast comparison of annotated PDB residues.

Gabriele Ausiello¹, Andreas Zanzoni, Daniele Peluso, Allegra Via, Manuela Helmer-Citterich.

Abstract

pdbFun (http://pdbfun.uniroma2.it) is a web server for structural and functional analysis of proteins at the residue level. pdbFun gives fast access to the whole Protein Data Bank (PDB) organized as a database of annotated residues. The available data (features) range from solvent exposure to ligand binding ability, location in a protein cavity, secondary structure, residue type, sequence functional pattern, protein domain and catalytic activity. Users can select any residue subset (even including any number of PDB structures) by combining the available features. Selections can be used as probe and target in multiple structure comparison searches. For example a search could involve, as a query, all solvent-exposed, hydrophylic residues that are not in alpha-helices and are involved in nucleotide binding. Possible examples of targets are represented by another selection, a single structure or a dataset composed of many structures. The output is a list of aligned structural matches offered in tabular and also graphical format.

Entities: Chemical

Mesh：

Substances：
Amino Acids
Proteins

Year: 2005 PMID： 15980442 PMCID： PMC1160259 DOI： 10.1093/nar/gki499

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Structural genomics projects (1) and the improvement of experimental techniques for structural analysis enrich the Protein Data Bank (PDB) (2) with structural data of very high quality and reliability. Nevertheless, few complete resources are available for analysing the connections between structural features and molecular functions that lie hidden in this huge amount of data. We identified some important characteristics that may be considered in the design and construction of a complete resource for establishing structure–function links: (i) presence of integrated data (number and type of different considered databases); (ii) level of the data integration detail (i.e. structure, domain, residue and atom); (iii) level of integration between data and computational tool(s) in the resource and (iv) wholeness (the quantity of data that can be analysed at the same time). In the perspective described here, we propose pdbFun as a fast and user-friendly integrated web server for structural analysis of local similarities among proteins. pdbFun collects annotations derived from different databases (data integration), maps them onto single residues (good level of integration detail) and runs a local structural comparison algorithm on the selected residues (data/method integration). Queries and comparisons are allowed on any sets of annotations or residues, even including the entire PDB (wholeness). Data integration can provide consistent advantages in the analysis of protein structures, as demonstrated and exemplified by PDBSUM (3), a database providing a vast amount of information on the PDB entries. At present, the huge MSD project (4), merging all main databases with the PDB, represents the best implementation of this concept. Data integration can operate at different levels. Large volumes of data about protein structure and function are currently available in the biologically relevant databases. Such data can be integrated at the protein level. More effectively, for a focus on molecular function, they can be mapped onto protein residues. Data integration at the residue level is exemplified by the possibility of querying for solvent-exposed amino acids located in the alpha-helices of a protein structure. This feature has already been used in the SURFACE database (5) and has now been extended by MSDmine (unpublished resource, ). Integration between data and one or more computational methods is a fundamental task. Such a task is achieved in tools where simple or complex selections of the integrated data can be built and straightforwardly used as input to an embedded method (i.e. running a comparison program only on proteins sharing a specified function). The last important property for a complete structural analysis tool is its being able to consider vast amounts of data at the same time, i.e. its wholeness or ability to work as a high-throughput resource. Queries can be formulated with more or even all the available data. A user may choose to focus on all proteins belonging to a specified SCOP class or to select all the tryptophan residues in the whole PDB catalytic sites.

Overview

pdbFun is an integrated web tool for querying the PDB at the residue level and for local structural comparison. pdbFun integrates knowledge about single residues in protein structures from other databases or calculated with available instruments or instruments developed in-house for structural analysis. Each set of different annotations represents a feature. Typical features are secondary structure assignments or SMART domains (6), whose annotations are the H/T/E assignments or domain families, respectively, reported at the residue level. The user can build simple residue selections by including any number of annotations from a single feature, e.g. all residues belonging to any of three different SMART domains. The selections can be combined recursively to create more complex ones. The user is allowed to choose only the β-strand or turn residues of the previous three domains. Each selection can be manually refined by adding and removing single residues. Structural similarity can be searched between any pair of selections. All comparisons and queries are performed in real time with a fast program (Ausiello,G., Via,A. and Helmer-Citterich,M., manuscript submitted) running on the web server.

Features

The different features currently available are shown on, and can be accessed from, the homepage. The user can start creating one residue selection by choosing any one of the following (Figure 1):

Figure 1

A Selection table is shown. The user has created five selections: Selection 1, all PROSITE residues with the ATP keyword in the pattern description (using the motifs feature); Selection 2, all charged residues in the PDB (D, E, H, K and R in the residues feature); Selection 3, all exposed residues (surface feature); Selection 4, all charged residues in the selected motifs (Selection 1 INTERSECT Selection 2); Selection 5, all charged residues in the selected motifs that are not solvent-exposed (Selection 4 SUBTRACT Selection 3). The estimated time for comparing the first chain (see text) of Selection 5 as query and Selection 3 as target is 18 s.

Structures. All residues belonging to one or many PDB structures can be selected, up to and including the whole database. Chains. All residues belonging to one or more chains can be selected. Lists of non-redundant PDB chains are available here as pre-calculated selections. Surfaces. Residues can be selected according to their solvent-exposed or buried status given by the NACCESS program (7). Clefts. The SURFNET program (8) is used to assign surface residues to protein cavities. Cavities are sorted by size (number 1 refers to the biggest). Domains. Residues belonging to domains are annotated here using HMMER (9) on the SMART database. Two-dimensional structures. Each residue is associated with the secondary structure assignment provided by the dssp (10) program. (E: extended strand; H: alpha-helix; T: hydrogen bonded turn, etc.). Motifs. PROSITE patterns (11) as found on the sequences of the PDB chains. Binding sites. Users can select residues whose distance is <3.5 Å from any ligand molecule present in the PDB. Choosing ATP or ADP selects all residues found at a distance closer than the defined threshold from the ATP or ADP nucleotides. Active sites. Active site residues in a set of enzyme structures obtained from the CatRes database (12). Residues. The 20 residue types [from A (alanine) to W (tryptophan)]. This feature helps the user to concentrate only on some kinds of residues, while ignoring all the others (i.e. all charged residues or aromatic residues).

Annotations

By selecting a feature from the pdbFun main page, the user accesses the annotation page where single annotations of that particular feature can be chosen to create a simple selection of residues. The total number of selected residues corresponds to the sum of all the residues selected by a single annotation. We describe in detail the Motifs feature page. In the Motifs page, all PROSITE patterns are listed and represent the annotations. Fields duplicated locally are the pattern ‘id’, ‘name’ and ‘short description’. In addition, a ‘residues’ field indicates the number of annotated residues in the whole PDB. A ‘chains’ field indicates the number of chains containing at least one of the annotated residues. In order to facilitate searching, the annotations are organized in pages and can be sorted by any field. Annotations (i.e. specific PROSITE motifs) can be added to the current selection in various ways: manually (using check-boxes), by text search (only the selected field will be searched) or by uploading a user flat file containing a list of PROSITE codes. All the features available in pdbFun share identical organization. New features can therefore be added and annotations handled without the need to modify the code. Let us take as an example how to select all PDB residues matching any of the PROSITE motifs involving ATP. In the Motifs page, sort the annotations by the ‘description’ field by clicking on the column header. Type ‘ATP’ in the search box (the search will be automatically conducted on the sorted field) and press the search button. All the 18 PROSITE motifs containing ‘ATP’ in the description are selected, and the user can go back to the main page and find the selection described as a row on the pdbFun main page.

Simple selections

Whenever a selection is made, pdbFun stores it as a row in a Selection table that can be visualized by going back to the main page. Each selection is identified by a unique name, by a type (the feature used to generate it), by the number of annotations selected in the feature and by the total number of chains and residues in the PDB that have been selected. New selections can be created by choosing one of the features available in the upper part of the screen. Existing selections can be accessed and modified via the ‘annotations’ field. For example, see Figure 1. The selection created in the previous example now appears in the Selection table as ‘Selection 1’. The ‘feature type’ field is ‘motifs’. The number of annotations selected is 18 (the 18 PROSITE patterns whose description contains the ATP word). Such patterns have been found on 2952 different PDB chains and comprise a total of 31 801 residues.

Combining selections

All selections can be combined using the AND, OR and NOT boolean operators. The result is a new selection containing a combination of their residues. The two selections to combine are chosen with the ‘probe’ and ‘target’ radio buttons. Applying the ‘Intersect’ (AND) on Selections 1 and 2 (see Figure 1) creates a new selection including only the common residues (e.g. the PDB proline residues that are found in alpha-helices), whereas using ‘Add’ (OR) the two selections will be merged (e.g. all residues that are in a big surface cleft ‘or’ belong to some active site). The ‘Subtract’ (NOT) is also a binary operator and needs to be understood as an ‘AND’ between the probe and the complement of the target (e.g. all the charged residues which are ‘not’ exposed). Each selection created can be, recursively, the object of a new combination. The ‘residues’ and ‘chains’ columns of the Selection table contain useful statistical information on the PDB residues’ composition. Questions such as ‘How many charged residues are buried in the whole PDB, or in a certain type of domain?’ can be answered in a fraction of a second.

Structural comparison

Selections can be chosen as probe and target of a structural comparison procedure to find local similarities in residues’ spatial arrangements. The selected residues in each chain of the probe are searched against the selected residues in each chain of the target. The comparison algorithm is guaranteed to find the largest subset of matching residues between two structures. The matching condition is an RMSD (root mean square difference) <0.7 Å and a residue similarity >1.3 according to the Dayhoff substitution matrix. The algorithm is exhaustive, fast and sequence and fold independent. All the probe (but not the target) residues must belong to a single PDB chain (if the probe is a multi-chain selection, only the first chain will be compared by default). Comparisons stop when a match is found comprising at least 10 residues. As soon as a new probe or target is chosen, an estimate of the comparison execution time is given at the bottom right of the screen.

Comparison results

When a comparison is started, the user is redirected to the Results page. Here new matches are immediately displayed as they are calculated. Matches are sorted by decreasing score and are displayed in pages. The probe chain matching residues are listed in the first column of the Results table. Each target chain is shown in a different column, together with the match length. Target residues are listed in the same rows as the probe residues to which they are structurally aligned (see Figure 2). Columns can be selected for a graphical view of the match in single or multiple alignment using a Java applet. A text file containing the results of the comparison is available for downloading.

Figure 2

The first Results page of a comparison. A manual selection of 5p21 (ras protein) residues involved in GTP binding was compared with the ∼5500 chains of a non-redundant PDB (50%). The output is shown in tabular and also graphic format. In the first column of the table, the matching residues of the query PDB chain are reported; in the adjacent columns, the other PDB chains follow, and the residues aligned in three dimensions appear in the same rows. The matched PDB chains are reported in the first row; the number of matched residues in the second one. Matching residues are also displayed upon selection (pressing on the ‘draw’ button) with a Java applet.

Manual selections

pdbFun allows the user to perform a manual selection of residues on a single PDB chain, according to his/her interest or personal knowledge (and not only by using the features calculated or extracted from pre-existing databases). Through the ‘chains’ field in the Selection table, the user accesses a page where he/she can choose the chain to work with manually. All the residues in the chain of interest will appear as a list, together with the available annotations. Sets of single residues can be chosen. A simple Java applet helps the user in selections. This selection appears in the Selection table as ‘manual selection’.

Non-redundant PDB sets

Non-redundant datasets of chains obtained from the PDB (2) at different (90, 70, 50 and 30%) redundancies are available and can be used to generate non-redundant selections of chains or as target datasets. These sets can be selected from the Chains feature page and modified manually or left as they are.

Implementation notes

In order to achieve high speed and a high level of interactivity, all residue data are stored in the server memory. A single C program executes both fast queries and structural comparisons, and a relational database is used only for the storage of the feature annotations list and for web users management. All selections can be run in a fraction of a second. Comparison times range from fractions of a second to minutes. No time limit is given to users (but a newly submitted job stops the running one). Web pages have been tested on the main browsers for the Windows and Linux platforms. Mac users should utilize Safari ≥1.2.

Future directions

Major future developments involve the addition of new features. Features in preparation are residue conservation derived by HSSP (13), presence in structural fold derived by CATH (14), user-defined sequence regular expressions and proximity of residues. Finally, to further improve the quality of integration among different data sources, part of the MSD data collection could be used. Upload of user structures will be made possible and statistical significance of the matches introduced.

13 in total

1. Assigning genomic sequences to CATH.

Authors: F M Pearl; D Lee; J E Bray; I Sillitoe; A E Todd; A P Harrison; J M Thornton; C A Orengo
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

Review 2. Structural genomics and its importance for gene function analysis.

Authors: J Skolnick; J S Fetrow; A Kolinski
Journal: Nat Biotechnol Date: 2000-03 Impact factor: 54.908

3. Analysis of catalytic residues in enzyme active sites.

Authors: Gail J Bartlett; Craig T Porter; Neera Borkakoti; Janet M Thornton
Journal: J Mol Biol Date: 2002-11-15 Impact factor: 5.469

4. SMART 4.0: towards genomic data integration.

Authors: Ivica Letunic; Richard R Copley; Steffen Schmidt; Francesca D Ciccarelli; Tobias Doerks; Jörg Schultz; Chris P Ponting; Peer Bork
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

5. Recent improvements to the PROSITE database.

Authors: Nicolas Hulo; Christian J A Sigrist; Virginie Le Saux; Petra S Langendijk-Genevaux; Lorenza Bordoli; Alexandre Gattiker; Edouard De Castro; Philipp Bucher; Amos Bairoch
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

6. Database of homology-derived protein structures and the structural meaning of sequence alignment.

Authors: C Sander; R Schneider
Journal: Proteins Date: 1991

Review 7. Profile hidden Markov models.

Authors: S R Eddy
Journal: Bioinformatics Date: 1998 Impact factor: 6.937

8. SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions.

Authors: R A Laskowski
Journal: J Mol Graph Date: 1995-10

9. PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids.

Authors: Roman A Laskowski; Victor V Chistyakov; Janet M Thornton
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

10. SURFACE: a database of protein surface regions for functional annotation.

Authors: Fabrizio Ferrè; Gabriele Ausiello; Andreas Zanzoni; Manuela Helmer-Citterich
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

16 in total

1. Recurrent use of evolutionary importance for functional annotation of proteins based on local structural similarity.

Authors: David M Kristensen; Brian Y Chen; Viacheslav Y Fofanov; R Matthew Ward; Andreas Martin Lisewski; Marek Kimmel; Lydia E Kavraki; Olivier Lichtarge
Journal: Protein Sci Date: 2006-05-02 Impact factor: 6.725

2. INTEGRATING COMPUTATIONAL PROTEIN FUNCTION PREDICTION INTO DRUG DISCOVERY INITIATIVES.

Authors: Marianne A Grant
Journal: Drug Dev Res Date: 2011-02 Impact factor: 4.360

3. Structural relationships in the lysozyme superfamily: significant evidence for glycoside hydrolase signature motifs.

Authors: Alexandre Wohlkönig; Joëlle Huet; Yvan Looze; René Wintjens
Journal: PLoS One Date: 2010-11-09 Impact factor: 3.240

4. Protein segment finder: an online search engine for segment motifs in the PDB.

Authors: Abraham O Samson; Michael Levitt
Journal: Nucleic Acids Res Date: 2008-10-30 Impact factor: 16.971

5. Identification of binding pockets in protein structures using a knowledge-based potential derived from local structural similarities.

Authors: Valerio Bianchi; Pier Federico Gherardini; Manuela Helmer-Citterich; Gabriele Ausiello
Journal: BMC Bioinformatics Date: 2012-03-28 Impact factor: 3.169