Literature DB >> 21097895

PCDB: a database of protein conformational diversity.

Ezequiel I Juritz¹, Sebastian Fernandez Alberti, Gustavo D Parisi.

Abstract

PCDB (http://www.pcdb.unq.edu.ar) is a database of protein conformational diversity. For each protein, the database contains the redundant compilation of all the corresponding crystallographic structures obtained under different conditions. These structures could be considered as different instances of protein dynamism. As a measure of the conformational diversity we use the maximum RMSD obtained comparing the structures deposited for each domain. The redundant structures were extracted following CATH structural classification and cross linked with additional information. In this way it is possible to relate a given amount of conformational diversity with different levels of information, such as protein function, presence of ligands and mutations, structural classification, active site information and organism taxonomy among others. Currently the database contains 7989 domains with a total of 36581 structures from 4171 different proteins. The maximum RMSD registered is 26.7 Å and the average of different structures per domain is 4.5.

Entities: Species

Mesh：

Year: 2010 PMID： 21097895 PMCID： PMC3013735 DOI： 10.1093/nar/gkq1181

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Protein conformational diversity is a key feature to understand protein function. Since the early studies of Max Perutz on the T and R forms of hemoglobin, increasing experimental evidence supports the notion that native state of proteins is not unique. In fact, the native state is better represented by an ensemble of conformers in equilibrium describing the conformational diversity or dynamism of a protein (1). It has been showed that the ensemble description is essential to understand central biological aspects of protein function such as the catalytic process of enzymes (2–4), protein–protein recognition (5–7), macromolecular process such as DNA replication and protein folding by chaperonins (8), enzyme promiscuity (9), signal transduction (10,11) and the proteins ability to develop new functions (property known as ‘evolvability’) (12,13). Despite that, the characterization of the equilibrium ensemble of conformers, involving the study of the structural and thermodynamic features of each individual conformer, represents a major challenge to overcome. In this way, different procedures have been applied to the study of protein dynamisms. Experimentally, the nuclear magnetic resonance (NMR) spectroscopy is among the most widely used approaches representing a promising and active area of research (14). On the other hand, computational methods like Coarse-Grained Molecular Dynamics and Monte Carlo methods techniques, used in combination with Normal mode analysis, have been revealed that they are useful tools to explore the conformational landscape of proteins (15–19). Finally, a completely different approach to study conformational diversity considers that crystallographic structures of the same protein obtained under different conditions are snapshots or instances of protein dynamism. This view is supported by the correlation found between the observed structural diversity determined by solution experiments such as NMR measurements and those coming from crystallographic structures of proteins obtained in different conditions (6,20–25). Also a good correlation was found when computational methods, such as molecular dynamics, were used to simulate protein dynamism and then compared with solution structures from NMR (26,27). With thousands of structures redundantly deposited in structural databases (28) the extension and distribution of the conformational diversity can be explored for a large number of proteins not accessible with the methodologies mentioned above. In this paper we have used this approach to develop a database of proteins with conformational diversity. Here, we describe PCDB (from protein Conformational diversity database), its web functionality and possible applications.

OVERVIEW OF PCDB

PCDB is a database of proteins showing conformational diversity. As was mentioned above, conformational diversity is estimated from a redundant collection of structures for each protein domain deposited in the database. PCDB was developed from CATH database v3.3 following its protein domain structural hierarchy and definitions (29). Briefly, CATH clusters proteins domains using structural and sequence similarities in a hierarchy defined by 9 levels called CATHSOLID where the ‘D’ level assigns a number for each individual domain in the database and corresponds with the collection of different crystallographic structures for an individual protein. This level was used to build PCDB collecting all the proteins domains with at least two different crystallographic structures classified in CATH. The current version of the PCDB contains 7989 protein domains from 4171 proteins and 34775 crystallographic structures and 1806 corresponding to NMR (Table 1).

Table 1.

Summary of the data available in PCDB v1.0

Domains	7989
Proteins	4171
Structures	36581
Maximum RMSDmax	26.7
Average structures per domain	4.7
Structures in class mainly alpha^a	1728
Structures in class mainly beta^a	2326
Structures in class mixed alpha-beta^a	3790
Structures in class few secondary structure^a	145

aAccordingly to CATH v3.3 (http://www.cathinfo.db) hierarchy (29).

Summary of the data available in PCDB v1.0 aAccordingly to CATH v3.3 (http://www.cathinfo.db) hierarchy (29). The structures collected for each protein domain could have been crystallized under the same or different conditions. If the conditions were the same, it is known that RMSD between different structures is as much as 0.1 to 0.4 A (30). Larger RMSD are expected when conformational diversity appears and this could happen when crystallization conditions varies among the structures considered. In fact RMSD as high as 23.4 have been reported in redundant studies of protein structures (28). Following the addition of ligands, for example, it is well established that a conformational equilibrium shift towards a high affinity conformer could occurred originating changes in tertiary structure (12,31,32). Besides, other changes in crystallization conditions like modifications in the oligomerization state (33), pH and temperature, as well as the presence of mutations (34) can also modify the relative stability of conformers and then originate differences between crystallographic structures for the same protein. In addition, different sequence modifications or crystallographic errors could introduce conformational diversity unrelated to biological reasons. Considering that our method to measure conformational diversity relies in the quality of the crystallographic structure, different filters were used in order to build the database. The different criteria used to select the structures are explained below and a general PCDB building scheme can be found in Supplementary Figure S1. In PCDB, the structures are linked with information contained in PDB concerning the crystallization procedure and supplementary data that could help to understand the occurrence of conformational diversity. The factors considered are: the presence of ligands, mutations, changes in the oligomeric state and pH. The maximum RMSD (RMSDmax) among the redundant structures of each protein domain is used to evaluate the extension of the structural change. Using the data in PCDB, we have found that at least one of these set of selected experimental features is involved in the 74% of all the domains (Table 2), and in the 60% of the domains with more than 0.4 RMSDmax. Besides the information provided for the crystallization procedure, each of the proteins deposited in PCDB was cross linked with different databases. In this way, a given extension of conformational diversity measured by RMSDmax can be related with diverse biological and structural information such as biological function [GO terms (35) and Enzyme Commission numbers(EC) (36)], structural classification [CATH (29)], taxonomy (NCBI taxonomic ID and genus and species names), metabolic pathways location, subcellular location, protein interactions, protein family, presence of characterized catalytic site [Catalytic Site Atlas (37)] and derived InterPro links (38).

Table 2.

Number of proteins in PCDB with different factors possibly promoting the observed conformational diversity

Possible condition promoting conformational diversity	Number of proteins
Mutations	268
Ligands	568
Changes in oligomeric state	536
Changes in pH	1029
Mutations and Ligands	77
Mutations and changes in oligomeric state	108
Mutations and changes in pH	213
Ligands and changes in oligomeric state	231
Ligands and changes in pH	387
Changes in oligomeric state and pH	613
Four conditions	269

Number of proteins in PCDB with different factors possibly promoting the observed conformational diversity PCDB is composed of a web application based on PHP language, connected with a MySQL database. The database includes information derived from numerous biological databases and online servers and data acquired from personal scripting and programs. PCDB search tool is based on dynamics SQL queries generated in PHP. PCDB browsing capability is based on SQL stored procedures that are executed dynamically, using PHP language. PCDB was built using the redundant structures from each protein domain collected from CATH v3.3 (39) (see Supplementary Figure S1). The structures belonging to each protein domain were structurally aligned using MAMMOTH (40) and the RMSDmax between conformers were registered. Information about crystallization conditions was extracted from PDBML/XML files, as well as the oligomeric state, presence of sequence modifications, mutations, deletion and missing residues. Post-translational modifications were extracted from the ‘Controlled vocabulary of post-translational modifications’ provided by Uniprot. Information about catalytic residues was extracted from Catalytic Site Atlas (37). Further biological information for each structure were extracted from different databases: PDB (30), SIFTS (http://www.ebi.ac.uk/msd/sifts/) and UniProt (41).

APPLICATIONS

Conformational diversity is a central issue to understand protein function so its characterization could span multiple applications. PCDB database is designed to retrieve proteins with a given amount of conformational diversity measured by RMSDmax and allows relating this value with different levels of information. There are two main ways to explore PCDB (Figure 1). The main attribute to search PCDB concerns the extension of conformational diversity measured by RMSDmax. This type of search could be limited using a set of four attributes (presence of ligands, presence of mutations, changes in oligomerization state and changes in pH) considering the properties characterizing the experimental conditions of crystallization of each structure. These attributes can be selected separately or in different combinations (Table 2) and can be used to explain the RMSDmax obtained for a given protein. In the example showed in Figure 1, we were interested in searching PCDB for proteins with 5–10 RMSDmax between their respective structures due to the presence of ligands. Therefore, the resulted extension of conformational diversity can be univocally associated to conformational changes upon ligand binding. Also in Figure 1, and below the search field, the field to customize the output information is displayed. In this section it is possible to select different levels of information from structural classification, protein function or subcellular location among others. It is also possible to retrieve the structural superposition of the conformers with the maximum RMSD. Similar searches could explore PCBD using a single or a combination of the attributes producing conformational changes. Furthermore, the biological and structural data contained in the customizable output field, could be used to explore different trends related with conformational diversity.

Figure 1.

Searching PCDB using the presence of ligands as possible origin of conformational diversity between 5 and 10 units of RMSDmax (1). In the Format output section (2) it is possible to customize the biological and physicochemical information retrieved with the results.

FUTURE CONSIDERATIONS

We are interested in increasing the amount and diversity of available biological and structural data for each domain represented in the database, to enhance possible correlations studies between conformational diversity and a broad spectrum of physiochemical parameters. One of our near future goals is to introduce sequence alignments for each deposited protein to derive evolutionary information such as the relative conservation of different positions and evolutionary rates. The link between the pattern of residue substitution and the extension of conformational diversity is a promising field to increase our understanding about protein evolution; however it is almost an unexplored field yet. Beside this, and following previous works, we would like to enrich PCDB introducing structures from close homologous proteins (21) in order to increase the conformational representation of the deposited domains.

CONCLUSIONS

Two main features differentiate PCDB from other databases containing information about conformational diversity in proteins (42,43). Firstly, PCDB uses experimentally determined structures and secondly this data are related with biological and structural information to possible explains the observed conformational diversity extension. In the present version, PCDB contains 7989 protein domains with a broad range of conformational diversity from the trivial zero to 26.7 RMSDmax. In this way PCDB could be an essential tool to understand conformational diversity and by this means obtain a better understanding of protein function.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Proyectos de Investigacion Plurianuales (PIP) CONICET grant (112-200801-02849) and Universidad Nacional de Quilmes grant (53/B056); Ezequiel Juritz has a type II fellowship from CONICET. Funding for open access charge: CONICET and UNQ grants. Conflict of interest statement. None declared.

43 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. The ENZYME database in 2000.

Authors: A Bairoch
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

3. Ligand-dependent dynamics and intramolecular signaling in a PDZ domain.

Authors: Ernesto J Fuentes; Channing J Der; Andrew L Lee
Journal: J Mol Biol Date: 2004-01-23 Impact factor: 5.469

4. What contributions to protein side-chain dynamics are probed by NMR experiments? A molecular dynamics simulation analysis.

Authors: Robert B Best; Jane Clarke; Martin Karplus
Journal: J Mol Biol Date: 2005-03-16 Impact factor: 5.469

5. Sampling of the native conformational ensemble of myoglobin via structures in different crystalline environments.

Authors: Dmitry A Kondrashov; Wei Zhang; Roman Aranda; Boguslaw Stec; George N Phillips
Journal: Proteins Date: 2008-02-01

6. The CATH Database provides insights into protein structure/function relationships.

Authors: C A Orengo; F M Pearl; J E Bray; A E Todd; A C Martin; L Lo Conte; J M Thornton
Journal: Nucleic Acids Res Date: 1999-01-01 Impact factor: 16.971

7. Dynameomics: a comprehensive database of protein dynamics.

Authors: Marc W van der Kamp; R Dustin Schaeffer; Amanda L Jonsson; Alexander D Scouras; Andrew M Simms; Rudesh D Toofanny; Noah C Benson; Peter C Anderson; Eric D Merkley; Steven Rysavy; Dennis Bromley; David A C Beck; Valerie Daggett
Journal: Structure Date: 2010-03-14 Impact factor: 5.006

Review 8. The structural dynamics of macromolecular processes.

Authors: Daniel Russel; Keren Lasker; Jeremy Phillips; Dina Schneidman-Duhovny; Javier A Velázquez-Muriel; Andrej Sali
Journal: Curr Opin Cell Biol Date: 2009-02-14 Impact factor: 8.382

9. A comparative analysis of the equilibrium dynamics of a designed protein inferred from NMR, X-ray, and computations.

Authors: Lin Liu; Leonardus M I Koharudin; Angela M Gronenborn; Ivet Bahar
Journal: Proteins Date: 2009-12

10. Dynamics and entropy of a calmodulin-peptide complex studied by NMR and molecular dynamics.

Authors: Ninad V Prabhu; Andrew L Lee; A Joshua Wand; Kim A Sharp
Journal: Biochemistry Date: 2003-01-21 Impact factor: 3.162

13 in total

Review 1. ChSeq: A database of chameleon sequences.

Authors: Wenlin Li; Lisa N Kinch; P Andrew Karplus; Nick V Grishin
Journal: Protein Sci Date: 2015-06-16 Impact factor: 6.725

2. PeptiSite: a structural database of peptide binding sites in 4D.

Authors: Chayan Acharya; Irina Kufareva; Andrey V Ilatovskiy; Ruben Abagyan
Journal: Biochem Biophys Res Commun Date: 2014-01-06 Impact factor: 3.575

3. Comparison of tertiary structures of proteins in protein-protein complexes with unbound forms suggests prevalence of allostery in signalling proteins.

Authors: Lakshmipuram S Swapna; Swapnil Mahajan; Alexandre G de Brevern; Narayanaswamy Srinivasan
Journal: BMC Struct Biol Date: 2012-05-03

4. Evolutionary dynamics on protein bi-stability landscapes can potentially resolve adaptive conflicts.

Authors: Tobias Sikosek; Erich Bornberg-Bauer; Hue Sun Chan
Journal: PLoS Comput Biol Date: 2012-09-13 Impact factor: 4.475

5. The 2011 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection.

Authors: Michael Y Galperin; Guy R Cochrane
Journal: Nucleic Acids Res Date: 2011-01 Impact factor: 16.971

6. Pocketome: an encyclopedia of small-molecule binding sites in 4D.

Authors: Irina Kufareva; Andrey V Ilatovskiy; Ruben Abagyan
Journal: Nucleic Acids Res Date: 2011-11-12 Impact factor: 16.971

7. On the effect of protein conformation diversity in discriminating among neutral and disease related single amino acid substitutions.

Authors: Ezequiel Juritz; Maria Silvina Fornasari; Pier Luigi Martelli; Piero Fariselli; Rita Casadio; Gustavo Parisi
Journal: BMC Genomics Date: 2012-06-18 Impact factor: 3.969