Literature DB >> 21097895

PCDB: a database of protein conformational diversity.

Ezequiel I Juritz1, Sebastian Fernandez Alberti, Gustavo D Parisi.   

Abstract

PCDB (http://www.pcdb.unq.edu.ar) is a database of protein conformational diversity. For each protein, the database contains the redundant compilation of all the corresponding crystallographic structures obtained under different conditions. These structures could be considered as different instances of protein dynamism. As a measure of the conformational diversity we use the maximum RMSD obtained comparing the structures deposited for each domain. The redundant structures were extracted following CATH structural classification and cross linked with additional information. In this way it is possible to relate a given amount of conformational diversity with different levels of information, such as protein function, presence of ligands and mutations, structural classification, active site information and organism taxonomy among others. Currently the database contains 7989 domains with a total of 36581 structures from 4171 different proteins. The maximum RMSD registered is 26.7 Å and the average of different structures per domain is 4.5.

Entities:  

Mesh:

Year:  2010        PMID: 21097895      PMCID: PMC3013735          DOI: 10.1093/nar/gkq1181

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Protein conformational diversity is a key feature to understand protein function. Since the early studies of Max Perutz on the T and R forms of hemoglobin, increasing experimental evidence supports the notion that native state of proteins is not unique. In fact, the native state is better represented by an ensemble of conformers in equilibrium describing the conformational diversity or dynamism of a protein (1). It has been showed that the ensemble description is essential to understand central biological aspects of protein function such as the catalytic process of enzymes (2–4), protein–protein recognition (5–7), macromolecular process such as DNA replication and protein folding by chaperonins (8), enzyme promiscuity (9), signal transduction (10,11) and the proteins ability to develop new functions (property known as ‘evolvability’) (12,13). Despite that, the characterization of the equilibrium ensemble of conformers, involving the study of the structural and thermodynamic features of each individual conformer, represents a major challenge to overcome. In this way, different procedures have been applied to the study of protein dynamisms. Experimentally, the nuclear magnetic resonance (NMR) spectroscopy is among the most widely used approaches representing a promising and active area of research (14). On the other hand, computational methods like Coarse-Grained Molecular Dynamics and Monte Carlo methods techniques, used in combination with Normal mode analysis, have been revealed that they are useful tools to explore the conformational landscape of proteins (15–19). Finally, a completely different approach to study conformational diversity considers that crystallographic structures of the same protein obtained under different conditions are snapshots or instances of protein dynamism. This view is supported by the correlation found between the observed structural diversity determined by solution experiments such as NMR measurements and those coming from crystallographic structures of proteins obtained in different conditions (6,20–25). Also a good correlation was found when computational methods, such as molecular dynamics, were used to simulate protein dynamism and then compared with solution structures from NMR (26,27). With thousands of structures redundantly deposited in structural databases (28) the extension and distribution of the conformational diversity can be explored for a large number of proteins not accessible with the methodologies mentioned above. In this paper we have used this approach to develop a database of proteins with conformational diversity. Here, we describe PCDB (from protein Conformational diversity database), its web functionality and possible applications.

OVERVIEW OF PCDB

PCDB is a database of proteins showing conformational diversity. As was mentioned above, conformational diversity is estimated from a redundant collection of structures for each protein domain deposited in the database. PCDB was developed from CATH database v3.3 following its protein domain structural hierarchy and definitions (29). Briefly, CATH clusters proteins domains using structural and sequence similarities in a hierarchy defined by 9 levels called CATHSOLID where the ‘D’ level assigns a number for each individual domain in the database and corresponds with the collection of different crystallographic structures for an individual protein. This level was used to build PCDB collecting all the proteins domains with at least two different crystallographic structures classified in CATH. The current version of the PCDB contains 7989 protein domains from 4171 proteins and 34775 crystallographic structures and 1806 corresponding to NMR (Table 1).
Table 1.

Summary of the data available in PCDB v1.0

Domains7989
Proteins4171
Structures36581
Maximum RMSDmax26.7
Average structures per domain4.7
Structures in class mainly alphaa1728
Structures in class mainly betaa2326
Structures in class mixed alpha-betaa3790
Structures in class few secondary structurea145

aAccordingly to CATH v3.3 (http://www.cathinfo.db) hierarchy (29).

Summary of the data available in PCDB v1.0 aAccordingly to CATH v3.3 (http://www.cathinfo.db) hierarchy (29). The structures collected for each protein domain could have been crystallized under the same or different conditions. If the conditions were the same, it is known that RMSD between different structures is as much as 0.1 to 0.4 A (30). Larger RMSD are expected when conformational diversity appears and this could happen when crystallization conditions varies among the structures considered. In fact RMSD as high as 23.4 have been reported in redundant studies of protein structures (28). Following the addition of ligands, for example, it is well established that a conformational equilibrium shift towards a high affinity conformer could occurred originating changes in tertiary structure (12,31,32). Besides, other changes in crystallization conditions like modifications in the oligomerization state (33), pH and temperature, as well as the presence of mutations (34) can also modify the relative stability of conformers and then originate differences between crystallographic structures for the same protein. In addition, different sequence modifications or crystallographic errors could introduce conformational diversity unrelated to biological reasons. Considering that our method to measure conformational diversity relies in the quality of the crystallographic structure, different filters were used in order to build the database. The different criteria used to select the structures are explained below and a general PCDB building scheme can be found in Supplementary Figure S1. In PCDB, the structures are linked with information contained in PDB concerning the crystallization procedure and supplementary data that could help to understand the occurrence of conformational diversity. The factors considered are: the presence of ligands, mutations, changes in the oligomeric state and pH. The maximum RMSD (RMSDmax) among the redundant structures of each protein domain is used to evaluate the extension of the structural change. Using the data in PCDB, we have found that at least one of these set of selected experimental features is involved in the 74% of all the domains (Table 2), and in the 60% of the domains with more than 0.4 RMSDmax. Besides the information provided for the crystallization procedure, each of the proteins deposited in PCDB was cross linked with different databases. In this way, a given extension of conformational diversity measured by RMSDmax can be related with diverse biological and structural information such as biological function [GO terms (35) and Enzyme Commission numbers(EC) (36)], structural classification [CATH (29)], taxonomy (NCBI taxonomic ID and genus and species names), metabolic pathways location, subcellular location, protein interactions, protein family, presence of characterized catalytic site [Catalytic Site Atlas (37)] and derived InterPro links (38).
Table 2.

Number of proteins in PCDB with different factors possibly promoting the observed conformational diversity

Possible condition promoting conformational diversityNumber of proteins
Mutations268
Ligands568
Changes in oligomeric state536
Changes in pH1029
Mutations and Ligands77
Mutations and changes in oligomeric state108
Mutations and changes in pH213
Ligands and changes in oligomeric state231
Ligands and changes in pH387
Changes in oligomeric state and pH613
Four conditions269
Number of proteins in PCDB with different factors possibly promoting the observed conformational diversity PCDB is composed of a web application based on PHP language, connected with a MySQL database. The database includes information derived from numerous biological databases and online servers and data acquired from personal scripting and programs. PCDB search tool is based on dynamics SQL queries generated in PHP. PCDB browsing capability is based on SQL stored procedures that are executed dynamically, using PHP language. PCDB was built using the redundant structures from each protein domain collected from CATH v3.3 (39) (see Supplementary Figure S1). The structures belonging to each protein domain were structurally aligned using MAMMOTH (40) and the RMSDmax between conformers were registered. Information about crystallization conditions was extracted from PDBML/XML files, as well as the oligomeric state, presence of sequence modifications, mutations, deletion and missing residues. Post-translational modifications were extracted from the ‘Controlled vocabulary of post-translational modifications’ provided by Uniprot. Information about catalytic residues was extracted from Catalytic Site Atlas (37). Further biological information for each structure were extracted from different databases: PDB (30), SIFTS (http://www.ebi.ac.uk/msd/sifts/) and UniProt (41).

APPLICATIONS

Conformational diversity is a central issue to understand protein function so its characterization could span multiple applications. PCDB database is designed to retrieve proteins with a given amount of conformational diversity measured by RMSDmax and allows relating this value with different levels of information. There are two main ways to explore PCDB (Figure 1). The main attribute to search PCDB concerns the extension of conformational diversity measured by RMSDmax. This type of search could be limited using a set of four attributes (presence of ligands, presence of mutations, changes in oligomerization state and changes in pH) considering the properties characterizing the experimental conditions of crystallization of each structure. These attributes can be selected separately or in different combinations (Table 2) and can be used to explain the RMSDmax obtained for a given protein. In the example showed in Figure 1, we were interested in searching PCDB for proteins with 5–10 RMSDmax between their respective structures due to the presence of ligands. Therefore, the resulted extension of conformational diversity can be univocally associated to conformational changes upon ligand binding. Also in Figure 1, and below the search field, the field to customize the output information is displayed. In this section it is possible to select different levels of information from structural classification, protein function or subcellular location among others. It is also possible to retrieve the structural superposition of the conformers with the maximum RMSD. Similar searches could explore PCBD using a single or a combination of the attributes producing conformational changes. Furthermore, the biological and structural data contained in the customizable output field, could be used to explore different trends related with conformational diversity.
Figure 1.

Searching PCDB using the presence of ligands as possible origin of conformational diversity between 5 and 10 units of RMSDmax (1). In the Format output section (2) it is possible to customize the biological and physicochemical information retrieved with the results.

Searching PCDB using the presence of ligands as possible origin of conformational diversity between 5 and 10 units of RMSDmax (1). In the Format output section (2) it is possible to customize the biological and physicochemical information retrieved with the results.

FUTURE CONSIDERATIONS

We are interested in increasing the amount and diversity of available biological and structural data for each domain represented in the database, to enhance possible correlations studies between conformational diversity and a broad spectrum of physiochemical parameters. One of our near future goals is to introduce sequence alignments for each deposited protein to derive evolutionary information such as the relative conservation of different positions and evolutionary rates. The link between the pattern of residue substitution and the extension of conformational diversity is a promising field to increase our understanding about protein evolution; however it is almost an unexplored field yet. Beside this, and following previous works, we would like to enrich PCDB introducing structures from close homologous proteins (21) in order to increase the conformational representation of the deposited domains.

CONCLUSIONS

Two main features differentiate PCDB from other databases containing information about conformational diversity in proteins (42,43). Firstly, PCDB uses experimentally determined structures and secondly this data are related with biological and structural information to possible explains the observed conformational diversity extension. In the present version, PCDB contains 7989 protein domains with a broad range of conformational diversity from the trivial zero to 26.7 RMSDmax. In this way PCDB could be an essential tool to understand conformational diversity and by this means obtain a better understanding of protein function.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Proyectos de Investigacion Plurianuales (PIP) CONICET grant (112-200801-02849) and Universidad Nacional de Quilmes grant (53/B056); Ezequiel Juritz has a type II fellowship from CONICET. Funding for open access charge: CONICET and UNQ grants. Conflict of interest statement. None declared.
  43 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  The ENZYME database in 2000.

Authors:  A Bairoch
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

3.  Ligand-dependent dynamics and intramolecular signaling in a PDZ domain.

Authors:  Ernesto J Fuentes; Channing J Der; Andrew L Lee
Journal:  J Mol Biol       Date:  2004-01-23       Impact factor: 5.469

4.  What contributions to protein side-chain dynamics are probed by NMR experiments? A molecular dynamics simulation analysis.

Authors:  Robert B Best; Jane Clarke; Martin Karplus
Journal:  J Mol Biol       Date:  2005-03-16       Impact factor: 5.469

5.  Sampling of the native conformational ensemble of myoglobin via structures in different crystalline environments.

Authors:  Dmitry A Kondrashov; Wei Zhang; Roman Aranda; Boguslaw Stec; George N Phillips
Journal:  Proteins       Date:  2008-02-01

6.  The CATH Database provides insights into protein structure/function relationships.

Authors:  C A Orengo; F M Pearl; J E Bray; A E Todd; A C Martin; L Lo Conte; J M Thornton
Journal:  Nucleic Acids Res       Date:  1999-01-01       Impact factor: 16.971

7.  Dynameomics: a comprehensive database of protein dynamics.

Authors:  Marc W van der Kamp; R Dustin Schaeffer; Amanda L Jonsson; Alexander D Scouras; Andrew M Simms; Rudesh D Toofanny; Noah C Benson; Peter C Anderson; Eric D Merkley; Steven Rysavy; Dennis Bromley; David A C Beck; Valerie Daggett
Journal:  Structure       Date:  2010-03-14       Impact factor: 5.006

Review 8.  The structural dynamics of macromolecular processes.

Authors:  Daniel Russel; Keren Lasker; Jeremy Phillips; Dina Schneidman-Duhovny; Javier A Velázquez-Muriel; Andrej Sali
Journal:  Curr Opin Cell Biol       Date:  2009-02-14       Impact factor: 8.382

9.  A comparative analysis of the equilibrium dynamics of a designed protein inferred from NMR, X-ray, and computations.

Authors:  Lin Liu; Leonardus M I Koharudin; Angela M Gronenborn; Ivet Bahar
Journal:  Proteins       Date:  2009-12

10.  Dynamics and entropy of a calmodulin-peptide complex studied by NMR and molecular dynamics.

Authors:  Ninad V Prabhu; Andrew L Lee; A Joshua Wand; Kim A Sharp
Journal:  Biochemistry       Date:  2003-01-21       Impact factor: 3.162

View more
  13 in total

Review 1.  ChSeq: A database of chameleon sequences.

Authors:  Wenlin Li; Lisa N Kinch; P Andrew Karplus; Nick V Grishin
Journal:  Protein Sci       Date:  2015-06-16       Impact factor: 6.725

2.  PeptiSite: a structural database of peptide binding sites in 4D.

Authors:  Chayan Acharya; Irina Kufareva; Andrey V Ilatovskiy; Ruben Abagyan
Journal:  Biochem Biophys Res Commun       Date:  2014-01-06       Impact factor: 3.575

3.  Comparison of tertiary structures of proteins in protein-protein complexes with unbound forms suggests prevalence of allostery in signalling proteins.

Authors:  Lakshmipuram S Swapna; Swapnil Mahajan; Alexandre G de Brevern; Narayanaswamy Srinivasan
Journal:  BMC Struct Biol       Date:  2012-05-03

4.  Evolutionary dynamics on protein bi-stability landscapes can potentially resolve adaptive conflicts.

Authors:  Tobias Sikosek; Erich Bornberg-Bauer; Hue Sun Chan
Journal:  PLoS Comput Biol       Date:  2012-09-13       Impact factor: 4.475

5.  The 2011 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection.

Authors:  Michael Y Galperin; Guy R Cochrane
Journal:  Nucleic Acids Res       Date:  2011-01       Impact factor: 16.971

6.  Pocketome: an encyclopedia of small-molecule binding sites in 4D.

Authors:  Irina Kufareva; Andrey V Ilatovskiy; Ruben Abagyan
Journal:  Nucleic Acids Res       Date:  2011-11-12       Impact factor: 16.971

7.  On the effect of protein conformation diversity in discriminating among neutral and disease related single amino acid substitutions.

Authors:  Ezequiel Juritz; Maria Silvina Fornasari; Pier Luigi Martelli; Piero Fariselli; Rita Casadio; Gustavo Parisi
Journal:  BMC Genomics       Date:  2012-06-18       Impact factor: 3.969

8.  CCProf: exploring conformational change profile of proteins.

Authors:  Che-Wei Chang; Chai-Wei Chou; Darby Tien-Hao Chang
Journal:  Database (Oxford)       Date:  2016-03-25       Impact factor: 3.451

9.  BeEP Server: Using evolutionary information for quality assessment of protein structure models.

Authors:  Nicolas Palopoli; Esteban Lanzarotti; Gustavo Parisi
Journal:  Nucleic Acids Res       Date:  2013-05-31       Impact factor: 16.971

10.  PDBFlex: exploring flexibility in protein structures.

Authors:  Thomas Hrabe; Zhanwen Li; Mayya Sedova; Piotr Rotkiewicz; Lukasz Jaroszewski; Adam Godzik
Journal:  Nucleic Acids Res       Date:  2015-11-28       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.