Literature DB >> 16381873

The PMDB Protein Model Database.

Tiziana Castrignanò¹, Paolo D'Onorio De Meo, Domenico Cozzetto, Ivano Giuseppe Talamo, Anna Tramontano.

Abstract

The Protein Model Database (PMDB) is a public resource aimed at storing manually built 3D models of proteins. The database is designed to provide access to models published in the scientific literature, together with validating experimental data. It is a relational database and it currently contains >74,000 models for approximately 240 proteins. The system is accessible at http://www.caspur.it/PMDB and allows predictors to submit models along with related supporting evidence and users to download them through a simple and intuitive interface. Users can navigate in the database and retrieve models referring to the same target protein or to different regions of the same protein. Each model is assigned a unique identifier that allows interested users to directly access the data.

Entities: Chemical Gene Species

Mesh：

Year: 2006 PMID： 16381873 PMCID： PMC1347467 DOI： 10.1093/nar/gkj105

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The data deluge brought about by the genomic projects has fostered an unprecedented level of expectation for new medical, pharmacological, environmental and biotechnological discoveries. Proteins mediate the majority of the functions of an organism, and all these functions are, by and large, determined by the proteins' 3D structure. Despite the progress achieved so far by structural genomics projects (1), the exploration of the complete protein structure space through experimental techniques such as X-ray crystallography and NMR spectroscopy is still out of reach, because these techniques are time and resource consuming and not necessarily successful in all cases. Consequently the gap between the numbers of known protein structures and sequences is steadily increasing. Natural proteins spontaneously assume a unique 3D structure that, by and large, only depends upon the protein sequence. The problem of understanding the rules governing the folding process is very challenging and as yet unsolved. However, approximate methods for inferring the structure of a protein from its amino acid sequence are flourishing (2). Their results are of enormous relevance in many fields, from medicine to biology, from biotechnology to pharmacology. Information derived from protein models has indeed proven to be useful by itself and in combination with experiments. Protein models have been shown to be instrumental for the refinement of experimental structures (3,4), the design of site-directed mutants (5), the characterization of molecular function (6) and structure-based drug design (7). Not surprisingly, a growing number of scientific papers reporting the results of modelling experiments and their application to the design and interpretation of experiments are appearing in the literature. Unfortunately, the models described in these reports are rarely publicly available and, in general, only accessible via direct interaction with the authors. The difficulty of mining the available structural model data leads to duplication of efforts and impairs the possibility of numerically evaluating the correctness of the models when the experimental structure becomes available. The establishment of public repositories for these protein 3D models can partly overcome these problems. Specialized databases, such as ModBase (8) and the SWISS-MODEL repository (9), are already available for automatically built protein structure models. We have developed, and describe here, a Protein Model Database (PMDB) where manually built models can be deposited and retrieved, together with their supporting information.

DATABASE CONTENT AND WEB ACCESS

PMDB (interactively accessible at ) is a relational database of protein models submitted by users and obtained with different structure prediction techniques. The database is implemented on a Linux server (Suse Enterprise Server 9) running Apache, and the management system is MySQL 4.1.12. The queue management system is written in Perl. PHP scripts and GD libraries are used for launching applications such as Blast and for display, respectively. The current release contains >74 000 models for ∼240 proteins, the majority of which are predictions submitted to the ‘Critical Assessment of Techniques for Structure Prediction’ experiment (2). Other models include those generated by our group (10,11) and models that we uploaded using published alignments (12–15). The database entry point is a protein target, for which one or more structural models can be present in the database. Available information for each target includes the protein name, sequence and length, organism and, whenever applicable, links to the SwissProt sequence database (16). Several models can be present for each target protein, or for different regions of the same target protein and the user can navigate through them using a graphical view shown in Figure 1. After the structure of a target is solved, the database entry is also linked to the experimental structure in the PDB (17).

Figure 1

PMDB overview. Information about each of the models satisfying the search criteria can be easily retrieved. When a user uploads a model, its amino acid sequence is automatically retrieved from its coordinate file. Residues for which coordinates are not available in the PDB model file, if any, can be manually inserted.

Models can be submitted in the form of a PDB file (TS format) or as an alignment to one or more known protein structures (AL format). In the latter case the coordinates of the backbone of the model are built using the AL2TS program (). When a user submits a model for a protein, the system verifies whether the target already exists (i.e. there is already a model for some regions of the same protein). If not, the target is created and the model mapped to it. Unless the target is an artificial or mutant protein, the target entry is linked to existing sequence databases [at present the NCBI nr database (18)]. The predictor can provide the NCBI id of the protein (in which case the system performs a sequence check), ask for a BLAST search in the database to retrieve the id (if more than one entry matches the sequence, the user is requested to select the correct one), or inform the system that the target is not expected to be present in any sequence database. The sequence of the target is derived from the submitted model PDB or alignment file. In the former case, if the distance between consecutive Cα is larger than expected for connected residues, the user is asked whether he or she wants to complete the target sequence (Figure 1). The system also reports cases where atoms in the model are closer than the sum of their van der Waals radii. The database stores information about the author of the model, a short description of the method used and supporting evidence, in the form, for example, of a multiple sequence alignment. Submitters are also asked to assign a reliability value to their model(s) and a literature reference that can also be provided at a later stage. Models can be kept on hold upon request and made available to the general users after at most 6 months from deposition. At the end of the submission procedure, the model is assigned a unique identifier. The user interface allows the model(s) to be searched by protein or organism name, protein accession number (in the nr database), author, PMDB model identifier, model type (i.e. a complete coordinate set indicated by TS or an alignment to a known structure indicated by AL). It is also possible to perform sequence similarity searches via BLAST (19). Search results are displayed in the form of a table, listing the records satisfying all selected criteria (Figure 1). Each row refers to a target sequence and related models, along with summary information. Every model that is not on hold can be downloaded or displayed through the 3D visualization program RASMOL (20).

FUTURE DEVELOPMENTS

Immediate future plans for the database include the possibility of using UNIPROT identifiers (21) for the protein targets and to perform more sophisticated searches. We also plan to add provisions for evaluating the models, other than the simple stereochemical checks performed at present, using tools such as WHATCHECK (22), Verify3D (23) and PROSA (24) as well as tools to automatically evaluate the quality of models of proteins the structure of which is subsequently solved (25). This will permit, in the future, to analyse the correlation between the actual quality of the models with the reliability values assigned by the authors and with those estimated by automatic verification tools.

24 in total

1. Evaluating the potential of using fold-recognition models for molecular replacement.

Authors: D T Jones
Journal: Acta Crystallogr D Biol Crystallogr Date: 2001-09-21

2. From protein structure to biochemical function?

Authors: Roman A Laskowski; James D Watson; Janet M Thornton
Journal: J Struct Funct Genomics Date: 2003

3. Critical assessment of methods of protein structure prediction (CASP)-round V.

Authors: John Moult; Krzysztof Fidelis; Adam Zemla; Tim Hubbard
Journal: Proteins Date: 2003

4. About the use of protein models.

Authors: Manuel C Peitsch
Journal: Bioinformatics Date: 2002-07 Impact factor: 6.937

5. UniProt: the Universal Protein knowledgebase.

Authors: Rolf Apweiler; Amos Bairoch; Cathy H Wu; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Darren A Natale; Claire O'Donovan; Nicole Redaschi; Lai-Su L Yeh
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

6. MODBASE, a database of annotated comparative protein structure models, and associated resources.

Authors: Ursula Pieper; Narayanan Eswar; Hannes Braberg; M S Madhusudhan; Fred P Davis; Ashley C Stuart; Nebojsa Mirkovic; Andrea Rossi; Marc A Marti-Renom; Andras Fiser; Ben Webb; Daniel Greenblatt; Conrad C Huang; Thomas E Ferrin; Andrej Sali
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

7. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.

Authors: Brigitte Boeckmann; Amos Bairoch; Rolf Apweiler; Marie-Claude Blatter; Anne Estreicher; Elisabeth Gasteiger; Maria J Martin; Karine Michoud; Claire O'Donovan; Isabelle Phan; Sandrine Pilbout; Michel Schneider
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

8. A model for recognition of polychlorinated dibenzo-p-dioxins by the aryl hydrocarbon receptor.

Authors: M Procopio; A Lahm; A Tramontano; L Bonati; D Pitea
Journal: Eur J Biochem Date: 2002-01

9. Protochlorophyllide oxidoreductase: a homology model examined by site-directed mutagenesis.

Authors: H E Townley; R B Sessions; A R Clarke; T R Dafforn; W T Griffiths
Journal: Proteins Date: 2001-08-15

10. A model for the hepatitis C virus envelope glycoprotein E2.

Authors: A T Yagnik; A Lahm; A Meola; R M Roccasecca; B B Ercole; A Nicosia; A Tramontano
Journal: Proteins Date: 2000-08-15

102 in total

1. Molecular models and mutational analyses of plant specifier proteins suggest active site residues and reaction mechanism.

Authors: Wolfgang Brandt; Anita Backenköhler; Eva Schulze; Antje Plock; Thomas Herberg; Elin Roese; Ute Wittstock
Journal: Plant Mol Biol Date: 2013-09-03 Impact factor: 4.076

2. Genome-wide analysis of HSP90 gene family in the Mediterranean olive (Olea europaea subsp. europaea) provides insight into structural patterns, evolution and functional diversity.

Authors: Inchirah Bettaieb; Jihen Hamdi; Dhia Bouktila
Journal: Physiol Mol Biol Plants Date: 2020-11-19

Review 3. The evaluation of protein structure prediction results.

Authors: Domenico Cozzetto; Alejandro Giorgetti; Domenico Raimondo; Anna Tramontano
Journal: Mol Biotechnol Date: 2007-12-11 Impact factor: 2.695

4. Analysis of oligomeric proteins during unfolding by pH and temperature.

Authors: Pradip Bhattacharya; Tamil Ganeshan; Soumiyadeep Nandi; Alok Srivastava; Prashant Singh; Mohommad Rehan; Reshmi Rashkush; Naidu Subbarao; Andrew Lynn
Journal: J Mol Model Date: 2009-02-11 Impact factor: 1.810

5. Structural and functional dissection of differentially expressed tomato WRKY transcripts in host defense response against the vascular wilt pathogen (Fusarium oxysporum f. sp. lycopersici).

Authors: Mohd Aamir; Vinay Kumar Singh; Manish Kumar Dubey; Sarvesh Pratap Kashyap; Andleeb Zehra; Ram Sanmukh Upadhyay; Surendra Singh
Journal: PLoS One Date: 2018-04-30 Impact factor: 3.240

6. Gene identification and comparative molecular modeling of a Trypanosoma rangeli major surface protease.

Authors: Paulo H M Calixto; Mainá Bitar; Keila A M Ferreira; Odonírio Abrahão; Eliane Lages-Silva; Glória R Franco; Luis E Ramírez; André L Pedrosa
Journal: J Mol Model Date: 2013-04-13 Impact factor: 1.810

7. The activity of prolactin releasing peptide correlates with its helicity.

Authors: Stephanie H Deluca; Daniel Rathmann; Annette G Beck-Sickinger; Jens Meiler
Journal: Biopolymers Date: 2013-05 Impact factor: 2.505

8. Homology modeling and molecular dynamics based insights into Chalcone synthase and Chalcone isomerase in Phyllanthus emblica L.

Authors: Anuj Kumar; Mansi Sharma; Swaroopa Nand Chaubey; Avneesh Kumar
Journal: 3 Biotech Date: 2020-08-04 Impact factor: 2.406

9. Bomapin is a redox-sensitive nuclear serpin that affects responsiveness of myeloid progenitor cells to growth environment.

Authors: Patrycja Przygodzka; Björn Ramstedt; Tobias Tengel; Göran Larsson; Malgorzata Wilczynska
Journal: BMC Cell Biol Date: 2010-04-30 Impact factor: 4.241

10. A ClpP protein model as tuberculosis target for screening marine compounds.

Authors: Abhilasha Tiwari; Smita Gupta; Shipra Srivastava; Rajeev Srivastava; Anil Kumar Rawat
Journal: Bioinformation Date: 2010-03-31