Literature DB >> 17517781

ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins.

Abstract

A major problem in structural biology is the recognition of errors in experimental and theoretical models of protein structures. The ProSA program (Protein Structure Analysis) is an established tool which has a large user base and is frequently employed in the refinement and validation of experimental protein structures and in structure prediction and modeling. The analysis of protein structures is generally a difficult and cumbersome exercise. The new service presented here is a straightforward and easy to use extension of the classic ProSA program which exploits the advantages of interactive web-based applications for the display of scores and energy plots that highlight potential problems spotted in protein structures. In particular, the quality scores of a protein are displayed in the context of all known protein structures and problematic parts of a structure are shown and highlighted in a 3D molecule viewer. The service specifically addresses the needs encountered in the validation of protein structures obtained from X-ray analysis, NMR spectroscopy and theoretical calculations. ProSA-web is accessible at https://prosa.services.came.sbg.ac.at.

Entities: Chemical Disease Species

Mesh：

Substances：
Proteins

Year: 2007 PMID： 17517781 PMCID： PMC1933241 DOI： 10.1093/nar/gkm290

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The availability of a structural model of a protein is one of the keys for understanding biological processes at a molecular level. The recent advances in experimental technology have led to the emergence of large-scale structure determination pipelines aimed at the rapid characterization of protein structures. The resulting amount of experimental structural information is enormous. The application of computational methods for the prediction of unknown structures adds another plethora of structural models. The latest NAR web server issue, e.g. lists about 50 tools in the category ‘3D Structure Prediction’ (1). The assessment of the accuracy and reliability of experimental and theoretical models of protein structures is a necessary task that needs to be addressed regularly and in particular, it is essential for maintaining integrity, consistency and reliability of public structure repositories (2). ProSA (3) is a tool widely used to check 3D models of protein structures for potential errors. Its range of application includes error recognition in experimentally determined structures (4–6), theoretical models (7–10) and protein engineering (11,12). Here we present a web-based version of ProSA, ProSA-web, that encompasses the basic functionality of stand-alone ProSA and extends it with new features that facilitate interpretation of the results obtained. The overall quality score calculated by ProSA for a specific input structure is displayed in a plot that shows the scores of all experimentally determined protein chains currently available in the Protein Data Bank (PDB) (13). This feature relates the score of a specific model to the scores computed from all experimental structures deposited in PDB. Problematic parts of a model are identified by a plot of local quality scores and the same scores are mapped on a display of the 3D structure using color codes. A particular intention of the ProSA-web application is to encourage structure depositors to validate their structures before they are submitted to PDB and to use the tool in early stages of structure determination and refinement. The service requires only Cα atoms so that low-resolution structures and approximate models obtained early in the structure determination process can be evaluated and compared against high-resolution structures. The ProSA-web service returns results instantaneously, i.e. the response time is in the order of seconds, even for large molecules.

WEB SERVER USAGE

Required input

ProSA-web requires the atomic coordinates of the model to be evaluated. Users can supply coordinates either by uploading a file in PDB format or by entering the four-letter code of a protein structure available from PDB. A chain identifier and an NMR model number may be used to specify a particular model. A list with possible values of these parameters is presented to the user if the entered chain identifier or model number is invalid. If no chain identifier or model number is supplied by the user, the first chain of the first model found in the PDB file is used for analysis.

Range of computations

The computational engine used for the calculation of scores and plots is standard ProSA which uses knowledge-based potentials of mean force to evaluate model accuracy (3). All calculations are carried out with Cα potentials, hence ProSA-web can also be applied to low-resolution structures or other cases where the Cα trace is available only (a set of Cβ potentials is included in the stand-alone version of ProSA, see Supplementary Data 1). After parsing the coordinates, the energy of the structure is evaluated using a distance-based pair potential (14,15) and a potential that captures the solvent exposure of protein residues (16). From these energies, two characteristics of the input structure are derived and displayed on the web page: its z-score and a plot of its residue energies. The z-score indicates overall model quality and measures the deviation of the total energy of the structure with respect to an energy distribution derived from random conformations (3,15). Z-scores outside a range characteristic for native proteins indicate erroneous structures. In order to facilitate interpretation of the z-score of the specified protein, its particular value is displayed in a plot that contains the z-scores of all experimentally determined protein chains in current PDB (an example is shown in Figure 1A). Groups of structures from different sources (X-ray, NMR) are distinguished by different colors. This plot can be used to check whether the z-score of the protein in question is within the range of scores typically found for proteins of similar size belonging to one of these groups.

Figure 1.

Investigation of two ABC transporter structures using the ProSA-web service. Subfigures (A–C) show the results for a monomer of MsbA (PDB code 1JSQ, chain A (17)). The structure was determined by X-ray crystallography to 4.5 Å resolution and had to be retracted due to problems in the interpretation of the crystallographic raw data (19). Subfigures (A, D and E) show the results for a monomer of Sav1866 (PDB code 2HYD, chain A (18)) as determined by X-ray crystallography to 3.0 Å resolution. Although homologous to 1JSQ, this structure differs considerably from the 1JSQ A chain. The ProSA-web results indicate that 2HYD has features characteristic for native structures. (A) ProSA-web z-scores of all protein chains in PDB determined by X-ray crystallography (light blue) or NMR spectroscopy (dark blue) with respect to their length. The plot shows only chains with less than 1000 residues and a z-score ≤ 10. The z-scores of 1JSQ-A and 2HYD-A are highlighted as large dots. (B) Energy plot of 1JSQ-A. Residue energies averaged over a sliding window are plotted as a function of the central residue in the window. A window size of 80 is used due to the large size of the protein chain (default: 40). (C) Jmol Cα trace of 1JSQ-A. Residues are colored from blue to red in the order of increasing residue energy. (D–E) Same as (B–C) but for 2HYD-A. The energy plot shows the local model quality by plotting energies as a function of amino acid sequence position i (see Figure 1B and D for example). In general, positive values correspond to problematic or erroneous parts of a model. A plot of single residue energies usually contains large fluctuations and is of limited value for model evaluation. Hence the plot is smoothed by calculating the average energy over each 40-residue fragment s, which is then assigned to the ‘central’ residue of the fragment at position i + 19. In order to further narrow down those regions in the model that contribute to a bad overall score, ProSA-web visualizes the 3D structure of the protein using the molecule viewer Jmol (http://www.jmol.org). Residues with unusually high energies stand out by color from the rest of the structure (Figure 1C and E). The interactive facilities provided by Jmol, like distance measurements, etc. are available for exploring these regions in more detail.

Protein structure validation by example

In what follows, we provide a typical example for the application of ProSA-web in the validation of protein structures. We analyze two structures determined by X-ray analysis and deposited in PDB. The first is the structure of MsbA from Escherichia coli, a homolog of the multi-drug resistance ATP-binding cassette (ABC) transporters (PDB code 1JSQ, release date 12 September 2001) determined to a resolution of 4.5 Å (17). The structure consists of an N-terminal transmembrane domain and a soluble nucleotide-binding domain. Doubts regarding the quality of 1JSQ were raised after the X-ray structure of a close homolog became available which turned out to be surprisingly different. This second structure, multi-drug ABC transporter Sav1866 from Staphylococcus aureus (PDB code 2HYD, release date 5 September 2006) was determined to a resolution of 3.0 Å (18). Based on the newly determined structure, it was realized that the published structure of the MsbA model is incorrect and as a consequence the related publication had to be retracted (19). Here, we apply the ProSA-web service to the analysis of the incorrect 1JSQ and the recently released 2HYD model. An interesting aspect is that both structures contain a transmembrane domain. Since the energy functions used in ProSA are derived mainly from soluble globular proteins of known structure, it is not clear in advance to what extent the ProSA scores reflect problems in protein structures containing membrane spanning domains. Figure 1A–C shows the results of ProSA-web obtained for 1JSQ (chain A). The z-score of this model is −0.60, a value far too high for a typical native structure. This can clearly be seen when the score is compared to the scores of other experimentally determined protein structures of the size of 1JSQ (Figure 1A). Furthermore, large parts of the energy plot show highly positive energy values, especially the N-terminal half of the sequence which contains part of the membrane spanning domain (Figure 1B). In the Cα trace of the model, residues with high energies are shown in grades of red (Figures 1C), and it is evident from these figures that the N-terminal transmembrane domain as well as the C-terminal globular domain contain regions of offending energies. Figure 1A also shows the location of the z-score for 2HYD (chain A). The value, −8.29, is in the range of native conformations. Overall the residue energies are largely negative with the exception of some peaks in the N-terminal part (Figure 1D). These peaks are supposed to correspond to membrane spanning regions of the protein. In the Cα trace, these regions show up as clusters of residues colored in red (Figure 1E, lower left). The C-terminal domain shows a high number of residues colored in blue and an energy distribution that is entirely below the zero base line, consistent with the parameters of a typical protein (Figure 1D and E).

CONCLUSION

The protein structure community is, to some extent, aware of the fact that the RCSB protein data base contains erroneous structures. But it is quite difficult to spot these errors. Grossly misfolded structures are sometimes revealed after the results of subsequent independent structure determinations become available. Errors in regular PDB files generally remain unknown to the structural community until the corresponding revisions are made available. Hence, diagnostic tools that reveal unusual structures and problematic parts of a structure in a manner that is independent of the experimental data and the specific method employed are essential in many areas of protein structure research. ProSA is a diagnostic tool that is based on the statistical analysis of all available protein structures. The potentials of mean force compiled from the data base provide a statistical average over the known structures. Structures of soluble globular proteins whose z-scores deviate strongly from the data base average are unusual and frequently such structures turn out to be erroneous. For proteins containing membrane spanning regions, the significance of deviations from the average over the data base is less clear. Here, we provide an example of a published structure (1JSQ) that is known to be incorrect as is revealed by subsequent independent X-ray analysis of a related protein yielding a completely different conformation. The ProSA-web result obtained for 1JSQ shows extreme deviations when compared to all the structures in PDB (Figure 1A). In contrast, the score obtained for the related 2HYD structure is close to the data base average. The result demonstrates that also for membrane proteins large deviations from normality may indicate an erroneous structure.

SUPPLEMENTARY DATA

(1) ProSA stand-alone version: http://cms.came.sbg.ac.at/typo3/index.php?id=prosa_download (2) List of studies that use ProSA for model validation: http://www.came.sbg.ac.at/typo3/index.php?id=prosa_literature

19 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

Review 2. Protein structure prediction: inroads to biology.

Authors: Donald Petrey; Barry Honig
Journal: Mol Cell Date: 2005-12-22 Impact factor: 17.970

3. Solution structure of human prolactin.

Authors: Kaare Teilum; Jeffrey C Hoch; Vincent Goffin; Sandrina Kinet; Joseph A Martial; Birthe B Kragelund
Journal: J Mol Biol Date: 2005-08-26 Impact factor: 5.469

4. Protein sequence randomization: efficient estimation of protein stability using knowledge-based potentials.

Authors: Markus Wiederstein; Manfred J Sippl
Journal: J Mol Biol Date: 2004-12-13 Impact factor: 5.469

5. Retraction.

Authors: Geoffrey Chang; Christopher B Roth; Christopher L Reyes; Owen Pornillos; Yen-Ju Chen; Andy P Chen
Journal: Science Date: 2006-12-22 Impact factor: 47.728

Review 6. Knowledge-based potentials for proteins.

Authors: M J Sippl
Journal: Curr Opin Struct Biol Date: 1995-04 Impact factor: 6.809

7. Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins.

Authors: M J Sippl
Journal: J Mol Biol Date: 1990-06-20 Impact factor: 5.469

Review 8. Boltzmann's principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures.

Authors: M J Sippl
Journal: J Comput Aided Mol Des Date: 1993-08 Impact factor: 3.686

9. Recognition of errors in three-dimensional structures of proteins.

Authors: M J Sippl
Journal: Proteins Date: 1993-12

10. Engineered superoxide dismutase monomers for superoxide biosensor applications.

Authors: Moritz K Beissenhirtz; Frieder W Scheller; Maria S Viezzoli; Fred Lisdat
Journal: Anal Chem Date: 2006-02-01 Impact factor: 6.986

1240 in total

1. Molecular modeling studies of Fatty acyl-CoA synthetase (FadD13) from Mycobacterium tuberculosis--a potential target for the development of antitubercular drugs.

Authors: Nidhi Jatana; Sarvesh Jangid; Garima Khare; Anil K Tyagi; Narayanan Latha
Journal: J Mol Model Date: 2010-05-08 Impact factor: 1.810

2. CDH3-Related Syndromes: Report on a New Mutation and Overview of the Genotype-Phenotype Correlations.

Authors: L Basel-Vanagaite; M Pasmanik-Chor; R Lurie; A Yeheskel; K W Kjaer
Journal: Mol Syndromol Date: 2011-04-07

3. Identification and structural characterization of novel cyclotide with activity against an insect pest of sugar cane.

Authors: Michelle F S Pinto; Isabel C M Fensterseifer; Ludovico Migliolo; Daniel A Sousa; Guy de Capdville; Jorge W Arboleda-Valencia; Michelle L Colgrave; David J Craik; Beatriz S Magalhães; Simoni C Dias; Octávio L Franco
Journal: J Biol Chem Date: 2011-11-10 Impact factor: 5.157

4. The primary diterpene synthase products of Picea abies levopimaradiene/abietadiene synthase (PaLAS) are epimers of a thermally unstable diterpenol.

Authors: Christopher I Keeling; Lina L Madilao; Philipp Zerbe; Harpreet K Dullat; Jörg Bohlmann
Journal: J Biol Chem Date: 2011-04-25 Impact factor: 5.157

5. Myosin B of Plasmodium falciparum (PfMyoB): in silico prediction of its three-dimensional structure and its possible interaction with MTIP.

Authors: Paula C Hernández; Liliana Morales; Isabel C Castellanos; Moisés Wasserman; Jacqueline Chaparro-Olaya
Journal: Parasitol Res Date: 2017-03-07 Impact factor: 2.289

6. Structural studies of E73 from a hyperthermophilic archaeal virus identify the "RH3" domain, an elaborated ribbon-helix-helix motif involved in DNA recognition.

Authors: Casey Schlenker; Anupam Goel; Brian P Tripet; Smita Menon; Taylor Willi; Mensur Dlakić; Mark J Young; C Martin Lawrence; Valérie Copié
Journal: Biochemistry Date: 2012-03-22 Impact factor: 3.162

7. Molecular characterization, modeling, in silico analysis of equine pituitary gonadotropin alpha subunit and docking interaction studies with ganirelix.

Authors: Anuradha Bhardwaj; Varij Nayan; Parvati Sharma; Sanjay Kumar; Yash Pal; Jitender Singh
Journal: In Silico Pharmacol Date: 2017-07-18

8. Influence of N- and/or C-terminal regions on activity, expression, characteristics and structure of lipase from Geobacillus sp. 95.

Authors: Renata Gudiukaitė; Audrius Gegeckas; Darius Kazlauskas; Donaldas Citavicius
Journal: Extremophiles Date: 2013-11-28 Impact factor: 2.395

9. Genomic analysis reveals widespread occurrence of new classes of copper nitrite reductases.

Authors: Mark J Ellis; J Günter Grossmann; Robert R Eady; S Samar Hasnain
Journal: J Biol Inorg Chem Date: 2007-08-22 Impact factor: 3.358

10. Searching whole genome sequences for biochemical identification features of emerging and reemerging pathogenic Corynebacterium species.

Authors: André S Santos; Rommel T Ramos; Artur Silva; Raphael Hirata; Ana L Mattos-Guaraldi; Roberto Meyer; Vasco Azevedo; Liza Felicori; Luis G C Pacheco
Journal: Funct Integr Genomics Date: 2018-05-11 Impact factor: 3.410