Literature DB >> 17142231

Phospho3D: a database of three-dimensional structures of protein phosphorylation sites.

Andreas Zanzoni1, Gabriele Ausiello, Allegra Via, Pier Federico Gherardini, Manuela Helmer-Citterich.   

Abstract

Phosphorylation is the most common protein post-translational modification. Phosphorylated residues (serine, threonine and tyrosine) play critical roles in the regulation of many cellular processes. Since the amount of data produced by screening assays is growing continuously, the development of computational tools for collecting and analysing experimental data has become a pivotal task for unravelling the complex network of interactions regulating eukaryotic cell life. Here we present Phospho3D, http://cbm.bio.uniroma2.it/phospho3d, a database of 3D structures of phosphorylation sites, which stores information retrieved from the phospho.ELM database and is enriched with structural information and annotations at the residue level. The database also collects the results of a large-scale structural comparison procedure providing clues for the identification of new putative phosphorylation sites.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 17142231      PMCID: PMC1669737          DOI: 10.1093/nar/gkl922

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The phosphorylation of specific protein residues is a crucial event in the regulation of several cellular processes, operating on activation, deactivation or recognition of the target protein. A great deal of eukaryotic proteins (∼30% in the human genome) undergo this reversible post-translational modification (1). Phosphorylation on serine/threonine or tyrosine residues is accomplished by protein kinases (PKs), one of the largest protein families, comprising 1.5–2.5% of all eukaryotic genes (2). Although the amount of data produced in various screening assays is steadily growing (3–6), experimental identification of phosphoproteins and the determination of individual phosphorylation sites remains a difficult and time-consuming task. Hence, the implementation of computational tools proves to be very useful for collecting and analysing experimental data. Several sequence-based methods to predict phosphorylation sites were developed using different computational approaches such as regular expressions with context-based rules (7), position-specific scoring matrices (PSSMs) (8), artificial neural networks (9,10), support vector machines (SVMs) (11,12), hidden Markov models (13) and iterative statistical methods (14). All these methods are based on the hypothesis that the sequence surrounding the phosphorylated residue represents the main determinant for kinase specificity. They are reasonably accurate and work well with a number of specific kinases. However, the specificity determinants and rules remain elusive for a large number of protein kinases that display a number of substrates sharing little or no sequence similarity in the known phosphopeptides. We propose that, at least in some cases, the rules of kinase specificity may reside in the presence of structural determinants which only occasionally overlap with sequence consensi and which might be independent of the residue order in protein sequences. Here we describe Phospho3D, a database of 3D structure of phosphorylation sites. It collects information retrieved from the phospho.ELM database (15) and is enriched with structural information and diverse annotations at the residue level. In addition, the database stores the results of a large-scale local structural comparison which suggest functional annotation of phosphorylation sites by 3D similarity. Cases of significant structural similarity between phosphorylation sites may indicate that they are phosphorylated by the same kinase.

DATABASE CONSTRUCTION AND CONTENT

The Phospho3D database was constructed by collecting data from the phospho.ELM database which gathers experimentally verified phosphorylation sites manually extracted from the literature. The phospho.ELM dataset used in this work (version 4.0) contains 5314 phosphorylation sites, or instances, belonging to 1805 different sequences. The correspondence between phospho.ELM sequences and the Protein Data Bank (PDB) chains was established via the Seq2Struct resource (16), an exhaustive collection of annotated links between SwissProt-TrEMBL and PDB sequences. Links are based on sequence alignment using pre-established highly reliable thresholds. From a list of 4530 sequence–structure links (for further details see website documentation), only the ones having the phosphorylable residue in the alignment region were retained, this resulting in 2726 instances (166 unique phospho.ELM instances on 1219 protein chains). The basic information stored in Phospho3D consists of the instance, its flanking sequence (10 residues) and any residue whose distance from the instance does not exceed 12 Å thus defining a 3D neighbourhood which we define as zone. For each zone, annotation at the residue level is provided, namely solvent accessibility supplied by the NACCESS program (17), secondary structure assignment given by the DSSP program (18) and residue conservation as from the Consurf-HSSP database (19). Users can also retrieve information extracted from the phospho.ELM dataset; for instance, the Medline reference PMID and, when available, the kinase(s) that phosphorylate(s) the given site. In addition, for each zone the results of a large-scale local structural comparison versus a representative dataset of PDB (20) protein chains from eukaryotic organisms are also given. The comparison was carried out using the Query3D sequence/fold independent algorithm (21). Structural matches are assessed by two criteria: structural similarity and biochemical similarity. The structural similarity demands that matching residues have a root mean square deviation (r.m.s.d.) lower than a given threshold, whereas the biochemical similarity is evaluated using a Dayhoff substitution matrix (22). The score of the match is the number of matching residues which fulfil the similarity criteria. The significance of the score is evaluated by calculating the Z-score over the score distribution of the query zone comparison to the whole dataset.

THE WEB INTERFACE

The Phospho3D database can be searched by kinase name, by PDB identification code or keyword. A browsing function has been also implemented. The information returned to the user consists of a brief description of the PDB structure(s) which fulfil the search criterion and of a list of instances presented along with associated information (Figure 1). For each instance, the user can select three options related to the surrounding structural zone: a graphical view using the Jmol Java Applet (); a tabular view reporting the zone annotation at the residue level; a list of 3D matches identified by local structural comparison. Each match can be visualized using Jmol. A tabular view of the matching residues is also presented (Figure 1).
Figure 1

In the central panel a list of instances for the PDB file 1A52 is shown. For each of them, users can visualize the corresponding zone via the Jmol viewer, the annotation at the residue level and the results of the large-scale local structural comparison. For each structural match the score, the Z-score, and the rmsd are reported along with the SCOP fold (27) of the matching PDB files.

In the central panel a list of instances for the PDB file 1A52 is shown. For each of them, users can visualize the corresponding zone via the Jmol viewer, the annotation at the residue level and the results of the large-scale local structural comparison. For each structural match the score, the Z-score, and the rmsd are reported along with the SCOP fold (27) of the matching PDB files.

CONCLUSION AND FUTURE PERSPECTIVES

The Phospho3D database is a useful tool for the analysis of the structural features of experimentally verified phosphorylation sites. Moreover, it provides the results of a large-scale local structural comparison between the zones and a representative set of eukaryotic protein chains. The results of such a comparison identify new putative phosphorylation sites and suggest the kinase(s) responsible for phosphorylation. Phospho3D will be regularly updated as soon as the new Phospho.ELM datasets are released. The annotations will be integrated as a feature in the pdbFun server (23). We are also planning to identify and annotate those sites which are recognized by protein phosphatases and phosphoresidues-binding modules (24–26). The Phospho3D dataset (annotations at the residue level and structural comparison results) is available upon request.
  25 in total

1.  Large-scale characterization of HeLa cell nuclear phosphoproteins.

Authors:  Sean A Beausoleil; Mark Jedrychowski; Daniel Schwartz; Joshua E Elias; Judit Villén; Jiaxu Li; Martin A Cohn; Lewis C Cantley; Steven P Gygi
Journal:  Proc Natl Acad Sci U S A       Date:  2004-08-09       Impact factor: 11.205

2.  Prediction of phosphorylation sites using SVMs.

Authors:  Jong Hun Kim; Juyoung Lee; Bermseok Oh; Kuchan Kimm; Insong Koh
Journal:  Bioinformatics       Date:  2004-07-01       Impact factor: 6.937

3.  The ConSurf-HSSP database: the mapping of evolutionary conservation among homologs onto PDB structures.

Authors:  Fabian Glaser; Yossi Rosenberg; Amit Kessel; Tal Pupko; Nir Ben-Tal
Journal:  Proteins       Date:  2005-02-15

4.  An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets.

Authors:  Daniel Schwartz; Steven P Gygi
Journal:  Nat Biotechnol       Date:  2005-11       Impact factor: 54.908

5.  A support vector machine approach to the identification of phosphorylation sites.

Authors:  Dariusz Plewczyński; Adrian Tkacz; Adam Godzik; Leszek Rychlewski
Journal:  Cell Mol Biol Lett       Date:  2005       Impact factor: 5.787

6.  Incorporating hidden Markov models for identifying protein kinase-specific phosphorylation sites.

Authors:  Hsien-Da Huang; Tzong-Yi Lee; Shih-Wei Tzeng; Li-Cheng Wu; Jorng-Tzong Horng; Ann-Ping Tsou; Kuan-Tsae Huang
Journal:  J Comput Chem       Date:  2005-07-30       Impact factor: 3.376

7.  SCOP: a structural classification of proteins database for the investigation of sequences and structures.

Authors:  A G Murzin; S E Brenner; T Hubbard; C Chothia
Journal:  J Mol Biol       Date:  1995-04-07       Impact factor: 5.469

8.  Seq2Struct: a resource for establishing sequence-structure links.

Authors:  Allegra Via; Andreas Zanzoni; Manuela Helmer-Citterich
Journal:  Bioinformatics       Date:  2004-09-28       Impact factor: 6.937

9.  Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins.

Authors:  Francesca Diella; Scott Cameron; Christine Gemünd; Rune Linding; Allegra Via; Bernhard Kuster; Thomas Sicheritz-Pontén; Nikolaj Blom; Toby J Gibson
Journal:  BMC Bioinformatics       Date:  2004-06-22       Impact factor: 3.169

10.  pdbFun: mass selection and fast comparison of annotated PDB residues.

Authors:  Gabriele Ausiello; Andreas Zanzoni; Daniele Peluso; Allegra Via; Manuela Helmer-Citterich
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

View more
  18 in total

Review 1.  Toward a complete in silico, multi-layered embryonic stem cell regulatory network.

Authors:  Huilei Xu; Christoph Schaniel; Ihor R Lemischka; Avi Ma'ayan
Journal:  Wiley Interdiscip Rev Syst Biol Med       Date:  2010 Nov-Dec

2.  AMASS: a database for investigating protein structures.

Authors:  Clinton J Mielke; Lawrence J Mandarino; Valentin Dinu
Journal:  Bioinformatics       Date:  2014-02-03       Impact factor: 6.937

Review 3.  Human Protein Reference Database and Human Proteinpedia as resources for phosphoproteome analysis.

Authors:  Renu Goel; H C Harsha; Akhilesh Pandey; T S Keshava Prasad
Journal:  Mol Biosyst       Date:  2011-12-08

4.  Selection on meiosis genes in diploid and tetraploid Arabidopsis arenosa.

Authors:  Kevin M Wright; Brian Arnold; Katherine Xue; Maria Šurinová; Jeremy O'Connell; Kirsten Bomblies
Journal:  Mol Biol Evol       Date:  2014-12-26       Impact factor: 16.240

Review 5.  Current status of PTMs structural databases: applications, limitations and prospects.

Authors:  Alexandre G de Brevern; Joseph Rebehmed
Journal:  Amino Acids       Date:  2022-01-12       Impact factor: 3.520

6.  Phospho3D 2.0: an enhanced database of three-dimensional structures of phosphorylation sites.

Authors:  Andreas Zanzoni; Daniel Carbajo; Francesca Diella; Pier Federico Gherardini; Anna Tramontano; Manuela Helmer-Citterich; Allegra Via
Journal:  Nucleic Acids Res       Date:  2010-10-21       Impact factor: 16.971

7.  Phospho.ELM: a database of phosphorylation sites--update 2011.

Authors:  Holger Dinkel; Claudia Chica; Allegra Via; Cathryn M Gould; Lars J Jensen; Toby J Gibson; Francesca Diella
Journal:  Nucleic Acids Res       Date:  2010-11-09       Impact factor: 16.971

8.  A structure filter for the Eukaryotic Linear Motif Resource.

Authors:  Allegra Via; Cathryn M Gould; Christine Gemünd; Toby J Gibson; Manuela Helmer-Citterich
Journal:  BMC Bioinformatics       Date:  2009-10-24       Impact factor: 3.169

9.  A comprehensive resource for integrating and displaying protein post-translational modifications.

Authors:  Tzong-Yi Lee; Justin Bo-Kai Hsu; Wen-Chi Chang; Ting-Yuan Wang; Po-Chiang Hsu; Hsien-Da Huang
Journal:  BMC Res Notes       Date:  2009-06-23

10.  Functional organization of the S. cerevisiae phosphorylation network.

Authors:  Dorothea Fiedler; Hannes Braberg; Monika Mehta; Gal Chechik; Gerard Cagney; Paromita Mukherjee; Andrea C Silva; Michael Shales; Sean R Collins; Sake van Wageningen; Patrick Kemmeren; Frank C P Holstege; Jonathan S Weissman; Michael-Christopher Keogh; Daphne Koller; Kevan M Shokat; Nevan J Krogan
Journal:  Cell       Date:  2009-03-06       Impact factor: 41.582

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.