Literature DB >> 15212693

Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins.

Francesca Diella1, Scott Cameron, Christine Gemünd, Rune Linding, Allegra Via, Bernhard Kuster, Thomas Sicheritz-Pontén, Nikolaj Blom, Toby J Gibson.   

Abstract

BACKGROUND: Post-translational phosphorylation is one of the most common protein modifications. Phosphoserine, threonine and tyrosine residues play critical roles in the regulation of many cellular processes. The fast growing number of research reports on protein phosphorylation points to a general need for an accurate database dedicated to phosphorylation to provide easily retrievable information on phosphoproteins. DESCRIPTION: Phospho.ELM http://phospho.elm.eu.org is a new resource containing experimentally verified phosphorylation sites manually curated from the literature and is developed as part of the ELM (Eukaryotic Linear Motif) resource. Phospho.ELM constitutes the largest searchable collection of phosphorylation sites available to the research community. The Phospho.ELM entries store information about substrate proteins with the exact positions of residues known to be phosphorylated by cellular kinases. Additional annotation includes literature references, subcellular compartment, tissue distribution, and information about the signaling pathways involved as well as links to the molecular interaction database MINT. Phospho.ELM version 2.0 contains 1703 phosphorylation site instances for 556 phosphorylated proteins.
CONCLUSION: Phospho.ELM will be a valuable tool both for molecular biologists working on protein phosphorylation sites and for bioinformaticians developing computational predictions on the specificity of phosphorylation reactions.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15212693      PMCID: PMC449700          DOI: 10.1186/1471-2105-5-79

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Background

The reversible phosphorylation of serine, threonine and tyrosine residues by enzymes of the kinase and phosphatase superfamilies is the most abundant post translational modification in intracellular proteins [1,2] and is an important mechanism for modulating (regulating) many cellular processes such as proliferation, differentiation and apoptosis. Eukaryotic protein kinases form one of the largest multigene families, and the full sequencing of the human genome has allowed the identification of almost all human protein kinases, representing about 1.7% of all human genes [3]. The role of an individual protein kinase in a particular cellular process, however, will be fully explained only when the basis for kinase substrate specificity will be better understood. Determining the substrate specificity of protein kinases is still one of the major challenges in molecular biology. Phosphorylation site predictors such as the CBS predictor NetPhos [4] based on artificial neural networks [5,6], or Scansite [7] based on peptide library derived position-specific scoring matrices (PSSM) [8] have gone some way to allowing molecular biologists to identify potential kinase substrate sites in query proteins, but suffer to a degree from over-prediction. The ELM resource attempts to reduce such problems using contextual filtering of motifs based on structure, cell compartment, taxonomic limits, and other properties of proteins [9]. Due to the biological importance of protein kinases in cell signaling and the steadily growing volume of reports identifying phosphorylation sites [10] it has become impractical for experimental molecular biologists to keep track of all the phosphorylation modifications of proteins within their area of research. Furthermore, large-scale proteomic and system biology approaches to cell regulation cannot succeed without full access to phosphorylation data. There is therefore a need to create and maintain a comprehensive database of known, experimentally verified phosphorylation sites within proteins. We describe here Phospho.ELM [11], a server interfaced to a manually curated database of phosphorylation sites (instances) that provides easy access to information from the primary scientific literature concerning experimentally verified serine, threonine and tyrosine phosphorylation sites in eukaryotic proteins.

Construction and content

Phospho.ELM is developed and deployed with open source software. The database management system used is PostgreSQL [12]. The software was developed in Python 2.2 including some modules from the BioPython.org project for retrieval of information from SWISS-PROT and the PyGreSQL module for PostgreSQL interfacing. The web interface software uses the CGI model framework [13]. The Phospho.ELM 1.0 database contained a dataset of 289 proteins. The current release (Phospho.ELM 2.0) has integrated data from PhosphoBase to give a total of 556 proteins (299 human, 52 mouse, 54 rat, and 151 from other species). The Phospho.ELM dataset represents the largest collection of experimentally verified phosphorylation sites: the annotated proteins contain 556 tyrosine, 913 serine and 234 threonine phosphorylation sites (instances) that are verified substrates for 119 different protein kinases (Table 1).
Table 1

Selected protein kinases, their class, the number of known protein substrates and the instances recorded in Phospho.ELM.

KinaseTypeSubstratesInstances
CK2Ser/Thr kinase54138
PKASer/Thr kinase88170
PDKISer/Thr kinase1217
Srcnon-receptor Tyr Kinase4069
Ablnon-receptor Tyr Kinase1421
FAKnon-receptor Tyr Kinase711
IRreceptor Tyr Kinase1136
EGFRreceptor Tyr Kinase1943
In the Phospho.ELM database information is presented in two classes, instance and phosphoprotein. The key information consists of the phosphorylated site (instance) and its flanking sequence within a protein, for which experimental evidence has been found in the literature. Moreover, annotations to each instance include (where known) the kinase(s) that phosphorylate(s) the given site, the domain(s) that bind to a phosphorylated motif (this is particularly relevant for tyrosine phosphorylation, e.g. SH2), and a link to the ELM server to retrieve further information about the kinase and the regular expression used for prediction of kinase substrates (see Fig. 1). Where available, hyperlinks are provided to protein structures containing phosphorylated residues [14]. Furthermore, additional information for each protein kinase substrate includes the subcellular compartment (annotated with Gene Ontology terms [15,16]), tissue distribution, a list of interaction partners derived from the MINT database [17], and a diagram of a signaling pathway in which the protein is involved. When one is available we provide a link to the BioCarta-Charting Pathways of Life [18]. Controlled vocabularies to describe experimental evidence [19] will soon be included in the database.
Figure 1

The simplified Phospho.ELM database scheme. The key data objects are Substrates (phosphoprotein) and Instances for which relevant information is stored, as well as links to external databases. pkey and fkey stand for "primary key" and "foreign key", respectively.

The database can be searched by protein name (for the substrate), kinase name to get a list of known substrates, or by phosphopeptide-binding domain to retrieve all instances interacting with the given domain. An example of a search output is given in Fig. 2.
Figure 2

A) Scheme for the PI3Kp85 protein with domains and phosphorylation sites. B) Output example of keyword search using PI3Kp85. Information about the phosphorylated sites includes the flanking sequence, the PubMed reference, the kinase responsible for the phosphorylation and links to additional information for the substrate and other relevant databases.

Utility and discussion

The phospho.ELM server will allow both 'wet-lab' biologists and bioinformaticians to easily retrieve extensive information about phosphoproteins. Indeed, further advance in the field of kinase-specific phosphorylation site prediction requires the combination of advanced algorithms together with high quality annotation of phosphorylation data. As such, Phospho.ELM is a valuable source of reliable data for the development of new predictors. Currently, sufficient data for training a machine learning method (e.g. circa 25 instances are needed for a neural network) are available only for the most well characterized kinases, however this number is expected to increase rapidly as a result of high-throughput proteomics initiatives. A method for kinase-specific substrate prediction of six S/T-kinases has recently been developed at the Center for Biological Sequence Analysis (N. Blom, personal communication).

Conclusions

Currently the set of known protein modification sites that are used to regulate the cell are poorly integrated into bioinformatics resources. This is hampering the research of systems biologists and research groups large and small. With Phospho.ELM we are working towards improving the catalogue for phosphorylation sites. Users are encouraged to help us to keep the database up-to-date by submitting additional information and their datasets of phosphorylation sites for integration into Phospho.ELM. Those interested in becoming data submission partner can send an email to phospho@elm.eu.org.

Availability and requirements

Phospho.ELM can be accessed on the public Apache2 powered website at .

Author's contributions

FD and SC were responsible for the annotation process and the Web design. Design of the database structure and implementation of the server software is credited to CG. RL contributed to the analysis of the data. AV is involved in linking structural databases. TSP implemented the PhosphoBase database. BK, NB and TJG were responsible for the overall project coordination. All authors read and approved the final manuscript.
  12 in total

1.  Sequence and structure-based prediction of eukaryotic protein phosphorylation sites.

Authors:  N Blom; S Gammeltoft; S Brunak
Journal:  J Mol Biol       Date:  1999-12-17       Impact factor: 5.469

Review 2.  Signaling--2000 and beyond.

Authors:  T Hunter
Journal:  Cell       Date:  2000-01-07       Impact factor: 41.582

3.  The origins of protein phosphorylation.

Authors:  Philip Cohen
Journal:  Nat Cell Biol       Date:  2002-05       Impact factor: 28.824

4.  The Gene Ontology (GO) database and informatics resource.

Authors:  M A Harris; J Clark; A Ireland; J Lomax; M Ashburner; R Foulger; K Eilbeck; S Lewis; B Marshall; C Mungall; J Richter; G M Rubin; J A Blake; C Bult; M Dolan; H Drabkin; J T Eppig; D P Hill; L Ni; M Ringwald; R Balakrishnan; J M Cherry; K R Christie; M C Costanzo; S S Dwight; S Engel; D G Fisk; J E Hirschman; E L Hong; R S Nash; A Sethuraman; C L Theesfeld; D Botstein; K Dolinski; B Feierbach; T Berardini; S Mundodi; S Y Rhee; R Apweiler; D Barrell; E Camon; E Dimmer; V Lee; R Chisholm; P Gaudet; W Kibbe; R Kishore; E M Schwarz; P Sternberg; M Gwinn; L Hannick; J Wortman; M Berriman; V Wood; N de la Cruz; P Tonellato; P Jaiswal; T Seigfried; R White
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

5.  E-MSD: the European Bioinformatics Institute Macromolecular Structure Database.

Authors:  H Boutselakis; D Dimitropoulos; J Fillon; A Golovin; K Henrick; A Hussain; J Ionides; M John; P A Keller; E Krissinel; P McNeil; A Naim; R Newman; T Oldfield; J Pineda; A Rachedi; J Copeland; A Sitnov; S Sobhany; A Suarez-Uruena; J Swaminathan; M Tagari; J Tate; S Tromm; S Velankar; W Vranken
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

6.  ELM server: A new resource for investigating short functional sites in modular eukaryotic proteins.

Authors:  Pål Puntervoll; Rune Linding; Christine Gemünd; Sophie Chabanis-Davidson; Morten Mattingsdal; Scott Cameron; David M A Martin; Gabriele Ausiello; Barbara Brannetti; Anna Costantini; Fabrizio Ferrè; Vincenza Maselli; Allegra Via; Gianni Cesareni; Francesca Diella; Giulio Superti-Furga; Lucjan Wyrwicz; Chenna Ramu; Caroline McGuigan; Rambabu Gudavalli; Ivica Letunic; Peer Bork; Leszek Rychlewski; Bernhard Küster; Manuela Helmer-Citterich; William N Hunter; Rein Aasland; Toby J Gibson
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

Review 7.  MINT: a Molecular INTeraction database.

Authors:  Andreas Zanzoni; Luisa Montecchi-Palazzi; Michele Quondam; Gabriele Ausiello; Manuela Helmer-Citterich; Gianni Cesareni
Journal:  FEBS Lett       Date:  2002-02-20       Impact factor: 4.124

8.  Phosphospecific proteolysis for mapping sites of protein phosphorylation.

Authors:  Zachary A Knight; Birgit Schilling; Richard H Row; Denise M Kenski; Bradford W Gibson; Kevan M Shokat
Journal:  Nat Biotechnol       Date:  2003-08-17       Impact factor: 54.908

9.  The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data.

Authors:  Henning Hermjakob; Luisa Montecchi-Palazzi; Gary Bader; Jérôme Wojcik; Lukasz Salwinski; Arnaud Ceol; Susan Moore; Sandra Orchard; Ugis Sarkans; Christian von Mering; Bernd Roechert; Sylvain Poux; Eva Jung; Henning Mersch; Paul Kersey; Michael Lappe; Yixue Li; Rong Zeng; Debashis Rana; Macha Nikolski; Holger Husi; Christine Brun; K Shanker; Seth G N Grant; Chris Sander; Peer Bork; Weimin Zhu; Akhilesh Pandey; Alvis Brazma; Bernard Jacq; Marc Vidal; David Sherman; Pierre Legrain; Gianni Cesareni; Ioannis Xenarios; David Eisenberg; Boris Steipe; Chris Hogue; Rolf Apweiler
Journal:  Nat Biotechnol       Date:  2004-02       Impact factor: 54.908

10.  A motif-based profile scanning approach for genome-wide prediction of signaling pathways.

Authors:  M B Yaffe; G G Leparc; J Lai; T Obata; S Volinia; L C Cantley
Journal:  Nat Biotechnol       Date:  2001-04       Impact factor: 54.908

View more
  143 in total

1.  Expression2Kinases: mRNA profiling linked to multiple upstream regulatory layers.

Authors:  Edward Y Chen; Huilei Xu; Simon Gordonov; Maribel P Lim; Matthew H Perkins; Avi Ma'ayan
Journal:  Bioinformatics       Date:  2011-11-10       Impact factor: 6.937

2.  PTMScout, a Web resource for analysis of high throughput post-translational proteomics studies.

Authors:  Kristen M Naegle; Melissa Gymrek; Brian A Joughin; Joel P Wagner; Roy E Welsch; Michael B Yaffe; Douglas A Lauffenburger; Forest M White
Journal:  Mol Cell Proteomics       Date:  2010-07-14       Impact factor: 5.911

3.  Global molecular dysfunctions in gastric cancer revealed by an integrated analysis of the phosphoproteome and transcriptome.

Authors:  Tiannan Guo; Sze Sing Lee; Wai Har Ng; Yi Zhu; Chee Sian Gan; Jiang Zhu; Haixia Wang; Shiang Huang; Siu Kwan Sze; Oi Lian Kon
Journal:  Cell Mol Life Sci       Date:  2010-10-16       Impact factor: 9.261

4.  SH2 domains recognize contextual peptide sequence information to determine selectivity.

Authors:  Bernard A Liu; Karl Jablonowski; Eshana E Shah; Brett W Engelmann; Richard B Jones; Piers D Nash
Journal:  Mol Cell Proteomics       Date:  2010-07-13       Impact factor: 5.911

5.  T-LAK cell-originated protein kinase (TOPK) phosphorylation of Prx1 at Ser-32 prevents UVB-induced apoptosis in RPMI7951 melanoma cells through the regulation of Prx1 peroxidase activity.

Authors:  Tatyana A Zykova; Feng Zhu; Tatyana I Vakorina; Jishuai Zhang; Lee Ann Higgins; Darya V Urusova; Ann M Bode; Zigang Dong
Journal:  J Biol Chem       Date:  2010-07-20       Impact factor: 5.157

Review 6.  Toward a complete in silico, multi-layered embryonic stem cell regulatory network.

Authors:  Huilei Xu; Christoph Schaniel; Ihor R Lemischka; Avi Ma'ayan
Journal:  Wiley Interdiscip Rev Syst Biol Med       Date:  2010 Nov-Dec

7.  Probabilistic enrichment of phosphopeptides by their mass defect.

Authors:  Can Bruce; Mark A Shifman; Perry Miller; Erol E Gulcicek
Journal:  Anal Chem       Date:  2006-07-01       Impact factor: 6.986

8.  A quantitative literature-curated gold standard for kinase-substrate pairs.

Authors:  Sara Sharifpoor; Alex N Nguyen Ba; Ji-Young Youn; Ji-Young Young; Dewald van Dyk; Helena Friesen; Alison C Douglas; Christoph F Kurat; Yolanda T Chong; Karen Founk; Alan M Moses; Brenda J Andrews
Journal:  Genome Biol       Date:  2011-04-14       Impact factor: 13.583

9.  AutoMotif Server for prediction of phosphorylation sites in proteins using support vector machine: 2007 update.

Authors:  Dariusz Plewczynski; Adrian Tkacz; Lucjan S Wyrwicz; Leszek Rychlewski; Krzysztof Ginalski
Journal:  J Mol Model       Date:  2007-11-08       Impact factor: 1.810

10.  Loss of post-translational modification sites in disease.

Authors:  Shuyan Li; Lilia M Iakoucheva; Sean D Mooney; Predrag Radivojac
Journal:  Pac Symp Biocomput       Date:  2010
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.