Literature DB >> 17962309

Phospho.ELM: a database of phosphorylation sites--update 2008.

Francesca Diella1, Cathryn M Gould, Claudia Chica, Allegra Via, Toby J Gibson.   

Abstract

Phospho.ELM is a manually curated database of eukaryotic phosphorylation sites. The resource includes data collected from published literature as well as high-throughput data sets. The current release of Phospho.ELM (version 7.0, July 2007) contains 4078 phospho-protein sequences covering 12 025 phospho-serine, 2362 phospho-threonine and 2083 phospho-tyrosine sites. The entries provide information about the phosphorylated proteins and the exact position of known phosphorylated instances, the kinases responsible for the modification (where known) and links to bibliographic references. The database entries have hyperlinks to easily access further information from UniProt, PubMed, SMART, ELM, MSD as well as links to the protein interaction databases MINT and STRING. A new BLAST search tool, complementary to retrieval by keyword and UniProt accession number, allows users to submit a protein query (by sequence or UniProt accession) to search against the curated data set of phosphorylated peptides. Phospho.ELM is available on line at: http://phospho.elm.eu.org.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17962309      PMCID: PMC2238828          DOI: 10.1093/nar/gkm772

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Protein phosphorylation is one of the most-studied post-translational modifications: it has been estimated that up to one-third of the proteins may be modified by protein kinases (1). This ubiquitous regulatory mechanism controls many biological processes, including cellular growth, differentiation and DNA repair (2). Knowing the phosphorylated residues in proteins is central to understanding the various signaling events in which they partake; therefore much effort has been invested in trying to identify and characterize phosphorylation sites. Traditional methods for measuring protein phosphorylation, such as mutational analysis and Edman degradation chemistry on phosphopeptides, have the disadvantage of being relatively time consuming and laborious, requiring large amounts of purified protein. On the other hand, mass spectrometry-(MS)based methods have emerged as powerful tools for the analysis of post-translational modifications due to higher sensitivity, selectively and speed. Over the past few years MS, combined with enrichment strategies for phosphorylated proteins e.g. isotope-coded affinity tags (ICAT) (3), stable isotopic amino acids in cell culture (SILAC) (4) and isobaric reagent iTRAQ (5), has been increasingly employed to identify novel phosphorylation sites. One consequence of this change in phosphorylation research is that bioinformatics resources need to be adapted and expanded to accommodate the new data. For the thousands of phosphorylation sites identified by phosphoproteomic MS the information on which kinase phosphorylates them, and consequently the pathway in which they act, is still missing. To improve the link between experimentally identified phosphorylation sites and protein kinases, Linding and collaborators (6) have recently used the Phospho.ELM data set to develop and train a method, NetworKIN (http://networkin.info/) that combines computational methods for predicting which group of kinases are likely to phosphorylate a given site with information about signaling pathways and protein interaction data. The analysis of protein phosphorylation by MS will clearly prove to be an invaluable source of information for understanding cellular signaling. For this reason, we consider it increasingly important to create and maintain publicly available phospho-protein databases, where the exponentially increasing number of known phosphorylation sites (7–12) can be easily accessed by the research community.

MATERIALS AND METHODS

The Phospho.ELM database

The content and the format of Phospho.ELM have been previously described in Diella et al. (13). While the general format of the database has remained essentially unchanged, some additions have been implemented to improve the data retrieval and presentation. The updated version also contains a much larger number of phosphorylation sites (see Figure 1), a new search tool based on sequence comparison and a Web Services interface.
Figure 1.

The plot shows the growth of the Phospho.ELM data set beginning with version 1.0 in December 2003 (panel A). The exponential growth of the phosphorylation instances from Version 5.0 is mainly due to incorporation of the high-throughput data sets. The overlapping of the instances derived from low-throughput (LTP) and high-throughput (HTP) experiments is also shown (panel B).

The plot shows the growth of the Phospho.ELM data set beginning with version 1.0 in December 2003 (panel A). The exponential growth of the phosphorylation instances from Version 5.0 is mainly due to incorporation of the high-throughput data sets. The overlapping of the instances derived from low-throughput (LTP) and high-throughput (HTP) experiments is also shown (panel B). The user can query the database by protein name, UniProt accession number/identifier, kinase name or binding motif to get a list of all known phosphorylation sites (instances) in a specific protein. The main results page summarizes information about the substrate protein (e.g. a brief description of the protein, protein type, the UniProt protein identification number), the phosphorylation sites contained within it and its surrounding amino acids (+/−10). The annotations to each instance include (where available) the PubMed reference, the kinase(s) phosphorylating the given site, the phospho-peptide binding domain(s) and a link to the ELM server (14) to retrieve further information about the kinase. Also where available, hyperlinks are provided to protein structures containing phosphorylated residues (15). Recently, Zanzoni and collaborators (16) have developed Phospho3D, a database of three-dimensional structures of phosphorylation sites, which stores data derived from the Phospho.ELM database and is focused on the annotation of structural information at the residue level. Additional information for each protein kinase substrate includes the subcellular compartment [annotated with the Gene Ontology terms (17)], the tissue distribution and a list of interaction partners derived from the MINT (18) and STRING databases (19). The STRING interactors are shown in a summary graphic (network) that opens in a pop-up window. The network views provide links to the STRING database, where the information relative to the interactors is described in detail.

Data set

The current release of the Phospho.ELM data set (version 7.0, July 2007) contains 4078 phospho-protein sequences covering 12 025 phospho-serine, 2362 phospho-threonine and 2083 phospho-tyrosine sites with a total of 16 470 sites. The dataset is currently limited to metazoan species. This is partly due to our annotation capacity and partly because the kinases and nomenclature are so different in other lineages that they should be placed in separate databases. Although no animal species is purposely excluded from the data, currently human (11 197 phospho-sites) and mouse (2073 phospho-sites) are the most representative species due to the prevalence of their use as model organisms in biological research e.g. phosphoproteomic MS analyses have been mainly performed on human/mouse cell lines/tissues. For each phospho-site we report if the phosphorylation evidence has been identified by small-scale analysis (low throughput; LTP) that typically focus on one or a few proteins at a time or by large-scale experiments (high throughput; HTP), which mainly apply MS techniques. It is noteworthy that in our data set there is a small overlap between instances identified by LTP and HTP experiments (Figure 1). This implies that most of the human phosphoproteome remains to be discovered. Figure 1 also shows that the rate of identification of additional phosphorylation sites on proteins has been increasing at a much faster rate than identification of novel phosphoproteins (e.g. see the srmm2 protein, UniProt accession Q9UQ35). While revealing that many more proteins are heavily phosphorylated than was previously known, it may be worth investigating whether the data also imply a strong bias in the proteins retrieved in the MS experiments. The kinase responsible for the phosphorylation is known for ∼21% of the Phospho.ELM instances. Currently, more than 250 kinases are annotated in the database (for a detailed list of the kinases see the related information at the Phospho.ELM home page).

The PhosphoBLAST search tool

A BLAST search has been implemented which is complementary to the retrieval by keyword or UniProt accession/identifier. This tool identifies phospho-peptides contained in the query sequence that match those stored in Phospho.ELM (Figure 2). It consists of a two-step process: a BLAST (20) search and a parsing of the BLAST output. The BLAST program performs a sequence-similarity search against the Phospho.ELM data set of peptides (16 471), which have been experimentally proven to contain phospho-residues. It returns a set of local gapped alignments between the query sequence peptides and the phospho-peptides. In the parsing stage, those matches that present more than 70% sequence similarity and that conserve the phospho-residue in the same position as the corresponding phospho-peptide are selected. The final output shows the list of chosen matches, with their alignments and links to database records.
Figure 2.

Output example of a PhosphoBLAST Search using as query the Danio rerio Aurora A kinase sequence. The summary graphic shows the phospho-hits on the query sequence and features from SMART. Details about the matches are shown below in the results table. Clicking on the ‘subject name’ the users can retrieve additional information about the matched Phospho.ELM phosphorylated sites, including the flanking sequence, the PubMed reference, the kinase responsible for the phosphorylation (where known) and links to additional information for the substrate and other relevant databases.

Output example of a PhosphoBLAST Search using as query the Danio rerio Aurora A kinase sequence. The summary graphic shows the phospho-hits on the query sequence and features from SMART. Details about the matches are shown below in the results table. Clicking on the ‘subject name’ the users can retrieve additional information about the matched Phospho.ELM phosphorylated sites, including the flanking sequence, the PubMed reference, the kinase responsible for the phosphorylation (where known) and links to additional information for the substrate and other relevant databases. The PhosphoBLAST tool does not aim at predicting phosphorylation motifs in the query protein and is primarily useful for retrieving phosphorylation sites that are conserved in related proteins (whether orthologs or paralogs). Nevertheless, unrelated query proteins occasionally yield matching phosphorylation sites in Phospho.ELM that can be equally interesting: it will be up to the user to consider carefully the possible biological meaning (e.g. shared kinase and/or phospho-peptide-binding domain specificities) associated with these match(es).

Web service

In order to facilitate remote tool integration, a Web Service to access the phospho.ELM database programmatically has been implemented and is available at: http://phospho.elm.eu.org/webservice/phosphoELMdb.wsdl. The WSDL (Web Service Description Language) (21) file is WS-I compatible. The WS-Interoperability Basic Profile (22) proposes a set of rules to achieve interoperability of web services between different platforms. The WSDL file implements an XML wrapped document/literal style (23). The backend code is implemented in Java and runs on Axis2 (24) inside a Tomcat servlet container (25). The functionality provided by the Web Service encompasses the current interface functionality with some additional filters. The extra options implemented in the Web Service are to search by PubMed ID and to retrieve all instances with a PDB entry assigned to them.

Database access

Phospho.ELM is developed and deployed with open source software (26). Software is developed in Python including some modules from the BioPython project (27) to retrieve information from UniProt and PubMed. The web interface software uses the CGImodel framework (28). The data set is publicly available for academic users. Phospho.ELM can be accessed on the public Apache2 powered website at: http://phospho.elm.eu.org.

SUMMARY

Since its inception in 2004, the Phospho.ELM data set has been adopted for numerous bioinformatics tools and pipelines e.g. the protein kinase-specific prediction server GPS (group-based phosphorylation scoring method) (29), the RLIMS-P, a rule-based text-mining program designed to extract information on phosphorylation sites from abstracts (30), PhosphoregDB, a database of tissue and sub-cellular distribution of mammalian protein kinases and phosphatases (31), and NetworKin, a computational approach which combines consensus sequence motifs and contextual data to predict which kinases phosphorylate experimentally identified phosphorylation sites (6). While anticipating that the size of the Phospho.ELM data set will constantly grow, we consider that the resource should be kept relatively lean in terms of the categories of data to be incorporated. On the other hand, links to external resources are under regular review and likely to be augmented from time to time. For example, resources such as KEGG (32) and Reactome (33) that annotate cell signaling networks are increasing their pathway coverage and it will clearly become essential to provide links to such resources. In the near future we intend to equip Phospho.ELM with links to the predicted kinase-substrate relations from the NetworKIN database (R.Linding, et al., submitted for publication).
  26 in total

Review 1.  Signaling--2000 and beyond.

Authors:  T Hunter
Journal:  Cell       Date:  2000-01-07       Impact factor: 41.582

2.  The Gene Ontology (GO) database and informatics resource.

Authors:  M A Harris; J Clark; A Ireland; J Lomax; M Ashburner; R Foulger; K Eilbeck; S Lewis; B Marshall; C Mungall; J Richter; G M Rubin; J A Blake; C Bult; M Dolan; H Drabkin; J T Eppig; D P Hill; L Ni; M Ringwald; R Balakrishnan; J M Cherry; K R Christie; M C Costanzo; S S Dwight; S Engel; D G Fisk; J E Hirschman; E L Hong; R S Nash; A Sethuraman; C L Theesfeld; D Botstein; K Dolinski; B Feierbach; T Berardini; S Mundodi; S Y Rhee; R Apweiler; D Barrell; E Camon; E Dimmer; V Lee; R Chisholm; P Gaudet; W Kibbe; R Kishore; E M Schwarz; P Sternberg; M Gwinn; L Hannick; J Wortman; M Berriman; V Wood; N de la Cruz; P Tonellato; P Jaiswal; T Seigfried; R White
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  E-MSD: the European Bioinformatics Institute Macromolecular Structure Database.

Authors:  H Boutselakis; D Dimitropoulos; J Fillon; A Golovin; K Henrick; A Hussain; J Ionides; M John; P A Keller; E Krissinel; P McNeil; A Naim; R Newman; T Oldfield; J Pineda; A Rachedi; J Copeland; A Sitnov; S Sobhany; A Suarez-Uruena; J Swaminathan; M Tagari; J Tate; S Tromm; S Velankar; W Vranken
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

4.  ELM server: A new resource for investigating short functional sites in modular eukaryotic proteins.

Authors:  Pål Puntervoll; Rune Linding; Christine Gemünd; Sophie Chabanis-Davidson; Morten Mattingsdal; Scott Cameron; David M A Martin; Gabriele Ausiello; Barbara Brannetti; Anna Costantini; Fabrizio Ferrè; Vincenza Maselli; Allegra Via; Gianni Cesareni; Francesca Diella; Giulio Superti-Furga; Lucjan Wyrwicz; Chenna Ramu; Caroline McGuigan; Rambabu Gudavalli; Ivica Letunic; Peer Bork; Leszek Rychlewski; Bernhard Küster; Manuela Helmer-Citterich; William N Hunter; Rein Aasland; Toby J Gibson
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

5.  Robust phosphoproteomic profiling of tyrosine phosphorylation sites from human T cells using immobilized metal affinity chromatography and tandem mass spectrometry.

Authors:  Laurence M Brill; Arthur R Salomon; Scott B Ficarro; Mridul Mukherji; Michelle Stettler-Gill; Eric C Peters
Journal:  Anal Chem       Date:  2004-05-15       Impact factor: 6.986

6.  Large-scale characterization of HeLa cell nuclear phosphoproteins.

Authors:  Sean A Beausoleil; Mark Jedrychowski; Daniel Schwartz; Joshua E Elias; Judit Villén; Jiaxu Li; Martin A Cohn; Lewis C Cantley; Steven P Gygi
Journal:  Proc Natl Acad Sci U S A       Date:  2004-08-09       Impact factor: 11.205

7.  Systematic discovery of in vivo phosphorylation networks.

Authors:  Rune Linding; Lars Juhl Jensen; Gerard J Ostheimer; Marcel A T M van Vugt; Claus Jørgensen; Ioana M Miron; Francesca Diella; Karen Colwill; Lorne Taylor; Kelly Elder; Pavel Metalnikov; Vivian Nguyen; Adrian Pasculescu; Jing Jin; Jin Gyoon Park; Leona D Samson; James R Woodgett; Robert B Russell; Peer Bork; Michael B Yaffe; Tony Pawson
Journal:  Cell       Date:  2007-06-14       Impact factor: 41.582

8.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.

Authors:  S Karlin; S F Altschul
Journal:  Proc Natl Acad Sci U S A       Date:  1990-03       Impact factor: 11.205

9.  Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics.

Authors:  Shao-En Ong; Blagoy Blagoev; Irina Kratchmarova; Dan Bach Kristensen; Hanno Steen; Akhilesh Pandey; Matthias Mann
Journal:  Mol Cell Proteomics       Date:  2002-05       Impact factor: 5.911

10.  Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins.

Authors:  Francesca Diella; Scott Cameron; Christine Gemünd; Rune Linding; Allegra Via; Bernhard Kuster; Thomas Sicheritz-Pontén; Nikolaj Blom; Toby J Gibson
Journal:  BMC Bioinformatics       Date:  2004-06-22       Impact factor: 3.169

View more
  118 in total

1.  PTMScout, a Web resource for analysis of high throughput post-translational proteomics studies.

Authors:  Kristen M Naegle; Melissa Gymrek; Brian A Joughin; Joel P Wagner; Roy E Welsch; Michael B Yaffe; Douglas A Lauffenburger; Forest M White
Journal:  Mol Cell Proteomics       Date:  2010-07-14       Impact factor: 5.911

2.  Musite, a tool for global prediction of general and kinase-specific phosphorylation sites.

Authors:  Jianjiong Gao; Jay J Thelen; A Keith Dunker; Dong Xu
Journal:  Mol Cell Proteomics       Date:  2010-08-11       Impact factor: 5.911

3.  Systems pharmacology of arrhythmias.

Authors:  Seth I Berger; Avi Ma'ayan; Ravi Iyengar
Journal:  Sci Signal       Date:  2010-04-20       Impact factor: 8.192

Review 4.  The regulatory crosstalk between kinases and proteases in cancer.

Authors:  Carlos López-Otín; Tony Hunter
Journal:  Nat Rev Cancer       Date:  2010-03-19       Impact factor: 60.716

5.  Phosphorylated and nonphosphorylated serine and threonine residues evolve at different rates in mammals.

Authors:  Sean Chun-Chang Chen; Feng-Chi Chen; Wen-Hsiung Li
Journal:  Mol Biol Evol       Date:  2010-06-09       Impact factor: 16.240

6.  Global molecular dysfunctions in gastric cancer revealed by an integrated analysis of the phosphoproteome and transcriptome.

Authors:  Tiannan Guo; Sze Sing Lee; Wai Har Ng; Yi Zhu; Chee Sian Gan; Jiang Zhu; Haixia Wang; Shiang Huang; Siu Kwan Sze; Oi Lian Kon
Journal:  Cell Mol Life Sci       Date:  2010-10-16       Impact factor: 9.261

7.  SH2 domains recognize contextual peptide sequence information to determine selectivity.

Authors:  Bernard A Liu; Karl Jablonowski; Eshana E Shah; Brett W Engelmann; Richard B Jones; Piers D Nash
Journal:  Mol Cell Proteomics       Date:  2010-07-13       Impact factor: 5.911

Review 8.  Systems biology of ageing and longevity.

Authors:  Thomas B L Kirkwood
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2011-01-12       Impact factor: 6.237

9.  A tissue-specific atlas of mouse protein phosphorylation and expression.

Authors:  Edward L Huttlin; Mark P Jedrychowski; Joshua E Elias; Tapasree Goswami; Ramin Rad; Sean A Beausoleil; Judit Villén; Wilhelm Haas; Mathew E Sowa; Steven P Gygi
Journal:  Cell       Date:  2010-12-23       Impact factor: 41.582

Review 10.  Toward a complete in silico, multi-layered embryonic stem cell regulatory network.

Authors:  Huilei Xu; Christoph Schaniel; Ihor R Lemischka; Avi Ma'ayan
Journal:  Wiley Interdiscip Rev Syst Biol Med       Date:  2010 Nov-Dec
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.