| Literature DB >> 21062810 |
Holger Dinkel1, Claudia Chica, Allegra Via, Cathryn M Gould, Lars J Jensen, Toby J Gibson, Francesca Diella.
Abstract
The Phospho.ELM resource (http://phospho.elm.eu.org) is a relational database designed to store in vivo and in vitro phosphorylation data extracted from the scientific literature and phosphoproteomic analyses. The resource has been actively developed for more than 7 years and currently comprises 42,574 serine, threonine and tyrosine non-redundant phosphorylation sites. Several new features have been implemented, such as structural disorder/order and accessibility information and a conservation score. Additionally, the conservation of the phosphosites can now be visualized directly on the multiple sequence alignment used for the score calculation. Finally, special emphasis has been put on linking to external resources such as interaction networks and other databases.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21062810 PMCID: PMC3013696 DOI: 10.1093/nar/gkq1104
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Examples of different search options for retrieving data stored in the Phospho.ELM resource
| Phospho.ELM data retrieval methods | |
|---|---|
| Input Query | Result |
| Protein name (keyword) | Retrieval of phosphorylation sites identified in sequences of the same substrate for multiple species |
| UniProt or Ensembl ACC | Retrieval of phosphorylation sites of a specific sequence |
| Kinase name | Retrieval of phosphorylation sites recognized by the specified kinase |
| byAccession/src_human.html | Retrieval of the human ‘src’ (UniProt ACC P12931) |
| byAccession/P12931.html | |
| byAccession/P12931,P07948.html | Retrieval of all the phosphorylation sites of two proteins (UniProt ACC P12931 and P07948) |
| byAccession/P12931,P07948.csv | Retrieval of a plain output of the phosphorylation sites of two proteins (UniProt ACC P12931 and P07948) |
| byDomain/CBL_SH2.html | Retrieval of phosphorylation sites which bind to the phospho-binding domain CBL_SH2 |
| P12931.fasta | Retrieval of the protein sequence stored in the Phospho.ELM database |
| Uniprot AC or text sequence | Retrieval of phosphorylation sites that are conserved in related proteins (whether orthologues or paralogues) |
| pELMdbws = phosphoELMdbLocator().getphosphoELMdb() | For retrieving phosphorylation sites recognized by the selected kinase (e.g. ALK) |
| kinaseName = ‘ALK' | |
| req = getInstancesByKinaseTextSearchRequestMsg() | |
| req._QueryText = kinaseName | |
| result = pELMdbws.getInstancesByKinaseTextSearch(req) | |
Figure 1.Output example of a Phospho.ELM search using the Cyclin dependent kinase inhibitor 1B (UniProt P46527) as query. The results table contains: the phosphorylated residue and its position; surrounding sequence; kinase responsible for the phosphorylation; literature reference; type of source (HTP/LTP); conservation score; link to ELM database; annotation of domain which binds to the phosphorylated residue; protein domain identified by SMART or Pfam; a disorder score calculated by IUPRED; link to PDB structure; and accessibility score calculated by Phospho3D. The conservation of the instance and the multiple sequence alignment that was used to calculate the CS can be inspected using the JALVIEW plugin (top right). Furthermore links to Phospho3D and the respective ELM entry are shown at the bottom right.
Figure 2.Venn diagram comparing the sources of Phospho.ELM instances. A total of 4249 instances have been obtained exclusively by LTP experiments and 37 413 instances solely by HTP assays while 846 instances were confirmed by both HTP and LTP analyses.
Figure 3.Distribution of the conservation scores for LTP and HTP instances in the Phospho.ELM database. The CS varies between 0 and 1, where 1 represents the highest conservation. The two distributions differ according to the Kolmogorov-Smirnov test with P-value <2.2e-16, with the LTP sites being more conserved.
Figure 4.Histograms of IUPRED Score of Phospho.ELM instances within and outside of known domains. Instances with an IUPRED score above 0.5 are predicted to be in a region of polypeptide sequence that is intrinsically disordered (i.e. cannot fold into a stable native structure). Instances that reside outside globular domains have a tendency towards higher IUPRED scores (disordered, lower panel) whereas the scores of instances within domains are more evenly distributed (upper panel). Note that sites mapping outside the known domains are predicted to be predominantly in natively disordered polypeptides.