| Literature DB >> 18501020 |
Neil F W Saunders1, Ross I Brinkworth, Thomas Huber, Bruce E Kemp, Bostjan Kobe.
Abstract
BACKGROUND: We have previously described an approach to predicting the substrate specificity of serine-threonine protein kinases. The method, named Predikin, identifies key conserved substrate-determining residues in the kinase catalytic domain that contact the substrate in the region of the phosphorylation site and so determine the sequence surrounding the phosphorylation site. Predikin was implemented originally as a web application written in Javascript.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18501020 PMCID: PMC2412879 DOI: 10.1186/1471-2105-9-245
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Design and construction of the PredikinDB database. (a) Illustration showing how a UniProt entry is parsed to link protein kinase sequences (names in bold) with phosphorylation sites. (b) PredikinDB table schema showing links between fields. Field headers in italics are primary keys. Abbreviations in parentheses indicate the UniProt line from which the field was derived. For clarity, 38 fields containing key protein kinase residues used in substrate prediction are summarised as one field.
Summary of current holdings in the PredikinDB database
| Unique substrates | 17,960 | 5,193 |
| Unique substrates linked to a kinase sequence | 707 | 459 |
| Phosphorylation sites | 55,044 | 8,100 |
| Sites linked to a kinase sequence | 1,448 | 887 |
| Unique kinase sequences linked to a phosphorylation site | 398 | 393 |
Figure 2Construction of substrate scoring matrices using SQL queries to the PredikinDB database. Schematic showing how sequence features from a query protein kinase are used to query PredikinDB and generate Predikin scoring matrices.
Figure 3Location of substrate-determining residues in protein kinase A using HMM alignment. The profile HMM S_TKc from the SMART database was aligned to rat PKA (UniProt accession P27791) using the HMMER program hmmsearch. The 6 motifs used to locate SDRs are shown in bold. SDRs are underlined. The KE loop, used to determine the SDRs for the substrate +2 position is italicised. SDRs used in substrate prediction for Ser/Thr-kinases are summarised under the alignment. Position refers to the number of residues N- or C-terminal to the substrate phosphorylation site. SDRs that determine the +2 position depend on KE loop length as follows: length 12–17 = AMK+10, AMK+11, AMK+12; length 18–20 = AMK+12, AMK+13, AMK+14; length < 12 or > 20 = E-7, E-6, E-5.
DisEMBL and TMHMM predictions for phosphorylation sites in the PredikinDB database
| S | 17,575 | 16,596 (94.4) | 6 (0.03) |
| T | 3,705 | 3,371 (91.0) | 6 (0.16) |
| Y | 1 929 | 1,410 (73.1) | 5 (0.26) |
| Total | 23,209 | 21,377 (92.1) | 17 (0.07) |
1Number of phosphoresidues annotated "experimental"
2Number and percentage of phosphoresidues predicted as disordered using at least one DisEMBL method
3Number and percentage of phosphoresidues predicted as TM helix using TMHMM
Area under ROC curve (AUC), sensitivity (Sn) and specificity (Sp) values for Predikin and five comparable methods
| 0.86 (0.04) | 75.5 (9.2) | 86.6 (7.2) | 0.93 (0.02) | 89.4 (2.9) | 91.3 (2.1) | n/a | n/a | n/a | |
| 0.86 (0.05) | 73.7 (10.1) | 90.0 (9.3) | 0.88 (0.02) | 83.8 (3.5) | 94.1 (1.1) | 0.76 (0.07) | 73.0 (13.3) | 79.7 (17.9) | |
| 0.88 (0.04) | 74.6 (5.0) | 94.2 (1.8) | 0.91 (0.03) | 85.5 (6.4) | 93.4 (2.1) | 0.66 (0.09) | 61.0 (5.5) | 79.9 (13.8) | |
| 0.83 | 76.0 | 87.2 | 0.94 | 97.8 | 89.8 | 0.72 | 56.0 | 88.2 | |
| 0.78 | 52.9 | 92.5 | 0.95 | 90.8 | 86.6 | 0.89 | 80.0 | 85.1 | |
| 0.90 | 86.3 | 78.8 | 0.57 | 16.8 | 95.4 | 0.68 | 60.0 | 71.4 | |
| 0.92 | 92.2 | 83.6 | 0.95 | 97.8 | 89.5 | 0.81 | 60.0 | 98.1 | |
| 0.95 | 86.3 | 93.3 | 0.94 | 94.6 | 87.8 | 0.70 | 64.0 | 93.2 | |
1SDR method not applicable to Tyr kinases
2AUC, sensitivity and specificity values are the mean and standard deviation (in parentheses) of 10 cross-validation tests
Predikin scores for two usage cases
| CLA4 | 727 | KRATMVG | 92.93 | NP_592843 | 86.62 |
| YOL113W | 541 | KRATMVG | 92.93 | NP_594393 | 84.73 |
| YHL021C | 129 | KGSSFVS | 91.87 | NP_595739 | 81.60 |
| YKR010C | 527 | KRNSITE | 91.70 | NP_595616 | 81.60 |
| YNL049C | 526 | RATSFFG | 90.14 | NP_595629 | 81.60 |
| YDL056W | 477 | KRKSTTP | 88.70 | NP_587921 | 81.60 |
| YOL157C | 527 | KLFSFTK | 88.25 | NP_596349 | 81.60 |
| YBR198C | 157 | RAYSMLK | 87.71 | NP_595795 | 68.41 |
1Top 8 Predikin scores (SDR method) from a set of 163 putative substrates for protein kinase CLA4 (UniProt accession P48562) from S. cerevisiae
2Top 8 Predikin scores (KSD method) for kinases at SPTSPSY repeats in substrate Rpb1 (UniProt accession P36594) from S. pombe