| Literature DB >> 24564522 |
Abstract
BACKGROUND: Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in cellular processes. Given the high-throughput mass spectrometry-based experiments, the desire to annotate the catalytic kinases for in vivo phosphorylation sites has motivated. Thus, a variety of computational methods have been developed for performing a large-scale prediction of kinase-specific phosphorylation sites. However, most of the proposed methods solely rely on the local amino acid sequences surrounding the phosphorylation sites. An increasing number of three-dimensional structures make it possible to physically investigate the structural environment of phosphorylation sites.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24564522 PMCID: PMC3853090 DOI: 10.1186/1471-2105-14-S16-S2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1System flow of the proposed method.
Data statistics of experimentally verified phosphorylation sites in each resource.
| Data set | Data Resource | Version | Number of phosphorylation sites | Number of phosphorylated proteins | ||
|---|---|---|---|---|---|---|
| S | T | Y | ||||
| Phospho.ELM | 9.0 | 26,136 | 6,316 | 3,118 | 8,690 | |
| UniProtKB | 20120711 | 92,221 | 23,289 | 14,337 | 34,040 | |
| - | ||||||
| PhosphoSitePlus | 20120730 | 73,969 | 19,946 | 14,696 | 18,550 | |
| PHOSIDA | 1.0 | 7,391 | 1,300 | 278 | 2,212 | |
| SysPTM | 1.1 | 30,307 | 6,643 | 2,255 | 10,667 | |
| HPRD | 9.0 | 34,273 | 10,761 | 4,121 | 7,753 | |
1NR, non-redundant.
Figure 2Sequence logos and radial cumulative propensity plots of nine kinase-specific substrate groups.
Cross-validation evaluation of sequence and structure-based phosphorylation site predictions on 3D structures.
| Kinase group | Number of positive data | Number of negative data | Sequence-only | Structural information | Combination of sequence and structural information | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Sn | Sp | Acc | Sn | Sp | Acc | Sn | Sp | Acc | |||
| All serine data | 1554 | 3108 | 61.4% | 62.0% | 61.8% | 66.9% | 68.1% | 67.7% | |||
| CDK | 11 | 22 | 72.7% | 81.8% | 78.8% | 90.9% | 86.8% | 87.9% | |||
| CK1 | 10 | 20 | 20.0% | 90.0% | 66.7% | 100% | 95.0% | 96.7% | |||
| CK2 | 24 | 48 | 66.7% | 87.5% | 80.6% | 87.5% | 87.5% | 87.5% | |||
| MAPK | 17 | 34 | 52.9% | 94.1% | 80.4% | 76.5% | 97.1% | 90.2% | |||
| PIKK | 15 | 30 | 26.7% | 83.3% | 64.4% | 73.3% | 83.3% | 80.0% | |||
| PKA | 56 | 112 | 79.1% | 78.8% | 78.9% | 83.6% | 84.3% | 84.1% | |||
| PKB | 12 | 24 | 75.0% | 66.7% | 69.4% | 75.0% | 83.3% | 80.6% | |||
| PKC | 50 | 100 | 77.3% | 78.0% | 77.8% | 81.2% | 80.0% | 80.4% | |||
| PKG | 10 | 20 | 80.0% | 80.0% | 80.0% | 80.0% | 85.0% | 83.3% | |||
| PLK | 10 | 20 | 60.0% | 80.0% | 73.3% | 70.0% | 90.0% | 83.3% | |||
| STE20 | 10 | 20 | 70.0% | 75.0% | 73.3% | 80.0% | 90.0% | 86.7% | |||
| All Threonine data | 603 | 1206 | 60.9% | 59.7% | 60.1% | 67.8% | 67.2% | 67.4% | |||
| MAPK | 13 | 26 | 69.2% | 76.9% | 74.3% | 69.2% | 76.9% | 74.3% | |||
| PKA | 10 | 20 | 70.0% | 90.0% | 83.3% | 80.0% | 85.0% | 83.3% | |||
| PKC | 13 | 26 | 61.5% | 76.9% | 71.8% | 69.2% | 88.5% | 82.1% | |||
| STE20 | 10 | 20 | 40.0% | 95.0% | 76.7% | 70.0% | 70.0% | 70.0% | |||
| All tyrosine data | 629 | 1258 | 62.0% | 63.3% | 62.8% | 64.1% | 63.4% | 63.8% | |||
| Abl | 18 | 36 | 50.0% | 88.9% | 75.9% | 66.7% | 80.6% | 75.9% | |||
| EGFR | 10 | 20 | 60.0% | 80.0% | 73.3% | 60.0% | 95.0% | 83.3% | |||
| InsR | 15 | 30 | 73.3% | 83.3% | 80.0% | 80.0% | 80.0% | 80.0% | |||
| Src | 57 | 114 | 77.2% | 75.4% | 76.0% | 79.1% | 83.3% | 81.9% | |||
| Syk | 11 | 22 | 63.6% | 90.9% | 81.8% | 72.7% | 86.4% | 81.8% | |||
Abbreviation: Sn, sensitivity; Sp, specificity; Acc, accuracy.
Figure 3The web interface of PhosK3D prediction system. The PhosK3D locates the predictive phosphorylation sites and the involved catalytic protein kinases. In order to reveal the characteristics of the phosphorylation sites including the phosphorylated residues and surrounding sequences, the training set of phosphorylation sites and constructed sequence logos corresponding to each protein kinase are also provided graphically on the web interface. Additionally, users can download the predicted results with tab-delimited format for further analyses. Since a PDB ID or structure file is inputted to PhosK3D, the sequential neighborhood (blue) and spatial neighborhood (gray) of the predicted phosphorylation sites (orange) are provided to users. Moreover, the positively charged residues (blue) and negatively charged residues (red) surrounding the predicted phosphorylation sites are physically presented by Jmol viewer.
Figure 4A case study of phosphorylation sites prediction on the protein structure of Pyruvate kinase 1 (PDB ID: 1A3W).
Figure 5A case study of phosphorylation sites prediction on protein structure of Histone (PDB ID:2CV5).
The comparison among PredPhospho, PPSP, GPS 2.0, KiasePhos 2.0, and our method.
| Tools | PredPhospho | GPS 2.0 | PPSP | KinasePhos 2.0 | Our method |
|---|---|---|---|---|---|
| Method | SVM | GPS | BDT | SVM | SVM |
| Training feature | Sequence | Sequence | Sequence | Sequence | Sequence + |
| Material | PhosphoBase + Swiss-Prot | Phospho.ELM | Phospho.ELM | Phospho.ELM + UniProtKB | Phospho.ELM + UniProtKB |
| No. of kinase groups | 4 | 68 | 58 | ||
| Data input | Sequence | Sequence | Sequence | Sequence | Sequence, |
| 3D structure visualization | - | - | - | - | |
| PKA group | Sn = 70.1% | Sn = 88.2% | Sn = 86.9% | Sn = 86.9% | |
| PKC group | Sn = 70.9% | Sn = 82.9% | Sn = 0.84 | Sn = 84.3% | |
| CK2 group | Sn = 82.0% | Sn = 81.4% | Sn = 84.0% | Sn = 86.2% | |
| SRC group | - | Sn = 82.3% | Sn = 78.0% | ||
The highlights are marked in bold. For PKA group, our method has highest sensitivity and specificity. For PKC group, GPS 2.0 has highest sensitivity and our method has highest specificity. For CK2 group, our method has highest sensitivity and PredPhospho has highest specificity. For SRC group, our method has highest sensitivity and GPS 2.0 has highest specificity.
Abbreviation: SVM, support vector machine; MCL, Markov cluster algorithm; GPS, group-based phosphorylation scoring method; BDT, Bayesian decision theory; MDD, maximal dependence decomposition; HMM, hidden Markov model; AAC, amino acid composition; CP, coupling pattern; SA, structural alphabet; Sn, sensitivity; Sp, specificity; Acc, accuracy.