| Literature DB >> 17517770 |
Yung-Hao Wong1, Tzong-Yi Lee, Han-Kuen Liang, Chia-Mao Huang, Ting-Yuan Wang, Yi-Huan Yang, Chia-Huei Chu, Hsien-Da Huang, Ming-Tat Ko, Jenn-Kang Hwang.
Abstract
Due to the importance of protein phosphorylation in cellular control, many researches are undertaken to predict the kinase-specific phosphorylation sites. Referred to our previous work, KinasePhos 1.0, incorporated profile hidden Markov model (HMM) with flanking residues of the kinase-specific phosphorylation sites. Herein, a new web server, KinasePhos 2.0, incorporates support vector machines (SVM) with the protein sequence profile and protein coupling pattern, which is a novel feature used for identifying phosphorylation sites. The coupling pattern [XdZ] denotes the amino acid coupling-pattern of amino acid types X and Z that are separated by d amino acids. The differences or quotients of coupling strength C(XdZ) between the positive set of phosphorylation sites and the background set of whole protein sequences from Swiss-Prot are computed to determine the number of coupling patterns for training SVM models. After the evaluation based on k-fold cross-validation and Jackknife cross-validation, the average predictive accuracy of phosphorylated serine, threonine, tyrosine and histidine are 90, 93, 88 and 93%, respectively. KinasePhos 2.0 performs better than other tools previously developed. The proposed web server is freely available at http://KinasePhos2.mbc.nctu.edu.tw/.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17517770 PMCID: PMC1933228 DOI: 10.1093/nar/gkm322
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The system flow of KinasePhos 2.0.
The statistics of phosphorylation sites obtained from Phospho.ELM and Swiss-Prot
| Data source | Number of phosphorylated proteins | Number of phosphorylation sites | ||||
|---|---|---|---|---|---|---|
| Serine (S) | Threonine (T) | Tyrosine (Y) | Histidine (H) | Total | ||
| Phospho.ELM | 3674 | 9917 | 1890 | 1804 | 1 | 13 612 |
| Swiss-Prot* | 3148 | 4846 | 1035 | 901 | 42 | 6832 |
| Combined (non-redundant) | 5842 | 11 888 | 2433 | 2179 | 43 | 16 551 |
It notices that the sum of serine, threonine, tyrosine and histidine in Swiss-Prot is not equal to 6832, because there are several phosphorylation sites located on other kinds of residue. *The entries which contain residues annotated as ‘phosphorylation’ in the ‘MOD_RES’ are extracted and the entries annotated as ‘by similarity’, ‘potential’ and ‘probable’ are excluded.
Figure 2.The comparison for the average precision (Prec), sensitivity (Sn), specificity (Sp) and accuracy (Acc) among the models trained with various features in phosphoserine, phosphothreonine, phosphotyrosine and phosphohistidine.
Figure 3.The web interface of KinasePhos 2.0.
The comparison among KinasePhos 2.0, DISPHOS, PredPhospho, GPS, PPSP and KinasePhos 1.0
| Tools | DISPHOS | PredPhospho | GPS | PPSP | KinasePhos 1.0 | KinasePhos 2.0 |
|---|---|---|---|---|---|---|
| Method | Logistic regression | SVM | MCL+GPS | BDT | MDD+HMM | CP+SVM |
| Number of kinases | – | 4 groups | 71 groups | 68 groups | 18 | 58 |
| Kinase PKA | – | Sn = 0.88 | Sn = 0.89 | Sn = 0.90 | Sn = 0.91 | Sn = 0.92 |
| Sp = 0.91 | Sp = 0.91 | Sp = 0.92 | Sp = 0.86 | Sp = 0.89 | ||
| Kinase PKC | – | Sn = 0.79 | Sn = 0.82 | Sn = 0.82 | Sn = 0.80 | Sn = 0.84 |
| Sp = 0.86 | Sp = 0.83 | Sp = 0.86 | Sp = 0.87 | Sp = 0.86 | ||
| Kinase CK2 | – | Sn = 0.84 | Sn = 0.83 | Sn = 0.83 | Sn = 0.87 | Sn = 0.87 |
| Sp = 0.96 | Sp = 0.88 | Sp = 0.90 | Sp = 0.85 | Sp = 0.86 | ||
| Serine | Acc = 0.76 | Acc = 0.81 | – | – | Acc = 0.86 | Acc = 0.90 |
| Threonine | Acc = 0.81 | Acc = 0.77 | – | – | Acc = 0.91 | Acc = 0.93 |
| Tyrosine | Acc = 0.83 | – | – | – | Acc = 0.84 | Acc = 0.88 |
| Histidine | – | – | – | – | – | Acc = 0.93 |
| Overall performance | – | Acc = 0.76 ∼ 0.91 | – | – | Acc = 0.87 | Acc = 0.91 |
SVM, support vector machine; MCL, Markov cluster algorithm; GPS, group-based phosphorylation scoring method; BDT, Bayesian decision theory; MDD, maximal dependence decomposition; HMM, hidden Markov model; CP, coupling pattern; Sn, sensitivity; Sp, specificity; Acc, accuracy.