| Literature DB >> 20122181 |
Inkyung Jung1, Akihisa Matsuyama, Minoru Yoshida, Dongsup Kim.
Abstract
BACKGROUND: Post-translational modifications (PTMs) have a key role in regulating cell functions. Consequently, identification of PTM sites has a significant impact on understanding protein function and revealing cellular signal transductions. Especially, phosphorylation is a ubiquitous process with a large portion of proteins undergoing this modification. Experimental methods to identify phosphorylation sites are labor-intensive and of high-cost. With the exponentially growing protein sequence data, development of computational approaches to predict phosphorylation sites is highly desirable.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20122181 PMCID: PMC3009482 DOI: 10.1186/1471-2105-11-S1-S10
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Illustration of the noise-reducing system. Illustration of the noise-reducing system. In step 1, we find the top 5 hits for a given query, where Pis a phosphorylation peptide and Nis a non-phosphorylation peptide. Next, Sscores are calculated between the top 5 hits and all peptides in a reference set (10 peptides), where if a peptide i is not included in top 5 hits for a peptide j the score (j, i) is set to zero. In step 3, by summing each row of indirect relationship matrix we calculate indirect scores. During summation we assume that scores between positive (or negative) peptides are signal, while those between positive (or negative) and negative (or positive) are noise. Finally, we check the number of phosphorylation peptides among the top 4 hits by indirect scores. In this example P2, P3, P4, and N2 are recognized as the top 4 hits, and among them 3 peptides are phosphorylation peptides, and thereby we predict that the query peptide is a phosphorylation peptide.
Performances of PKB-group kinases using different features
|
|
|
| Noise-reducing | |
|---|---|---|---|---|
| ACC | 0.92 | 0.91 | 0.94 | 0.95 |
| P | 0.69 | 0.60 | 0.84 | 0.87 |
| R | 0.24 | 0.04 | 0.37 | 0.46 |
Performances of 48 kinase groups using different features
|
|
|
| Noise-reducing | |
|---|---|---|---|---|
| ACC | 0.92 | 0.91 | 0.93 | 0.93 |
| P | 0.59 | 0.43 | 0.68 | 0.68 |
| R | 0.24 | 0.31 | 0.34 | 0.39 |
Figure 2ROC curves with various features. The figure shows number of true matches (phosphorylation peptides) according to number of false matches (non-phosphorylation peptides) up to 1000 false matches among 48 kinase families. In figure, values in brackets represent number of windowed residues in peptides. From the figure we note that Swith S(41 windowed residues) shows best performance. The fact remarks that considering long-range region is effective to identify phosphorylation peptides.
Performance comparison with AutoMotif for 48 kinases.
| Kinase | New method | AutoMotif | Kinase | New method | AutoMotif | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ACC | P | R | ACC | P | R | ACC | P | R | ACC | P | R | ||
| CDK_group(102) | 0.93 | 0.57 | 0.85 | 0.94 | 0.80 | 0.46 | |||||||
| GSK-3_group(33) | 0.91 | 0.56 | 0.15 | 0.91 | 1.00 | 0.03 | |||||||
| IKK_group(35) | 0.89 | 0.27 | 0.11 | 0.91 | 0.00 | 0.00 | |||||||
| CaM-KII_group(55) | 0.92 | 0.63 | 0.18 | 0.92 | 0.89 | 0.15 | |||||||
| CK2_group(248) | 0.94 | 0.75 | 0.56 | 0.94 | 0.83 | 0.40 | PKB_group(84) | 0.94 | 0.87 | 0.46 | 0.95 | 0.87 | 0.57 |
| AMPK_group(38) | 0.93 | 0.86 | 0.32 | 0.91 | 1.00 | 0.06 | CDK1(147) | 0.94 | 0.63 | 0.84 | 0.92 | 0.65 | 0.28 |
| PKA_group(330) | 0.95 | 0.83 | 0.56 | 0.96 | 0.90 | 0.58 | |||||||
| PKC alpha(188) | 0.93 | 0.75 | 0.27 | 0.92 | 0.78 | 0.11 | RSK_group(23) | 0.95 | 0.81 | 0.57 | No data | ||
| DNA-PK(21) | 0.93 | 0.60 | 0.57 | ||||||||||
| Aurora(55) | 0.93 | 0.81 | 0.31 | ||||||||||
| MAPK_group(52) | 0.93 | 0.62 | 0.50 | 0.93 | 1.00 | 0.22 | Met(26) | 0.95 | 0.69 | 0.85 | |||
| MAPK3(88) | 0.94 | 0.68 | 0.69 | 0.95 | 0.88 | 0.55 | PHK_group(21) | 0.92 | 0.67 | 0.29 | |||
| GRK-2(29) | 0.92 | 1.00 | 0.10 | ||||||||||
| ROCK_group(23) | 0.92 | 0.67 | 0.26 | ||||||||||
| FGFR1(23) | 0.90 | 0.42 | 0.22 | ||||||||||
| PKC_group(236) | 0.93 | 0.76 | 0.26 | 0.93 | 0.85 | 0.24 | PDGFR(21) | 0.94 | 0.83 | 0.48 | |||
| CK1(39) | 0.92 | 0.62 | 0.21 | ||||||||||
| CDK5(22) | 0.95 | 0.76 | 0.59 | ||||||||||
| ATM(57) | 0.95 | 0.85 | 0.60 | 0.97 | 0.91 | 0.75 | PAK1(28) | 0.91 | 0.60 | 0.11 | |||
ACC, P, and R indicate accuracy, precision, and recall, respectively. The mean accuracy, precision, and recall of the new method for 36 kinases are 0.93, 0.67, and 0.40, respectively while those of AutoMotif are 0.91, 0.47, and 0.17, respectively. The values in brackets represent number of known phosphorylation sites in the reference set. Kinase groups which show better performance in our method compared to AutoMotif are bolded.
Average performance of 10-fold cross validation.
| Kinase | New method | Kinase | New method | Kinase | New method | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| ACC | P | R | ACC | P | R | ACC | P | R | |||
| CDK_group | 0.92 | 0.60 | 0.88 | MAPK3 | 0.93 | 0.61 | 0.73 | CDK1 | 0.94 | 0.63 | 0.90 |
| GSK-3_group | 0.88 | 0.32 | 0.17 | MAPK1 | 0.94 | 0.66 | 0.80 | CDK2 | 0.94 | 0.65 | 0.87 |
| PLK1 | 0.90 | 0.00 | 0.00 | MAPK8 | 0.91 | 0.53 | 0.83 | CK2 alpha | 0.95 | 0.76 | 0.66 |
| GRK_group | 0.91 | 0.15 | 0.07 | Lck | 0.92 | 0.75 | 0.20 | Lyn | 0.92 | 0.67 | 0.23 |
| EGFR | 0.92 | 0.69 | 0.42 | PKC_group | 0.92 | 0.76 | 0.30 | RSK_group | 0.96 | 0.80 | 0.85 |
| MAPK14 | 0.93 | 0.67 | 0.60 | Src | 0.92 | 0.69 | 0.27 | DNA-PK | 0.88 | 0.40 | 0.35 |
| InsR | 0.93 | 0.62 | 0.45 | IGF1R | 0.95 | 0.76 | 0.70 | Aurora | 0.92 | 0.58 | 0.26 |
| CK2_group | 0.94 | 0.71 | 0.58 | ATM | 0.96 | 0.87 | 0.70 | Met | 0.96 | 0.80 | 0.85 |
| AMPK_group | 0.94 | 0.60 | 0.27 | Abl | 0.93 | 0.77 | 0.30 | PHK_group | 0.90 | 0.10 | 0.05 |
| MAPKAPK2 | 0.92 | 0.63 | 0.33 | PDK-1 | 0.96 | 0.86 | 0.67 | GRK-2 | 0.94 | 0.30 | 0.15 |
| CK1_group | 0.90 | 0.20 | 0.10 | PKA alpha | 0.95 | 0.78 | 0.57 | ROCK_group | 0.94 | 0.70 | 0.35 |
| PKA_group | 0.95 | 0.81 | 0.59 | IKK_group | 0.91 | 0.43 | 0.20 | FGFR1 | 0.89 | 0.10 | 0.05 |
| PKC alpha | 0.92 | 0.69 | 0.28 | CaM-KIIalpha | 0.94 | 0.65 | 0.37 | PDGFR | 0.94 | 0.70 | 0.45 |
| Syk | 0.94 | 0.86 | 0.54 | GSK-3beta | 0.90 | 0.32 | 0.12 | CK1 | 0.93 | 0.55 | 0.23 |
| Fyn | 0.90 | 0.25 | 0.13 | CaM-KII_group | 0.92 | 0.62 | 0.20 | CDK5 | 0.97 | 0.87 | 0.75 |
| MAPK_group | 0.92 | 0.64 | 0.54 | PKB_group | 0.94 | 0.93 | 0.41 | PAK1 | 0.94 | 0.50 | 0.25 |
The mean accuracy, precision, and recall of the new method for 48 kinases are 0.93, 0.59, and 0.43, respectively.
Performance variation with seven methods. The scores indicate the area under the ROC curves
| CDK | CK2 | PKA | PKC | |
|---|---|---|---|---|
| Noise-reducing | ||||
| GPS | 0.8130 | 0.8446 | 0.7574 | |
| KinasePhos | 0.8713 | 0.7508 | 0.8234 | 0.7440 |
| NetPhoK | 0.7767 | 0.8749 | 0.7581 | |
| PPSP | 0.8721 | 0.8767 | ||
| PredPhospho | 0.8670 | 0.7791 | 0.8537 | 0.7149 |
| Scansite | 0.7584 | 0.7734 | 0.7656 | 0.6397 |
Figure 3The PostMod server input page (A) and result page (B). In input page, search sequence is pasted into the text box and one of 48 kinase types is selected. The default kinase type is AMPK_group. The example sequence is AMPK beta-1 chain (UniProt id is P80386). The search result of phosphorylation sites are shown in (B). There are 36 candidate phosphorylation sites (S, T) and three of them are recognized as phosphorylation sites (bolded line).