| Literature DB >> 32753030 |
Hongli Ma1,2, Guojun Li3,4, Zhengchang Su5.
Abstract
BACKGROUND: Protein phosphorylation by kinases plays crucial roles in various biological processes including signal transduction and tumorigenesis, thus a better understanding of protein phosphorylation events in cells is fundamental for studying protein functions and designing drugs to treat diseases caused by the malfunction of phosphorylation. Although a large number of phosphorylation sites in proteins have been identified using high-throughput phosphoproteomic technologies, their specific catalyzing kinases remain largely unknown. Therefore, computational methods are urgently needed to predict the kinases that catalyze the phosphorylation of these sites.Entities:
Keywords: Algorithm; Kinase; Kinase-substrate relationship; Phosphorylation
Mesh:
Substances:
Year: 2020 PMID: 32753030 PMCID: PMC7646512 DOI: 10.1186/s12864-020-06895-2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Evaluation of KSP on CK2A1 and Src when different number of top-ranked predictions were considered
| top 10 | top 9 | top 8 | top 7 | top 6 | top 5 | top 4 | top 3 | top 2 | top 1 | |
| TP | 458 | 453 | 451 | 447 | 436 | 426 | 415 | 389 | 351 | 274 |
| FP | 46 | 42 | 42 | 38 | 32 | 30 | 24 | 23 | 17 | 8 |
| TN | 319 | 323 | 323 | 327 | 333 | 335 | 341 | 342 | 348 | 357 |
| FN | 7 | 12 | 14 | 18 | 29 | 39 | 50 | 76 | 114 | 191 |
| TPR | 0.984946 | 0.974194 | 0.969892 | 0.961290 | 0.937634 | 0.916129 | 0.892473 | 0.836559 | 0.754839 | 0.589247 |
| FPR | 0.126027 | 0.115068 | 0.115068 | 0.104110 | 0.087671 | 0.082192 | 0.065753 | 0.063014 | 0.046575 | 0.021918 |
| TNR | 0.873973 | 0.884932 | 0.884932 | 0.895890 | 0.912329 | 0.917808 | 0.934247 | 0.936986 | 0.953425 | 0.978082 |
| FNR | 0.015054 | 0.025806 | 0.030108 | 0.038710 | 0.062366 | 0.083871 | 0.107527 | 0.163441 | 0.245161 | 0.410753 |
| ACCURACY | 0.936145 | 0.934940 | 0.932530 | 0.932530 | 0.926506 | 0.916867 | 0.910843 | 0.880723 | 0.842169 | 0.760241 |
| PRECISION | 0.908730 | 0.915152 | 0.914807 | 0.921649 | 0.931624 | 0.934211 | 0.945330 | 0.944175 | 0.953804 | 0.971631 |
| RECALL | 0.984946 | 0.974194 | 0.969892 | 0.961290 | 0.937634 | 0.916129 | 0.892473 | 0.836559 | 0.754839 | 0.589247 |
| F1 | 0.945304 | 0.943750 | 0.941545 | 0.941053 | 0.934620 | 0.925081 | 0.918142 | 0.887115 | 0.842737 | 0.733601 |
| top 10 | top 9 | top 8 | top 7 | top 6 | top 5 | top 4 | top 3 | top 2 | top 1 | |
| TP | 395 | 394 | 392 | 389 | 384 | 372 | 360 | 328 | 300 | 214 |
| FP | 38 | 37 | 35 | 35 | 32 | 20 | 18 | 16 | 12 | 8 |
| TN | 331 | 332 | 334 | 334 | 337 | 349 | 351 | 353 | 357 | 361 |
| FN | 13 | 14 | 16 | 19 | 24 | 36 | 48 | 80 | 108 | 194 |
| TPR | 0.968137 | 0.965686 | 0.960784 | 0.953431 | 0.941176 | 0.911765 | 0.882353 | 0.803922 | 0.735294 | 0.524510 |
| FPR | 0.102981 | 0.100271 | 0.094851 | 0.094851 | 0.086721 | 0.054201 | 0.048780 | 0.043360 | 0.032520 | 0.021680 |
| TNR | 0.897019 | 0.899729 | 0.905149 | 0.905149 | 0.913279 | 0.945799 | 0.951220 | 0.956640 | 0.967480 | 0.978320 |
| FNR | 0.031863 | 0.034314 | 0.039216 | 0.046569 | 0.058824 | 0.088235 | 0.117647 | 0.196078 | 0.264706 | 0.475490 |
| ACCURACY | 0.934363 | 0.934363 | 0.934363 | 0.930502 | 0.927928 | 0.927928 | 0.915058 | 0.876448 | 0.845560 | 0.740026 |
| PRECISION | 0.912240 | 0.914153 | 0.918033 | 0.917453 | 0.923077 | 0.948980 | 0.952381 | 0.953488 | 0.961538 | 0.963964 |
| RECALL | 0.968137 | 0.965686 | 0.960784 | 0.953431 | 0.941176 | 0.911765 | 0.882353 | 0.803922 | 0.735294 | 0.524510 |
| F1 | 0.939358 | 0.939213 | 0.938922 | 0.935096 | 0.932039 | 0.930000 | 0.916031 | 0.872340 | 0.833333 | 0.679365 |
Fig. 1Vioplot of F1 Scores of the predictions in the four kinases as a whole when different number of top predictions were considered
Fig. 2Comparison between the sequence-based method and the combined method (Sequence + KSP) on the p-sites of CDK2 and ATM. The two boxplots on the top show the different precision score of positive and negative samples of the sequence-based method and the combined method. The two figures on the bottom show the ROC curves of these two methods on the p-sites of CDK2 and ATM
Fig. 3Results of the 10-fold cross validation on PKACA: the ROC curves of the sequence-based method and the combined method (Sequence+KSP)
Fig. 4Comparison between KSP and five other methods (including one combined method) on CDK2 and ATM
Fig. 5Comparison between KSP and other methods on the CMGC and AGC groups
Fig. 6Flowchart of KSP for kinase-substrate pair prediction