| Literature DB >> 33114312 |
Chi-Wei Chen1,2, Lan-Ying Huang2, Chia-Feng Liao2, Kai-Po Chang3,4, Yen-Wei Chu2,5,6,7,8,9.
Abstract
Protein phosphorylation is one of the most important post-translational modifications, and many biological processes are related to phosphorylation, such as DNA repair, transcriptional regulation and signal transduction and, therefore, abnormal regulation of phosphorylation usually causes diseases. If we can accurately predict human phosphorylation sites, this could help to solve human diseases. Therefore, we developed a kinase-specific phosphorylation prediction system, GasPhos, and proposed a new feature selection approach, called Gas, based on the ant colony system and a genetic algorithm and used performance evaluation strategies focused on different kinases to choose the best learning model. Gas uses the mean decrease Gini index (MDGI) as a heuristic value for path selection and adopts binary transformation strategies and new state transition rules. GasPhos can predict phosphorylation sites for six kinases and showed better performance than other phosphorylation prediction tools. The disease-related phosphorylated proteins that were predicted with GasPhos are also discussed. Finally, Gas can be applied to other issues that require feature selection, which could help to improve prediction performance. GasPhos is available at http://predictor.nchu.edu.tw/GasPhos.Entities:
Keywords: ant colony system; feature selection; genetic algorithms; kinase; phosphorylation
Mesh:
Substances:
Year: 2020 PMID: 33114312 PMCID: PMC7660635 DOI: 10.3390/ijms21217891
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Comparison of different heuristic functions.
| Kinase | Classifier | All Features | ACSFS | |||
|---|---|---|---|---|---|---|
| IG | F-Score | PCC | MDGI | |||
| CDK_S | BFTree | 0.711 | 0.723 | 0.747 | 0.748 | 0.746 |
| CDK_T | SimpleCart | 0.764 | 0.770 | 0.767 | 0.773 | 0.776 |
| CK2_S | NaiveBayes | 0.630 | 0.699 | 0.692 | 0.702 | 0.700 |
| CK2_T | MultiBoostAB | 0.625 | 0.630 | 0.681 | 0.682 | 0.692 |
| MAPK_S | BFTree | 0.742 | 0.745 | 0.770 | 0.775 | 0.773 |
| MAPK_T | BFTree | 0.843 | 0.850 | 0.859 | 0.849 | 0.854 |
| PKA_S | DecisionTable | 0.747 | 0.769 | 0.772 | 0.781 | 0.781 |
| PKA_T | RBFNetwork | 0.720 | 0.763 | 0.800 | 0.823 | 0.862 |
| PKC_S | DecisionTable | 0.558 | 0.574 | 0.585 | 0.585 | 0.585 |
| PKC_T | SimpleLogistic | 0.470 | 0.502 | 0.579 | 0.579 | 0.649 |
| Src_Y | NaiveBayes | 0.320 | 0.359 | 0.394 | 0.408 | 0.421 |
| Avg. | 0.648 | 0.671 | 0.695 | 0.700 | 0.713 | |
The results of the genetic algorithm (GA)-aided strategy with the mean decrease Gini index (MDGI).
| Kinase | ACSFS | ACSGAFS | ||||||
|---|---|---|---|---|---|---|---|---|
| SN | SP | ACC | MCC | SN | SP | ACC | MCC | |
| CDK_S | 0.838 | 0.906 | 0.872 | 0.746 | 0.845 | 0.904 | 0.874 | 0.751 |
| CDK_T | 0.861 | 0.913 | 0.887 | 0.776 | 0.865 | 0.913 | 0.889 | 0.779 |
| CK2_S | 0.814 | 0.884 | 0.849 | 0.700 | 0.815 | 0.898 | 0.856 | 0.716 |
| CK2_T | 0.813 | 0.875 | 0.844 | 0.692 | 0.825 | 0.875 | 0.850 | 0.705 |
| MAPK_S | 0.877 | 0.894 | 0.885 | 0.773 | 0.877 | 0.901 | 0.889 | 0.779 |
| MAPK_T | 0.904 | 0.949 | 0.927 | 0.854 | 0.914 | 0.944 | 0.929 | 0.859 |
| PKA_S | 0.871 | 0.907 | 0.889 | 0.781 | 0.878 | 0.902 | 0.890 | 0.782 |
| PKA_T | 0.905 | 0.951 | 0.929 | 0.862 | 0.888 | 1.000 | 0.944 | 0.896 |
| PKC_S | 0.795 | 0.789 | 0.792 | 0.585 | 0.804 | 0.786 | 0.795 | 0.592 |
| PKC_T | 0.815 | 0.831 | 0.823 | 0.649 | 0.792 | 0.862 | 0.827 | 0.656 |
| Src_Y | 0.715 | 0.704 | 0.709 | 0.421 | 0.752 | 0.715 | 0.733 | 0.469 |
| Avg. | 0.837 | 0.873 | 0.855 | 0.713 | 0.841 | 0.882 | 0.862 | 0.726 |
The result of the full pseudo-random proportional rule.
| Kinase | Binary Transformation Strategy | Pseudo-Random Proportional Rule | Full Pseudo-Random Proportional Rule |
|---|---|---|---|
| CDK_S | 0.751 (50) | 0.743 (104) | 0.755 (33) |
| CDK_T | 0.779 (41) | 0.779 (43) | 0.792 (27) |
| CK2_S | 0.716 (74) | 0.712 (112) | 0.712 (49) |
| CK2_T | 0.705 (71) | 0.705 (50) | 0.758 (44) |
| MAPK_S | 0.779 (45) | 0.769 (83) | 0.786 (30) |
| MAPK_T | 0.859 (29) | 0.866 (27) | 0.864 (20) |
| PKA_S | 0.782 (45) | 0.774 (62) | 0.795 (25) |
| PKA_T | 0.896 (55) | 0.896 (40) | 0.909 (34) |
| PKC_S | 0.592 (77) | 0.585 (126) | 0.615 (48) |
| PKC_T | 0.656 (73) | 0.651 (64) | 0.667 (54) |
| Src_Y | 0.469 (92) | 0.437 (132) | 0.477 (61) |
| Avg. | 0.726 (59.27) | 0.720 (76.64) | 0.739 (38.64) |
The number in parentheses is the number of features selected.
The results of different feature selection methods.
| Kinase | Simulated Annealing Algorithm | Genetic Algorithm | Gas |
|---|---|---|---|
| CDK_S | 0.692 | 0.736 | 0.755 |
| CDK_T | 0.764 | 0.773 | 0.792 |
| CK2_S | 0.602 | 0.699 | 0.712 |
| CK2_T | 0.580 | 0.694 | 0.758 |
| MAPK_S | 0.746 | 0.760 | 0.786 |
| MAPK_T | 0.844 | 0.844 | 0.864 |
| PKA_S | 0.770 | 0.780 | 0.795 |
| PKA_T | 0.769 | 0.830 | 0.909 |
| PKC_S | 0.571 | 0.589 | 0.615 |
| PKC_T | 0.379 | 0.557 | 0.667 |
| Src_Y | 0.323 | 0.391 | 0.477 |
| Avg. | 0.640 | 0.696 | 0.739 |
The results of different predictors for various specific kinases.
| Kinase | KinasPhos 2.0 | GPS | iGPS | Musite | PPSP | GasPhos |
|---|---|---|---|---|---|---|
| CDK_S | 0.150 | 0.593 | 0.503 | 0.677 | 0.689 | 0.755 |
| CDK_T | 0.260 | 0.688 | 0.575 | 0.743 | 0.761 | 0.792 |
| CK2_S | 0.647 | 0.619 | 0.423 | 0.661 | 0.583 | 0.712 |
| CK2_T | 0.400 | 0.590 | 0.434 | 0.555 | 0.550 | 0.758 |
| MAPK_S | 0.390 | 0.588 | 0.613 | 0.691 | 0.696 | 0.786 |
| MAPK_T | N/A * | 0.730 | 0.708 | 0.820 | 0.824 | 0.864 |
| PKA_S | 0.161 | 0.765 | 0.516 | 0.747 | 0.747 | 0.795 |
| PKA_T | 0.560 | 0.813 | 0.514 | 0.719 | 0.700 | 0.909 |
| PKC_S | 0.166 | 0.464 | 0.466 | 0.493 | 0.521 | 0.615 |
| PKC_T | 0.231 | 0.459 | 0.410 | 0.418 | 0.436 | 0.667 |
| Src_Y | 0.075 | 0.459 | 0.329 | 0.285 | 0.319 | 0.477 |
| Avg. | 0.276 | 0.615 | 0.499 | 0.619 | 0.621 | 0.739 |
* N/A is not available.
Figure 1Comparison of conserved sequences and feature subsets.
Proteomics analysis of various functional proteins by different predictors.
| Function Type | KinasPhos 2.0 | GPS | iGPS | Musite | PPSP | GasPhos |
|---|---|---|---|---|---|---|
| Defense proteins | 0.040 | 0.499 | 0.231 | 0.386 | 0.458 | 0.396 |
| Enzymes | 0.252 | 0.606 | 0.515 | 0.543 | 0.544 | 0.680 |
| Contractile proteins | 0.064 | 0.247 | 0.306 | 0.414 | 0.391 | 0.447 |
| Regulatory proteins | 0.212 | 0.393 | 0.273 | 0.426 | 0.469 | 0.539 |
| Receptor proteins | 0.176 | 0.571 | 0.447 | 0.500 | 0.543 | 0.588 |
| Other | 0.269 | 0.602 | 0.473 | 0.627 | 0.610 | 0.744 |
| Avg. | 0.169 | 0.486 | 0.374 | 0.483 | 0.502 | 0.566 |
Figure 2Gas flowchart.
Figure 3The path selection strategy adopted by Gas.
Figure 4GasPhos flow chart.