| Literature DB >> 27323404 |
Wang-Ren Qiu1,2, Xuan Xiao1,3, Zhao-Chun Xu1, Kuo-Chen Chou3,4,5.
Abstract
Protein phosphorylation is a posttranslational modification (PTM or PTLM), where a phosphoryl group is added to the residue(s) of a protein molecule. The most commonly phosphorylated amino acids occur at serine (S), threonine (T), and tyrosine (Y). Protein phosphorylation plays a significant role in a wide range of cellular processes; meanwhile its dysregulation is also involved with many diseases. Therefore, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence containing many residues of S, T, or Y, which ones can be phosphorylated, and which ones cannot? To address this problem, we have developed a predictor called iPhos-PseEn by fusing four different pseudo component approaches (amino acids' disorder scores, nearest neighbor scores, occurrence frequencies, and position weights) into an ensemble classifier via a voting system. Rigorous cross-validations indicated that the proposed predictor remarkably outperformed its existing counterparts. For the convenience of most experimental scientists, a user-friendly web-server for iPhos-PseEn has been established at http://www.jci-bioinfo.cn/iPhos-PseEn, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.Entities:
Keywords: ensemble classifier; protein phosphorylation; pseudo components; random forests
Mesh:
Substances:
Year: 2016 PMID: 27323404 PMCID: PMC5239474 DOI: 10.18632/oncotarget.9987
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
A comparison of the proposed predictor with the existing methods based on the 5-fold cross-validation on exactly the same benchmark dataset
| Prediction method | Metrics | Type of phosphorylation | ||
|---|---|---|---|---|
| S | T | Y | ||
| Musite | Acc (%) | 67.22 | 77.11 | 71.60 |
| PWAAC | 67.89 | 66.65 | 63.04 | |
| iPhos-PseEn | 79.76 | 79.88 | 76.28 | |
| Musite | MCC | 0.2538 | 0.2960 | 0.2472 |
| PWAAC | 0.2342 | 0.2079 | 0.1720 | |
| iPhos-PseEn | 0.3901 | 0.3444 | 0.3244 | |
| Musite | Sn (%) | 76.63 | 68.26 | 69.58 |
| PWAAC | 71.74 | 69.23 | 67.70 | |
| iPhos-PseEn | 79.64 | 71.51 | 76.18 | |
| Musite | Sp (%) | 66.28 | 77.94 | 71.79 |
| PWAAC | 67.51 | 66.40 | 62.61 | |
| iPhos-PseEn | 79.78 | 80.68 | 76.29 | |
The method developed by Gao et al. [22].
The method developed by Huang et al. [24].
The method proposed in this paper.
See Eq.14 for the definition of metrics.
Figure 1The intuitive graphs of ROC curves to show the performance of Musite, PWAAC, iPhos-PseEn, respectively, for the case of the center residue ⊛ is (A) S, (B) T, and (C) Y
See the main text for further explanation.
Figure 2A semi-screenshot to show the top-page of the iPhos-PseEn web-server at http://www.jci-bioinfo.cn/iPhos-PseEn
Figure 3A schematic drawing to show the peptide model Pξ(⊛) when (A) ⊛ = S, (B) ⊛ = T and (C) ⊛ = Y
See Eq.3 as well as the relevant text for further explanation.
Summary of phosphorylation site samples in the benchmark dataset
| Subset | Phosphorylation type and number of samples | ||
|---|---|---|---|
| ⊛ = S | ⊛ = T | ⊛ = Y | |
| Positive | 4,317 | 923 | 743 |
| Negative | 43,532 | 9,739 | 8,061 |
See Eqs.1–3 and the relevant text for further explanation.
Of the negative samples, 21,564 from the 845 phosphoserine proteins and the 21,968 from the 638 non-phosphorylated proteins.
Of the negative samples, 4,307 from the 386 phosphothreonine proteins and the 5,432 from the 638 non-phosphorylated proteins.
Of the negative samples, 3,968 from the 249 phosphotyrosine proteins and the 4,362 from the 638 non-phosphorylated proteins.
Figure 4A flow chart to show how the four individual random forest predictors are fused into an ensemble classifier via a voting system
See Eqs.12–13 as well as the relevant text for further explanation.