| Literature DB >> 26149854 |
Shoukai Lin1, Qi Song1, Huan Tao1, Wei Wang1, Weifeng Wan1, Jian Huang1, Chaoqun Xu1, Vivien Chebii1, Justine Kitony1, Shufu Que1, Andrew Harrison2, Huaqin He1.
Abstract
Experimentally-determined or computationally-predicted protein phosphorylation sites for distinctive species are becoming increasingly common. In this paper, we compare the predictive performance of a novel classification algorithm with different encoding schemes to develop a rice-specific protein phosphorylation site predictor. Our results imply that the combination of Amino acid occurrence Frequency with Composition of K-Spaced Amino Acid Pairs (AF-CKSAAP) provides the best description of relevant sequence features that surround a phosphorylation site. A support vector machine (SVM) using AF-CKSAAP achieves the best performance in classifying rice protein phophorylation sites when compared to the other algorithms. We have used SVM with AF-CKSAAP to construct a rice-specific protein phosphorylation sites predictor, Rice_Phospho 1.0 (http://bioinformatics.fafu.edu.cn/rice_phospho1.0). We measure the Accuracy (ACC) and Matthews Correlation Coefficient (MCC) of Rice_Phospho 1.0 to be 82.0% and 0.64, significantly higher than those measures for other predictors such as Scansite, Musite, PlantPhos and PhosphoRice. Rice_Phospho 1.0 also successfully predicted the experimentally identified phosphorylation sites in LOC_Os03g51600.1, a protein sequence which did not appear in the training dataset. In summary, Rice_phospho 1.0 outputs reliable predictions of protein phosphorylation sites in rice, and will serve as a useful tool to the community.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26149854 PMCID: PMC4493637 DOI: 10.1038/srep11940
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1ROC curves of predicting performance of SVM with 3 different sole encoding schemes.
*In the diagrams, the increased area under the ROC indicates the improved classification performance. The same below.
Performance of 3 sole encoding schemes on different size of dataset.
| Method | (+) sites | (−) sites | Ratio | Sn (%) | Sp (%) | ACC (%) | MCC |
|---|---|---|---|---|---|---|---|
| AF | 112 | 127 | 0.88:1 | 67.20 | 66.13 | 69.51 | 0.403 |
| 365 | 370 | 0.99:1 | 72.00 | 73.07 | 75.11 | 0.461 | |
| 853 | 937 | 0.91:1 | 68.31 | 69.43 | 72.30 | 0.408 | |
| 1530 | 1630 | 0.94:1 | 69.37 | 68.21 | 70.15 | 0.391 | |
| 2107 | 2018 | 1.04:1 | 75.33 | 76.86 | 74.94 | 0.477 | |
| KNN | 112 | 127 | 0.88:1 | 57.00 | 52.3 | 58.29 | 0.237 |
| 365 | 370 | 0.99:1 | 60.14 | 58.12 | 59.50 | 0.281 | |
| 853 | 937 | 0.91:1 | 68.22 | 63.10 | 67.20 | 0.306 | |
| 1530 | 1630 | 0.94:1 | 68.73 | 65.16 | 69.75 | 0.362 | |
| 2107 | 2018 | 1.04:1 | 75.35 | 71.03 | 72.39 | 0.407 | |
| CKSAAP | 112 | 127 | 0.88:1 | 82.77 | 80.30 | 83.37 | 0.633 |
| 365 | 370 | 0.99:1 | 82.27 | 80.63 | 82.84 | 0.617 | |
| 853 | 937 | 0.91:1 | 81.02 | 80.12 | 81.27 | 0.623 | |
| 1530 | 1630 | 0.94:1 | 77.84 | 79.70 | 80.56 | 0.612 | |
| 2107 | 2018 | 1.04:1 | 80.02 | 79.33 | 80.41 | 0.605 |
Figure 2ROC curves of predicting performance of SVM with the combining encoding schemes.
*A. ROC curves of SVM with AF-CKSAAP, AF and CKSAAP. B. ROC curves of SVM with AF-KNN, AF and KNN. C. ROC curves of SVM with CKSAAP-KNN, KNN and CKSAAP.
Figure 3MCC of predicting performance of different classification algorithms with different encoding schemes.
Predicting performance of SVM with 3 different encoding schemes on different phospho-amino acid sites.
| Methods | Phospho-amino acids | Ratio | Sn (%) | Sp (%) | ACC (%) | MCC |
|---|---|---|---|---|---|---|
| CKSAAP | Serine | 1:0.7 | 79.84 | 80.41 | 80.36 | 0.617 |
| 1:1 | 80.32 | 80.55 | 80.51 | 0.619 | ||
| 0.7:1 | 79.91 | 80.31 | 80.16 | 0.613 | ||
| Threonine | 1:0.34 | 79.89 | 80.11 | 79.95 | 0.597 | |
| 1:1 | 79.34 | 79.62 | 78.79 | 0.583 | ||
| 0.34:1 | 78.42 | 78.86 | 78.37 | 0.589 | ||
| Tyrosine | 1:0.17 | 84.17 | 83.36 | 84.00 | 0.638 | |
| 1:1 | 77.25 | 78.71 | 78.03 | 0.573 | ||
| 0.17:1 | 74.61 | 73.83 | 73.02 | 0.532 | ||
| AF- CKSAAP | Serine | 1:0.7 | 81.22 | 79.20 | 81.15 | 0.623 |
| 1:1 | 81.87 | 81.23 | 82.14 | 0.635 | ||
| 0.7:1 | 83.22 | 83.14 | 84.71 | 0.642 | ||
| Threonine | 1:0.34 | 78.57 | 78.27 | 79.52 | 0.601 | |
| 1:1 | 76.28 | 77.31 | 77.20 | 0.593 | ||
| 0.34:1 | 78.19 | 77.22 | 78.05 | 0.591 | ||
| Tyrosine | 1:0.17 | 80.19 | 79.75 | 80.72 | 0.623 | |
| 1:1 | 78.35 | 77.55 | 79.12 | 0.597 | ||
| 0.17:1 | 77.75 | 75.38 | 78.04 | 0.593 | ||
| CKSAAP -KNN | Serine | 1:0.7 | 77.14 | 75.43 | 76.31 | 0.542 |
| 1:1 | 76.18 | 74.83 | 75.12 | 0.526 | ||
| 0.7:1 | 80.86 | 80.17 | 81.14 | 0.624 | ||
| Threonine | 1:0.34 | 71.77 | 71.19 | 72.37 | 0.497 | |
| 1:1 | 72.46 | 72.81 | 73.34 | 0.509 | ||
| 0.34:1 | 75.53 | 73.11 | 74.27 | 0.520 | ||
| Tyrosine | 1:0.17 | 68.91 | 69.71 | 69.53 | 0.418 | |
| 1:1 | 69.92 | 70.24 | 70.11 | 0.433 | ||
| 0.17:1 | 72.86 | 73.42 | 70.34 | 0.465 |
Predicting performance of the three top SVM models trained by the negative dataset in Table S3 and its balancing positive dataset.
| Models | Sn (%) | Sp (%) | ACC (%) | MCC | AUC |
|---|---|---|---|---|---|
| CKSAAP | 79.47 | 82.83 | 81.15 | 0.623 | 0.839 |
| AF-CKSAAP | 82.84 | 81.59 | 82.04 | 0.641 | 0.858 |
| CKSAAP-KNN | 77.46 | 79.86 | 78.63 | 0.579 | 0.820 |
AUC is the area under ROC curve.
Predicting performance of SVM models and newly developed predictors.
| Tools | Sn (%) | Sp (%) | ACC (%) | MCC |
|---|---|---|---|---|
| Musite | 55.18 | 81.91 | 73.32 | 0.368 |
| Scansite | 75.18 | 53.23 | 59.90 | 0.285 |
| PhosphoRice | 70.25 | 74.40 | 75.32 | 0.462 |
| Rice_Phospho 1.0 | 79.93 | 81.21 | 80.33 | 0.616 |
Figure 4ROC curves of predicting performance of Rice_Phospho 1.0 and PlantPhos.
Figure 5Interface of the online predictor, Rice_Phospho 1.0, on rice protein phosphorylation sites.