| Literature DB >> 19290060 |
Jianlin Shao1, Dong Xu, Sau-Na Tsai, Yifei Wang, Sai-Ming Ngai.
Abstract
Protein methylation is one type of reversible post-translational modifications (PTMs), which plays vital roles in many cellular processes such as transcription activity, DNA repair. Experimental identification of methylation sites on proteins without prior knowledge is costly and time-consuming. In silico prediction of methylation sites might not only provide researches with information on the candidate sites for further determination, but also facilitate to perform downstream characterizations and site-specific investigations. In the present study, a novel approach based on Bi-profile Bayes feature extraction combined with support vector machines (SVMs) was employed to develop the model for Prediction of Protein Methylation Sites (BPB-PPMS) from primary sequence. Methylation can occur at many residues including arginine, lysine, histidine, glutamine, and proline. For the present, BPB-PPMS is only designed to predict the methylation status for lysine and arginine residues on polypeptides due to the absence of enough experimentally verified data to build and train prediction models for other residues. The performance of BPB-PPMS is measured with a sensitivity of 74.71%, a specificity of 94.32% and an accuracy of 87.98% for arginine as well as a sensitivity of 70.05%, a specificity of 77.08% and an accuracy of 75.51% for lysine in 5-fold cross validation experiments. Results obtained from cross-validation experiments and test on independent data sets suggest that BPB-PPMS presented here might facilitate the identification and annotation of protein methylation. Besides, BPB-PPMS can be extended to build predictors for other types of PTM sites with ease. For public access, BPB-PPMS is available at http://www.bioinfo.bio.cuhk.edu.hk/bpbppms.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19290060 PMCID: PMC2654709 DOI: 10.1371/journal.pone.0004920
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The optimal parameters and performance of BPB-PPMS.
| Methylated residues | Optimal parameters | Performance | |||||||
| Sliding window size | Type of Kernel |
|
| Sensitivity (%) | Specificity (%) | Accuracy (%) | AUC | MCC | |
|
| 11 | RBF | 32 | 0.5 | 74.71 | 94.32 | 87.98 | 92.54 | 0.7729 |
|
| 11 | RBF | 128 | 8 | 70.05 | 77.08 | 75.51 | 83.83 | 0.3400 |
The optimal parameter combination was determined in a grid-based manner introduced in LIBSVM packages[11].
Here, input window size for SVMs is two times sliding window size.
RBF, Radial Basis Function .
, the penalty parameter of the error term in objective function.
, the parameter in Radial Basis Function.
AUC, the area under ROC.
MCC, Matthews Correlation Coefficient.
Figure 1ROC curves to assess the prediction performance of three arginine prediction models.
Red, blue, and green curve denotes 5-fold cross-validation prediction performance of Bi-profile Bayes SVM classifier, Simple SVM classifier and Naïve Bayes classifier, respectively. (The corresponding average AUC is 0.9254, 0.8958 and 0.8909, respectively.)
Figure 2ROC curves to assess the prediction performance of lysine prediction model.
Red, blue, and green curve denotes 5-fold cross-validation prediction performance of Bi-profile Bayes SVM classifier, Simple SVM classifier, Naïve Bayes classifier, respectively. (The corresponding average AUC is 0.8383, 0.7498 and 0.7581, respectively.)
Comparison among Naïve Bayes classifier, simple SVM classifier and BPB-PPMS classifier in the 5-fold cross-validation experiment on the same training datasets.
| Methods | Methylated residues | Sensitivity (%) | Specificity (%) | Accuracy (%) | MCC |
|
| Arginine | 67.82 | 85.35 | 79.68 | 0.5379 |
| Lysine | 66.31 | 73.19 | 71.62 | 0.2755 | |
|
| Arginine | 70.11 | 89.01 | 82.90 | 0.6248 |
| Lysine | 65.24 | 71.78 | 70.32 | 0.2502 | |
|
| Arginine | 74.71 | 92.46 | 86.80 | 0.7243 |
| Lysine | 70.05 | 77.08 | 75.51 | 0.3400 |
Performance of BPB-PPMS and MeMo on independent test datasets in terms of BPB-PPMS.
| Server | Methylated residues | Threshold | Sensitivity (%) | Specificity (%) | Accuracy (%) |
|
| Arginine | - | 20.00 | 88.42 | 87.22 |
| Lysine | - | 9.52 | 92.47 | 91.11 | |
|
| Arginine | 0.5 | 60.00 | 81.74 | 81.36 |
| 0.8 | 53.33 | 88.56 | 87.96 | ||
| Lysine | 0.5 | 71.43 | 91.51 | 91.19 | |
| 0.75 | 9.52 | 98.65 | 97.19 |
Prediction threshold value is not avalable in MeMo.
Potential methylation sites predicted on Tat protein (P04610) through BPB-PPMS, Simple SVMs, and Naïve Bayes classifiers.
| Experimentally verified methylation sites on Tat protein | Potential methylation sites predicted on Tat protein | ||
| BPB-PPMS | Simple SVMs | Naïve Bayes | |
|
|
| K28(0.9012), | K19 (0.6577), |
The numbers in bracket denote the predictive probability of methylation at corresponding sites.