| Literature DB >> 33793659 |
Sabit Ahmed1, Afrida Rahman1, Md Al Mehedi Hasan1, Md Khaled Ben Islam2, Julia Rahman1, Shamim Ahmad3.
Abstract
Post-translational modification (PTM) involves covalent modification after the biosynthesis process and plays an essential role in the study of cell biology. Lysine phosphoglycerylation, a newly discovered reversible type of PTM that affects glycolytic enzyme activities, and is responsible for a wide variety of diseases, such as heart failure, arthritis, and degeneration of the nervous system. Our goal is to computationally characterize potential phosphoglycerylation sites to understand the functionality and causality more accurately. In this study, a novel computational tool, referred to as predPhogly-Site, has been developed to predict phosphoglycerylation sites in the protein. It has effectively utilized the probabilistic sequence-coupling information among the nearby amino acid residues of phosphoglycerylation sites along with a variable cost adjustment for the skewed training dataset to enhance the prediction characteristics. It has achieved around 99% accuracy with more than 0.96 MCC and 0.97 AUC in both 10-fold cross-validation and independent test. Even, the standard deviation in 10-fold cross-validation is almost negligible. This performance indicates that predPhogly-Site remarkably outperformed the existing prediction tools and can be used as a promising predictor, preferably with its web interface at http://103.99.176.239/predPhogly-Site.Entities:
Year: 2021 PMID: 33793659 PMCID: PMC8016359 DOI: 10.1371/journal.pone.0249396
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1An overview of predPhogly-Site for phosphoglycerylation site prediction.
Summary of the non-redundant phosphoglycerylation dataset.
| Similarity threshold | No. of non-redundant proteins | Phosphoglycerylated sites | Non-phosphoglycerylated sites |
|---|---|---|---|
| 40% | 91 | 111 | 3249 |
Fig 2Amino acid frequencies around the K-PTM and non-K-PTM sites.
Fig 3The conditional probability of amino acids at sample positions 1 to 13 and 15 to 28.
Fig 4Probabilistic information of 21 amino acids at sample positions 14 and 15.
Selected parameters of 10-fold cross validation (10 iterations).
| Iteration | 1 | 2 | 3 | 4 | 5 |
| 20 | 20 | 20 | 20 | 20 | |
| 2−1 | 2−2 | 2−2 | 2−2 | 2−2 | |
| Iteration | 6 | 7 | 8 | 9 | 10 |
| 21 | 22 | 22 | 20 | 20 | |
| 2−1 | 2−2 | 2−2 | 2−2 | 2−2 |
Cross-validation performance of predPhogly-Site on the benchmark dataset.
| Predictor | Sp | Sn | Pre | ACC | MCC | AUC |
|---|---|---|---|---|---|---|
| predPhogly-Site | 0.9997 ± 0.0001 | 1.00±0.00 | 0.9920±0.0027 | 0.9997±0.0001 | 0.9958±0.0014 | 0.9999±0.00 |
Cross-validation performance of the existing prediction systems.
| Predictor | Sp | Sn | Pre | ACC | MCC | AUC |
|---|---|---|---|---|---|---|
| iPGK-PseAAC | 0.9846 | 0.4595 | 0.5050 | 0.9673 | 0.4648 | 0.7220 |
| iPGK-PseAAC | 0.9864 | 0.4555 | 0.9548 | 0.8119 | 0.5692 | 0.7230 |
| CKSAAP_PhoglySite | 0.8941 | 0.8288 | 0.2110 | 0.8920 | 0.3845 | 0.8615 |
| CKSAAP_PhoglySite | 0.9420 | 0.8285 | 0.8765 | 0.9043 | 0.7818 | 0.8854 |
| Phogly-PseAAC | 0.7064 | 0.6937 | 0.0747 | 0.7060 | 0.1550 | 0.7000 |
| Phogly-PseAAC | 0.7193 | 0.6927 | 0.5518 | 0.7102 | 0.3951 | 0.7062 |
| Bigram-PGK | 0.8973 | 0.9642 | 0.8253 | 0.9193 | 0.8330 | 0.9306 |
* Corresponds to the experimental findings reported by the Bigram-PGK study [11].
Fig 5Cross-validation performance of the available predictors.
Prediction performance in Independent test.
| Predictor | Sp | Sn | Pre | ACC | MCC | AUC |
|---|---|---|---|---|---|---|
| iPGK-PseAAC | 0.9738 | 0.2927 | 0.2553 | 0.9535 | 0.2494 | 0.6332 |
| Phogly-PseAAC | 0.6837 | 0.6829 | 0.0622 | 0.6836 | 0.1329 | 0.6833 |
| CKSAAP_PhoglySite | 0.8823 | 0.7561 | 0.1649 | 0.8785 | 0.3161 | 0.8192 |
Fig 6Comparative ROC curves between different prediction methods based on the independent test.