| Literature DB >> 35173235 |
Samme Amena Tasmia1, Md Kaderi Kibria1, Khanis Farhana Tuly1, Md Ariful Islam1, Mst Shamima Khatun2, Md Mehedi Hasan3, Md Nurul Haque Mollah4.
Abstract
Serine phosphorylation is one type of protein post-translational modifications (PTMs), which plays an essential role in various cellular processes and disease pathogenesis. Numerous methods are used for the prediction of phosphorylation sites. However, the traditional wet-lab based experimental approaches are time-consuming, laborious, and expensive. In this work, a computational predictor was proposed to predict serine phosphorylation sites mapping on Schizosaccharomyces pombe (SP) by the fusion of three encoding schemes namely k-spaced amino acid pair composition (CKSAAP), binary and amino acid composition (AAC) with the random forest (RF) classifier. So far, the proposed method is firstly developed to predict serine phosphorylation sites for SP. Both the training and independent test performance scores were used to investigate the success of the proposed RF based fusion prediction model compared to others. We also investigated their performances by 5-fold cross-validation (CV). In all cases, it was observed that the recommended predictor achieves the largest scores of true positive rate (TPR), true negative rate (TNR), accuracy (ACC), Mathew coefficient of correlation (MCC), Area under the ROC curve (AUC) and pAUC (partial AUC) at false positive rate (FPR) = 0.20. Thus, the prediction performance as discussed in this paper indicates that the proposed approach may be a beneficial and motivating computational resource for predicting serine phosphorylation sites in the case of Fungi. The online interface of the software for the proposed prediction model is publicly available at http://mollah-bioinformaticslab-stat.ru.ac.bd/PredSPS/ .Entities:
Mesh:
Substances:
Year: 2022 PMID: 35173235 PMCID: PMC8850546 DOI: 10.1038/s41598-022-06529-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1An overview of the proposed PredSPS predictor.
Figure 2Two study logos program[58] presents the occurrences of amino acid propensities of surrounding positive windows (phosphorylation site) and negative windows (non-phosphorylation sites) of size 25.
Training performance scores at FPR = 0.20 for 21 prediction models that were trained by 1:2 ratio of positive and negative samples.
| Predictors | TPR | TNR | FNR | ACC | MCC | MCR | AUC | pAUC |
|---|---|---|---|---|---|---|---|---|
| ADA (CKSAAP) | 0.763 | 0.801 | 0.237 | 0.877 | 0.662 | 0.172 | 0.891 | 0.121 |
| ADA (binary) | 0.757 | 0.800 | 0.243 | 0.863 | 0.657 | 0.210 | 0.872 | 0.119 |
| ADA (AAC) | 0.750 | 0.802 | 0.250 | 0.862 | 0.643 | 0.198 | 0.876 | 0.115 |
| ADA (CKSAAP, binary) | 0.772 | 0.802 | 0.220 | 0.907 | 0.645 | 0.142 | 0.923 | 0.141 |
| ADA (CKSAAP, AAC) | 0.757 | 0.801 | 0.243 | 0.868 | 0.656 | 0.189 | 0.887 | 0.133 |
| ADA (binary, AAC) | 0.761 | 0.800 | 0.239 | 0.869 | 0.658 | 0.186 | 0.899 | 0.139 |
| SVM (CKSAAP) | 0.769 | 0.801 | 0.233 | 0.887 | 0.668 | 0.173 | 0.899 | 0.132 |
| SVM (binary) | 0.765 | 0.800 | 0.221 | 0.879 | 0.668 | 0.167 | 0.898 | 0.120 |
| SVM (AAC) | 0.638 | 0.801 | 0.362 | 0.737 | 0.541 | 0.268 | 0.820 | 0.079 |
| SVM (CKSAAP, binary) | 0.869 | 0.801 | 0.121 | 0.939 | 0.787 | 0.100 | 0.942 | 0.131 |
| SVM (CKSAAP, AAC) | 0.668 | 0.802 | 0.332 | 0.779 | 0.575 | 0.243 | 0.848 | 0.071 |
| SVM (binary, AAC) | 0.675 | 0.801 | 0.325 | 0.781 | 0.578 | 0.241 | 0.850 | 0.072 |
| RF (CKSAAP) | 0.867 | 0.801 | 0.113 | 0.935 | 0.789 | 0.107 | 0.934 | 0.152 |
| RF (binary) | 0.854 | 0.800 | 0.123 | 0.939 | 0.781 | 0.109 | 0.927 | 0.145 |
| RF (AAC) | 0.761 | 0.802 | 0.239 | 0.905 | 0.627 | 0.159 | 0.913 | 0.129 |
| RF (CKSAAP, binary) | 0.888 | 0.802 | 0.111 | 0.945 | 0.789 | 0.100 | 0.965 | 0.198 |
| RF (CKSAAP, AAC) | 0.857 | 0.801 | 0.143 | 0.932 | 0.778 | 0.113 | 0.942 | 0.157 |
| RF (binary, AAC) | 0.859 | 0.801 | 0.141 | 0.934 | 0.779 | 0.110 | 0.947 | 0.161 |
Better results with each of ADA, SVM and RF were highlighted by bold values.
Performance scores at FPR = 0.20 for 21 prediction models by 5-fold CV with the training dataset that was consisted of 1:2 ratio of positive and negative samples.
| Predictors classifier (encoding) | TPR | TNR | FNR | ACC | MCC | MCR | AUC | pAUC |
|---|---|---|---|---|---|---|---|---|
| ADA (CKSAAP) | 0.676 (0.32) | 0.800 (0.00) | 0.323 (0.32) | 0.689 (0.01) | 0.378 (0.16) | 0.311 (0.01) | 0.737 (0.04) | 0.12 (0.06) |
| ADA (binary) | 0.613 (0.03) | 0.800 (0.01) | 0.386 (0.03) | 0.657 (0.01) | 0.315 (0.03) | 0.343 (0.01) | 0.718 (0.03) | 0.11 (0.07) |
| ADA (AAC) | 0.644 (0.31) | 0.801 (0.00) | 0.355 (0.31) | 0.692 (0.03) | 0.383 (0.02) | 0.291 (0.09) | 0.747 (0.05) | 0.133 (0.04) |
| ADA (CKSAAP, binary) | 0.650 (0.24) | 0.800 (0.01) | 0.349 (0.24) | 0.702 (0.12) | 0.407 (0.24) | 0.297 (0.12) | 0.771 (0.10) | 0.136 (0.05) |
| ADA (CKSAAP, AAC) | 0.661 (0.09) | 0.800 (0.00) | 0.339 (0.09) | 0.712 (0.10) | 0.417 (0.21) | 0.289 (0.10) | 0.783 (0.09) | 0.139 (0.03) |
| ADA (binary, AAC) | 0.653 (0.12) | 0.800 (0.00) | 0.347 (0.12) | 0.710 (0.13) | 0.412 (0.11) | 0.292 (0.13) | 0.778 (0.10) | 0.137 (0.09) |
| SVM (CKSAAP) | 0.677 (0.16) | 0.800 (0.00) | 0.323 (0.03) | 0.712 (0.12) | 0.425 (0.07) | 0.287 (0.02) | 0.788 (0.06) | 0.143 (0.07) |
| SVM (binary) | 0.683 (0.02) | 0.800 (0.00) | 0.317 (0.03) | 0.718 (0.01) | 0.438 (0.15) | 0.281 (0.09) | 0.787 (0.08) | 0.138 (0.04) |
| SVM (AAC) | 0.681 (0.12) | 0.801 (0.00) | 0.316 (0.12) | 0.704 (0.13) | 0.382 (0.06) | 0.325 (0.01) | 0.785 (0.03) | 0.134 (0.05) |
| SVM (CKSAAP, binary) | 0.711 (0.08) | 0.800 (0.00) | 0.293 (0.29) | 0.728 (0.11) | 0.445 (0.26) | 0.256 (0.11) | 0.799 (0.23) | 0.146 (0.09) |
| SVM (CKSAAP, AAC) | 0.543 (0.13) | 0.802 (0.00) | 0.456 (0.09) | 0.667 (0.23) | 0.376 (0.12) | 0.356 (0.04) | 0.800 (0.03) | 0.154 (0.11) |
| SVM (binary, AAC) | 0.567 (0.12) | 0.801 (0.00) | 0.432 (0.11) | 0.684 (0.13) | 0.382 (0.21) | 0.324 (0.12) | 0.803 (0.05) | 0.169 (0.10) |
| RF (CKSAAP) | 0.798 (0.15) | 0.800 (0.00) | 0.201 (0.15) | 0.749 (0.26) | 0.500 (0.20) | 0.251 (0.11) | 0.803 (0.16) | 0.145 (0.08) |
| RF (binary) | 0.735 (0.09) | 0.800 (0.00) | 0.264 (0.09) | 0.721 (0.14) | 0.443 (0.01) | 0.278 (0.02) | 0.793 (0.13) | 0.143 (0.06) |
| RF (AAC) | 0.691 (0.15) | 0.801 (0.00) | 0.308 (0.15) | 0.786 (0.26) | 0.584 (0.20) | 0.213 (0.11) | 0.791 (0.16) | 0.141 (0.08) |
| RF (CKSAAP, binary) | 0.806 (0.02) | 0.800 (0.00) | 0.193 (0.01) | 0.754 (0.02) | 0.510 (0.06) | 0.246 (0.10) | 0.823 (0.03) | 0.158 (0.02) |
| RF (CKSAAP, AAC) | 0.681 (0.13) | 0.800 (0.00) | 0.319 (0.13) | 0.659 (0.23) | 0.502 (0.18) | 0.182 (0.09) | 0.797 (0.14) | 0.151 (0.08) |
| RF (binary, AAC) | 0.725 (0.09) | 0.802 (0.00) | 0.275 (0.09) | 0.671 (0.14) | 0.588 (0.01) | 0.185 (0.02) | 0.826 (0.13) | 0.159 (0.06) |
Better results with each of ADA, SVM and RF were highlighted by bold values.
The values within the first bracket indicate the standard error (SE).
Figure 3Performance of 21 prediction models by 5-fold CV results based on the training dataset that was consisted of 1:2 ratio of positive and negative samples. (A) ROC curves with the RF based 7 different prediction models, (B) ROC curves with the ADA based 7 different prediction models, (C) ROC curves with the SVM based 7 different prediction models, and (D) ROC curves for the best prediction models with ADA, SVM, and RF.
Independent test performance scores at FPR = 0.20 for 21 prediction models that were trained by 1:2 ratio of positive and negative samples.
| Predictors | TPR | TNR | FNR | ACC | MCC | MCR | AUC | pAUC |
|---|---|---|---|---|---|---|---|---|
| ADA (CKSAAP) | 0.631 | 0.800 | 0.369 | 0.665 | 0.331 | 0.334 | 0.726 | 0.121 |
| ADA (binary) | 0.614 | 0.800 | 0.385 | 0.657 | 0.316 | 0.342 | 0.718 | 0.118 |
| ADA (AAC) | 0.618 | 0.802 | 0.381 | 0.669 | 0.426 | 0.338 | 0.755 | 0.122 |
| ADA (CKSAAP, binary) | 0.635 | 0.800 | 0.364 | 0.697 | 0.397 | 0.303 | 0.763 | 0.138 |
| ADA (CKSAAP, AAC) | 0.621 | 0.801 | 0.398 | 0.682 | 0.467 | 0.309 | 0.782 | 0.140 |
| ADA (binary, AAC) | 0.626 | 0.802 | 0.393 | 0.691 | 0.487 | 0.308 | 0.788 | 0.142 |
| SVM (CKSAAP) | 0.718 | 0.800 | 0.281 | 0.721 | 0.442 | 0.278 | 0.793 | 0.137 |
| SVM (binary) | 0.677 | 0.800 | 0.322 | 0.716 | 0.433 | 0.283 | 0.790 | 0.136 |
| SVM (AAC) | 0.645 | 0.901 | 0.345 | 0.772 | 0.563 | 0.227 | 0.777 | 0.122 |
| SVM (CKSAAP, binary) | 0.728 | 0.800 | 0.278 | 0.734 | 0.467 | 0.268 | 0.796 | 0.140 |
| SVM (CKSAAP, AAC) | 0.698 | 0.902 | 0.301 | 0.761 | 0.526 | 0.238 | 0.801 | 0.145 |
| SVM (binary, AAC) | 0.701 | 0.901 | 0.298 | 0.812 | 0.667 | 0.209 | 0.804 | 0.149 |
| RF (CKSAAP) | 0.772 | 0.800 | 0.227 | 0.739 | 0.479 | 0.261 | 0.798 | 0.143 |
| RF (binary) | 0.729 | 0.800 | 0.271 | 0.716 | 0.432 | 0.283 | 0.786 | 0.142 |
| RF (AAC) | 0.732 | 0.902 | 0.267 | 0.771 | 0.544 | 0.228 | 0.795 | 0.147 |
| RF (CKSAAP, binary) | 0.777 | 0.800 | 0.222 | 0.749 | 0.478 | 0.261 | 0.814 | 0.154 |
| RF (CKSAAP, AAC) | 0.732 | 0.802 | 0.267 | 0.671 | 0.544 | 0.228 | 0.737 | 0.124 |
| RF (binary, AAC) | 0.761 | 0.802 | 0.239 | 0.760 | 0.627 | 0.159 | 0.805 | 0.129 |
| RF (CKSAAP, binary, AAC) | 0.825 |
Better results with each of ADA, SVM and RF were highlighted by bold values.
Figure 4Independent test performance for 21 different candidate prediction models. (A) ROC curves with the RF based 7 different prediction models, (B) ROC curves with the ADA based 7 different prediction models, (C) ROC curves with the SVM based 7 different prediction models, and (D) ROC curves for the best prediction models with ADA, SVM, and RF.