| Literature DB >> 34220299 |
Samme Amena Tasmia1, Fee Faysal Ahmed1, Parvez Mosharaf1, Mehedi Hasan1, Nurul Haque Mollah1.
Abstract
BACKGROUND: Lysine succinylation is one of the reversible protein post-translational modifications (PTMs), which regulate the structure and function of proteins. It plays a significant role in various cellular physiologies including some diseases of human as well as many other organisms. The accurate identification of succinylation site is essential to understand the various biological functions and drug development.Entities:
Keywords: Protein sequences; encoding schemes; feature selection; fusion model; lysine succinylation site; prediction; random forest
Year: 2021 PMID: 34220299 PMCID: PMC8188582 DOI: 10.2174/1389202922666210219114211
Source DB: PubMed Journal: Curr Genomics ISSN: 1389-2029 Impact factor: 2.236
Fig. (1)Overview on the development of the proposed predictor.
Fig. (2)A schematic diagram of CKSAAP encoding. (A higher resolution/colour version of this figure is available in the electronic copy of the article).
Fig. (3)The occurrences of amino acid propensities of surrounding positive windows (succinylation site) and negative windows (non-succinylation sites) of size 25 are presented by the Two-Sample Logos software (Vacie et al., 2006) [29]. (A higher resolution/colour version of this figure is available in the electronic copy of the article).
Fig. (4)ROC curve to display the performance of different candidate predictors based on independent test dataset-1. (A higher resolution/colour version of this figure is available in the electronic copy of the article).
Fig. (5)ROC curve to display the performance of different candidate predictors based on independent test dataset-2. (A higher resolution/colour version of this figure is available in the electronic copy of the article).
Summary performance at FPR=0.10 for different candidate predictors based on the training dataset corresponding to 1:2 ratio of positive and negative windows.
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| ADA(Binary) | 0.752 | 0.901 | 0.248 | 0.867 | 0.652 | 0.187 | 0.881 | 0.131 |
| ADA(CKSAAP) | 0.747 | 0.900 | 0.253 | 0.853 | 0.637 | 0.221 | 0.865 | 0.110 |
| ADA(AAC) | 0.750 | 0.902 | 0.250 | 0.862 | 0.643 | 0.198 | 0.876 | 0.115 |
| ADA(CKSAAP, Binary) | 0.763 | 0.901 | 0.237 | 0.871 | 0.687 | 0.178 | 0.902 | 0.142 |
| ADA(CKSAAP, AAC) | 0.757 | 0.901 | 0.243 | 0.868 | 0.656 | 0.189 | 0.887 | 0.133 |
| ADA(Binary, AAC) | 0.761 | 0.900 | 0.239 | 0.869 | 0.658 | 0.186 | 0.899 | 0.139 |
| ADA(CKSAAP,Binary,AAC) | 0.769 | 0.901 | 0.231 | 0.873 | 0.690 | 0.171 | 0.908 | 0.144 |
| SVM(Binary) | 0.643 | 0.902 | 0.357 | 0.756 | 0.543 | 0.266 | 0.822 | 0.069 |
| SVM(CKSAAP) | 0.632 | 0.903 | 0.368 | 0.734 | 0.532 | 0.286 | 0.812 | 0.073 |
| SVM(AAC) | 0.638 | 0.901 | 0.362 | 0.737 | 0.541 | 0.268 | 0.820 | 0.079 |
| SVM(CKSAAP, Binary) | 0.665 | 0.901 | 0.335 | 0.777 | 0.573 | 0.250 | 0.841 | 0.069 |
| SVM(CKSAAP, AAC) | 0.668 | 0.902 | 0.332 | 0.779 | 0.575 | 0.243 | 0.848 | 0.071 |
| SVM(Binary, AAC) | 0.675 | 0.901 | 0.325 | 0.781 | 0.578 | 0.241 | 0.850 | 0.072 |
| SVM(CKSAAP,Binary,AAC) | 0.678 | 0.902 | 0.322 | 0.788 | 0.581 | 0.234 | 0.852 | 0.076 |
| RF(Binary) | 0.833 | 0.902 | 0.167 | 0.912 | 0.858 | 0.121 | 0.940 | 0.195 |
| RF(CKSAAP) | 0.801 | 0.903 | 0.199 | 0.903 | 0.703 | 0.134 | 0.904 | 0.181 |
| RF(AAC) | 0.761 | 0.902 | 0.239 | 0.905 | 0.627 | 0.159 | 0.913 | 0.129 |
| RF(CKSAAP, Binary) | 0.854 | 0.900 | 0.146 | 0.930 | 0.776 | 0.118 | 0.940 | 0.156 |
| RF(CKSAAP, AAC) | 0.857 | 0.901 | 0.143 | 0.932 | 0.778 | 0.113 | 0.942 | 0.157 |
| RF(Binary, AAC) | 0.859 | 0.901 | 0.141 | 0.934 | 0.779 | 0.110 | 0.947 | 0.161 |
Summary of average performance at FPR=0.10 for different candidate predictors based on 20-fold CV with 1:2 ratio of positive and negative window samples in the training dataset.
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| ADA(Binary) | 0.378 | 0.902 | 0.621 | 0.593 | 0.205 | 0.407 | 0.734 | 0.048 |
| ADA(CKSAAP) | 0.364 | 0.903 | 0.635 | 0.589 | 0.199 | 0.410 | 0.726 | 0.050 |
| ADA(AAC) | 0.344 | 0.901 | 0.655 | 0.607 | 0.251 | 0.392 | 0.702 | 0.030 |
| ADA(CKSAAP, Binary) | 0.557 | 0.899 | 0.443 | 0.683 | 0.376 | 0.318 | 0.783 | 0.069 |
| ADA(CKSAAP, AAC) | 0.478 | 0.901 | 0.521 | 0.670 | 0.368 | 0.329 | 0.774 | 0.059 |
| ADA(Binary, AAC) | 0.559 | 0.902 | 0.441 | 0.685 | 0.377 | 0.386 | 0.788 | 0.078 |
| ADA(CKSAAP,Binary,AAC) | 0.612 | 0.901 | 0.487 | 0.721 | 0.456 | 0.302 | 0.826 | 0.110 |
| SVM(Binary) | 0.456 | 0.900 | 0.515 | 0.650 | 0.317 | 0.350 | 0.745 | 0.058 |
| SVM(CKSAAP) | 0.343 | 0.901 | 0.657 | 0.578 | 0.178 | 0.421 | 0.719 | 0.044 |
| SVM(AAC) | 0.281 | 0.901 | 0.716 | 0.574 | 0.182 | 0.425 | 0.708 | 0.034 |
| SVM(CKSAAP, Binary) | 0.557 | 0.903 | 0.442 | 0.685 | 0.384 | 0.314 | 0.779 | 0.068 |
| SVM(CKSAAP, AAC) | 0.543 | 0.902 | 0.456 | 0.667 | 0.376 | 0.356 | 0.766 | 0.054 |
| SVM(Binary, AAC) | 0.567 | 0.901 | 0.432 | 0.684 | 0.382 | 0.324 | 0.771 | 0.069 |
| SVM(CKSAAP,Binary,AAC) | 0.598 | 0.902 | 0.401 | 0.700 | 0.422 | 0.312 | 0.814 | 0.100 |
| RF(Binary) | 0.725 | 0.902 | 0.275 | 0.871 | 0.688 | 0.175 | 0.895 | 0.132 |
| RF(CKSAAP) | 0.691 | 0.901 | 0.308 | 0.786 | 0.584 | 0.183 | 0.869 | 0.123 |
| RF(AAC) | 0.681 | 0.899 | 0.319 | 0.859 | 0.672 | 0.192 | 0.877 | 0.124 |
| RF(CKSAAP, Binary) | 0.789 | 0.902 | 0.211 | 0.917 | 0.761 | 0.153 | 0.948 | 0.153 |
| RF(CKSAAP, AAC) | 0.711 | 0.899 | 0.249 | 0.869 | 0.682 | 0.172 | 0.891 | 0.134 |
| RF(Binary, AAC) | 0.725 | 0.902 | 0.275 | 0.871 | 0.688 | 0.175 | 0.895 | 0.132 |
| RF(CKSAAP,Binary,AAC) | 0.800 | 0.902 | 0.200 | 0.919 | 0.766 | 0.141 | 0.958 | 0.163 |
(Binary), SVM(CKSAAP), SVM(AAC), SVM(CKSAAP, Binary), SVM(CKSAAP, AAC), SVM(Binary, AAC), SVM(CKSAAP, Binary, AAC), RF(Binary), RF(CKSAAP), RF(AAC), RF(CKSAAP, Binary), RF(CKSAAP, AAC), RF(Binary, AAC)) along with two existing predictors [14, 18].
Summary performance at FPR=0.10 for different candidate predictors based on independent test dataset-1.
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| ADA(Binary) | 0.645 | 0.901 | 0.345 | 0.772 | 0.563 | 0.227 | 0.841 | 0.101 |
| ADA(CKSAAP) | 0.637 | 0.902 | 0.363 | 0.769 | 0.558 | 0.231 | 0.812 | 0.100 |
| ADA(AAC) | 0.698 | 0.902 | 0.301 | 0.761 | 0.526 | 0.238 | 0.820 | 0.102 |
| ADA(CKSAAP,Binary) | 0.703 | 0.902 | 0.297 | 0.811 | 0.666 | 0.208 | 0.866 | 0.122 |
| ADA(CKSSAP,AAC) | 0.701 | 0.901 | 0.298 | 0.812 | 0.667 | 0.209 | 0.856 | 0.121 |
| ADA(Binary,AAC) | 0.732 | 0.902 | 0.267 | 0.791 | 0.587 | 0.208 | 0.881 | 0.123 |
| ADA(CKSSAP,Binary,AAC) | 0.739 | 0.901 | 0.260 | 0.799 | 0.602 | 0.200 | 0.889 | 0.138 |
| SVM(Binary) | 0.393 | 0.901 | 0.607 | 0.646 | 0.339 | 0.353 | 0.752 | 0.069 |
| SVM(CKSAAP) | 0.388 | 0.901 | 0.611 | 0.644 | 0.335 | 0.355 | 0.742 | 0.073 |
| SVM(AAC) | 0.389 | 0.901 | 0.610 | 0.645 | 0.336 | 0.353 | 0.747 | 0.075 |
| SVM(CKSAAP,Binary) | 0.425 | 0.901 | 0.575 | 0.669 | 0.382 | 0.310 | 0.762 | 0.069 |
| SVM(CKSSAP,AAC) | 0.389 | 0.901 | 0.610 | 0.645 | 0.336 | 0.353 | 0.748 | 0.075 |
| SVM(Binary,AAC) | 0.479 | 0.902 | 0.520 | 0.677 | 0.387 | 0.322 | 0.793 | 0.098 |
| SVM(CKSSAP,Binary,AAC) | 0.482 | 0.901 | 0.518 | 0.679 | 0.397 | 0.311 | 0.796 | 0.103 |
| RF(Binary) | 0.742 | 0.902 | 0.267 | 0.851 | 0.603 | 0.189 | 0.898 | 0.125 |
| RF(CKSAAP) | 0.701 | 0.901 | 0.299 | 0.810 | 0.593 | 0.220 | 0.864 | 0.121 |
| RF(AAC) | 0.732 | 0.902 | 0.267 | 0.771 | 0.544 | 0.228 | 0.868 | 0.124 |
| RF(CKSAAP,Binary) | 0.761 | 0.902 | 0.239 | 0.860 | 0.627 | 0.159 | 0.913 | 0.129 |
| RF(CKSSAP,AAC) | 0.742 | 0.902 | 0.257 | 0.781 | 0.554 | 0.208 | 0.878 | 0.128 |
| RF(Binary,AAC) | 0.761 | 0.902 | 0.239 | 0.860 | 0.627 | 0.159 | 0.913 | 0.129 |
| RF(CKSSAP,Binary,AAC) | 0.798 | 0.902 | 0.201 | 0.891 | 0.629 | 0.145 | 0.921 | 0.139 |
Summary performance at FPR=0.10 for different candidate predictors based on independent test dataset-2.
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| ADA(Binary) | 0.627 | 0.901 | 0.373 | 0.763 | 0.547 | 0.236 | 0.862 | 0.110 |
| ADA(CKSAAP) | 0.626 | 0.902 | 0.374 | 0.763 | 0.547 | 0.236 | 0.793 | 0.093 |
| ADA(AAC) | 0.702 | 0.901 | 0.297 | 0.756 | 0.515 | 0.243 | 0.816 | 0.100 |
| ADA(CKSAAP,Binary) | 0.643 | 0.903 | 0.357 | 0.812 | 0.602 | 0.189 | 0.874 | 0.113 |
| ADA(CKSSAP,AAC) | 0.631 | 0.901 | 0.368 | 0.783 | 0.592 | 0.217 | 0.846 | 0.109 |
| ADA(Binary,AAC) | 0.698 | 0.902 | 0.301 | 0.815 | 0.654 | 0.182 | 0.893 | 0.129 |
| ADA(CKSSAP,Binary,AAC) | 0.700 | 0.902 | 0.299 | 0.816 | 0.656 | 0.181 | 0.895 | 0.130 |
| SVM(Binary) | 0.604 | 0.902 | 0.396 | 0.704 | 0.558 | 0.302 | 0.775 | 0.077 |
| SVM(CKSAAP) | 0.453 | 0.903 | 0.652 | 0.613 | 0.324 | 0.543 | O.689 | 0.05 |
| SVM(AAC) | 0.581 | 0.901 | 0.418 | 0.672 | 0.351 | 0.327 | 0.740 | 0.069 |
| SVM(CKSAAP,Binary) | 0.604 | 0.901 | 0.396 | 0.704 | 0.558 | 0.302 | 0.776 | 0.077 |
| SVM(CKSSAP,AAC) | 0.581 | 0.901 | 0.418 | 0.672 | 0.351 | 0.327 | 0.740 | 0.069 |
| SVM(Binary,AAC) | 0.667 | 0.902 | 0.332 | 0.727 | 0.458 | 0.272 | 0.807 | 0.089 |
| SVM(CKSSAP,Binary,AAC) | 0.678 | 0.901 | 0.321 | 0.731 | 0.466 | 0.268 | 0.810 | 0.099 |
| RF(Binary) | 0.724 | 0.902 | 0.276 | 0.845 | 0.680 | 0.204 | 0.902 | 0.125 |
| RF(CKSAAP) | 0.693 | 0.902 | 0.307 | 0.804 | 0.666 | 0.225 | 0.847 | 0.114 |
| RF(AAC) | 0.615 | 0.901 | 0.384 | 0.801 | 0.651 | 0.198 | 0.877 | 0.121 |
| RF(CKSAAP,Binary) | 0.745 | 0.901 | 0.255 | 0.881 | 0.601 | 0.148 | 0.910 | 0.138 |
| RF(CKSSAP,AAC) | 0.702 | 0.902 | 0.298 | 0.797 | 0.608 | 0.202 | 0.878 | 0.122 |
| RF(Binary,AAC) | 0.749 | 0.902 | 0.250 | 0.884 | 0.669 | 0.169 | 0.920 | 0.140 |
Performance comparison of the proposed predictor with other existing Predictors.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Succinsite2.0 | Dataset-1 | 0.632 | 0.872 | 0.866 | 0.241 | 0.845 |
| GPSuc | Dataset-1 | 0.693 | 0.877 | 0.872 | 0.279 | 0.885 |
| HybridSucc | Dataset-1 | 0.822 | 0.855 | 0.859 | 0.562 | 0.891 |
| CNN-SuccSite | Dataset-1 | 0.716 | 0.844 | 0.842 | 0.443 | 0.839 |
| Proposed Predictor | Dataset-1 | 0.798 | 0.902 | 0.891 | 0.629 | 0.921 |