| Literature DB >> 30108418 |
Md Shakil Ahmed1, Md Shahjaman2, Enamul Kabir3, Md Kamruzzaman4.
Abstract
Lysine acetylation is one of the decisive categories of protein post-translational modification (PTM), it is convoluted in many significant cellular developments and severe diseases in the biological system. The experimental identification of protein-acetylated sites is painstaking, time-consuming and expensive. Hence, there is significant interest in the development of computational approaches for consistent prediction of acetylation sites using protein sequences. Features selection from protein sequences plays a significant role for acetylation sites prediction. We describe an improved feature selection approach for acetylation sites prediction based on kernel naive Bayes classifier (KNBC). We have shown that KNBC generated from selected features by a new feature selection method outperforms than the existing methods for identification of acetylation sites. The sensitivity, specificity, ACC (Accuracy), MCC (Matthews Correlation Coefficient) and AUC (Area under Curve of ROC) in our proposed method are as follows 80.71%, 93.39%, 76.73%, 41.37% and 83.0% with the optimum window size is 47. Thus the kernel naive Bayes classifier finds application in acetylation site prediction.Entities:
Keywords: Acetylation; Binary Encoding; CKSAAP Encoding; Kernel Naive Bayes Classifier; Kruskal-Wallis test; Protein Sequences
Year: 2018 PMID: 30108418 PMCID: PMC6077816 DOI: 10.6026/97320630014213
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1Schematic diagram for Kernel Naive Bayes modeling
Figure 2Flow chart for the prediction of acetylation and non-acetylation sites using kernel NB model based on the K-W feature selection (Proposed), the necessary working steps are as follows: (A) Data collection, preprocessing and making positive and negative groups using suitable window size. (B) Feature extraction from the fragment of protein sequences using Perl programming language. (C) Machine learning algorithm kernel naive Bayes for the classification of acetylation sites. (D) The prediction score and results by our proposed method.
AUC between two encoding methods by the proposed method
| Encoding Methods | Proposed (AUC) |
| Binary | 0.761 |
| CKSAAP | 0.830 |
Figure 4The performance of different classifiers for (a) Full dataset, (b) 1:1 dataset, (c) 1:2 dataset and (d) 1:3 dataset
Figure 3The feature slection performance of different test statistic for (a) Full dataset, (b) 1:1 dataset, (c) 1:2 dataset and (d) 1:3 dataset
Figure 5Comparison with other popular methods for acetylation sites prediction