| Literature DB >> 35323738 |
Sharaf Malebary1, Shaista Rahman2, Omar Barukab1, Rehab Ash'ari1, Sher Afzal Khan2.
Abstract
Acetylation is the most important post-translation modification (PTM) in eukaryotes; it has manifold effects on the level of protein that transform an acetyl group from an acetyl coenzyme to a specific site on a polypeptide chain. Acetylation sites play many important roles, including regulating membrane protein functions and strongly affecting the membrane interaction of proteins and membrane remodeling. Because of these properties, its correct identification is essential to understand its mechanism in biological systems. As such, some traditional methods, such as mass spectrometry and site-directed mutagenesis, are used, but they are tedious and time-consuming. To overcome such limitations, many computer models are being developed to correctly identify their sequences from non-acetyl sequences, but they have poor efficiency in terms of accuracy, sensitivity, and specificity. This work proposes an efficient and accurate computational model for predicting Acetylation using machine learning approaches. The proposed model achieved an accuracy of 100 percent with the 10-fold cross-validation test based on the Random Forest classifier, along with a feature extraction approach using statistical moments. The model is also validated by the jackknife, self-consistency, and independent test, which achieved an accuracy of 100, 100, and 97, respectively, results far better as compared to the already existing models available in the literature.Entities:
Keywords: acetylation; machine learning; membrane proteins; post-translational modification; probabilistic neural network; random forest; statistical movement
Year: 2022 PMID: 35323738 PMCID: PMC8955084 DOI: 10.3390/membranes12030265
Source DB: PubMed Journal: Membranes (Basel) ISSN: 2077-0375
Figure 1Acetylation protein.
Figure 2Flowchart of the proposed predictor.
Figure 3Schematic view of PNN.
Figure 4The working mechanism of the Random Forest classifier.
Confusion matrix.
| Status Person | Predicted Patient (1) | Predicted Healthy Person (0) |
|---|---|---|
| Actual patient (1) | TP | FN |
| Actual healthy person (0) | FP | TN |
Figure 5K-fold cross-validation (KFCV).
Result of 10-fold cross-validation based on the Random Forest classifier.
| 10-Fold Cross-Validation Random Forest | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset 1 | Dataset 2 | Dataset 3 | |||||||||||||||||||
| K-Folds | ACC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m |
| 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 |
| 2 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 |
| 3 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 |
| 4 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 |
| 5 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 |
| 6 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 |
| 7 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 |
| 8 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 |
| 9 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 |
| 10 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 |
| result | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 |
Figure 610-fold Random Forest ROC curve.
10-fold cross-validation Result for Probabilistic neural network.
| 10-Fold Cross-Validation Probabilistic Neural Network | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset 1 | Dataset 2 | Dataset 3 | |||||||||||||||||||
| K-Folds | ACC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m |
| Final Score | 66.83 | 0.72 | 0.60 | 0.65 | 0.36 | 0.95 | 0.72 | 60 | 0.26 | 0.93 | 0.81 | 0.21 | 0.26 | 0.40 | 57.17 | 0.92 | 0.22 | 0.54 | 0.36 | 0.92 | 0.72 |
Figure 7Ten-fold Probabilistic Neural Network ROC curve.
Jackknife test score based on Random Forest.
| Predicton | Jackknife Random Forest | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset 1 | Dataset 2 | Dataset 3 | |||||||||||||||||||
| Fold | ACC | Sn | Sp | Pre | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m |
| Result | 100 | 0.55 | 0.5 | 0.55 | 0.05 | 0.003 | 0.01 | 99.86 | 0.55 | 0.5 | 0.55 | 0.05 | 0.54931 | 0.55 | 99.86 | 0.55 | 0.5 | 0.55 | 0.05 | 0.54931 | 0.5 |
Jackknife score based on the Probabilistic Neural Network.
| Prediction | Jackknife Probabilistic Neural Network | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset 1 | Dataset 2 | Dataset 3 | |||||||||||||||||||
| Jackknife | ACC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m |
| Final Score | 66.87 | 0.55 | 0.5 | 0.5 | 0.6 | 0.41 | 0.42 | 59.77 | o.5 | 0.5 | 0.5 | 0.6 | 0.13 | 0.20 | 57.41 | 0.55 | 0.4 | 0.5 | 0.6 | 0.50 | 0.48 |
Figure 8Jackknife Random Forest ROC curve.
Figure 9Jackknife PNN ROC curve.
Results of self-consistency based on Random Forest.
| Self-Consistency Random Forest | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset 1 | Dataset 2 | Dataset 3 | ||||||||||||||||||
| ACC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m |
| 100 | 1 | 0.99 | 1 | 0.99 | 0.997 | 1 | 100 | 1 | 1 | 1 | 0.99 | 0.997 | 1 | 100 | 1 | 1 | 1 | 1 | 1 | 1 |
Figure 10Self- consistency Random Forest ROC curve.
Self-consistency test result for probabilistic neural network.
| Self-Consistency Probabilistic Neural Network | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset 1 | Dataset 2 | Dataset 3 | ||||||||||||||||||
| ACC | Sn | Sp | Prec | MCC | Recall | F.m | AC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m |
| 66.83 | 0.72 | 0.60 | 0.64 | 0.36 | 0.72 | 0.68 | 60 | 0.26 | 0.93 | 0.80 | 0.20 | 0.26 | 0.39 | 57.17 | 0.92 | 0.22 | 0.54 | 0.36 | 0.92 | 1.84 |
Figure 11Self- consistency Probabilistic Neural Network ROC curve.
Figure 12Independent test.
Results of independent test based on the Random Forest.
| Independent Test Results Random Forest | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset 1 | Dataset 2 | Dataset 3 | ||||||||||||||||||
| Training Dataset | Training Dataset | Training Dataset | ||||||||||||||||||
| ACC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m |
| 97 | 1 | 1 | 1 | 1 | 0.969 | 1 | 96 | 1 | 1 | 1 | 1 | 0.97 | 1 | 96 | 1 | 1 | 1 | 1 | 0.95 | 1 |
| Testing Dataset | Testing Dataset | Testing Dataset | ||||||||||||||||||
| ACC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m |
| 98 | 1 | 1 | 1 | 1 | 0.96 | 1 | 95 | 1 | 1 | 1 | 1 | 0.93 | 1 | 97 | 1 | 1 | 1 | 1 | 0.96 | 1 |
Result of independent test based on Probabilistic Neural Network.
| Independent Test Result Probabilistic Neural Network | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset 1 | Dataset 2 | Dataset 3 | |||||||||||||||||
|
|
|
| |||||||||||||||||
| ACC | Sn | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m |
| 50.8 | 0.1 | 1 | 0.6 | 0.115 | 0.20 | 52.6 | 0.7 | 0.3 | 0.53 | 0.05 | 0.75 | 0.62 | 51.33 | 1 | 0.1 | 1 | 0.5 | 0.97 | 0.98 |
|
|
|
| |||||||||||||||||
| ACC | Sn | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m |
| 50.8 | 1 | 1 | 0.6 | 0.92 | 0.96 | 54.0 | 0.2 | 0.9 | 0.54 | 0.06 | 0.19 | 2.86 | 51.03 | 1 | 0.9 | 1 | 0.6 | 0.07 | 0.13 |
Figure 13Independent Random Forest ROC curve.
Figure 14Independent Probabilistic Neural Network ROC curve.
Performance of proposed model based on RF and PNN.
| Prediction | Comparative Analysis of RF and PNN | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset 1 | Dataset 2 | Dataset 3 | |||||||||||||||||||
| Classifier | ACC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m | ACC | Sn | Sp | Prec | MCC | Recall | F.m |
| RF | 100 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| PNN | 66.83 | 0.72 | 0.6 | 0.65 | 0.36 | 0.95 | 0.72 | 60 | 0.26 | 0.93 | 0.81 | 0.21 | 0.26 | 0.40 | 57.17 | 0.92 | 0.22 | 0.54 | 0.36 | 0.92 | 0.72 |
Figure 15ROC curves through Random Forest.
Figure 16ROC curves through Probabilistic Neural Network.
Comparative analysis of the proposed acetylation model with the existing models.
| Prediction Models | ACC% | MCC% | Sn% | Sp% | Prec% | F.m% |
|---|---|---|---|---|---|---|
| All-Mean JK | 74.64% | 0.4980 | 81.38% | 67.91% | 71.78% | 76.24% |
| iAcet-PseFDA | 77.55% | 0.5883 | 96.41% | 71.26% | 52.79% | 68.23% |
| InterPro | 68.25% | 0.3658 | 71.40% | 65.10% | 67.17% | 69.22% |
| iAcety–SmRF | 100 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
| iAcety–SmPNN | 66.83 | 0.36 | 0.72 | 0. 60 | 0.65 | 0.72 |