| Literature DB >> 32878321 |
Md Easin Arafat1, Md Wakil Ahmad1, S M Shovan2, Abdollah Dehzangi3,4, Shubhashis Roy Dipta1, Md Al Mehedi Hasan2, Ghazaleh Taherzadeh5, Swakkhar Shatabda1, Alok Sharma6,7,8,9.
Abstract
Post Translational Modification (PTM) is defined as the alteration of protein sequence upon interaction with different macromolecules after the translation process. Glutarylation is considered one of the most important PTMs, which is associated with a wide range of cellular functioning, including metabolism, translation, and specified separate subcellular localizations. During the past few years, a wide range of computational approaches has been proposed to predict Glutarylation sites. However, despite all the efforts that have been made so far, the prediction performance of the Glutarylation sites has remained limited. One of the main challenges to tackle this problem is to extract features with significant discriminatory information. To address this issue, we propose a new machine learning method called BiPepGlut using the concept of a bi-peptide-based evolutionary method for feature extraction. To build this model, we also use the Extra-Trees (ET) classifier for the classification purpose, which, to the best of our knowledge, has never been used for this task. Our results demonstrate BiPepGlut is able to significantly outperform previously proposed models to tackle this problem. BiPepGlut achieves 92.0%, 84.8%, 95.6%, 0.82, and 0.88 in accuracy, sensitivity, specificity, Matthew's Correlation Coefficient, and F1-score, respectively. BiPepGlut is implemented as a publicly available online predictor.Entities:
Keywords: bi-peptide evolutionary features; extra-trees classifier; lysine Glutarylation; machine learning; post-translational modification
Mesh:
Substances:
Year: 2020 PMID: 32878321 PMCID: PMC7565944 DOI: 10.3390/genes11091023
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1This flow chart demonstrates the general architecture of BiPepGlut. The positive and negative sites were yielded from a public database. Features were then extracted using the bi-peptide-based evolutionary feature extraction technique and then the useful features are selected. After that, the Extra Tree (ET) classifier was trained using our extracted features and then evaluated using 10-fold cross-validation and an independent test set.
Figure 2Illustration of lysine residues with its surrounding upstream and downstream amino acids. (a) Lysine residues with sufficient neighboring amino acids. (b) A scenario of adding dummy residues in N-terminus and C-terminus to have insufficient amino acids neighbors on either the upstream or downstream segment.
Name of measuring matrices used for comparing performances based on a 10-fold cross-validation.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
| 80.2% | 63.4% | 96.9% | 0.64 | 0.76 |
|
| 79.7% | 67.8% | 91.5% | 0.61 | 0.76 |
|
| 82.9% | 74.2% | 91.5% | 0.67 | 0.81 |
|
| 79.2% | 74.7% | 83.8% | 0.59 | 0.78 |
|
| 81.5% | 70.0% | 92.9% | 0.64 | 0.79 |
|
| 78.7% | 75.4% | 82.0% | 0.58 | 0.78 |
Name of measuring matrices used for comparing performances based on the independent-test set.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
| 79.6% | 41.3% | 98.9% | 0.54 | 0.58 |
|
| 80.3% | 45.7% | 97.8% | 0.55 | 0.61 |
|
| 91.2% | 78.3% | 97.8% | 0.80 | 0.86 |
|
| 85.4% | 76.1% | 90.1% | 0.67 | 0.78 |
|
| 92.0% | 84.8% | 95.6% | 0.82 | 0.88 |
|
| 84.7% | 76.1% | 88.0% | 0.64 | 0.76 |
Figure 3Feature importance of 210 features selected for our model development.
Figure 4Receiver operator characteristic (ROC) curves using 10-fold cross-validation.
Figure 5Receiver operator characteristic (ROC) curves using the independent test. The area under the curve (AUC) for each algorithm is indicated in parentheses.
Comparison of the performance of BiPepGlut to existing Glutarylation predictor using 10-fold cross-validation.
| Predictor Tool | ACC (%) | SN (%) | SP (%) | MCC | F1-Score |
|---|---|---|---|---|---|
| 74.9% | 64.8% | 76.6% | 0.32 | 0.43 | |
|
| 50.4% |
| 0.51 | - | |
| 75.0% |
| 68.0% | 0.50 | 0.73 | |
|
| 81.5% | 70.0% | 92.9% |
|
|
Comparison of the performance of BiPepGlut to an existing Glutarylation predictor using the independent-test set.
| Predictor Tool | ACC (%) | SN (%) | SP (%) | MCC | F1-Score |
|---|---|---|---|---|---|
| 75.4% | 51.8% | 78.5% | 0.22 | 0.33 | |
| 88.5% | 51.4% | 95.3% | 0.52 | - | |
| 72.0% | 73.0% | 70.0% | 0.43 | 0.72 | |
|
|
|
|
|
|
|
Figure 6Comparing the results achieved using barplot among our model, BiPepGlut, GlutPred [22], iGlu-Lys [23], and RF-GlutarySite [25].
Figure 7Screen-shot of BiPepGlut homepage.