| Literature DB >> 34064731 |
Arslan Siraj1, Dae Yeong Lim1, Hilal Tayara2, Kil To Chong1,3.
Abstract
Protein ubiquitylation is an essential post-translational modification process that performs a critical role in a wide range of biological functions, even a degenerative role in certain diseases, and is consequently used as a promising target for the treatment of various diseases. Owing to the significant role of protein ubiquitylation, these sites can be identified by enzymatic approaches, mass spectrometry analysis, and combinations of multidimensional liquid chromatography and tandem mass spectrometry. However, these large-scale experimental screening techniques are time consuming, expensive, and laborious. To overcome the drawbacks of experimental methods, machine learning and deep learning-based predictors were considered for prediction in a timely and cost-effective manner. In the literature, several computational predictors have been published across species; however, predictors are species-specific because of the unclear patterns in different species. In this study, we proposed a novel approach for predicting plant ubiquitylation sites using a hybrid deep learning model by utilizing convolutional neural network and long short-term memory. The proposed method uses the actual protein sequence and physicochemical properties as inputs to the model and provides more robust predictions. The proposed predictor achieved the best result with accuracy values of 80% and 81% and F-scores of 79% and 82% on the 10-fold cross-validation and an independent dataset, respectively. Moreover, we also compared the testing of the independent dataset with popular ubiquitylation predictors; the results demonstrate that our model significantly outperforms the other methods in prediction classification results.Entities:
Keywords: CNN; LSTM; deep learning; post-translational modification; ubiquitylation
Year: 2021 PMID: 34064731 PMCID: PMC8151217 DOI: 10.3390/genes12050717
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Proposed model architecture.
Proposed Model Layer Hyperparameter Details.
| Layers | Hyperparameter Settings | Output Shape |
|---|---|---|
| Input_1 | shape = (31) | (31) |
| Embedding | Input dim = 22 | |
| Output dim = 32 | (31, 32) | |
| Input shape = (31 ) | ||
| LSTM | units = 32 | |
| Kernal reg = L2 (1 × 10 | (31, 32) | |
| Recurrent reg = L2 (1 × 10 | ||
| Bias reg = L2 (1 × 10 | ||
| Dropout | Rate = 0.2 | (31, 32) |
| MaxPooling1D | Pool size = 2 | (15, 32) |
| Flatten_1 | Just flatten the matrix | (480) |
| Input_2 | shape = (31, 5) | (31, 5) |
| Conv1D | filters = 16 | |
| kernal_size = 3 | (29, 16) | |
| Activation = relu | ||
| MaxPooling1D | Pool size = 2 | (14, 16) |
| Dropout | Rate = 0.2 | (14, 16) |
| Flatten_2 | Just flatten the matrix | (224) |
| Concatenate | concatenate the Flatten_1 and Flatten_2 | (704) |
| Dense | Activation = relu | (16) |
| Units = 16 | ||
| Dropout | Rate = 0.4 | (16) |
| Dense | Activation = softmax | (2) |
| Units = 2 |
Results of different techniques.
| Models | 10-Fold Cross Validation | Independent | ||
|---|---|---|---|---|
|
|
|
|
|
|
| LSTM-emb | 0.700 | 0.735 | 0.734 | 0.779 |
| CNN-emb | 0.704 | 0.739 | 0.733 | 0.776 |
| BiLSTM-onehot | 0.725 | 0.729 | 0.757 | 0.777 |
| CNN-onehot | 0.719 | 0.731 | 0.748 | 0.778 |
| CNN-onehot-PCA | 0.748 | 0.750 | 0.768 | 0.786 |
| Comb-emb-PCA (UbiComb) |
|
|
|
|
| RF-Comb | 0.762 | 0.757 | 0.781 | 0.800 |
The Comb-emb-PCA (UbiComb) provided the best results in terms of 10-fold cross-validation and independent results.
Figure 2ROC-AUC comparisons of different techniques. (a) 10-fold cross validation. (b) Independent data results.
10-fold cross-validation result on different fragment lengths.
| Fragment | ACC | F-Score | AUC |
|---|---|---|---|
| 21 | 0.762 | 0.753 | 0.833 |
| 23 | 0.754 | 0.744 | 0.835 |
| 25 | 0.767 | 0.759 | 0.848 |
| 27 | 0.774 | 0.760 | 0.853 |
| 29 | 0.779 | 0.763 | 0.854 |
| 31 |
|
|
|
| 33 | 0.780 | 0.769 | 0.859 |
| 35 | 0.782 | 0.773 | 0.854 |
| 37 | 0.771 | 0.770 | 0.856 |
| 39 | 0.788 | 0.777 | 0.856 |
| 41 | 0.777 | 0.763 | 0.855 |
The fragment length 31 shows the best result.
Figure 3AUCs of different fragment results.
Comparison of UbiComb with recent existing predictor.
| Models | 10-Fold Cross Validation | Independent | ||
|---|---|---|---|---|
|
|
|
|
|
|
| Wang et al., | 0.782 | 0.785 | 0.791 | 0.782 |
| UbiComb |
|
|
|
|
The UbiComb give the improve results in terms of 10-fold cross-validation and independent results.
Figure 4AUCs of 10-fold cross-validation.
Figure 5AUCs of 10-fold cross-validation of Tobacco species Dataset.
Independent dataset comparison of UbiComb with existing predictors.
| Models | 10-Fold Cross Validation | Independent | ||
|---|---|---|---|---|
|
|
|
|
|
|
| UbPred | 0.719 | 0.738 | 0.626 | 0.678 |
| iUbiq-Lys | 0.799 | 0.837 | 0.563 | 0.671 |
| Ubisite | 0.752 | 0.794 | 0.596 | 0.681 |
| Deep Ub | 0.683 | 0.703 | 0.674 | 0.687 |
| DeepUbi | 0.739 | 0.741 | 0.733 | 0.734 |
| Wang et al., | 0.756 | 0.767 | 0.733 | 0.749 |
| UbiComb |
|
|
|
|
The UbiComb give the best results in terms of 10-fold cross-validation and independent results.
Figure 6AUC of independent testing.