| Literature DB >> 35693166 |
Yin Luo1, Jiulei Jiang2, Jiajie Zhu3, Qiyi Huang1,4, Weimin Li3, Ying Wang2, Yamin Gao2.
Abstract
Ubiquitination, a widespread mechanism of regulating cellular responses in plants, is one of the most important post-translational modifications of proteins in many biological processes and is involved in the regulation of plant disease resistance responses. Predicting ubiquitination is an important technical method for plant protection. Traditional ubiquitination site determination methods are costly and time-consuming, while computational-based prediction methods can accurately and efficiently predict ubiquitination sites. At present, capsule networks and deep learning are used alone for prediction, and the effect is not obvious. The capsule network reflects the spatial position relationship of the internal features of the neural network, but it cannot identify long-distance dependencies or focus on amino acids in protein sequences or their degree of importance. In this study, we investigated the use of convolutional neural networks and capsule networks in deep learning to design a novel model "Caps-Ubi," first using the one-hot and amino acid continuous type hybrid encoding method to characterize ubiquitination sites. The sequence patterns, the dependencies between the encoded protein sequences and the important amino acids in the captured sequences, were then focused on the importance of amino acids in the sequences through the proposed Caps-Ubi model and used for multispecies ubiquitination site prediction. Through relevant experiments, the proposed Caps-Ubi method is superior to other similar methods in predicting ubiquitination sites.Entities:
Keywords: capsule network; hybrid encoding; plant protection; protein ubiquitination; site prediction
Year: 2022 PMID: 35693166 PMCID: PMC9175003 DOI: 10.3389/fpls.2022.884903
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 6.627
Data on protein ubiquitination sites.
| Dataset | No. of positive data | No. of negative data |
|---|---|---|
| Training | 44,214 | 44,214 |
| Validation | 4,913 | 4,913 |
| Testing | 5,459 | 5,459 |
Figure 1Schematic diagram of one-hot encoding of protein fragments.
The pseudocode of a dynamic routing mechanism.
| ROUTING ( |
|
|
|
|
| for all capsules |
| for |
| for all capsules |
| for all capsules |
| for all capsules |
| end for |
Figure 2Network structure of the proposed model.
Figure 3Accuracy of the verification set for various window lengths.
Comparison of various coding schemes.
| Feature | Model | Acc (%) | AUC | MCC | ||
|---|---|---|---|---|---|---|
| One-hot | CapsNet | 89.51 | 93.70 | 85.31 | 0.96 | 0.80 |
| CNN | 84.93 | 86.39 | 82.93 | 0.93 | 0.70 | |
| Amino acid continuous | CapsNet | 90.06 | 91.88 | 88.23 | 0.96 | 0.80 |
| CNN | 83.83 | 85.25 | 82.41 | 0.91 | 0.68 | |
| One-hot and amino acid continuous | CapsNet | 90.47 | 93.66 | 87.27 | 0.96 | 0.81 |
| CNN | 84.67 | 82.62 | 86.72 | 0.93 | 0.70 |
Accuracy of the model
Sensitivity of the model.
Specificity of the model.
Area under curve.
Matthew’s correlation coefficient.
Figure 4Receiver operating characteristic curve of Caps-Ubi and CNN on the test set.
Results of testing Caps-Ubi under natural-distribution data.
| Protein fragment | Acc (%) | AUC | MCC | Positive–negative ratio | ||
|---|---|---|---|---|---|---|
| 1,000 | 53.75 | 0.08 | 0.99 | 0.70 | 0.19 | 1:8 |
| 10,000 | 53.30 | 0.12 | 0.95 | 0.59 | 0.12 | 1:8 |
Accuracy of the model.
Sensitivity of the model.
Specificity of the model.
Area under curve.
Matthew’s correlation coefficient.
Proposed Caps-Ubi compared with other methods.
| Predictor | Acc (%) | AUC | MCC | ||
|---|---|---|---|---|---|
| UbiPred ( | 84.44 | 83.44 | 85.43 | 0.85 | 0.69 |
| UbSite ( | 74.5 | 65.5 | 74,8 | – | – |
| CKSAAP_UbSite ( | 73.4 | 69.85 | 76.96 | 0.81 | 0.47 |
| UbiProber ( | – | 37.0 | 90.0 | 0.77 | 0.63 |
| iUbiq-Lys ( | 82.14 | 80.56 | 99.39 | – | 0.50 |
| DeepUbi ( | 88.98 | 89.8 | 88,10 | 0.91 | 0.78 |
| Caps-Ubi | 91.34 | 93.11 | 89.34 | 0.96 | 0.83 |
Accuracy of the model.
Sensitivity of the model.
Specificity of the model.
Area under curve.
Matthew’s correlation coefficient.