| Literature DB >> 30777029 |
Hongli Fu1, Yingxi Yang1, Xiaobo Wang1, Hui Wang2, Yan Xu3,4.
Abstract
BACKGROUND: Protein ubiquitination occurs when the ubiquitin protein binds to a target protein residue of lysine (K), and it is an important regulator of many cellular functions, such as signal transduction, cell division, and immune reactions, in eukaryotes. Experimental and clinical studies have shown that ubiquitination plays a key role in several human diseases, and recent advances in proteomic technology have spurred interest in identifying ubiquitination sites. However, most current computing tools for predicting target sites are based on small-scale data and shallow machine learning algorithms.Entities:
Keywords: Convolutional neural networks; Deep learning; Ubiquitination
Mesh:
Substances:
Year: 2019 PMID: 30777029 PMCID: PMC6379983 DOI: 10.1186/s12859-019-2677-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The values of super-parameter tuning
| Super-Parameter | Preferred Setting |
|---|---|
| Embedding length | 21 |
| Batch size | 65 |
| Maximum epoch | 30 |
| Convolution blocks | ([2–6], 64, ReLU) |
| Fully connected layer units | 128 |
| Cutoff | 0.5 |
| Dropout | 0.7 |
| Learning rate | 0.01 with decay rate 0.95 |
| Regularization | L2 |
The results of 4-, 6-,8-,10-fold cross-validations with the One-Hot feature
| Cross-Validation | AUC |
| |||
|---|---|---|---|---|---|
| 4-fold | 86.86 | 84.33 | 89.57 | 0.8787 | 0.74 |
| 6-fold | 88.47 | 86.65 | 90.43 | 0.8853 | 0.77 |
| 8-fold | 88.06 | 88.26 | 87.84 | 0.8884 | 0.76 |
| 10-fold | 89.58 | 87.65 | 91.65 | 0.8905 | 0.79 |
Fig. 1ROC curves of different cross-validations. ROC curves and their AUC values of 4-, 6-, 8-, and 10-fold cross validations with the One-Hot encoding scheme
The results of four different encoding schemes in the 10-fold cross-validation
| Features | AUC |
| |||
|---|---|---|---|---|---|
| One-Hot | 89.58 | 87.65 | 91.65 | 0.8905 | 0.79 |
| One-Hot + CKSAAP | 88.98 | 89.80 | 88.10 | 0.9066 | 0.78 |
| One-Hot + PseAAC | 86.05 | 87.58 | 84.41 | 0.8847 | 0.72 |
| One-Hot + IPCP | 86.41 | 83.44 | 89.61 | 0.8932 | 0.73 |
Fig. 2ROC curves of different feature constructions. ROC curves and their AUC values of four features in the 10-fold cross validation. These curves are very close to each other which illustrate the robustness of the model
The results for naturally distributed DeepUbi data
| No. of fragments | Acc (%) | Sn (%) | Sp (%) | AUC | MCC | Pos:Neg |
|---|---|---|---|---|---|---|
| 900 | 50.56 | 45.50 | 91.00 | 0.5490 | 0.23 | 1:8 |
| 9000 | 49.56 | 44.46 | 90.30 | 0.6626 | 0.22 | 1:8 |
Comparison of DeepUbi and other ubiquitination prediction tools
| Predictor | No. of positive samples | AUC |
| |||
|---|---|---|---|---|---|---|
| UbiPred | 151 | 84.44 | 83.44 | 85.43 | 0.85 | 0.69 |
| UbPred | 265 | 72.0 | – | – | 0.79 | – |
| UbSite | 385 | 74.5 | 65.5 | 74.8 | – | – |
| CKSAAP_UbSite | 263 | 73.4 | 69.85 | 76.96 | 0.81 | 0.47 |
| UbiProber | 22,192 | – | 37.0 | 90.0 | 0.77 | 0.63 |
| hCKSAAP_UbSite | 9537 | – | – | – | 0.77 | – |
| iUbiq-Lys | 659 | 82.14 | 80.56 | 99.39 | – | 0.50 |
| ESA-UbiSite | 85 | 94.0 | 96.0 | 92.0 | – | 0.92 |
| DeepUbi | 53,999 | 88.98 | 89.80 | 88.10 | 0.91 | 0.78 |
The DeepUbi results for the same number of samples as the other existing tools
| No. of positive samples | AUC |
| |||
|---|---|---|---|---|---|
| UbiPred | 84.44 | 83.44 | 85.43 | 0.85 | 0.69 |
| DeepUbi | 98.77 | 98.87 | 98.67 | 0.9993 | 0.98 |
| UbPred | 72.0 | – | – | 0.79 | – |
| DeepUbi | 98.51 | 98.45 | 98.57 | 0.9975 | 0.97 |
| UbSite | 74.5 | 65.5 | 74.8 | – | – |
| DeepUbi | 97.99 | 97.79 | 98.18 | 0.9933 | 0.96 |
| CKSAAP_UbSite | 73.4 | 69.85 | 76.96 | 0.81 | 0.47 |
| DeepUbi | 99.19 | 98.96 | 99.42 | 0.9959 | 0.98 |
| UbiProber | – | 37.0 | 90.0 | 0.77 | 0.63 |
| DeepUbi | 91.83 | 90.12 | 93.55 | 0.9093 | 0.84 |
| hCKSAAP_UbSite | – | – | – | 0.77 | – |
| DeepUbi | 94.10 | 92.31 | 95.89 | 0.9289 | 0.88 |
| iUbiq-Lys | 82.14 | 80.56 | 99.39 | – | 0.50 |
| DeepUbi | 98.92 | 98.90 | 98.93 | 0.9913 | 0.98 |
| ESA-UbiSite | 94.0 | 96.0 | 92.0 | – | 0.92 |
| DeepUbi | 95.59 | 95.53 | 95.65 | 0.9947 | 0.91 |
Fig. 3Different sequence analysis charts about ubiquitination and non-ubiquitination peptides. a A bar chart to compare the number of flanking amino acids surrounding the ubiquitination and non-ubiquitination peptides. b A circular chart to compare the percentage of flanking amino acids surrounding the ubiquitination and non-ubiquitination peptides. c Two Sample Logos web-server to calculate and visualize differences between ubiquitination and non-ubiquitination peptides
Fig. 4Flow chart of the data collection and processing. Firstly, collecting the raw proteins and then removing the redundant protein sequences with CD-Hit; secondly, intercepting the protein sequences with a 31 sliding window to get the positive and negative fragments; at last, using 30% identity in negative samples to get a balanced training data
Fig. 5a Flow chart of the CNN deep learning model. b An example of convolution-pooling structure. a Input a fragment and encode; construct an embedding layer; build multi-convolution-pooling layers; construct fully connected layers; and then get the output. b Use different filters with different sizes to get a series of feature maps; and then use a max-pooling and concatenating together to form a feature vector. Finally, the softmax function regularization is used to get the classification