| Literature DB >> 32476995 |
Wangren Qiu1, Chunhui Xu2, Xuan Xiao1, Dong Xu2,3.
Abstract
BACKGROUND: Ubiquitination, as a post-translational modification, is a crucial biological process in cell signaling, apoptosis, and localization. Identification of ubiquitination proteins is of fundamental importance for understanding the molecular mechanisms in biological systems and diseases. Although high-throughput experimental studies using mass spectrometry have identified many ubiquitination proteins and ubiquitination sites, the vast majority of ubiquitination proteins remain undiscovered, even in well-studied model organisms.Entities:
Keywords: Ubiquitination; functional domain; machine learning; protein annotation; random forest; subcellular localization
Year: 2019 PMID: 32476995 PMCID: PMC7235393 DOI: 10.2174/1389202919666191014091250
Source DB: PubMed Journal: Curr Genomics ISSN: 1389-2029 Impact factor: 2.236
Fig. (2)GO enrichment analysis of the training dataset. A, B, C indicate the positive datasets of H. Sapiens, followed by M. Musculus and A. Thaliana, D, E, F indicate the negative datasets with same order of species.
Fig. (3)GO enrichment analysis network visualization of the training dataset.
Fig. (4)Distribution of subcellular localization of the training data.
Fig. (6)ROC curves to show the performance comparison with other prediction tools. AUC indicates the area under the curve.
Performance comparison of PSSM-Grey by a 5-fold cross-validation.
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|
| ACC | MCC | SN | SP | ACC | MCC | SN | SP | |
| KNN | 79.99 | 60.52 | 86.63 | 73.35 | 80.43 | 61.37 | 86.79 | 74.08 |
| SVM | 88.68 | 77.60 | 84.75 | 92.61 | 87.62 | 75.28 | 86.00 | 89.24 |
| RF | 86.21 | 72.44 | 87.24 | 85.19 | 86.19 | 72.41 | 87.24 | 85.15 |
| Average | 84.96 | 70.19 | 86.20 | 83.72 | 84.75 | 69.68 | 86.68 | 82.82 |
Note: The abbreviations in the table are: Accuracy (ACC), Matthews Correlation Coefficient (MCC), Sensitivity (SN) and Specificity (SP).
A comparison of eight features with different algorithms.
|
|
|
|
|
|
|---|---|---|---|---|
| GO | 82 81 80 | 64 62 60 | 83 79 79 | 81 83 81 |
| Pfam | 87 85 85 | 75 70 70 | 83 80 83 | 92 90 87 |
| Smart | 73 72 70 | 50 50 42 | 50 49 60 | 95 95 81 |
| PROSITE | 79 78 76 | 60 59 55 | 64 61 67 | 94 94 86 |
| SUPFAM | 75 73 74 | 52 48 50 | 60 56 60 | 90 90 88 |
| InterPro | 88 83 83 | 77 67 67 | 86 79 81 | 91 88 86 |
| PRINTS | 70 70 69 | 49 49 45 | 41 41 43 | 99 99 95 |
| SL* | 80 78 77 | 59 57 54 | 76 78 76 | 83 79 77 |
Note: The abbreviations in the table are: Accuracy (ACC), Matthews Correlation Coefficient (MCC), Sensitivity (SN) and Specificity (SP). Three algorithms, Random Forest (RF), Support Vector Machine (SVM) and KNN (K-Nearest Neighbor) were applied. * indicates Subcellular Localization (SL).
A comparison of eight features performance in the training data.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| 1 GO | 82.18 | 64.39 | 83.29 | 81.08 | 81.52 | 83.29 |
| 2 Pfam | 87.35 | 75.01 | 82.88 | 91.83 | 91.03 | 82.88 |
| 3 Smart | 72.62 | 50.45 | 50.49 | 94.75 | 90.59 | 50.49 |
| 4 PROSITE | 78.69 | 60.24 | 63.50 | 93.89 | 91.23 | 63.50 |
| 5 SUPFAM | 75.01 | 52.30 | 60.43 | 89.59 | 85.34 | 60.43 |
| 6 InterPro | 88.42 | 76.91 | 86.25 | 90.59 | 90.16 | 86.25 |
| 7 PRINTS | 70.07 | 49.06 | 41.31 | 98.83 | 97.24 | 41.31 |
| 8 SL | 79.64 | 59.49 | 76.00 | 83.28 | 82.02 | 76.00 |
| 9 PSSM | 86.19 | 72.40 | 87.34 | 85.04 | 85.37 | 87.34 |
| Feature(1-8) | 89.74 | 79.53 | 87.85 | 91.63 | 91.30 | 87.85 |
| Feature(1-9) | 90.13 | 80.34 | 87.99 | 92.28 | 91.93 | 87.99 |
Note: The abbreviations in the table are: Accuracy (ACC), Matthews Correlation Coefficient (MCC),
Sensitivity (SN) and Specificity (SP). “Feature(1-8)” indicates that the first 8 features were applied, and “Feature(1-9)” means that all features were applied.
A comparison of eight features performance in the test data.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| 1 GO | 77.45 | 57.40 | 91.63 | 63.27 | 71.51 | 91.63 |
| 2 Pfam | 82.39 | 65.06 | 78.10 | 86.67 | 85.49 | 78.10 |
| 3 Smart | 72.61 | 47.70 | 56.80 | 88.43 | 83.13 | 56.80 |
| 4 PROSITE | 78.14 | 56.86 | 71.05 | 85.23 | 82.79 | 71.05 |
| 5 SUPFAM | 72.71 | 45.78 | 66.80 | 78.63 | 75.81 | 66.80 |
| 6 InterPro | 80.16 | 60.40 | 81.76 | 78.56 | 79.31 | 81.76 |
| 7 PRINTS | 70.78 | 47.54 | 46.54 | 95.03 | 90.37 | 46.54 |
| 8 SL | 74.35 | 49.35 | 80.46 | 68.24 | 72.06 | 80.46 |
| 9 PSSM | 85.72 | 71.45 | 86.21 | 85.23 | 85.40 | 86.21 |
| Feature(1-8) | 86.27 | 72.57 | 87.25 | 85.29 | 85.58 | 87.25 |
| Feature(1-9) | 87.71 | 75.43 | 87.91 | 87.52 | 87.57 | 87.91 |
Note: The abbreviations in the table are: Accuracy (ACC), Matthews Correlation Coefficient (MCC),
Sensitivity (SN) and Specificity (SP). “Feature(1-8)” indicates that the first 8 features were applied, and “Features(1-9)” means that all features were applied.