| Literature DB >> 26940649 |
Abstract
BACKGROUND: Ubiquitination is a very important process in protein post-translational modification, which has been widely investigated by biology scientists and researchers. Different experimental and computational methods have been developed to identify the ubiquitination sites in protein sequences. This paper aims at exploring computational machine learning methods for the prediction of ubiquitination sites using the physicochemical properties (PCPs) of amino acids in the protein sequences.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26940649 PMCID: PMC4778322 DOI: 10.1186/s12859-016-0959-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Diagram of the process for segment-PCP prediction matrix generation
Sequence segment data set list for ubiquitination site prediction
| Data Set # | Number of Segments | Data Sources | ||
|---|---|---|---|---|
| All | With | With Non- | ||
| Ubiquitination Central K Site | ||||
| 1 | 300 | 150 | 150 | [ |
| 2 | 6838 | 3419 | 3419 | [ |
| 3 | 12236 | 6118 | 6118 | [ |
| 4 | 4608 | 263 | 4345 | [ |
| 5 | 3651 | 131 | 3520 | [ |
| 6 | 676 | 37 | 639 | [ |
Fig. 2DAG model of naïve Bayes
Fig. 3Illustrative example of EBMC
AUROC of ubiquitination prediction results from different methods based on different segment-PCP data sets
| Data Set # | EBMC | NB | FSNB | MANB | SVM | LR | LASSO |
|---|---|---|---|---|---|---|---|
| 1 | 0.6714 | 0.5289 | 0.5613 | 0.5545 | 0.6597 | 0.7244 | 0.6933 |
| 2 | 0.6467 | 0.5330 | 0.5582 | 0.5502 | 0.6035 | 0.6410 | 0.6041 |
| 3 | 0.6667 | 0.5141 | 0.5633 | 0.5192 | 0.6102 | 0.6476 | 0.6129 |
| 4 | 0.6646 | 0.6036 | 0.6193 | 0.6108 | 0.6670 | 0.7200 | 0.5000 |
| 5 | 0.6373 | 0.5505 | 0.5637 | 0.5804 | 0.6763 | 0.7235 | 0.5000 |
| 6 | 0.6001 | 0.5134 | 0.4838 | 0.5690 | 0.5758 | 0.5546 | 0.5000 |
Statistical analysis and comparisons between EBMC and other methods for ubiquitination site prediction
| Data Type | All | Balanced | UnBalanced | Large-scale | |
|---|---|---|---|---|---|
| EBMC : NB | Outperformance % | 20.12 | 25.87 | 14.25 | 19.22 |
|
| 0.0007 | 0.0072 | 0.0118 | 0.0132 | |
| EBMC : FSNB | Outperformance % | 16.37 | 18.09 | 14.80 | 13.65 |
|
| 0.0004 | 0.0047 | 0.0628 | 0.0082 | |
| EBMC : MANB | Outperformance % | 15.18 | 22.47 | 8.03 | 16.14 |
|
| 0.0056 | 0.0146 | 0.0284 | 0.0271 | |
| EBMC : SVM | Outperformance % | 2.77 | 6.17 | −0.64 | 2.57 |
|
| 0.3108 | 0.0950 | 0.7854 | 0.5527 | |
| EBMC : LR | Outperformance % | −2.48 | −1.06 | −3.80 | −3.94 |
|
| 0.3687 | 0.7238 | 0.5051 | 0.3268 | |
| EBMC : LASSO | Outperformance % | 15.51 | 4.32 | 26.80 | 19.05 |
|
| 0.0359 | 0.3800 | 0.0189 | 0.0461 | |
Computational time (seconds) of ubiquitination predictions by different methods for different segment-PCP data sets
| Data Set # | EBMC | NB | FSNB | MANB | SVM | LR | LASSO |
|---|---|---|---|---|---|---|---|
| 1 | 5.039 | 1.232 | 2.794 | 4.385 | 0.785 | 0.412 | 1.147 |
| 2 | 270.115 | 9.220 | 571.632 | 43.486 | 34.763 | 17.946 | 3.384 |
| 3 | 586.531 | 11.045 | 1936.617 | 46.816 | 42.151 | 18.622 | 5.152 |
| 4 | 78.142 | 6.443 | 138.030 | 30.780 | 5.249 | 5.746 | 3.337 |
| 5 | 52.869 | 3.775 | 44.711 | 19.314 | 39.540 | 15.836 | 2.011 |
| 6 | 6.928 | 1.700 | 4.166 | 6.662 | 10.229 | 5.273 | 1.126 |
Fig. 4AUROC comparison of different machine learning methods for ubiquitination prediction using different data sets