| Literature DB >> 25707307 |
Van-Nui Nguyen, Kai-Yao Huang, Chien-Hsun Huang, Tzu-Hao Chang, Neil Bretaña, K Lai, Julia Weng, Tzong-Yi Lee.
Abstract
BACKGROUND: In eukaryotes, ubiquitin-conjugation is an important mechanism underlying proteasome-mediated degradation of proteins, and as such, plays an essential role in the regulation of many cellular processes. In the ubiquitin-proteasome pathway, E3 ligases play important roles by recognizing a specific protein substrate and catalyzing the attachment of ubiquitin to a lysine (K) residue. As more and more experimental data on ubiquitin conjugation sites become available, it becomes possible to develop prediction models that can be scaled to big data. However, no development that focuses on the investigation of ubiquitinated substrate specificities has existed. Herein, we present an approach that exploits an iteratively statistical method to identify ubiquitin conjugation sites with substrate site specificities.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25707307 PMCID: PMC4331700 DOI: 10.1186/1471-2105-16-S1-S1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The reaction process of protein ubiquitination.
Figure 2The system flowchart.
Data statistics of collected ubiquitination sites.
| Resource (data set) | Number of ubiquitinated proteins | Number of ubiquitinated lysines | Number of non-ubiquitinated lysines |
|---|---|---|---|
| 6259 | 23949 | 228441 | |
| 35494 | 110695 | 1217977 | |
Data statistics after using CD-HIT.
| Sequence identity | Training data set (6259) | Testing data set (35494) | ||
|---|---|---|---|---|
| Positive | Negative | Positive | Negative | |
| 100% (original) | 23949 | 228441 | 110695 | 1217977 |
| 90% | 21621 | 196808 | 38739 | 325640 |
| 80% | 21165 | 179691 | 36647 | 284713 |
| 70% | 20709 | 165560 | 35165 | 255134 |
| 60% | 18588 | 115296 | 29810 | 162044 |
| 50% | 10216 | 34428 | 14210 | 41700 |
Figure 3Composition of amino acids surrounding ubiquitination sites. (A) Comparison of amino acid composition between Ubi-sites (blue) and non-Ubi-sites(red). (B) Position-specific amino acid composition surrounding Ubi-sites.
MDD-identified substate motifs for 2658 Ubi-sites (positive training data).
| MDD Group | Number of Ubi-sites | Sequence Logo |
|---|---|---|
| 1 | 98 | |
| 2 | 87 | |
| 3 | 191 | |
| 4 | 574 | |
| 5 | 423 | |
| 6 | 131 | |
| 7 | 122 | |
| 8 | 260 | |
| 9 | 75 | |
| 10 | 697 | |
Performance evaluation by five-fold cross-validation for all data and 10 MDD-clustered subgroups.
| Data set | SEN | SPE | ACC | MCC |
|---|---|---|---|---|
| 77.55% | 70.59% | 72.85% | 0.453 | |
| 94.32% | 85.56% | 88.43% | 0.764 | |
| 84.82% | 70.53% | 75.17% | 0.519 | |
| 60.80% | 58.24% | 59.07% | 0.178 | |
| 66.90% | 60.68% | 62.70% | 0.258 | |
| 72.52% | 68.13% | 69.55% | 0.382 | |
| 70.49% | 62.60% | 65.16% | 0.310 | |
| 62.31% | 54.90% | 57.30% | 0.161 | |
| 70.67% | 58.97% | 62.77% | 0.278 | |
| 70.30% | 64.44% | 66.34% | 0.326 | |
Independent testing performance for single HMM and MDD-clustered HMMs.
| Models | SEN | SPE | ACC | MCC |
|---|---|---|---|---|
| MDD-Model 1 | 70.16% | 66.61% | 67.83% | 0.351 |
| MDD-Model 2 | 87.70% | 70.47% | 76.41% | 0.553 |
| MDD-Model 3 | 74.78% | 67.70% | 70.14% | 0.405 |
| MDD-Model 4 | 63.27% | 57.35% | 59.39% | 0.196 |
| MDD-Model 5 | 52.74% | 62.13% | 58.90% | 0.143 |
| MDD-Model 6 | 77.50% | 70.12% | 72.66% | 0.454 |
| MDD-Model 7 | 71.66% | 60.22% | 64.16% | 0.303 |
| MDD-Model 8 | 65.17% | 60.57% | 62.16% | 0.245 |
| MDD-Model 9 | 68.07% | 59.14% | 62.22% | 0.259 |
| MDD-Model 10 | 70.65% | 67.14% | 68.35% | 0.360 |
Figure 4The comparison of ROC curve between our proposed method and UbiProber.
Figure 5A case study of the ubiquitination site prediction on .