| Literature DB >> 26818456 |
Chien-Hsun Huang1,2, Min-Gang Su3, Hui-Ju Kao4, Jhih-Hua Jhong5, Shun-Long Weng6,7,8, Tzong-Yi Lee9,10.
Abstract
BACKGROUND: The conjugation of ubiquitin to a substrate protein (protein ubiquitylation), which involves a sequential process--E1 activation, E2 conjugation and E3 ligation, is crucial to the regulation of protein function and activity in eukaryotes. This ubiquitin-conjugation process typically binds the last amino acid of ubiquitin (glycine 76) to a lysine residue of a target protein. The high-throughput of mass spectrometry-based proteomics has stimulated a large-scale identification of ubiquitin-conjugated peptides. Hence, a new web resource, UbiSite, was developed to identify ubiquitin-conjugation site on lysines based on large-scale proteome dataset.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26818456 PMCID: PMC4895383 DOI: 10.1186/s12918-015-0246-z
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Data statistics in the construction of training dataset and independent testing dataset
| Data set | Data resource | Number of ubiquitylated proteins | Number of ubiquitylated lysines | Number of non-ubiquitylated lysines |
|---|---|---|---|---|
| Training set | hCKSAAP_UbiSite | 2500 | 6118 | 6118 |
| dbPTM 3.0 | 6259 | 23,949 | 228,441 | |
| mUbiSiDa | 35,494 | 110,695 | 1,217,977 | |
| Combined non-redundant data | 37,647 | 128,026 | 1,317,734 | |
| Non-homologous data (sequence identity ≦ 30 %) | 4828 | 5438 | 12,663 | |
| Independent testing set | CPLM 2.0 | 32,429 | 139,950 | 1,109,432 |
| Non-homologous data (sequence identity ≦ 30 %) | 2894 | 3732 | 10,664 |
Fig. 1Flowchart of constructing two-layered prediction model based on MDDLogo-identified substrate motifs
Fig. 2Sequenced and structural characteristics of ubiquitin-conjugation sites. a Comparison of position-specific amino acid composition between ubiquitylation and non-ubiquitylation sites. b Comparison of solvent-accessible surface area between ubiquitylation and non-ubiquitylation sites. c Distribution of secondary structure around ubiquitylation sites
Performance evaluation of the investigated features in identifying ubiquitylation sites based on five-fold cross-validation
| Investigated features | Sensitivity | Specificity | Accuracy | MCC |
|---|---|---|---|---|
| 20D binary coding (20D) | 65.59 % | 67.09 % | 66.64 % | 0.303 |
| Amino Acid Composition (AAC) | 64.34 % | 65.44 % | 65.11 % | 0.275 |
| Amino Acid Pair Composition (AAPC) | 68.70 % | 70.72 % | 70.11 % | 0.367 |
| Position Weight Matrix (PWM) | 68.08 % | 67.99 % | 68.01 % | 0.334 |
| Position-Specific Scoring Matrix (PSSM) | 69.46 % | 70.69 % | 70.32 % | 0.374 |
| Solvent-Accessible Surface Area (SASA) | 64.58 % | 65.47 % | 65.20 % | 0.278 |
| Secondary Structure (SS) | 55.20 % | 60.51 % | 58.91 % | 0.145 |
Performance evaluation of the SVM models trained with various features based on independent testing dataset (3732 ubiquitylation sites and 10,664 non-ubiquitylation sites)
| Training features | Sensitivity | Specificity | Accuracy | MCC |
|---|---|---|---|---|
| 20D binary coding (20D) | 62.59 % | 65.85 % | 65.00 % | 0.253 |
| Amino Acid Composition (AAC) | 66.37 % | 64.63 % | 65.08 % | 0.274 |
| Amino Acid Pair Composition (AAPC) | 69.05 % | 69.05 % | 69.05 % | 0.340 |
| Position Weight Matrix (PWM) | 73.90 % | 67.29 % | 69.01 % | 0.364 |
| Position-Specific Scoring Matrix (PSSM) | 73.20 % | 68.45 % | 69.68 % | 0.369 |
| Solvent-Accessible Surface Area (SASA) | 63.91 % | 61.36 % | 62.02 % | 0.223 |
| Secondary Structure (SS) | 55.60 % | 51.34 % | 52.45 % | 0.061 |
Fig. 3Tree view of MDDLogo-clustered subgroups with statistically significant motifs for 5438 ubiquitylation sites
Fig. 4Comparison of independent testing performance between single SVM model and two-layered SVM model
Fig. 5Case study of identifying ubiquitylation sites on E3 ubiquitin-protein ligase DMA2 of Saccharomyces cerevisiae