| Literature DB >> 18625080 |
Chun-Wei Tung1, Shinn-Ying Ho.
Abstract
BACKGROUND: Ubiquitylation plays an important role in regulating protein functions. Recently, experimental methods were developed toward effective identification of ubiquitylation sites. To efficiently explore more undiscovered ubiquitylation sites, this study aims to develop an accurate sequence-based prediction method to identify promising ubiquitylation sites.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18625080 PMCID: PMC2488362 DOI: 10.1186/1471-2105-9-310
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Performance comparisons among various classifiers with the three kinds of features. (a) physicochemical property, (b) amino acid identity, and (c) evolutionary information.
Figure 2The sequence logo of the 151 positive samples with w = 21. (a) information content and (b) frequency plot.
Figure 3Performance comparisons between the SVM with informative physicochemical properties (SVM+IPCP) and other compared classifiers.
Figure 4The best 10-CV accuracies of prediction using SVM with the window size 21 for various numbers of features (properties) selected by IPMA from 30 independent runs.
The 31 informative physicochemical properties mined by IPMA.
| AAindex identity | Description | MED |
| NADH010102 | Hydropathy scale based on self-information values in the two-state model of 9% accessibility | 31.79 |
| BROC820102 | Retention coefficient in HFBA | 29.80 |
| MEIH800102 | Average reduced distance for side chain | 28.48 |
| LEVM780101 | Normalized frequency of alpha-helix, with weights | 25.17 |
| GUYH850104 | Apparent partition energies calculated from Janin index | 23.84 |
| CORJ870101 | NNEIG index | 23.18 |
| RACS770102 | Average reduced distance for side chain | 22.52 |
| GEOR030108 | Linker propensity from helical (annotated by DSSP) dataset | 22.52 |
| HARY940101 | Mean volumes of residues buried in protein interiors | 21.85 |
| GRAR740102 | Polarity | 19.87 |
| GUYH850105 | Apparent partition energies calculated from Chothia index | 19.87 |
| MEIH800103 | Average side chain orientation angle | 17.88 |
| KRIW790102 | Fraction of site occupied by water | 17.88 |
| LEVM780106 | Normalized frequency of reverse turn, unweighted | 14.57 |
| BULH740102 | Apparent partial specific volume | 13.25 |
| FAUJ880101 | Graph shape index | 11.92 |
| PUNT030102 | Knowledge-based membrane-propensity scale from 3D_Helix in MPtopo databases | 10.60 |
| HUTJ700103 | Entropy of formation | 9.93 |
| EISD840101 | Consensus normalized hydrophobicity scale | 8.61 |
| CEDJ970105 | Composition of amino acids in nuclear proteins (percent) | 7.28 |
| ZIMJ680102 | Bulkiness | 7.28 |
| CEDJ970103 | Composition of amino acids in membrane proteins (percent) | 5.96 |
| CHOC760103 | Proportion of residues 95% buried | 5.30 |
| CEDJ970102 | Composition of amino acids in anchored proteins (percent) | 5.30 |
| ROSM880102 | Side chain hydropathy, corrected for solvation | 4.64 |
| BROC820101 | Retention coefficient in TFA | 4.64 |
| FAUJ830101 | Hydrophobic parameter pi | 1.99 |
| NAKH920101 | AA composition of CYT of single-spanning proteins | 1.99 |
| ZHOH040102 | The relative stability scale extracted from mutation experiments | 1.99 |
| NAKH900101 | AA composition of total proteins | 1.32 |
| QIAN880129 | Weights for coil at the window position of -4 | 1.32 |
Figure 5The derived decision tree by using C5.0 and the features of informative physicochemical properties for classification of ubiquitylation sites.
Five concise if-then rules with confidence larger than 0.5 obtained by using C5.0 and 31 informative physicochemical properties.
| # | Rule | Confidence | Ubiquitylation sites | Covered samples | Misclassified samples |
| 1 | MEIH800102 < = 0.95381 | 0.96 | N | 23 | 0 |
| 2 | HARY940101 > 135.2 AND CORJ870101 > 49.70762 | 0.90 | N | 49 | 4 |
| 3 | CEDJ970105 > 6.805556 | 0.85 | N | 18 | 2 |
| 4 | GEOR030108 < = 0.931333 | 0.75 | N | 10 | 2 |
| 5 | MEIH800102 > 0.95381 | 0.54 | Y | 279 | 128 |
Figure 6The system flow of the prediction server UbiPred.
The LOOCV performances of the SVM with various kinds of features:
| Feature | Window size | C | ACC (%) | SEN (%) | SPE (%) | MCC | AUC | ||
| 1 | Informative physicochemical properties (UbiPred) | 21 | 4 | 2-1 | 84.44 | 83.44 | 85.43 | 0.69 | 0.85 |
| 2 | All physicochemical properties | 17 | 1 | 2-4 | 72.19 | 70.86 | 73.51 | 0.44 | 0.74 |
| 3 | Amino acid identity | 13 | 2 | 2-2 | 65.67 | 57.33 | 74.00 | 0.32 | 0.70 |
| 4 | Evolutionary information | 13 | 1 | 2-7 | 66.33 | 72.00 | 60.67 | 0.33 | 0.71 |
informative physicochemical properties (UbiPred), amino acid identity, evolutionary information, and all physicochemical properties.
Figure 7Performance comparison of SVM with various features, informative physicochemical properties (UbiPred), amino acid identity, evolutionary information, and all physicochemical properties, in terms of receiver operating characteristic curves.
The LOOCV performances of the SVM with 31 informative physicochemical properties on datasets of various sequence identity thresholds.
| Sequence identity threshold | Accuracy(%) | Number of positive samples | Number of negative samples |
| 100% | 84.44 | 151 | 151 |
| 90% | 82.71 | 145 | 150 |
| 80% | 81.72 | 141 | 149 |
| 70% | 80.63 | 136 | 148 |
| 60% | 81.23 | 131 | 146 |
| 50% | 80.80 | 130 | 146 |
| 40% | 79.70 | 121 | 145 |
Figure 8The schema for illustrating the training data (302 samples) and the independent dataset (3424 putative non-ubiquitylation sites) using w = 21 as an example.
Figure 9Histogram result of UbiPred using prediction scores from evaluating 3424 putative non-ubiquitylation sites in an independent dataset. The site with a score close to 1 has a high possibility to be an ubiquitylation site.
List of 23 promising ubiquitylation sites identified from an independent dataset of 3424 putative non-ubiquitylation sites.
| Accession number | Position | Score | Accession number | Position | Score | Accession number | Position | Score |
| P19358 | 114 | 0.99 | P39976 | 323 | 0.90 | P38080 | 809 | 0.87 |
| Q9Y6K9 | 35 | 0.96 | P38261 | 147 | 0.89 | P10592 | 54 | 0.87 |
| P25694 | 6 | 0.96 | P25360 | 846 | 0.89 | P38080 | 792 | 0.87 |
| P40087 | 325 | 0.95 | P09936 | 195 | 0.88 | P12866 | 129 | 0.86 |
| Q08412 | 232 | 0.93 | P10591 | 54 | 0.88 | Q05911 | 460 | 0.86 |
| P04629 | 609 | 0.91 | Q06408 | 156 | 0.87 | P40087 | 410 | 0.86 |
| P16603 | 165 | 0.91 | P37303 | 283 | 0.87 | P38075 | 10 | 0.86 |
| P31539 | 626 | 0.91 | P32467 | 38 | 0.87 |
Figure 10The sequence logo of the 23 peptides of promising ubiquitylation sites with w = 21. (a) Information content and (b) Frequency plot.