| Literature DB >> 21829559 |
Zhen Chen1, Yong-Zi Chen, Xiao-Feng Wang, Chuan Wang, Ren-Xiang Yan, Ziding Zhang.
Abstract
As one of the most important reversible protein post-translation modifications, ubiquitination has been reported to be involved in lots of biological processes and closely implicated with various diseases. To fully decipher the molecular mechanisms of ubiquitination-related biological processes, an initial but crucial step is the recognition of ubiquitylated substrates and the corresponding ubiquitination sites. Here, a new bioinformatics tool named CKSAAP_UbSite was developed to predict ubiquitination sites from protein sequences. With the assistance of Support Vector Machine (SVM), the highlight of CKSAAP_UbSite is to employ the composition of k-spaced amino acid pairs surrounding a query site (i.e. any lysine in a query sequence) as input. When trained and tested in the dataset of yeast ubiquitination sites (Radivojac et al, Proteins, 2010, 78: 365-380), a 100-fold cross-validation on a 1∶1 ratio of positive and negative samples revealed that the accuracy and MCC of CKSAAP_UbSite reached 73.40% and 0.4694, respectively. The proposed CKSAAP_UbSite has also been intensively benchmarked to exhibit better performance than some existing predictors, suggesting that it can be served as a useful tool to the community. Currently, CKSAAP_UbSite is freely accessible at http://protein.cau.edu.cn/cksaap_ubsite/. Moreover, we also found that the sequence patterns around ubiquitination sites are not conserved across different species. To ensure a reasonable prediction performance, the application of the current CKSAAP_UbSite should be limited to the proteome of yeast.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21829559 PMCID: PMC3146527 DOI: 10.1371/journal.pone.0022930
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Comparison of CKSAAP_UbSite with the binary encoding and UbPred.
| Method |
|
|
|
|
| CKSAAP_UbSite | 69.85±1.67 | 76.96±2.52 | 73.40±1.71 | 0.4694±0.0347 |
| The binary encoding | 56.23±2.21 | 60.04±3.56 | 58.14±2.30 | 0.1630±0.0486 |
| UbPred | _ _ | _ _ | 72.00 | _ _ |
The corresponding measurement was represented as the average value ± standard deviation.
The corresponding value was cited from Radivojac et al (2010) [2]. ‘_ _’ means the corresponding value is not available.
Figure 1ROC curves of CKSAAP_UbSite and the binary encoding scheme based on balanced ubiquitination and non-ubiquitination sites.
The performance of CKSAAP_UbSite and the binary encoding scheme was assessed through a 100-fold cross-validation strategy.
The top 25 features ranked by CHI- and IG- based feature selection methods.
| Top 25 features | CHI | IG |
| 1 |
|
|
| 2 |
|
|
| 3 |
|
|
| 4 |
|
|
| 5 |
|
|
| 6 |
|
|
| 7 |
|
|
| 8 |
|
|
| 9 |
|
|
| 10 |
|
|
| 11 | HxxxxxN |
|
| 12 |
| LxK |
| 13 |
| IxxxxxI |
| 14 |
| IxxxxxL |
| 15 |
| KK |
| 16 | PxY |
|
| 17 | QxxN |
|
| 18 |
|
|
| 19 |
|
|
| 20 |
|
|
| 21 | SxN |
|
| 22 |
|
|
| 23 | NE | KxxxK |
| 24 |
|
|
| 25 | ExxA | IxxxL |
The feature ‘ExE’ represents a 1-spaced residue pair of ‘EE’, where x stands for any amino acid. The same representation was applied to other k-spaced residue pairs.
The k-spaced amino acid pairs in bold type mean they are consistently ranked as the top-25 features by both feature selection methods.
Figure 2The composition of the top-25 residue pairs resulting from two feature selection methods.
The composition of each residue pair is represented by a radial vector whose length is proportional to the composition concerned.
Figure 3Two Two-Sample-Logos of the position-specific residue composition surrounding the ubiquitination sites and non-ubiquitination sites, which were inferred from Radivojac_dataset (A) and Cai_dataset_1 (B), respectively.
These two logos were prepared using the web server http://www.twosamplelogo.org/ and only residues significantly enriched and depleted surrounding ubiquitination sites (t-test, P<0.05) are shown.
Figure 4Comparison of CKSAAP_UbSite and UbPred based on an independent dataset of 21 proteins.
Comparison of CKSAAP_UbSite with other predictors.
| Method | Dataset | Ratio |
|
|
|
|
| CKSAAP_UbSite | Radivojac_dataset | 1∶3 | 20.38 | 98.48 | 78.95 | 0.3374 |
| CKSAAP_UbSite | Cai_dataset_1 | 1∶3 | 6.74 | 99.53 | 76.33 | 0.1923 |
| CKSAAP_UbSite | Cai_dataset_2 | 1∶3 | 7.14 | 100.00 | 95.37 | 0.2610 |
| Cai et al | Cai_dataset_1 | 1∶3 | _ _ | _ _ | _ _ | 0.1420 |
| Cai et al | Cai_dataset_2 | 1∶3 | _ _ | _ _ | _ _ | 0.1390 |
| UbPred | Cai_dataset_2 | 1∶3 | _ _ | _ _ | _ _ | 0.1350 |
| Tung and Ho | Cai_dataset_2 | 1∶3 | _ _ | _ _ | _ _ | 0.1170 |
To save computational time, the default parameters of the RBF kernel in SVM training were employed in this benchmark experiment.
The result was based on the 100-fold cross-validation.
The result was based on the jackknife cross-validation.
The corresponding value was cited from Cai et al (2011) [21]. ‘_ _’ means the corresponding value is not available.