| Literature DB >> 22720073 |
Yong-Zi Chen1, Zhen Chen, Yu-Ai Gong, Guoguang Ying.
Abstract
Sumoylation is one of the most essential mechanisms of reversible protein post-translational modifications and is a crucial biochemical process in the regulation of a variety of important biological functions. Sumoylation is also closely involved in various human diseases. The accurate computational identification of sumoylation sites in protein sequences aids in experimental design and mechanistic research in cellular biology. In this study, we introduced amino acid hydrophobicity as a parameter into a traditional binary encoding scheme and developed a novel sumoylation site prediction tool termed SUMOhydro. With the assistance of a support vector machine, the proposed method was trained and tested using a stringent non-redundant sumoylation dataset. In a leave-one-out cross-validation, the proposed method yielded an excellent performance with a correlation coefficient, specificity, sensitivity and accuracy equal to 0.690, 98.6%, 71.1% and 97.5%, respectively. In addition, SUMOhydro has been benchmarked against previously described predictors based on an independent dataset, thereby suggesting that the introduction of hydrophobicity as an additional parameter could assist in the prediction of sumoylation sites. Currently, SUMOhydro is freely accessible at http://protein.cau.edu.cn/others/SUMOhydro/.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22720073 PMCID: PMC3375222 DOI: 10.1371/journal.pone.0039195
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Prediction performance of 10-fold cross-validation based on different encoding methods.
| Site | Encoding scheme |
|
|
| MCC |
| K | Binary | 60.3±2.1 | 98.8±1.4 | 97.2±0.1 | 0.631±0.005 |
| CKSAAP | 55.7±2.4 | 94.6±0.1 | 93.0±0.1 | 0.385±0.026 | |
| PSSM | 51.1±2.2 | 95.8±0.1 | 93.9±0.0 | 0.393±0.022 | |
| KNN | 56.0±1.2 | 98.6±0.0 | 96.8±0.0 | 0.584±0.006 | |
| Six_letter | 53.5±3.8 | 96.1±0.2 | 94.3±0.0 | 0.422±0.035 | |
| Nine_letter | 57.9±3.1 | 95.4±0.1 | 93.8±0.1 | 0.426±0.027 | |
|
|
|
|
|
| |
| Z_scales | 57.5±3.2 | 98.6±0.1 | 96.8±0.1 | 0.593±0.017 |
The SVM-based prediction algorithm was used, and the parameters for each encoding scheme were primary optimized. The hydrobinary encoding scheme resulted in the highest level of accuracy, and the corresponding Sn, Sp, Ac and MCC values are represented in bold. b Each corresponding measurement is represented as the average value ±standard deviation.
Figure 1Prediction performance for different ratios of positive to negative sets based on binary encoding.
The performance of the binary encoding scheme was assessed using a 10-fold cross-validation strategy.
Prediction performance of leave-one-out cross-validation based on different encoding methods.
| Site | Encoding scheme |
|
|
| MCC |
| K | Binary | 59.3±0.6 | 99.0±0.0 | 97.3±0.1 | 0.640±0.005 |
| CKSAAP | 57.0±1.9 | 93.7±0.1 | 92.1±0.1 | 0.367±0.016 | |
| PSSM | 55.3±1.4 | 94.8±0.1 | 93.1±0.1 | 0.388±0.021 | |
| KNN | 57.9±1.9 | 98.4±0.1 | 96.6±0.2 | 0.576±0.019 | |
| Six_letter | 58.3±4.8 | 95.2±0.5 | 93.7±0.1 | 0.423±0.020 | |
| Nine_letter | 57.4±3.0 | 95.2±0.2 | 93.6±0.3 | 0.415±0.021 | |
|
|
|
|
|
| |
| Z_scales | 60.1±4.6 | 98.4±0.1 | 96.8±0.4 | 0.599±0.037 |
The SVM-based prediction algorithm was used, and the parameters of each encoding scheme were primary optimized. The hydrobinary encoding scheme resulted in the highest level of accuracy, and the corresponding Sn, Sp, Ac and MCC values are represented in bold. b Each corresponding measurement is represented as the average value ±standard deviation.
Figure 2ROC curves of different encoding SVM models using a 10-fold cross-validation.
Figure 3ROC curves of different encoding SVM models using a leave-one-out cross-validation.
Comparison of SUMOhydro with other predictors.
| Threshold | Method |
|
|
| MCC |
| Low | SUMOsp2.0 |
| 83.1 | 82.8 | 0.304 |
| seeSUMO-SVM | 66.7 | 91.0 | 89.9 | 0.373 | |
| seeSUMO-RF |
| 82.8 | 82.4 | 0.300 | |
| SUMOhydro | 70.8 |
|
|
| |
| Medium | SUMOsp2.0 | 62.5 | 92.6 | 91.2 | 0.381 |
| seeSUMO-SVM | 54.2 |
|
| 0.397 | |
| seeSUMO-RF |
| 88.4 | 87.6 | 0.351 | |
| SUMOhydro | 66.7 | 93.5 | 92.3 |
| |
| High | SUMOsp2.0 | 58.3 | 96.3 | 94.6 |
|
| seeSUMO-SVM | 37.5 |
|
| 0.386 | |
| seeSUMO-RF |
| 90.4 | 89.3 | 0.362 | |
| SUMOhydro | 58.3 | 94.9 | 93.3 | 0.419 |
SUMOhydro, seeSUMO and SUMOsp2.0 were tested using an entirely independent dataset. b The highest values for each threshold are indicated in bold.
Hydrophobicity scales for the 20 amino acids.
| Amino Acid | Feature Value | Amino Acid | Feature Value |
| A | 1.81 | M | 2.35 |
| C | 1.28 | N | -6.64 |
| D | -8.72 | P | 4.04 |
| E | -6.81 | Q | -5.54 |
| F | 2.98 | R | -14.92 |
| G | 0.94 | S | -3.40 |
| H | -4.66 | T | -2.57 |
| I | 4.92 | V | 4.04 |
| K | -5.55 | W | 2.33 |
| L | 4.92 | Y | -0.14 |
Cited from [35] .
Z_scale for the 20 amino acids.
| Amino Acid | z1 | z2 | z3 | z4 | z5 | Amino Acid | z1 | z2 | z3 | z4 | z5 |
| A | 0.24 | -2.32 | 0.60 | -0.14 | 1.30 | M | -2.85 | -0.22 | 0.47 | 1.94 | -0.98 |
| C | 0.84 | -1.67 | 3.71 | 0.18 | -2.65 | N | 3.05 | 1.62 | 1.04 | -1.15 | 1.61 |
| D | 3.98 | 0.93 | 1.93 | -2.46 | 0.75 | P | -1.66 | 0.27 | 1.84 | 0.70 | 2.00 |
| E | 3.11 | 0.26 | -0.11 | -3.04 | -0.25 | Q | 1.75 | 0.50 | -1.44 | -1.34 | 0.66 |
| F | -4.22 | 1.94 | 1.06 | 0.54 | -0.62 | R | 3.52 | 2.50 | -3.50 | 1.99 | -0.17 |
| G | 2.05 | -4.06 | 0.36 | -0.82 | -0.38 | S | 2.39 | -1.07 | 1.15 | -1.39 | 0.67 |
| H | 2.47 | 1.95 | 0.26 | 3.90 | 0.09 | T | 0.75 | -2.18 | -1.12 | -1.46 | -0.40 |
| I | -3.89 | -1.73 | -1.71 | -0.84 | 0.26 | V | -2.59 | -2.64 | -1.54 | -0.85 | -0.02 |
| K | 2.29 | 0.89 | -2.49 | 1.49 | 0.31 | W | -4.36 | 3.94 | 0.59 | 3.44 | -1.59 |
| L | -4.28 | -1.30 | -1.49 | -0.72 | 0.84 | Y | -2.54 | 2.44 | 0.43 | 0.04 | -1.47 |