| Literature DB >> 27553667 |
Ming-An Sun1, Qing Zhang1, Yejun Wang2, Wei Ge3, Dianjing Guo4.
Abstract
BACKGROUND: Reactive oxygen species can modify the structure and function of proteins and may also act as important signaling molecules in various cellular processes. Cysteine thiol groups of proteins are particularly susceptible to oxidation. Meanwhile, their reversible oxidation is of critical roles for redox regulation and signaling. Recently, several computational tools have been developed for predicting redox-sensitive cysteines; however, those methods either only focus on catalytic redox-sensitive cysteines in thiol oxidoreductases, or heavily depend on protein structural data, thus cannot be widely used.Entities:
Keywords: Post-translational modification; Reactive oxygen species; Redox-sensitive cysteine; SVM-based recursive feature elimination; Support vector machine
Mesh:
Substances:
Year: 2016 PMID: 27553667 PMCID: PMC4995733 DOI: 10.1186/s12859-016-1185-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Optimization of the parameters for feature extraction. a Performance with different numbers of nearby cysteines. The numbers of nearby cysteines are optimized between 1 and 10. b Performance with different window sizes. The window sizes are optimized between 3 and 25. The gray bars indicated the finally selected parameters
10-fold cross-validation of different combinations of features on RSC758
| Feature | ACC | SN | SP | MCC | AUC |
|---|---|---|---|---|---|
| D + PSSM + SS + SA + PC | 0.650 | 0.540 | 0.761 | 0.309 | 0.705 |
| D + PSSM + SS + PC | 0.653 | 0.529 | 0.777 | 0.316 | 0.705 |
|
|
|
|
|
|
|
| D + PSSM | 0.644 | 0.503 | 0.785 | 0.300 | 0.691 |
| D | 0.639 | 0.442 | 0.835 | 0.301 | 0.671 |
| SS | 0.555 | 0.770 | 0.339 | 0.121 | 0.559 |
| PSSM | 0.575 | 0.578 | 0.573 | 0.150 | 0.590 |
| SA | 0.557 | 0.552 | 0.562 | 0.114 | 0.554 |
| PCP | 0.525 | 0.611 | 0.439 | 0.051 | 0.542 |
The results are sorted by AUC value. The feature set in bold was selected as the optimal
D sequential distance to adjacent cysteines, PSSM PSSM profile, SS predicted secondary structure, SA predicted solvent accessibility, PCP physical-chemical property
Fig. 2The ROC curves of SVM classifiers using RSC758 dataset. The average values of true positive rate and false positive rate from 10-fold cross-validation are used
Fig. 3Performance using different number of features selected by SVM-RFE for RSC758 dataset. The x-axis indicated the number of selected features. y-axis represents the ACC, MCC and AUC estimated from 10-fold cross-validation
10-fold cross-validation with forty selected features using different machine learning methods on RSC758
| ACC | SN | SP | MCC | AUC | |
|---|---|---|---|---|---|
| SVM | 0.679 | 0.602 | 0.756 | 0.362 | 0.727 |
| Naive Bayes | 0.648 | 0.450 | 0.846 | 0.322 | 0.713 |
| Random Forest | 0.664 | 0.611 | 0.718 | 0.330 | 0.711 |
| Artificial Neural Network | 0.662 | 0.615 | 0.708 | 0.325 | 0.698 |
The results are sorted by AUC value
Fig. 4The ROC curves of different machine learning techniques using the forty selected features for RSC758 dataset. The average values of true positive rate and false positive rate from 10-fold cross validation are used
Fig. 5Comparison of sequential distance to nearby cysteines between redox-sensitive and redox-insensitive cysteines. This result is derived from the RSC758 dataset. The x-axis indicated the index of nearby cysteines (for example, 1 indicated the nearest cysteine, and 2 indicates the 2nd nearest cysteine). y-axis represents the log10-scaled sequential distance. The error bars represent the standard deviation
Performance comparison between RSCP and COPA using BALOSCTdb by 10-fold cross-validation
| Features | ACC | SN | SP | MCC | AUC | |
|---|---|---|---|---|---|---|
| RSCP | 40 features selected using RSC758 | 0.683 | 0.671 | 0.696 | 0.362 | 0.727 |
| 20 features selected using BALOSCTdb | 0.761 | 0.770 | 0.752 | 0.522 | 0.821 | |
| COPA | 3 structure based features | 0.786 | 0.776 | 0.795 | 0.572 | 0.823 |
Performance evaluation using OSCTdb by gene families
| Protein class | #Cys | ACC | SN | SP | MCC |
|---|---|---|---|---|---|
| Oxidoreductase | 175 | 0.606 | 0.815 | 0.482 | 0.297 |
| Hydrolase | 110 | 0.736 | 0.783 | 0.724 | 0.424 |
| Transferase | 96 | 0.479 | 0.739 | 0.397 | 0.121 |
| Non-enzyme proteins | 124 | 0.718 | 0.784 | 0.690 | 0.435 |
| Total | 537 | 0.629 | 0.789 | 0.561 | 0.322 |
Only gene families with at least ten redox-sensitive cysteines were shown