| Literature DB >> 27034949 |
Song Guo1, Chunhua Liu1, Peng Zhou2, Yanling Li1.
Abstract
Tyrosine sulfation is one of the ubiquitous protein posttranslational modifications, where some sulfate groups are added to the tyrosine residues. It plays significant roles in various physiological processes in eukaryotic cells. To explore the molecular mechanism of tyrosine sulfation, one of the prerequisites is to correctly identify possible protein tyrosine sulfation residues. In this paper, a novel method was presented to predict protein tyrosine sulfation residues from primary sequences. By means of informative feature construction and elaborate feature selection and parameter optimization scheme, the proposed predictor achieved promising results and outperformed many other state-of-the-art predictors. Using the optimal features subset, the proposed method achieved mean MCC of 94.41% on the benchmark dataset, and a MCC of 90.09% on the independent dataset. The experimental performance indicated that our new proposed method could be effective in identifying the important protein posttranslational modifications and the feature selection scheme would be powerful in protein functional residues prediction research fields.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27034949 PMCID: PMC4806266 DOI: 10.1155/2016/8151509
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1The system architecture of the proposed model.
Figure 2IFS scatter plot for 472 features.
The parameter configuration used in four optimization algorithms.
| Parameter configurations | ||
|---|---|---|
| Grid search | [ | [−5, 15] |
| [ | [−15, 5] | |
|
| ||
| GA | Crossover | 0.6 |
| Populations | 20 | |
| Mutation | 0.033 | |
| Max generation | 1000 | |
|
| ||
| DPSO | Particles | 100 |
| C1 | 1 | |
| C2 | 2 | |
| Max generation | 1000 | |
|
| ||
| DFA | Group | 100 |
| Randomness | 0.9 | |
| Absorption coefficient | 0.5 | |
| Max generation | 1000 | |
The prediction performance of four algorithms.
| SN (%) | SP (%) | ACC (%) | MCC (%) | Features | |
|---|---|---|---|---|---|
| RES+IFS1 | 91.49 | 96.01 | 94.67 | 88.74 | 103 |
| mRMR+IFS2 | 86.71 | 91.66 | 90.08 | 84.65 | 127 |
| GA3 | 92.55 | 97.17 | 94.28 | 91.69 | 73 |
| DPSO4 | 93.73 | 97.59 | 95.04 | 92.66 | 62 |
| DFA5 |
|
|
|
|
|
1 C = 64, γ = 0.03125 using Gauss kernel function; 2 C = 64, γ = 0.04268 using Gauss kernel function; 3 C = 128, γ = 0.003790 using Gauss kernel function; 4 C = 128, γ = 0.01136 using Gauss kernel function; 5 C = 128, γ = 0.005062 using Gauss kernel function.
Figure 3The ROC curve of four algorithms.
Figure 4Various contributions of different features. The black bars indicated the proportion of the feature in the whole feature matrix; the grey ones represented the percentage of the selected features accounting for the corresponding feature type; and the white ones represented the percentage of the selected features accounting for the final optimal feature subsets.
Comparisons of the proposed method with other methods.
| SN (%) | SP (%) | ACC (%) | MCC (%) | |
|---|---|---|---|---|
| Sulfinator [ | 44.44 | 87.50 | 74.14 | 35.44 |
| SulfoSite [ | 83.33 | 87.50 | 86.21 | 68.94 |
| PredSulSite [ | 89.89 | 97.50 | 94.83 | 87.80 |
| This method |
|
|
|
|
Figure 5The home page of DFA_PTSs.