| Literature DB >> 26478747 |
Jian Zhang1, Wenhan Chen2, Pingping Sun3, Xiaowei Zhao3, Zhiqiang Ma1.
Abstract
BACKGROUND: The prediction of solvent accessibility could provide valuable clues for analyzing protein structure and functions, such as protein 3-Dimensional structure and B-cell epitope prediction. To fully decipher the protein-protein interaction process, an initial but crucial step is to calculate the protein solvent accessibility, especially when the tertiary structure of the protein is unknown. Although some efforts have been put into the protein solvent accessibility prediction, the performance of existing methods is far from satisfaction.Entities:
Keywords: Particle swarm optimization; Protein sequence; Solvent accessibility; Support vector regression
Year: 2015 PMID: 26478747 PMCID: PMC4608127 DOI: 10.1186/s13040-014-0031-3
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Figure 1The architecture of PSAP for protein solvent accessibility prediction. Five different types of sequence-derived features are generated and constructed as input vector to build the PSO-SVR with weighted sliding window scheme.
Combination of different types of Sequence-derived features for SVR predictors on PSAP2312
|
|
| |
|---|---|---|
|
|
| |
| PSSM1 | 17.3 | 0.49 |
| PSSM+PS2 | 16.2 | 0.55 |
| PSSM+PS+ DO 3 | 15.5 | 0.61 |
| PSSM+PS+ DO +SS4 | 15.2 | 0.65 |
| PSSM+PS+ DO +SS+PC5 | 14.8 | 0.67 |
1Position specific scoring matrix; 2protein sequence information; 3Native disorder; 4Secondary Structure features; 5physicochemical propensities.
The performance of different machine learning methods using 3-fold cross-validation
|
|
| |
|---|---|---|
| MAE (%) | PCC | |
| wKNN1 | 14.9 | 0.63 |
| GBR2 | 15.1 | 0.64 |
| SVR |
|
|
1weighted K-Nearest Neighbor, kernel = triangular, k = 19; 2Generalized Boosting Regression, distribution = Gaussian, n.trees = 1000, shrinkage = 0.05, interaction.depth = 3; best results are shown in bold.
Performance of different parameter optimization methods using 3-fold cross-validation
|
|
|
| ||
|---|---|---|---|---|
|
|
|
|
| |
| SVR | 19.6 | 0.60 | 14.8 | 0.67 |
| SVR-grid search1 | 17.3 | 0.67 | 14.7 | 0.69 |
| PSO-SVR2 |
|
|
|
|
1kernel = Gaussian, C = 0.01, γ = 0.0025, ɛ = 0.05; 2kernel = Gaussian, C = 0.00762, γ = 0.00130, ɛ = 0.04129; best results are shown in bold.
Comparison with other reported methods
|
|
|
| ||
|---|---|---|---|---|
|
|
|
|
| |
| EO | - | 0.49 | - | 0.52 |
| SVR | 14.8 | 0.68 | 14.2 | 0.69 |
| Real-SPINE | 14.5 | 0.68 | 13.8 | 0.70 |
| PR | - | - |
| 0.64 |
| NetSurfP | 14.3 | 0.71 | 13.6 | 0.70 |
| PSO-SVR |
|
|
|
|
Unreported results are denoted by “-”; best results are shown in bold.
Experimental comparison between the proposed predictor and other reported classification predictors
|
|
| ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
| |
| PR | 76.8 | 74.8 | 75.3 | 76.7 | 77.7 | 79.8 | 86.3 | - | - | - | - |
| Agent-based | 79.7 | 78.4 | 77.0 | 77.0 | 77.1 | 79.3 | 85.1 | - | - | - | - |
| Two-stage SVR | 81.1 | 78.7 | 77.6 | 77.3 | - | - | 79.5 | 84.3 | 89.9 |
| 97.5 |
| SVR | 80.9 | 80.1 | 78.7 | - | - | - | 80.8 | 85.3 |
|
|
|
| PSO-SVR |
|
|
|
|
|
|
|
| 90.2 |
|
|
Unreported results are denoted by “-”; best results are shown in bold.
Experimental performance of different servers for the independent dataset
|
|
|
|
| |
|---|---|---|---|---|
|
|
| |||
| NN | 513 proteins | NetSurfP | 14.5 | 0.66 |
| NN | 2640 proteins | Real-SPINE 3.0 | 14.2 | 0.69 |
| KNN | 5717 proteins | SANN | 14.3 | 0.69 |
| PSO-SVR | PSAP2312 | Our PSAP | 13.9 | 0.73 |
| CB502 | Our PSAP | 14.0 | 0.71 | |
| Manesh215 | Our PSAP | 14.3 | 0.70 | |
Figure 2True mean values and PSAP predicted mean values for 20 types of amino acid on PSAP2312 datasets. The blue bar represents the true mean values, while the red bar represents the PSO-SVR predicted values.
Figure 320 types of amino acid mean predicted errors on PSAP2312 datasets.
Figure 4Prediction error bar diagram showing the relative number of residues predicted within a given range of MAE on PSAP2312 dataset.
Figure 5Residue-specific prediction error and RSA variability. Blue squares represent the prediction error of PSO-SVR approach on PSAP2312 dataset, while red circles represent standard deviation. The correlation between PSO-SVR approach and standard deviation is 96.9%.