| Literature DB >> 31874617 |
Shunfang Wang1, Xiaoheng Wang2.
Abstract
BACKGROUND: Protein structural class predicting is a heavily researched subject in bioinformatics that plays a vital role in protein functional analysis, protein folding recognition, rational drug design and other related fields. However, when traditional feature expression methods are adopted, the features usually contain considerable redundant information, which leads to a very low recognition rate of protein structural classes.Entities:
Keywords: Different feature expressions; Fusion; Parallel 2-D wavelet denoising; Prediction of protein structural classes
Mesh:
Substances:
Year: 2019 PMID: 31874617 PMCID: PMC6929547 DOI: 10.1186/s12859-019-3276-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Detailed information of the two datasets
| Dataset | Number of proteins | ||||
|---|---|---|---|---|---|
| all-α | all-β | α/β | α + β | Total | |
| 25PDB | 443 | 443 | 346 | 441 | 1673 |
| 1189PDB | 223 | 294 | 334 | 241 | 1092 |
| 640PDB | 138 | 154 | 177 | 171 | 640 |
Fig. 1Flow chart of 2-D wavelet denoising
Pseudocode of the 2-D wavelet denoising algorithm
| Input: 2-D data, d1 Output: new 2-D data, d2 | |
|---|---|
| 1 | set x, n, t, j = 0; //set wavelet function, decomposition scale, threshold value and pointer j. |
| 2 | (L [j], h1[j], h2[j], h3[j]) = wavedec2(x, d1) //decompose data. |
| 3 | (h1[j], h2[j], h3[j]) = threshold(t, h1[j], h2[j], h3[j]); //quantize high frequency coefficients. |
| 4 | for→j = 0 to n-1: //the process of decomposition. |
| 5 | (L [j + 1], h1[j + 1], h2[j + 1], h3[j + 1]) = wavedec2(x, L [j]); |
| 6 | (h1[j + 1], h2[j + 1], h3[j + 1]) = threshold(h, h1[j + 1], h2[j + 1], h3[j + 1]); j = j + 1; |
| 7 | for→i = n-1 to 0: //the process of reconstruction. |
| 8 | L [i-1] = waverec2(x, L [i], h1[i], h2[i], h3[i]); i = i-1; |
| 9 | d2 = waverec2(x, L [i], h1[i], h2[i], h3[i]); //reconstruct data. |
Fig. 2Flow chart of the PWD-FU-PseAAC method
Prediction results of type 1 PseAAC by different values of λ on the 25PDB
| Class | λ | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Jackknife test(%) | |||||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
| all-α | 77.43 | 94.58 | 88.71 | 85.10 | 88.94 | 88.49 | 87.36 | 88.26 | 87.81 |
| all-β | 89.16 | 90.52 | 90.52 | 89.39 | 88.94 | 88.04 | 90.29 | 90.29 | 90.52 |
| α/β | 78.03 | 88.73 | 86.42 | 83.53 | 87.57 | 86.71 | 86.99 | 89.31 | 91.62 |
| α + β | 68.03 | 78.23 | 76.87 | 75.28 | 76.42 | 75.28 | 72.11 | 73.47 | 71.20 |
| OA | 78.18 | 87.98 | 85.59 | 83.32 | 85.36 | 84.52 | 84.04 | 85.11 | 84.94 |
Prediction results of type 2 PseAAC by different values of r on the 25PDB
| Class | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Jackknife test(%) | |||||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
| all-α | 76.07 | 74.49 | 70.88 | 73.81 | 72.23 | 71.11 | 71.11 | 68.17 | 63.43 |
| all-β | 87.81 | 88.49 | 85.78 | 83.75 | 84.65 | 83.75 | 82.39 | 79.46 | 79.46 |
| α/β | 76.01 | 79.77 | 78.90 | 82.08 | 85.55 | 83.82 | 86.71 | 85.55 | 87.57 |
| α + β | 61.45 | 65.76 | 60.09 | 62.59 | 56.46 | 51.47 | 50.34 | 47.62 | 44.22 |
| OA | 75.31 | 76.99 | 73.64 | 75.19 | 74.12 | 71.91 | 71.85 | 69.34 | 67.60 |
Prediction results on the 25PDB by different wavelet functions and different wavelet decomposition scales using type 1 PseAAC
| Wavelet | Scales | |||
|---|---|---|---|---|
| Jackknife test (%) | ||||
| 2 | 3 | 4 | 5 | |
| db2 | 78.60 | 80.27 | 82.07 | 87.09 |
| db4 | 83.68 | 87.99 | 94.08 | |
| db6 | 75.79 | 83.38 | 89.30 | 93.37 |
| sym2 | 78.60 | 80.27 | 82.07 | 87.09 |
| sym4 | 77.05 | 85.18 | 90.79 | 91.63 |
| sym6 | 78.06 | 78.30 | 81.59 | 84.82 |
| coif1 | 76.75 | 83.32 | 87.15 | 90.50 |
| coif3 | 78.90 | 86.01 | 91.57 | 91.69 |
| bior2.2 | 71.07 | 79.20 | 82.90 | 86.61 |
| bior2.4 | 73.52 | 82.37 | 84.88 | 83.68 |
Prediction results on the 25PDB by different wavelet functions and different wavelet decomposition scales using type 2 PseAAC
| Wavelet | Scales | |||
|---|---|---|---|---|
| Jackknife test (%) | ||||
| 2 | 3 | 4 | 5 | |
| db2 | 74.90 | 84.28 | 88.58 | 91.21 |
| db4 | 78.84 | 76.99 | 86.01 | 86.25 |
| db6 | 78.00 | 85.00 | 89.90 | 91.15 |
| sym2 | 74.90 | 84.28 | 88.58 | 91.21 |
| sym4 | 79.01 | 83.32 | 91.57 | |
| sym6 | 75.43 | 83.44 | 87.45 | 89.60 |
| coif1 | 76.27 | 83.14 | 91.57 | 91.45 |
| coif3 | 78.90 | 76.93 | 80.63 | 82.96 |
| bior2.2 | 77.82 | 86.61 | 88.64 | 86.07 |
| bior2.4 | 74.30 | 88.16 | 92.77 | 93.19 |
Fig. 3Prediction results by type 1 PseAAC on different decomposition scales and wavelet basis functions on the 25PDB
Fig. 4Prediction results by type 2 PseAAC on different decomposition scales and wavelet basis functions on the 25PDB
Fig. 5Comparisons of 1-D wavelet denoising and 2-D wavelet denoising on the 25PDB
Prediction results by choosing different values of K on the 25PDB
| Class | K | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Jackknife test(%) | |||||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
| all-α | 97.97 | 98.65 | 95.71 | 96.84 | 93.23 | 94.36 | 93.91 | 94.58 | 93.00 |
| all-β | 98.87 | 99.10 | 98.65 | 98.87 | 98.42 | 98.65 | 98.65 | 98.87 | 98.65 |
| α/β | 97.98 | 97.40 | 95.67 | 96.24 | 93.93 | 94.80 | 93.64 | 93.64 | 92.77 |
| α + β | 97.51 | 89.80 | 94.78 | 89.11 | 89.57 | 85.71 | 86.17 | 83.45 | 85.26 |
Fig. 6Prediction results by choosing different values of K on the 25PDB
Comparison of different strategies on the 25PDB
| Dataset | Prediction accuracy(%) | |||||
|---|---|---|---|---|---|---|
| Strategy | all-α | all-β | α/β | α + β | OA | |
| 25PDB | 1 | 53.05 | 44.24 | 75.72 | 16.55 | 45.79 |
| 2 | 53.05 | 45.37 | 73.41 | 17.23 | 45.79 | |
| 3 | 98.19 | 98.19 | 97.11 | 94.10 | 96.89 | |
| 4 | 93.00 | 98.87 | 94.80 | 92.97 | 94.92 | |
| 5 | 96.16 | 99.32 | 97.98 | 94.78 | 97.01 | |
Fig. 7Comparison of different strategies on the 25PDB
Influence of different classifiers on prediction results on the 25PDB
| Classifier | Prediction accuracy(%) | ||||
|---|---|---|---|---|---|
| all-α | all-β | α/β | α + β | OA | |
| Naive Bayes | 95.49 | 97.29 | 90.75 | 49.66 | 82.90 |
| KNN | |||||
| SVM | 98.65 | 97.97 | 97.11 | 97.51 | 97.85 |
Fig. 8Influence of different classifiers on prediction results on the 25PDB
Prediction performance of model 1 on three benchmark datasets
| Dataset | Class | Sens(%) | Spec(%) | MCC | OA(%) |
|---|---|---|---|---|---|
| 25PDB | all-α | 97.97 | 99.84 | 0.983 | 98.09 |
| all-β | 98.87 | 99.84 | 0.989 | ||
| α/β | 97.98 | 99.17 | 0.967 | ||
| α + β | 97.51 | 98.62 | 0.957 | ||
| 1189 | all-α | 98.21 | 99.66 | 0.980 | 97.25 |
| all-β | 99.32 | 99.87 | 0.993 | ||
| α/β | 99.10 | 97.23 | 0.950 | ||
| α + β | 91.29 | 99.41 | 0.930 | ||
| 640 | all-α | 95.65 | 99.20 | 0.954 | 96.09 |
| all-β | 98.05 | 99.59 | 0.979 | ||
| α/β | 97.18 | 96.98 | 0.928 | ||
| α + β | 93.57 | 98.93 | 0.936 |
Comparison with other methods on three benchmark datasets
| Dataset | Prediction accuracy(%) | |||||
|---|---|---|---|---|---|---|
| Method | all-α | all-β | α/β | α + β | OA | |
| 25PDB | MEDP [ | 87.8 | 78.3 | 76.0 | 57.4 | 74.8 |
| SCPRED [ | 92.6 | 80.1 | 74.0 | 71.0 | 79.7 | |
| PKS-PPSC [ | 89.2 | 86.7 | 82.6 | 65.6 | 81.3 | |
| Zhang et al. [ | 92.4 | 87.4 | 82.0 | 71.0 | 83.9 | |
| PSSS-PSSM [ | 96.6 | 87.1 | 83.0 | 78.9 | 86.6 | |
| PSSS-PsePSSM [ | 96.4 | 90.5 | 90.2 | 81.2 | 89.5 | |
| WD-PseAAC [ | 95.7 | 97.7 | 94.8 | 84.4 | 93.1 | |
| 1189 | MEDP [ | 85.2 | 84.0 | 84.4 | 45.2 | 75.8 |
| SCPRED [ | 89.1 | 86.7 | 89.6 | 53.8 | 80.6 | |
| PKS-PPSC [ | 89.2 | 86.7 | 82.6 | 65.6 | 81.3 | |
| Zhang et al. [ | 92.4 | 87.4 | 82.0 | 71.0 | 83.2 | |
| PSSS-PSSM [ | 94.2 | 88.4 | 85.3 | 71.8 | 85.0 | |
| PSSS-PsePSSM [ | 91.9 | 91.8 | 87.7 | 73.9 | 86.6 | |
| WD-PseAAC [ | 98.7 | 99.0 | 94.0 | 68.9 | 90.8 | |
| 640 | MEDP [ | 84.8 | 75.3 | 86.4 | 53.8 | 74.7 |
| SCPRED [ | 90.6 | 81.8 | 85.9 | 66.7 | 80.8 | |
| PKS-PPSC [ | 89.1 | 85.1 | 88.1 | 71.4 | 83.1 | |
| Zhang et al. [ | – | – | – | – | – | |
| PSSS-PSSM [ | – | – | – | – | – | |
| PSSS-PsePSSM [ | 87.0 | 81.2 | 84.7 | 70.8 | 81.0 | |
| WD-PseAAC [ | 92.8 | 95.5 | 92.1 | 78.9 | 89.5 | |
Fig. 9Comparison with other methods on the 25PDB
Fig. 10Comparison with other methods on the 1189PDB
Fig. 11Comparison with other methods on the 640PDB