| Literature DB >> 26788119 |
Yunyun Liang1, Sanyang Liu1, Shengli Zhang1.
Abstract
Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26788119 PMCID: PMC4693000 DOI: 10.1155/2015/370756
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
The compositions of three datasets adopted in this paper.
| Dataset | All- | All- |
|
| Total |
|---|---|---|---|---|---|
| 1189 | 223 | 294 | 334 | 241 | 1092 |
| 25PDB | 443 | 443 | 346 | 441 | 1673 |
| 640 | 138 | 154 | 177 | 171 | 640 |
The prediction accuracies of our method on the 1189, 25PDB and 640 datasets.
| Dataset | Structural class | Sens (%) | Spec (%) |
| MCC | AUC |
|---|---|---|---|---|---|---|
| 1189 | All- | 84.8 | 95.6 | 0.84 | 0.80 | 0.90 |
| All- | 85.4 | 94.1 | 0.85 | 0.79 | 0.90 | |
|
| 85.0 | 90.0 | 0.82 | 0.74 | 0.88 | |
|
| 55.2 | 91.3 | 0.59 | 0.49 | 0.73 | |
| OA | 78.5 | |||||
| AA | 77.6 | |||||
|
| ||||||
| 25PDB | All- | 94.4 | 96.4 | 0.92 | 0.90 | 0.95 |
| All- | 91.9 | 97.2 | 0.92 | 0.89 | 0.95 | |
|
| 71.1 | 95.7 | 0.76 | 0.70 | 0.83 | |
|
| 92.5 | 95.2 | 0.90 | 0.86 | 0.94 | |
| OA | 88.4 | |||||
| AA | 87.5 | |||||
|
| ||||||
| 640 | All- | 83.3 | 96.8 | 0.86 | 0.82 | 0.90 |
| All- | 83.1 | 95.3 | 0.84 | 0.79 | 0.89 | |
|
| 83.0 | 89.4 | 0.79 | 0.70 | 0.86 | |
|
| 60.2 | 87.4 | 0.62 | 0.49 | 0.74 | |
| OA | 77.0 | |||||
| AA | 77.4 | |||||
Figure 1The flowchart of our proposed method.
Figure 2Comparison of accuracies between our method that includes 224 features and method that includes 700 features.
Performance comparison of our six feature groups on the 1189 dataset.
| Dataset | Features | Prediction accuracy (%) | ||||
|---|---|---|---|---|---|---|
| All- | All- |
|
| OA (%) | ||
| 1189 | CSAAC-PSSM (20D) | 72.7 | 76.2 | 78.7 | 26.1 | 65.2 |
| CSCM-PSSM (20D) | 69.1 | 76.9 | 82.0 | 29.9 | 66.5 | |
| Seg2-PsePSSM (200D) | 80.7 | 82.7 | 80.8 | 51.0 | 74.7 | |
| Seg3-PsePSSM (180D) | 79.8 | 80.6 | 81.4 | 48.1 | 73.5 | |
| Seg2-ACPSSM (160D) | 76.7 | 82.3 | 76.0 | 44.4 | 70.9 | |
| Seg3-ACPSSM (120D) | 69.1 | 77.6 | 78.4 | 38.6 | 67.5 | |
The contribution of each feature group for the overall accuracy (%).
| Combination of feature groups | Dimension | 1189 | 25PDB | 640 |
|---|---|---|---|---|
| CSAACP | 20 | 65.2 | 62.0 | 66.0 |
| CSAACP + CSCMP (CSP) | 40 | 66.5 | 63.1 | 64.7 |
| CSP + Seg2-PseP | 240 | 75.2 | 74.4 | 75.8 |
| CSP + Seg2-PseP + Seg3-PseP | 420 | 76.2 | 87.7 | 74.5 |
| CSP + SegPseP + seg2-ACP | 680 | 76.1 | 87.9 | 75.0 |
| CSP + SegPseP + seg2-ACP + seg3-ACP | 700 | 77.1 | 88.6 | 75.5 |
| CSP + SegPseP + SegACP-PCA | 224 | 78.5 | 88.4 | 77.0 |
Performance comparison of different methods on three datasets.
| Dataset | Method | Prediction accuracy (%) | ||||
|---|---|---|---|---|---|---|
| All- | All- |
|
| OA (%) | ||
| 1189 | PSSM-S [ | 93.3 | 85.1 | 77.6 | 65.6 | 80.2 |
| LCC-PSSM [ | 89.2 | 88.8 | 85.6 | 58.5 | 81.2 | |
| MBMGAC-PSSM [ | 79.8 | 85.0 | 84.7 | 50.6 | 76.3 | |
| RPSSM [ | 67.7 | 75.2 | 74.6 | 17.4 | 60.2 | |
| AADP-PSSM [ | 69.1 | 83.7 | 85.6 | 35.7 | 70.7 | |
| AATP [ | 72.7 | 85.4 | 82.9 | 42.7 | 72.6 | |
| MEDP [ | 85.2 | 84.0 | 84.3 | 45.2 | 75.8 | |
| PsePSSM [ | 82.0 | 82.3 | 84.1 | 44.0 | 74.4 | |
| AAC-PSSM-AC [ | 80.7 | 86.4 | 81.4 | 45.2 | 74.6 | |
| This paper |
|
|
|
|
| |
|
| ||||||
| 25PDB | PSSM-S [ | 93.8 | 92.8 | 92.6 | 81.7 | 90.1 |
| LCC-PSSM [ | 91.7 | 80.8 | 79.8 | 64.0 | 79.0 | |
| MBMGAC-PSSM [ | 86.7 | 81.5 | 79.5 | 61.7 | 77.2 | |
| RPSSM [ | 75.6 | 70.2 | 52.0 | 43.3 | 60.8 | |
| AADP-PSSM [ | 83.3 | 78.1 | 76.3 | 54.4 | 72.9 | |
| AATP [ | 81.9 | 74.7 | 75.1 | 55.8 | 71.7 | |
| MEDP [ | 87.8 | 78.3 | 76.0 | 57.4 | 74.8 | |
| AAC-PSSM-AC [ | 85.3 | 81.7 | 73.7 | 55.3 | 74.1 | |
| PsePSSM [ | 86.2 | 78.8 | 75.7 | 57.6 | 75.5 | |
| Xia et al. [ | 92.6 | 72.5 | 71.7 | 71.0 | 77.2 | |
| This paper |
|
|
|
|
| |
|
| ||||||
| 640 | LCC-PSSM [ | 92.8 | 88.3 | 85.9 | 66.1 | 82.7 |
| MBMGAC-PSSM [ | 86.2 | 83.1 | 85.3 | 63.2 | 79.1 | |
| MEDP [ | 84.8 | 75.3 | 86.4 | 53.8 | 74.7 | |
| PsePSSM [ | 73.9 | 76.6 | 85.3 | 51.5 | 71.7 | |
| This paper |
|
|
|
|
| |