| Literature DB >> 27563663 |
Abstract
Prediction of secreted protein types based solely on sequence data remains to be a challenging problem. In this study, we extract the long-range correlation information and linear correlation information from position-specific score matrix (PSSM). A total of 6800 features are extracted at 17 different gaps; then, 309 features are selected by a filter feature selection method based on the training set. To verify the performance of our method, jackknife and independent dataset tests are performed on the test set and the reported overall accuracies are 93.60% and 100%, respectively. Comparison of our results with the existing method shows that our method provides the favorable performance for secreted protein type prediction.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27563663 PMCID: PMC4985605 DOI: 10.1155/2016/3206741
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
The protein numbers of each type in training set and test set.
| Type | Training set | Test set |
|---|---|---|
| T1SP | 112 | 25 |
| T2SP | 99 | 29 |
| T3SP | 182 | 28 |
| T4SP | 62 | 22 |
| T5SP | 164 | 35 |
| T7SP | 48 | 33 |
Figure 1The overall accuracy of training dataset with G ranging from 0 to 16.
The selected feature numbers for training set at G = 10 (g ranges from 0 to 10).
| The value of | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| Number of selected features | 35 | 36 | 45 | 30 | 22 | 18 | 33 | 28 | 22 | 22 | 18 |
The prediction quality of our method on training set and test set.
| Dataset | Class | Sens (%) | Spec (%) | MCC |
|---|---|---|---|---|
| Training set | T1SP | 91.07 | 99.64 | 0.94 |
| T2SP | 79.80 | 97.18 | 0.78 | |
| T3SP | 89.01 | 89.90 | 0.76 | |
| T4SP | 67.74 | 98.35 | 0.72 | |
| T5SP | 96.34 | 99.20 | 0.96 | |
| T7SP | 81.25 | 99.35 | 0.85 | |
| OA | 87.26 | |||
|
| ||||
| Test set | T1SP | 84.00 | 100.0 | 0.90 |
| T2SP | 100.0 | 97.90 | 0.94 | |
| T3SP | 92.86 | 98.61 | 0.92 | |
| T4SP | 86.36 | 98.67 | 0.87 | |
| T5SP | 97.14 | 99.27 | 0.96 | |
| T7SP | 96.97 | 97.84 | 0.93 | |
| OA | 93.60 | |||
The comparison of our prediction quality with Yu's method by independent dataset test on the test set.
| Type | Reference | T1SP | T2SP | T3SP | T4SP | T5SP | T7SP | Total |
|---|---|---|---|---|---|---|---|---|
| Number of sequences | 25 | 29 | 28 | 22 | 35 | 33 | 172 | |
| The “one-to-one” algorithm | ||||||||
| Correct hit | 22 | 23 | 28 | 18 | 35 | 29 | 155 | |
| Sensitivity (%) | 88.00 | 79.31 | 100.00 | 81.82 | 100.00 | 87.88 | 90.12 | |
|
| ||||||||
| The “one-to-the-rest” algorithm | ||||||||
| Correct hit | [ | 20 | 22 | 28 | 17 | 34 | 27 | 148 |
| Sensitivity (%) | [ | 80.00 | 75.86 | 100.00 | 77.27 | 97.14 | 81.82 | 86.05 |
|
| ||||||||
| Correct hit | Our method | 25 | 29 | 28 | 22 | 35 | 33 | 172 |
| Sensitivity (%) | Our method | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |