| Literature DB >> 27660756 |
Qiwen Dong1, Kai Wang2, Bin Liu3, Xuan Liu4.
Abstract
Motivation. To assist efforts in determining and exploring the functional properties of proteins, it is desirable to characterize and predict protein flexibilities. Results. In this study, the conformational entropy is used as an indicator of the protein flexibility. We first explore whether the conformational change can capture the protein flexibility. The well-defined decoy structures are converted into one-dimensional series of letters from a structural alphabet. Four different structure alphabets, including the secondary structure in 3-class and 8-class, the PB structure alphabet (16-letter), and the DW structure alphabet (28-letter), are investigated. The conformational entropy is then calculated from the structure alphabet letters. Some of the proteins show high correlation between the conformation entropy and the protein flexibility. We then predict the protein flexibility from basic amino acid sequence. The local structures are predicted by the dual-layer model and the conformational entropy of the predicted class distribution is then calculated. The results show that the conformational entropy is a good indicator of the protein flexibility, but false positives remain a problem. The DW structure alphabet performs the best, which means that more subtle local structures can be captured by large number of structure alphabet letters. Overall this study provides a simple and efficient method for the characterization and prediction of the protein flexibility.Entities:
Year: 2016 PMID: 27660756 PMCID: PMC5021887 DOI: 10.1155/2016/4628025
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
The average Q-scores of local structure prediction for the four structure alphabets.
| Sec3 | Sec8 | PB | DW | |
|---|---|---|---|---|
| Number of letters | 3 | 8 | 16 | 28 |
|
| ||||
| Single-layer model | 0.756 | 0.593 | 0.564 | 0.432 |
| Dual-layer model | 0.765 | 0.614 | 0.585 | 0.456 |
The single-layer model uses the position specific score matrix (PSSM) as input and output probability of the structure alphabet letters. The dual-layer model adds an additional classifier, which uses the output of single-layer model as input and output final prediction. For both models, the support vector machine is used as the classifiers.
The correlations between the conformational entropies and the B-factors.
| ID | <3a | 3-4 | 4-5 | 5-6 | >6 | Sec3b | Sec8 | PB | DW |
|---|---|---|---|---|---|---|---|---|---|
| 1res | 73 | 73 | 73 | 7 | 4 | 0.1105 | 0.1505 | 0.2454 | 0.2605 |
| 1am3 | 571 | 177 | 162 | 161 | 400 | 0.1139 | 0.2993 | 0.4110 | 0.5149 |
| 1r69 | 389 | 119 | 284 | 228 | 300 | 0.2028 | 0.4040 | 0.3909 | 0.3794 |
| 1utg | 1 | 20 | 401 | 290 | 300 | 0.2003 | 0.2990 | 0.2729 | 0.1653 |
| 1a32 | 364 | 125 | 95 | 142 | 300 | 0.2819 | 0.4818 | 0.5077 | 0.4145 |
| 1mzm | 9 | 306 | 317 | 171 | 300 | 0.0118 | 0.2734 | 0.3353 | 0.3144 |
| 1hyp | 1 | 0 | 34 | 270 | 300 | 0.1491 | 0.3579 | 0.1893 | 0.2889 |
| 1cei | 1 | 0 | 4 | 64 | 300 | 0.0821 | 0.3583 | 0.4335 | 0.4932 |
| 1pgx | 219 | 342 | 182 | 391 | 300 | 0.0264 | 0.2843 | 0.3339 | 0.3674 |
| 5icb | 3 | 142 | 481 | 225 | 300 | 0.4255 | 0.5660 | 0.5433 | 0.5635 |
| Ave | 163.1 | 130.4 | 203.3 | 194.9 | 280.4 | 0.1604 | 0.3474 | 0.3663 | 0.3762 |
aShown in the table are the numbers of decoy structures in this class.
bShown in the table are the correlations measured by the specific structure alphabet.
Prediction performance of the protein flexibilities by different structure alphabets.
| SAa | No. pob | No. nec | Sensitivity | Specificity | Precision | ROC |
|---|---|---|---|---|---|---|
| Sec3 | 6152 | 54737 | 0.6291 | 0.4543 | 0.1109 | 0.5457 |
| Sec8 | 9468 | 51421 | 0.5887 | 0.5677 | 0.1942 | 0.5741 |
| PB | 10625 | 50264 | 0.6209 | 0.5521 | 0.2114 | 0.5901 |
| DW | 16012 | 44877 | 0.6399 | 0.5725 | 0.2586 | 0.6193 |
aThe structure alphabet types.
bThe number of positive samples (flexible residues).
cThe number of negative samples (rigid residues).
Figure 1The ROC curve of the proposed method by using different structure alphabets on the set of 171 protein sequences.