| Literature DB >> 27195495 |
Ravindra Kumar1, Bandana Kumari1, Manish Kumar1.
Abstract
Heat shock proteins are chaperonic proteins, which are present in every domain of life. They play a crucial role in folding/unfolding of proteins, their sorting and assembly into multi-protein complex, cell cycle control and also protect the cell during stress. Considering the fact that no web-based predictor is available for simultaneous prediction and classification of HSPs, it is imperative to develop a method, which can predict and classify them efficiently. In this study, we have developed coupled amino acid composition and support vector machine based two-tier method, PredHSP that identifies heat shock proteins (1st tier) and classifies it to different families (at 2nd tier). At 1st tier, we achieved maximum accuracy 76.66% with MCC 0.43, while at 2nd tier we achieved maximum accuracy 96.36% with MCC 0.87 for HSP20, 91.91% with MCC 0.83 for HSP40, 95.96% with MCC 0.72 for HSP60, 91.87% with MCC 0.71 for HSP70, 98.43% with MCC 0.70 for HSP90 and 97.48% with MCC 0.71 for HSP100. We have also developed a webserver, as well as standalone package for the use of scientific community, which can be accessed at http://14.139.227.92/mkumar/predhsp/index.html.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27195495 PMCID: PMC4873250 DOI: 10.1371/journal.pone.0155872
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Protein distribution in training dataset.
| HSP Family | Description | Number of proteins |
|---|---|---|
| HSP20 | sHSP/Small HSP | 357 |
| HSP40 | DnaJ-class proteins | 1279 |
| HSP60 | GroEL/ES or chaperonins | 163 |
| HSP70 | DnaK/chaperones | 283 |
| HSP90 | Chaperonines | 58 |
| HSP100 | High Molecular Weight HSP | 85 |
Distribution of HSPs across different families in independent datasets.
HGNC dataset contains human HSPs obtained from HGNC [14] and mixed dataset contains rice HSPs obtained from Wang et al [15] and Sarkar et al [16].
| HSP family | Number of Proteins | ||
|---|---|---|---|
| HGNC Dataset | Mixed Dataset | ||
| Wang et al | Sarkar et al | ||
| HSP20 | 11 | 14 | — |
| HSP40 | 49 | — | — |
| HSP60 | 14 | 4 | — |
| HSP70 | 17 | 7 | 24 |
| HSP90 | 4 | 3 | — |
| HSP100 | — | 3 | |
Fig 1Flow chart to show the prediction schema of HSPs and its families.
Fig 2Schematic illustration of categorization of prediction into different categories.
Fig 3Relative enrichment and depletion of amino acids in HSP and their families with reference to non-HSP and other HSP families respectively.
(3a) HSPs vs. Non-HSPs; (3b) HSP20 vs. remaining HSP family; (3c) HSP40 vs. remaining HSP family; (3d) HSP60 vs. remaining HSP family; (3e) HSP70 vs. remaining HSP family; (3f) HSP90 vs. remaining HSP family; (3g) HSP100 vs. remaining HSP family.
Performance of discrete amino acid and coupled amino acid composition based SVM models during FFCV at 1st tier.
| Discrete Amino Acid Composition | Coupled Amino Acid Composition | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Sens | Spec | Accu | MCC | AUC | Para | Sens | Spec | Accu | MCC | AUC | Para |
| 66.69 | 74.39 | 72.98 | 0.34 | 0.77 | -z c -j 5 -t 2 -g 0.01 | 74.45 | 77.17 | 76.66 | 0.43 | 0.84 | -z c -j 7 -t 1 -d 2 |
Sens, Spec, Accu, MCC, AUC and Para represents sensitivity, specificity, accuracy, Matthew’s correlation coefficient, area under ROC curve and SVM_light learning parameters on which performance was achieved respectively. All values except MCC and AUC are expressed in percentage.
Performance of discrete amino acid and coupled amino acid composition based SVM models during LOOCV at 2nd tier.
| HSP Family | Discrete Amino Acid Composition | Coupled Amino Acid Composition | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sens | Spec | Accu | MCC | AUC | Para | Sens | Spec | Accu | MCC | AUC | Para | |
| HSP20 | 84.87 | 86.24 | 86.02 | 0.60 | 0.96 | -z c–j 7 t 2 –g 0.005 | 92.16 | 97.16 | 96.36 | 0.87 | 1.00 | -z c -j 4 -t 2 -g 0.005 |
| HSP40 | 86.55 | 84.88 | 85.84 | 0.71 | 0.94 | -z c–j 1 t 1 –d 4 | 96.09 | 86.26 | 91.91 | 0.83 | 0.99 | -z c -j 1 -t 2 -g 0.0005 |
| HSP60 | 84.05 | 85.65 | 85.53 | 0.46 | 0.95 | -z c–j 9 t 1 -d 5 | 79.75 | 97.24 | 95.96 | 0.72 | 1.00 | -z c -j 10 -t 1 -d 3 |
| HSP70 | 84.81 | 83.73 | 83.87 | 0.53 | 0.92 | -z c–j 5 t 1 -d 5 | 91.17 | 91.97 | 91.87 | 0.71 | 1.00 | -z c -j 6 -t 1 -d 2 |
| HSP90 | 82.76 | 83.25 | 83.24 | 0.27 | 0.92 | -z c–j 22 t 2 –g 0.0005 | 72.41 | 99.12 | 98.43 | 0.70 | 1.00 | -z c -j 20 -t 2 -g 0.0005 |
| HSP100 | 88.24 | 89.02 | 88.99 | 0.43 | 0.97 | -z c–j 37 t 1 –d 5 | 82.35 | 98.08 | 97.48 | 0.71 | 1.00 | -z c -j 19 -t 2 -g 0.0005 |
Sens, Spec, Accu, MCC, AUC and Para stand for sensitivity, specificity, accuracy, Matthew’s correlation coefficient, area under ROC curve and SVM_light parameter respectively. All values except MCC and AUC are expressed in percentage.
Fig 4ROC curve of SVM models based on amino acid and coupled amino acid composition for prediction of (4a) HSPs and (4b) different families of HSPs. Solid line represents discrete amino acid composition (AA) while broken represents coupled amino acid composition (CAA) based SVM model.
Comparison of performance of PredHSP with iHSP-PseRAAAC at 2nd tier.
| HSP Family | iHSP-PseRAAAC/PredHSP | ||
|---|---|---|---|
| Sensitivity | Specificity | MCC | |
| HSP20 | 87.68/92.16 | 96.36/97.16 | 0.82/0.87 |
| HSP40 | 95.31/96.09 | 84.87/86.26 | 0.99/0.83 |
| HSP60 | 66.87/79.75 | 98.93/97.24 | 0.69/0.72 |
| HSP70 | 79.15/91.17 | 86.54/91.97 | 0.54/0.71 |
| HSP90 | 51.72/72.41 | 99.89/99.12 | 0.30/0.70 |
| HSP100 | 69.41/82.35 | 99.84/98.08 | 0.83/0.71 |
Performance of PredHSP on human HSPs obtained from HGNC [14] and rice HSPs obtained from Wang et al. [15] and Sarkar et al. [16].
TP represents true prediction and FP represents false prediction.
| Source→ | Human | Rice | |||||||
|---|---|---|---|---|---|---|---|---|---|
| HSP | HGNC Database | Wang et al. | Sarkar et al. | ||||||
| Class | Total | TP | FP | Total | TP | FP | Total | TP | FP |
| HSP20 | 11 | 8 | 3 (2-non-HSP, 1-HSP40) | 14 | 12 | 2 (non-HSP) | — | — | — |
| HSP40 | 49 | 45 | 4 (non-HSP) | — | — | — | — | — | — |
| HSP60 | 14 | 9 | 5 (4 non-HSP, 1-HSP70) | 4 | 4 | 0 | — | — | — |
| HSP70 | 17 | 17 | 0 | 7 | 7 | 0 | 24 | 23 | 1 (HSP20) |
| HSP90 | 4 | 4 | 0 | 3 | 3 | 0 | — | — | — |
| HSP100 | — | — | — | 3 | 3 | 0 | — | — | |
Genome wide annotation of heat shock proteins in different organisms.
| Organism | Total number of HSP | HSP20 | HSP40 | HSP60 | HSP70 | HSP90 | HSP100 |
|---|---|---|---|---|---|---|---|
| 43 | 8 | 9 | 5 | 13 | 1 | 7 | |
| 51 | 8 | 22 | 3 | 15 | 2 | 1 | |
| 123 | 15 | 42 | 5 | 42 | 2 | 17 | |
| 145 | 12 | 82 | 9 | 30 | 6 | 6 | |
| 814 | 137 | 406 | 70 | 149 | 19 | 33 | |
| 2192 | 324 | 1212 | 158 | 403 | 11 | 84 | |
| 556 | 94 | 252 | 54 | 125 | 13 | 18 | |
| 331 | 62 | 172 | 24 | 61 | 4 | 8 | |
| 979 | 225 | 539 | 57 | 113 | 16 | 29 |