| Literature DB >> 25351274 |
Ravindra Kumar1, Bandana Kumari1, Abhishikha Srivastava1, Manish Kumar1.
Abstract
Nuclear receptor proteins (NRP) are transcription factor that regulate many vital cellular processes in animal cells. NRPs form a super-family of phylogenetically related proteins and divided into different sub-families on the basis of ligand characteristics and their functions. In the post-genomic era, when new proteins are being added to the database in a high-throughput mode, it becomes imperative to identify new NRPs using information from amino acid sequence alone. In this study we report a SVM based two level prediction systems, NRfamPred, using dipeptide composition of proteins as input. At the 1st level, NRfamPred screens whether the query protein is NRP or non-NRP; if the query protein belongs to NRP class, prediction moves to 2nd level and predicts the sub-family. Using leave-one-out cross-validation, we were able to achieve an overall accuracy of 97.88% at the 1st level and an overall accuracy of 98.11% at the 2nd level with dipeptide composition. Benchmarking on independent datasets showed that NRfamPred had comparable accuracy to other existing methods, developed on the same dataset. Our method predicted the existence of 76 NRPs in the human proteome, out of which 14 are novel NRPs. NRfamPred also predicted the sub-families of these 14 NRPs.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25351274 PMCID: PMC5381360 DOI: 10.1038/srep06810
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Performance of amino acid and dipeptide composition based SVM models during LOOCV at 1st level. All values except MCC and AUC are in percentage. Sens, Spec, Acc, MCC and AUC stand for sensitivity, specificity, accuracy, Matthew's correlation coefficient and area under ROC curve respectively
| Amino Acid | Dipeptide | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Sens | Spec | Acc | MCC | AUC | Sens | Spec | Acc | MCC | AUC |
| 96.86 | 92.20 | 93.32 | 0.84 | 0.98 | 96.23 | 98.40 | 97.88 | 0.94 | 1.00 |
Performance of amino acid and dipeptide composition based SVM models during LOOCV at 2nd level. All values except MCC and AUC are in percentage. Sens, Spec, Acc, MCC and AUC represent sensitivity, specificity, accuracy, Matthew's correlation coefficient and area under ROC curve respectively
| Amino Acid | Dipeptide | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sub-family | Sens | Spec | Acc | MCC | AUC | Sens | Spec | Acc | MCC | AUC |
| 50.00 | 97.96 | 94.34 | 0.55 | 0.79 | 83.33 | 95.92 | 94.97 | 0.70 | 0.95 | |
| 80.00 | 82.57 | 81.76 | 0.60 | 0.86 | 98.00 | 99.08 | 98.74 | 0.97 | 0.99 | |
| 83.33 | 78.86 | 79.87 | 0.54 | 0.84 | 91.67 | 96.75 | 95.60 | 0.88 | 0.97 | |
| 78.38 | 98.36 | 93.71 | 0.82 | 0.98 | 100.00 | 99.18 | 99.37 | 0.98 | 1.00 | |
| 85.71 | 90.79 | 90.57 | 0.47 | 0.92 | 100.00 | 99.34 | 99.37 | 0.93 | 1.00 | |
| 58.33 | 97.96 | 94.97 | 0.61 | 0.84 | 83.33 | 100.00 | 98.74 | 0.91 | 0.98 | |
| 80.00 | 100.00 | 99.37 | 0.89 | 1.00 | 100.00 | 100.00 | 100.00 | 1.00 | 1.00 | |
| 76.73 | 92.98 | 90.66 | 0.63 | -- | 94.97 | 98.63 | 98.11 | 0.92 | -- | |
Comparative performance of NRfamPred vis-à-vis INR-PhysChem and NR-2L at 1st level. At LOOCV, comparison is made at the point where sensitivities of NR-2L and iNR-PhysChem were equal to NRfamPred. iNR-PhysChem was not evaluated using independent dataset in Ref. no. [16]. Hence, corresponding values of iNR-PhysChem is not shown. All values except MCC are in percentage. (#Ref. no. [16], ¶Ref. no. [15])
| LOOCV | ||||
|---|---|---|---|---|
| Predictor | Sensitivity | Specificity | Accuracy | MCC |
| NRfamPred/iNR-PhysChem# | 96.23/96.23 | 98.40/98.80 | 97.88/98.18 | 0.94/0.96 |
| NRfamPred/NR-2L¶ | 98.11/98.11 | 96.40/90.80 | 96.81/92.56 | 0.92/0.83 |
Comparison of LOOCV performance of NRfamPred, iNR-PhysChem and NR-2L at 2nd level of prediction. All values except MCC are in percentage. Sensitivities of iNR-PhysChem and NR-2L was reported as accuracy in #Ref. no. [16] and ¶Ref. no. [15] respectively
| NRfamPred | iNR-PhysChem | NR-2L | ||||
|---|---|---|---|---|---|---|
| Sub-family | Sensitivity | MCC | Sensitivity# | MCC# | Sensitivity¶ | MCC¶ |
| NRP0 | 83.33 | 0.70 | 66.67 | 0.81 | 75.00 | 0.86 |
| NRP1 | 98.00 | 0.97 | 94.00 | 0.87 | 86.00 | 0.88 |
| NRP2 | 91.67 | 0.88 | 97.22 | 0.93 | 86.11 | 0.85 |
| NRP3 | 100.00 | 0.98 | 100.00 | 0.95 | 100.00 | 0.86 |
| NRP4 | 100.00 | 0.93 | 71.43 | 0.84 | 85.71 | 0.70 |
| NRP5 | 83.33 | 0.91 | 83.33 | 0.91 | 83.33 | 0.86 |
| NRP6 | 100.00 | 1.00 | 100.00 | 1.00 | 100.00 | 1.00 |
| Overall | 94.97 | 0.92 | 92.45 | 0.91 | 88.68 | 0.87 |
Comparison of performance of NRfamPred and NR-2L on PIND at 2nd level of prediction using PIND. iNR-PhysChem was not evaluated on PIND in Ref. no. [16]. Hence, corresponding values of iNR-PhysChem is not shown. All values except MCC are in percentage. ¶Sensitivity of NR-2L was reported as accuracy in Ref. no. [15]
| NRfamPred | NR-2L | |||
|---|---|---|---|---|
| Sub-family | Sensitivity | MCC | Sensitivity¶ | MCC |
| NRP0 | 100.00 | 0.77 | 100.00 | 1.00 |
| NRP1 | 100.00 | 1.00 | 99.13 | 0.99 |
| NRP2 | 99.21 | 0.99 | 100.00 | 1.00 |
| NRP3 | 100.00 | 0.97 | 100.00 | 1.00 |
| NRP4 | 100.00 | 1.00 | 100.00 | 0.98 |
| NRP5 | 100.00 | 1.00 | 100.00 | 0.98 |
| NRP6 | -- | -- | -- | -- |
| Overall | 99.82 | 0.98 | 99.65 | -- |
Comparative performance of NRfamPred, iNR-PhysChem and NR-2L web-servers on PIND. NRP6 was not evaluated since PIND doesn't have sub-family NR6 data
| Sub-family | Number of proteins in PIND | NRfamPred | iNR-PhysChem | NR-2L |
|---|---|---|---|---|
| NRP0 | 6 | 6 | 5 | 6 |
| NRP1 | 231 | 231 | 229 | 228 |
| NRP2 | 127 | 127 | 126 | 127 |
| NRP3 | 148 | 148 | 147 | 148 |
| NRP4 | 23 | 23 | 22 | 23 |
| NRP5 | 33 | 33 | 33 | 33 |
| NRP6 | NA | NA | NA | NA |
| Total | 568 | 568 | 562 | 565 |
| Non-NRP | 500 | 488 | 489 | 481 |
Figure 1Classification schema of prediction on the basis of actual and predicted state.
At level-1, decision is made on the basis of whether the query protein is predicted as NRP or non-NRP. At level-2, the predicted NRP is categorized into same or different sub-family. At level-2 the schema is described for a hypothetical sub-family ‘X’.