| Literature DB >> 26630876 |
Abstract
BACKGROUND: Nuclear receptors (NRs) form a large family of ligand-inducible transcription factors that regulate gene expressions involved in numerous physiological phenomena, such as embryogenesis, homeostasis, cell growth and death. These nuclear receptors-related pathways are important targets of marketed drugs. Therefore, the design of a reliable computational model for predicting NRs from amino acid sequence has now been a significant biomedical problem.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26630876 PMCID: PMC4668603 DOI: 10.1186/s12859-015-0828-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The detailed GPCRs subfamilies of dataset
| NRs family | Subset | Number of proteins from NucleaRDB | Number of proteins after CD-HIT (cut off threshold 0.6) | D159 (cut off threshold 0.6) | D282 (cut off threshold 0.9) |
|---|---|---|---|---|---|
| Thyroid hormone like | NR1 | 1172 | 162 | 50 | 114 |
| HNF4-like | NR2 | 736 | 140 | 36 | 72 |
| Estrogen like | NR3 | 704 | 82 | 37 | 75 |
| Nerve Growth factor IB-like | NR4 | 119 | 23 | 7 | - |
| Fushi tarazu-F1 like | NR5 | 151 | 29 | 12 | 21 |
| Germ cell nuclear factor like | NR6 | 41 | 7 | 5 | - |
| Knirps like | NR7 | 47 | 21 | 12 | - |
| DAX like | NR8 | 46 | 10 | - | - |
| Overall | 3016 | 474 | 159 | 282 |
Dataset
| Dataset | Numbers of NRs | Numbers of Non-NRs |
|---|---|---|
| Training Dataset | 474 | 500 |
Fig. 1CGR picture. The segments labeled serially with numbers 1–24. (Also can be found in the reference ([30])
Fig. 2Flowchart to describe the operation process
Results in identifying the NR proteins from non-NR Proteins
| Feature set | Dimension | Sens | Spec | Acc | MCC | AUC |
|---|---|---|---|---|---|---|
| AAC | 20 | 0.9388 | 0.9320 | 0.9353 | 0.8706 | 0.9923 |
| CGR | 24 | 0.8567 | 0.8460 | 0.8511 | 0.7022 | 0.9290 |
| CTF | 343 | 0.9346 | 0.9880 | 0.9620 | 0.9240 | 0.9920 |
| AAC + CGR | 44 | 0.9388 | 0.9240 | 0.9312 | 0.8624 | 0.9923 |
| AAC + CTF | 363 | 0.9451 | 0.9800 | 0.9630 | 0.9261 | 0.9923 |
| CTF + CGR | 367 | 0.9409 | 0.9820 | 0.9620 | 0.9246 | 0.9915 |
| CTF + CGR + AAC | 387 | 0.9409 | 0.9800 | 0.9610 | 0.9220 | 0.9914 |
(10-fold cross-validation test)
Fig. 3Receiver operating characteristic (ROC) curves for NRs predictions. ROC curves illustrate the trade-off between true positive rate and false positive rate for SVM classifiers, by using seven different groups of feature combinations on new dataset D474
Success rates in identifying eight main NR families
| Feature set | Dimension | Overall Sens | Overall MCC | Gamma | C |
|---|---|---|---|---|---|
| AAC | 20 | 0.7173 | 0.6769 | 2.0231 | 71.8882 |
| CGR | 24 | 0.6772 | 0.6311 | 1.0098 | 77.9671 |
| CTF | 343 | 0.9430 | 0.9349 | 0.0192 | 11.0849 |
| AAC + CGR | 44 | 0.7806 | 0.7492 | 2.5595 | 13.0576 |
| AAC + CTF | 363 | 0.9473 | 0.9397 | 0.0159 | 10.3440 |
| CTF + CGR | 367 | 0.9409 | 0.9325 | 0.0015 | 104.92 |
| CTF + CGR + AAC | 387 | 0.9409 | 0.9325 | 0.0138 | 11.6455 |
(10-fold cross-validation test)
Predicting performance in identifying eight main NR families based on Feature set 3 and Feature set 5
| NR Subfamily |
|
| ||||||
|---|---|---|---|---|---|---|---|---|
| Sens(i) | Spec(i) | Acc(i) | MCC(i) | Sens(i) | Spec(i) | Acc(i) | MCC(i) | |
|
|
|
|
|
|
|
| 0.9620 | 0.8966 |
|
|
|
|
|
|
|
| 0.9663 | 0.9189 |
|
|
|
| 0.9852 |
|
|
| 0.9873 | 0.955 |
|
|
| 1 | 0.9937 |
|
| 1 | 0.9937 | 0.9294 |
|
|
| 1 | 0.9958 |
|
| 1 | 0.9958 | 0.9627 |
|
|
| 1 | 0.9958 |
|
| 1 | 0.9958 | 0.8433 |
|
|
| 1 | 0.9979 |
|
| 1 | 0.9979 | 0.9748 |
|
|
| 1 | 0.9958 |
|
| 1 | 0.9958 | 0.8340 |
| Overall | 447/474 = 0.9430 | 0.9919 | 0.9858 | 0.9349 | 449/474 = 0.9473 | 0.9925 | 0.9868 | 0.9397 |
(10-fold cross-validation test)
Comparisons with NR-2L and iNR-PhysChem at a single level (jackknife test)
| Feature | Dimension | Acc | MCC | Independent test dataset |
|---|---|---|---|---|
| AAC | 20 | 0.9348 | 0.8288 | 0.9504 |
| CGR | 24 | 0.8847 | 0.7693 | 0.8268 |
| CTF | 343 | 0.9863 | 0.9625 | 0.9831 |
| AAC + CGR | 44 | 0.9439 | 0.8457 | 0.9410 |
| AAC + CTF | 363 | 0.9879 | 0.9667 | 0.9878 |
| CTF + CGR | 367 | 0.9863 | 0.9727 | 0.9850 |
| CTF + CGR + AAC | 387 | 0.9848 | 0.9583 | 0.9878 |
| NR-2L | 881 | 0.9256 | 0.8500 | 0.9803 |
| iNR-PhysChem | 1000 | 0.9818 | 0.9600 | - |
Comparisons with NR-2L and iNR-PhysChem at the second level (jackknife test)
| NR Subfamily | CTF | NR-2L | iNR-PhysChem | |||
|---|---|---|---|---|---|---|
| Sens(i) | MCC(i) | Sens(i) | MCC(i) | Sens(i) | MCC(i) | |
| NR1 | 49/50 = 0.9800 | 0.9029 | 43/50 = 0.8600 | 0.88 | 47/50 = 0.9400 | 0.87 |
| NR2 | 32/36 = 0.8889 | 0.8907 | 31/36 = 0.8611 | 0.85 | 35/36 = 0.9722 | 0.93 |
| NR3 | 37/37 = 1 | 0.9660 | 37/37 = 1.00 | 0.86 | 37/37 = 1.00 | 0.95 |
| NR4 | 6/7 = 0.8571 | 0.9228 | 6/7 = 0.8571 | 0.70 | 5/7 = 0.7143 | 0.84 |
| NR5 | 10/12 = 0.8333 | 0.9067 | 10/12 = 0.8333 | 0.86 | 10/12 = 0.8333 | 0.91 |
| NR6 | 5/5 = 1 | 1 | 5/5 = 1.00 | 1.00 | 5/5 = 1.00 | 1.00 |
| NR0 | 10/12 = 0.8333 | 0.9067 | 9/12 = 0.7500 | 0.86 | 8/12 = 0.6667 | 0.81 |
| Overall | 149/159 = 0.9371 | 0.9266 | 141/159 = 0.8868 | 0.87 | 147/159 = 0.9245 | 0.91 |
The top-50 significant features in CTF and their p-values
| ID | Feature |
| ID | Feature |
|
|---|---|---|---|---|---|
| 1 | {C}-{RK}-{AGV} | 2.83E-113 | 26 | {HNQW}-{YMIS}-{C} | 1.83E-30 |
| 2 | {AGV}-{C}-{RK} | 1.28E-109 | 27 | {ILFP}-{ILFP}-{YMIS} | 4.15E-30 |
| 3 | {C}-{AGV}-{DE} | 1.10E-89 | 28 | {AGV}-{ILFP}-{ILFP} | 8.51E-30 |
| 4 | {DE}-{AGV}-{C} | 2.48E-89 | 29 | {YMIS}-{ILFP}-{ILFP} | 1.25E-29 |
| 5 | {C}-{RK}-{ILFP} | 5.91E-89 | 30 | {YMIS}-{YMIS}-{C} | 1.34E-29 |
| 6 | {C}-{DE}-{AGV} | 1.08E-85 | 31 | {YMIS}-{AGV}-{YMIS} | 1.70E-29 |
| 7 | {YMIS}-{C}-{DE} | 1.33E-72 | 32 | {RK}-{AGV}-{C} | 9.04E-29 |
| 8 | {RK}-{C}-{ILFP} | 3.69E-72 | 33 | {YMIS}-{AGV}-{AGV} | 1.34E-28 |
| 9 | {RK}-{RK}-{C} | 2.15E-54 | 34 | {YMIS}-{AGV}-{C} | 3.07E-28 |
| 10 | {AGV}-{C}-{AGV} | 1.37E-48 | 35 | {ILFP}-{HNQW}-{DE} | 1.03E-27 |
| 11 | {YMIS}-{C}-{RK} | 1.93E-46 | 36 | {HNQW}-{YMIS}-{AGV} | 1.21E-27 |
| 12 | {RK}-{C}-{HNQW} | 1.15E-42 | 37 | {RK}-{RK}-{YMIS} | 1.93E-27 |
| 13 | {ILFP}-{RK}-{RK} | 3.18E-41 | 38 | {RK}-{YMIS}-{ILFP} | 1.06E-26 |
| 14 | {HNQW}-{RK}-{C} | 2.37E-40 | 39 | {RK}-{ILFP}-{ILFP} | 2.32E-26 |
| 15 | {YMIS}-{YMIS}-{YMIS} | 7.58E-40 | 40 | {ILFP}-{YMIS}-{YMIS} | 3.03E-26 |
| 16 | {RK}-{AGV}-{ILFP} | 5.18E-37 | 41 | {ILFP}-{AGV}-{C} | 5.91E-26 |
| 17 | {HNQW}-{HNQW}-{C} | 1.32E-36 | 42 | {RK}-{HNQW}-{C} | 1.57E-25 |
| 18 | {AGV}-{DE}-{RK} | 4.68E-36 | 43 | {ILFP}-{YMIS}-{ILFP} | 2.26E-25 |
| 19 | {ILFP}-{ILFP}-{RK} | 4.00E-34 | 44 | {DE}-{RK}-{AGV} | 2.62E-25 |
| 20 | {AGV}-{AGV}-{YMIS} | 1.25E-33 | 45 | {ILFP}-{YMIS}-{DE} | 5.45E-25 |
| 21 | {C}-{ILFP}-{AGV} | 2.43E-33 | 46 | {C}-{AGV}-{AGV} | 7.34E-25 |
| 22 | {YMIS}-{YMIS}-{ILFP} | 2.21E-31 | 47 | {YMIS}-{ILFP}-{HNQW} | 9.15E-25 |
| 23 | {ILFP}-{ILFP}-{ILFP} | 2.27E-31 | 48 | {C}-{HNQW}-{AGV} | 1.05E-24 |
| 24 | {AGV}-{YMIS}-{YMIS} | 4.38E-31 | 49 | {YMIS}-{ILFP}-{YMIS} | 3.83E-24 |
| 25 | {AGV}-{YMIS}-{AGV} | 8.37E-31 | 50 | {C}-{HNQW}-{YMIS} | 4.09E-24 |
Relative importance of the top-50 significant features
| Feature | Dimension | D159 | D474 | ||
|---|---|---|---|---|---|
| Acc | MCC | Acc | MCC | ||
| CTF | 343 | 0.9863 | 0.9625 | 0.9620 | 0.9240 |
| Top-10 | 10 | 0.9621 | 0.9241 | 0.9312 | 0.8624 |
| Top-50 | 50 | 0.9772 | 0.9545 | 0.9538 | 0.9076 |
| CTF-10 | 333 | 0.9681 | 0.9363 | 0.9384 | 0.8768 |
| CTF-50 | 293 | 0.9408 | 0.8816 | 0.9035 | 0.8070 |
| NR-2L | 881 | 0.9256 | 0.8500 | - | - |
| iNR-PhysChem | 1000 | 0.9818 | 0.9600 | - | - |