| Literature DB >> 21143803 |
Yu-Feng Huang1, Li-Yuan Chiu, Chun-Chin Huang, Chien-Kang Huang.
Abstract
BACKGROUND: RNA-binding proteins (RBPs) play crucial roles in post-transcriptional control of RNA. RBPs are designed to efficiently recognize specific RNA sequences after it is derived from the DNA sequence. To satisfy diverse functional requirements, RNA binding proteins are composed of multiple blocks of RNA-binding domains (RBDs) presented in various structural arrangements to provide versatile functions. The ability to computationally predict RNA-binding residues in a RNA-binding protein can help biologists reveal important site-directed mutagenesis in wet-lab experiments.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21143803 PMCID: PMC3005934 DOI: 10.1186/1471-2164-11-S4-S2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Prediction performance evaluated by the 5-fold cross-validation using the training dataset, RB147
| Predictors | Sensitivity | Specificity | Precision | Accuracy | MCC | F-score | F0.5-score |
|---|---|---|---|---|---|---|---|
| ProteRNASVM | 38.85% ± 0.46% | 97.01% ± 0.09% | 75.99% ± 0.48% | 85.93% ± 0.08% | 0.4732 ± 0.0036 | 0.5170 ± 0.0040 | 0.6343 ± 0.0034 |
| ProteRNAWildSpan | 12.28% | 96.26% | 43.60% | 80.27% | 0.1489 | 0.1916 | 0.2887 |
| ProteRNA | 44.84% ± 0.37% | 93.56% ± 0.09% | 62.10% ± 0.25% | 84.28% ± 0.06% | 0.4378 ± 0.0027 | 0.5208 ± 0.0027 | 0.5766 ± 0.0022 |
Statistical information of the training dataset, RB147 in terms of RNA-binding residues
| Number of RNA-binding residues | Total number of residues | Ratio of RNA-binding residues | |
|---|---|---|---|
| rRNA | 3916 | 10267 | 38.14% |
| mRNA | 256 | 1878 | 13.63% |
| tRNA | 1230 | 12401 | 9.92% |
| others | 755 | 7778 | 9.71% |
| Total | 6157 | 32324 | 19.05% |
Prediction performance breakdown in terms of the categories of RNA using the training dataset, RB147
| Predictor | RNA | TP | FP | TN | FN | Sensitivity | Specificity | Precision | Accuracy | MCC | F-score | F0.5-score |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ProteRNASVM | rRNA | 2060 | 537 | 5814 | 1856 | 52.60% | 91.54% | 79.32% | 76.69% | 0.4933 | 0.6326 | 0.7201 |
| mRNA | 27 | 16 | 1606 | 229 | 10.55% | 99.01% | 62.79% | 86.95% | 0.2193 | 0.1806 | 0.3154 | |
| tRNA | 234 | 171 | 11000 | 996 | 19.02% | 98.47% | 57.78% | 90.59% | 0.2942 | 0.2862 | 0.4105 | |
| others | 109 | 93 | 6930 | 646 | 14.44% | 98.68% | 53.96% | 90.50% | 0.2441 | 0.2278 | 0.3487 | |
| Total | 2430 | 823 | 25344 | 3727 | 39.47% | 96.86% | 74.70% | 85.92% | 0.4741 | 0.5165 | 0.6338 | |
| ProteRNAWildSpan | rRNA | 554 | 412 | 5939 | 3362 | 14.15% | 93.51% | 57.35% | 63.24% | 0.1274 | 0.2270 | 0.3560 |
| mRNA | 67 | 121 | 1501 | 189 | 26.17% | 92.54% | 35.64% | 83.49% | 0.2139 | 0.3018 | 0.3323 | |
| tRNA | 50 | 173 | 10998 | 1180 | 4.07% | 98.45% | 22.42% | 89.09% | 0.0566 | 0.0688 | 0.1178 | |
| others | 85 | 272 | 6751 | 670 | 11.26% | 96.13% | 23.81% | 87.89% | 0.1045 | 0.1529 | 0.1947 | |
| Total | 756 | 978 | 25189 | 5401 | 12.28% | 96.26% | 43.60% | 80.27% | 0.1489 | 0.1916 | 0.2887 | |
| ProteRNA | rRNA | 2256 | 878 | 5473 | 1660 | 57.61% | 86.18% | 71.98% | 75.28% | 0.4618 | 0.6400 | 0.6856 |
| mRNA | 89 | 138 | 1484 | 167 | 34.77% | 91.49% | 39.21% | 83.76% | 0.2764 | 0.3685 | 0.3823 | |
| tRNA | 238 | 304 | 10867 | 992 | 19.35% | 97.28% | 43.91% | 89.55% | 0.2431 | 0.2686 | 0.3502 | |
| others | 177 | 366 | 6657 | 578 | 23.44% | 94.79% | 32.60% | 87.86% | 0.2118 | 0.2727 | 0.3024 | |
| Total | 2760 | 1686 | 24481 | 3397 | 44.83% | 93.56% | 62.08% | 84.28% | 0.4376 | 0.5206 | 0.5764 | |
Comparison of ProteRNA with other predictors using the independent testing dataset, RB33
| Predictor* | TP | FP | TN | FN | Sensitivity | Specificity | Precision | Accuracy | MCC | F-score | F0.5-score |
| ProteRNA | 222 | 340 | 8563 | 660 | 25.17% | 96.18% | 39.50% | 89.78% | 0.2628 | 0.3075 | 0.3546 |
| PiRaNhA | 265 | 538 | 8365 | 617 | 30.05% | 93.96% | 33.00% | 88.20% | 0.2504 | 0.3145 | 0.3236 |
| Pprint | 447 | 1782 | 7121 | 435 | 50.68% | 79.98% | 20.05% | 77.34% | 0.2094 | 0.2873 | 0.2281 |
| BindN | 348 | 1613 | 7290 | 534 | 39.46% | 81.88% | 17.75% | 78.06% | 0.1527 | 0.2449 | 0.1994 |
| PRIP | 131 | 835 | 8068 | 751 | 14.85% | 90.62% | 13.56% | 83.79% | 0.0526 | 0.1418 | 0.1380 |
*Order by MCC.
Comparison of the top 10 ranking predictions with results from other predictors
| Rank | ProteRNA | PiRaNhA | Pprint | BindN | PRIP |
|---|---|---|---|---|---|
| (a) Rank by MCC | |||||
| 1 | 2PJP_A | ||||
| 2 | |||||
| 3 | 2PJP_A | 2HYI_D | |||
| 4 | 2NQP_B | ||||
| 5 | 2GYA_3 | 2IY5_A | |||
| 6 | |||||
| 7 | 2I82_C | 2G8K_A | 2J0Q_A | 2I82_C | |
| 8 | 2OZB_B | 2IPY_B | 2V47_C | ||
| 9 | 2V47_C | 2DR2_A | 2HVR_A | 2GJE_A | |
| 10 | 2DR2_A | 2GJE_D | 2QKK_F | 2GTT_G | 2JEA_B |
| MCC of Rank 1 | 0.6668 | 0.6415 | 0.6006 | 0.4364 | 0.5521 |
| MCC of Rank 10 | 0.3063 | 0.2719 | 0.2390 | 0.1951 | 0.0517 |
| (b) Rank by precision | |||||
| 1 | 2Q66_A | ||||
| 2 | 2PJP_A | ||||
| 3 | |||||
| 4 | |||||
| 5 | 2OZB_B | ||||
| 6 | |||||
| 7 | 2J0Q_A | 2IY5_A | |||
| 8 | |||||
| 9 | |||||
| 10 | 2G8K_A | 2Q66_A | 2GJE_A | 2GJE_A | |
| Precision of Rank 1 | 100.00% | 100.00% | 76.92% | 76.47% | 75.00% |
| Precision of Rank 10 | 50.00% | 35.71% | 25.00% | 24.00% | 13.33% |
Values in bold indicate listing in the top 10 by at least 5 predictors.
Values in bold and italics indicate listing in the top 10 by at least 4 predictors.
Figure 1Case study on Residues colored by green, red, and blue represent true positive, false positive and false negative, respectively. (a) Predicted RNA-binding residues by ProteRNA. (b) Predicted RNA-binding residues by PiRaNhA.
Figure 2Case study on RluA (PDBID 2I82C) Residues colored by green, red, and blue represent true positive, false positive and false negative, respectively. (a) Predicted RNA-binding residues by ProteRNA. (b) Predicted RNA-binding residues by PiRaNhA.
Datasets for ProteRNA
| (a) Training dataset - RB147 | |||||||
|---|---|---|---|---|---|---|---|
| 1A34_A | 1A9N_A | 1APG_A | 1ASY_A | 1AV6_A | 1B23_P | 1B2M_A | 1C0A_A |
| 1DDL_A | 1DFU_P | 1DI2_A | 1E8O_A | 1EC6_A | 1EIY_B | 1F7U_A | 1FEU_A |
| 1FFY_A | 1FJG_B | 1FJG_C | 1FJG_D | 1FJG_E | 1FJG_G | 1FJG_I | 1FJG_J |
| 1FJG_K | 1FJG_L | 1FJG_M | 1FJG_N | 1FJG_P | 1FJG_Q | 1FJG_S | 1FJG_T |
| 1FJG_V | 1G1X_A | 1G1X_B | 1G1X_C | 1G2E_A | 1GTF_Q | 1H2C_A | 1H3E_A |
| 1H4S_A | 1HQ1_A | 1HRO_W | 1I6U_A | 1J1U_A | 1J2B_A | 1JBR_A | 1JID_A |
| 1K8W_A | 1KNZ_A | 1KQ2_A | 1LAJ_A | 1LNG_A | 1M5O_C | 1M8V_A | 1M8X_A |
| 1MZP_A | 1N35_A | 1N78_A | 1NB7_A | 1OOA_A | 1PGL_2 | 1Q2S_A | 1QF6_A |
| 1QTQ_A | 1R3E_A | 1RMV_A | 1RPU_A | 1SDS_A | 1SER_A | 1SI3_A | 1T0K_B |
| 1TFW_A | 1U0B_B | 1UN6_B | 1UVJ_A | 1VFG_A | 1VQO_1 | 1VQO_2 | 1VQO_3 |
| 1VQO_A | 1VQO_B | 1VQO_C | 1VQO_D | 1VQO_E | 1VQO_G | 1VQO_H | 1VQO_I |
| 1VQO_J | 1VQO_K | 1VQO_L | 1VQO_M | 1VQO_N | 1VQO_P | 1VQO_Q | 1VQO_R |
| 1VQO_S | 1VQO_T | 1VQO_U | 1VQO_V | 1VQO_W | 1VQO_X | 1VQO_Y | 1VQO_Z |
| 1W2B_5 | 1WNE_A | 1WPU_A | 1WSU_A | 1WZ2_A | 1Y69_8 | 1Y69_K | 1Y69_U |
| 1YVP_A | 1YZ9_A | 1ZH5_A | 2A1R_A | 2A8V_A | 2ASB_A | 2AVY_F | 2AVY_U |
| 2AW4_0 | 2AW4_1 | 2AW4_2 | 2AW4_3 | 2AW4_D | 2AW4_E | 2AW4_G | 2AW4_H |
| 2AW4_J | 2AW4_L | 2AW4_N | 2AW4_P | 2AW4_Q | 2AW4_R | 2AW4_S | 2AW4_Y |
| 2AW4_Z | 2AZ0_A | 2BGG_A | 2BH2_A | 2BTE_A | 2BU1_A | 2BX2_L | 2CT8_A |
| 2D3O_1 | 2D3O_S | 2FMT_A | |||||
| (b) Independent Testing Dataset - RB33 | |||||||
| 1VS8_O | 2D6F_D | 2DB3_C | 2DER_B | 2DR2_A | 2DU3_A | 2F8S_A | 2FK6_A |
| 2G4B_A | 2G8K_A | 2GJE_A | 2GJE_D | 2GJW_C | 2GTT_G | 2GYA_3 | 2HVR_A |
| 2HYI_D | 2I82_C | 2IPY_B | 2IX1_A | 2IY5_A | 2J0Q_A | 2JEA_A | 2JEA_B |
| 2NQP_B | 2OZB_B | 2PJP_A | 2PY9_C | 2Q66_A | 2QAM_Z | 2QBE_T | 2QKK_F |
| 2V47_C | |||||||
Figure 4An outline of RNA-binding residue prediction by the SVM-based classifier.