| Literature DB >> 30117758 |
Abstract
RNA-protein interactions (RPIs) play a very important role in a wide range of post-transcriptional regulations, and identifying whether a given RNA-protein pair can form interactions or not is a vital prerequisite for dissecting the regulatory mechanisms of functional RNAs. Currently, expensive and time-consuming biological assays can only determine a very small portion of all RPIs, which calls for computational approaches to help biologists efficiently and correctly find candidate RPIs. Here, we integrated a successful computing algorithm, conjoint triad feature (CTF), and another method, chaos game representation (CGR), for representing RNA-protein pairs and by doing so developed a prediction model based on these representations and random forest (RF) classifiers. When testing two benchmark datasets, RPI369 and RPI2241, the combined method (CTF+CGR) showed some superiority compared with four existing tools. Especially on RPI2241, the CTF+CGR method improved prediction accuracy (ACC) from 0.91 (the best record of all published works) to 0.95. When independently testing a newly constructed dataset, RPI1449, which only contained experimentally validated RPIs released between 2014 and 2016, our method still showed some generalization capability with an ACC of 0.75. Accordingly, we believe that our hybrid CTF+CGR method will be an important tool for predicting RPIs in the future.Entities:
Keywords: RNA-protein interactions; chaos game representation; conjoint triad feature; prediction; random forest
Mesh:
Substances:
Year: 2018 PMID: 30117758 PMCID: PMC6984769 DOI: 10.1080/21655979.2018.1470721
Source DB: PubMed Journal: Bioengineered ISSN: 2165-5979 Impact factor: 3.269
Results in predicting RPIs on RPI369 dataset (10-fold cross-validation test).
| Feature set | Dim | Sens | Spec | ACC | MCC | AUC | ntree | mtry |
|---|---|---|---|---|---|---|---|---|
| AAC+NC | 20 + 4 = 24 | 0.6856 | 0.7073 | 0.6965 | 0.3930 | 0.7011 | 372 | 17 |
| CTF | 343 + 256 = 599 | 0.8211 | 0.7696 | 0.7954 | 0.5916 | 0.8295 | 487 | 476 |
| CGR | 24 + 16 = 40 | 0.7019 | 0.7317 | 0.7168 | 0.4338 | 0.7559 | 338 | 15 |
| CTF+CGR | 599 + 40 = 639 | 0.8211 | 0.7778 | 0.7995 | 0.5995 | 0.7842 | 489 | 442 |
| CTF+CGR +AAC+NC | 599 + 40 + 20 + 4 = 663 | 0.7466 | 0.8010 | 0.7724 | 0.5500 | 0.8198 | 327 | 142 |
Results in predicting RPIs on RPI2241 dataset (10-fold cross-validation test).
| Feature set | Dim | Sens | Spec | ACC | MCC | AUC | ntree | mtry |
|---|---|---|---|---|---|---|---|---|
| AAC+NC | 20 + 4 = 24 | 0.7964 | 0.8298 | 0.8134 | 0.6268 | 0.8791 | 437 | 10 |
| CTF | 343 + 256 = 599 | 0.8415 | 0.8568 | 0.8492 | 0.6984 | 0.9163 | 426 | 406 |
| CGR | 24 + 16 = 40 | 0.7964 | 0.8659 | 0.8316 | 0.6643 | 0.8867 | 422 | 9 |
| CTF+CGR | 599 + 40 = 639 | 0.9192 | 0.9848 | 0.9520 | 0.9060 | 0.9722 | 482 | 104 |
| CTF+CGR +AAC+NC | 599 + 40 + 20 + 4 = 663 | 0.8405 | 0.8667 | 0.8536 | 0.7073 | 0.9163 | 385 | 306 |
Figure 1.ROC curves of five groups of features on RPI369 (A) and RPI2241 (B).
Comparisons with four existing tools.
| Tools | RPI369 | RPI2241 | ||||
|---|---|---|---|---|---|---|
| ACC | MCC | AUC | ACC | MCC | AUC | |
| Muppirala et al. [ | 0.76 | – | – | 0.90 | – | – |
| Wang et al. [ | 0.77 | 0.46 | – | 0.76 | 0.42 | – |
| RPI-Pred [ | 0.92 | – | 0.95 | 0.84 | – | 0.89 |
| rpiCOOL [ | 0.80 | 0.60 | 0.88 | 0.91 | 0.81 | 0.97 |
| Our method | 0.80 | 0.60 | 0.78 | 0.95 | 0.91 | 0.97 |
Independent testing dataset and predicting result.
| Data sources | RNA-protein complexes in PDB database | Independent testing dataset RPI1449 | Comparisons of predicting results | |
|---|---|---|---|---|
| Muppirala et al. [ | Our test result | |||
| 2014 | 378 | 1449 RNA-protein pairs after preprocessing | ACC: 1042/1449 = 0.7191 | ACC: 1092/1449 = 0.7536 |
| 2015 | 221 | |||
| 2016 | 250 | |||
| Total | 849 | |||
Figure 2.CTF picture of protein.
Figure 3.CGR picture of protein. The segments labelled serially with numbers 1-24.
Figure 4.CTF picture of RNA.
Figure 5.CGR picture of RNA. The segments labelled serially with numbers 1-16.