| Literature DB >> 29934510 |
Dharm Skandh Jain1,2, Sanket Rajan Gupte1, Raviprasad Aduri3.
Abstract
RNA protein interactions (RPI) play a pivotal role in the regulation of various biological processes. Experimental validation of RPI has been time-consuming, paving the way for computational prediction methods. The major limiting factor of these methods has been the accuracy and confidence of the predictions, and our in-house experiments show that they fail to accurately predict RPI involving short RNA sequences such as TERRA RNA. Here, we present a data-driven model for RPI prediction using a gradient boosting classifier. Amino acids and nucleotides are classified based on the high-resolution structural data of RNA protein complexes. The minimum structural unit consisting of five residues is used as the descriptor. Comparative analysis of existing methods shows the consistently higher performance of our method irrespective of the length of RNA present in the RPI. The method has been successfully applied to map RPI networks involving both long noncoding RNA as well as TERRA RNA. The method is also shown to successfully predict RNA and protein hubs present in RPI networks of four different organisms. The robustness of this method will provide a way for predicting RPI networks of yet unknown interactions for both long noncoding RNA and microRNA.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29934510 PMCID: PMC6015049 DOI: 10.1038/s41598-018-27814-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1A schematic of the feature generation used in the current method. F0 to F1023 represent the protein features (see text) and F1024 to F2047 belong to RNA. These two vectors are concatenated to make the final feature vector of the RPI. The different classes are color coded for clarity. The lower cases of nucleotides refer to the modified residues corresponding to that particular nucleotide.
Prediction results of the test datasets (RPI2825, RPI2435, and RPI390) using ten-fold nested cross-validation.
| Dataset*/Metric$ | RPI2825 | RPI2435 | RPI390 |
|---|---|---|---|
| Accuracy | 0.943 (0.012) | 0.950 (0.015) | 0.871 (0.023) |
| Precision | 0.953 (0.009) | 0.955 (0.012) | 0.891 (0.017) |
| Recall | 0.931 (0.015) | 0.949 (0.014) | 0.843 (0.024) |
| F-Score | 0.942 (0.013) | 0.952 (0.016) | 0.865 (0.019) |
| Area under ROC curve | 0.975 (0.014) | 0.987 (0.011) | 0.914 (0.016) |
*The dataset refers to the comprehensive (RPI2825), ribosomal (RPI2435), and nonribosomal (RPI390) datasets. Standard deviations of predictions from the ten folds are mentioned in parentheses.
$Please refer to the methods for definition of the metrics.
Comparative analysis of the performance of XRPI, RPISeq-RF, RPISeq-SVM, and RPI-Pred on RPI2241 and RPI369 datasets using ten-fold cross-validation.
| Method | RPI2241 | RPI369 |
|---|---|---|
|
| 0.960 | 0.931 |
| RPISeq-RF | 0.896 | 0.762 |
| RPISeq-SVM | 0.871 | 0.728 |
| RPI-Pred | 0.84 | 0.92 |
Numbers reported are the prediction accuracies. Numbers for RPISeq and RPI-Pred are taken directly as reported from[14] and[16] respectively.
Comparative analysis of the performance of XRPI, RPISeq-RF, RPISeq-SVM, RPI-Pred, and lncPro on TeloPin dataset.
| Method | Human | Mouse | Total (%) |
|---|---|---|---|
| 124/140 | 36/41 | 160/181 (88.4%) | |
| 136/140 | 40/41 | 176/181 (97.29%) | |
| RPISeq-SVM | 130/140 | 33/41 | 163/181 (90.1%) |
| RPISeq-RF | 68/140 | 20/41 | 88/181 (48.6%) |
| RPI-Pred | 104/140 | 31/41 | 135/181 (74.6%) |
| lncPro | 9/140 | 1/41 | 10/181 (5.5%) |
$The number of correctly predicted RPIs is shown. In parenthesis is the prediction accuracy.
Comparative analysis of the performance of XRPI, RPISeq-RF, RPISeq-SVM, RPI-Pred, and lncPro on NPInter v3.0 golden dataset.
| Method |
|
|
|
| Total (%) |
|---|---|---|---|---|---|
| 1533/1560 | 225/237 | 18/19 | 200/204 | 1976/2020 (97.8%) | |
| 1498/1560 | 216/237 | 16/19 | 202/204 | 1932/2020 (95.6%) | |
| RPISeq-SVM | 1347/1560 | 219/237 | 17/19 | 202/204 | 1785/2020 (88.4%) |
| RPISeq-RF | 1483/1560 | 234/237 | 18/19 | 201/204 | 1936/2020 (95.8%) |
| RPI-Pred | 1529/1560 | 201/237 | 19/19 | 204/204 | 1955/2020 (96.8%) |
| lncPro | 784/1560 | 191/237 | 15/19 | 131/204 | 1121/2020 (55.5%) |
$The number of correctly predicted RPIs is shown. In parenthesis is the prediction accuracy.
Figure 2RPI networks. (A) TERRA RNA – protein interaction network from Mus musculus. (B) Xist RNA protein Interaction network from Mus musculus. (C) LincRNA-protein interactions network from Mus musculus, and (D) whole organism RNA protein network from Saccharomyces cerevisiae. RNA is depicted in blue and protein in red circles. Correctly predicted interactions are shown as green and failed predictions as black edges.